Google Drive to Google Sheets, clean OCR tables
Copying tables out of PDFs is the kind of work that looks “quick” until you’ve done it 40 times. One missed row, one shifted column, and your spreadsheet is suddenly lying to you.
Operations managers dealing with invoices and timesheets feel this pain daily. A bookkeeper cleaning up bank statements does too. Even a marketing lead tracking ad spend receipts runs into the same mess. This Drive Sheets OCR automation turns documents in Google Drive into clean, usable rows in Google Sheets.
Below you’ll see how the workflow works, what it replaces, and what you need to get reliable tables into Sheets without the retyping.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Google Drive to Google Sheets, clean OCR tables
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n1@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n9@{ icon: "mdi:swap-vertical", form: "rounded", label: "Document URL", pos: "b", h: 48 }
n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Image URL", pos: "b", h: 48 }
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral DOC OCR1"]
n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral IMAGE OCR1"]
n10 --> n12
n9 --> n11
n1 --> n9
n1 --> n10
end
subgraph sg1["Flow 2"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Upload"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Signed URL"]
n3@{ icon: "mdi:cog", form: "rounded", label: "Import PDF", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral DOC OCR"]
n3 --> n0
n0 --> n2
n2 --> n7
end
subgraph sg2["Flow 3"]
direction LR
n4@{ icon: "mdi:cog", form: "rounded", label: "Import Image", pos: "b", h: 48 }
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Upload1"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Signed URL1"]
n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral IMAGE OCR"]
n4 --> n5
n5 --> n6
n6 --> n8
end
subgraph sg3["Flow 4"]
direction LR
n13@{ icon: "mdi:swap-vertical", form: "rounded", label: "Document URL1", pos: "b", h: 48 }
n14["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Document Understanding"]
n13 --> n14
end
subgraph sg4["Flow 5"]
direction LR
n15@{ icon: "mdi:swap-vertical", form: "rounded", label: "Image URL1", pos: "b", h: 48 }
n16["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Document Mis-Understanding?"]
n15 --> n16
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n1 trigger
class n11,n12,n0,n2,n7,n5,n6,n8,n14,n16 api
classDef customIcon fill:none,stroke:none
class n11,n12,n0,n2,n7,n5,n6,n8,n14,n16 customIcon
The Problem: PDF Tables Don’t Belong in Spreadsheets
A lot of “business data” still arrives as PDFs or screenshots. Bank statements, supplier invoices, expense receipts, timesheets, compliance docs. The moment you need that data in Google Sheets, the real work starts: download the file, squint at the table, copy cells that don’t paste cleanly, then spend another pass fixing columns that drifted. It’s not only slow. It’s mentally exhausting, and it’s the kind of task that quietly creates errors you discover much later when reports don’t reconcile.
The friction compounds. Here’s where it breaks down in real life:
- Copy-paste often merges columns, so you’re forced into manual cleanup before anyone can even use the data.
- One skipped line item can throw off a full month of totals, which means rechecking the source document again.
- Teams end up saving “final_v7.xlsx” versions because no one trusts the first pass.
- When volume increases (month-end, tax season, campaign reporting), the process collapses under its own weight.
The Solution: Google Drive → Mistral OCR → Clean Tables in Sheets
This workflow pulls a PDF or an image from Google Drive, sends it to Mistral OCR, then turns the OCR output into structured, copy-friendly table data you can drop into Google Sheets. It supports two practical paths: a “public URL” method (fast to test) and a more secure approach where the document is uploaded to Mistral’s cloud storage and retrieved via a signed link. That signed link matters when you’re processing sensitive documents like bank statements, because it keeps files from being publicly accessible. The workflow also includes a document understanding option, which lets you ask questions about the document (for example, how many deposits appear) instead of extracting everything.
The workflow starts when you run it in n8n and provide a document or image URL, or it fetches the source file from Google Drive. It then uploads the file (or uses a public link), calls the Mistral OCR API via HTTP Request, and returns OCR content that’s already split by page and converted to markdown. From there, you can map the table content into rows and columns in Google Sheets, and optionally send summaries or exceptions through Gmail.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you process 20 bank statement pages each week and you need line items in Google Sheets. Manually, it’s often about 10 minutes per page between copying, fixing columns, and double-checking totals, so you lose roughly 3 hours weekly. With this workflow, you fetch the PDF from Google Drive, let Mistral OCR handle the page splitting and conversion, then spend about 20 minutes reviewing and pasting clean tables into Sheets. That’s a couple hours back every week, plus fewer “why doesn’t this reconcile?” moments.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Google Drive to store and fetch PDFs/images.
- Google Sheets to store extracted tables as rows.
- Mistral API key (get it from your Mistral Cloud dashboard)
Skill level: Intermediate. You’ll connect accounts, add an API key, and do light field mapping for the table output.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A run is triggered. In the base workflow, it starts with a manual run in n8n, which is perfect for testing. In production, many teams adapt this to start when a new file lands in a specific Google Drive folder.
The document is retrieved and prepared. The workflow can fetch a PDF or an image from Google Drive, then read the file content so it’s ready to send to an OCR service. If you already have a public URL, it can use that too.
Mistral OCR does the heavy lifting. The workflow calls Mistral OCR via HTTP Request. You can either send the public URL to OCR directly or upload the file to Mistral Cloud first and generate a signed link (more secure, usually the better move for financial docs).
The output is shaped for spreadsheets. Mistral returns OCR content with page splitting and markdown conversion, which makes tables far easier to normalize into consistent columns for Google Sheets. If you want “answers” instead of raw extraction, the workflow also shows how to run a document query step.
You can easily modify the Google Drive source (folder, naming rules) to match how your team stores documents. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
Start the workflow with a manual trigger so you can test OCR flows on demand.
- Add the Manual Run Trigger node as the workflow trigger.
- Connect Manual Run Trigger to both Assign Document URL and Assign Image URL; this creates a parallel start.
- Confirm the parallel flow: Manual Run Trigger outputs to both Assign Document URL and Assign Image URL in parallel.
Tip: Use the manual trigger during setup to validate each OCR path before you automate the trigger later.
Step 2: Connect Google Drive Inputs
Configure Google Drive downloads for file-based OCR processing.
- Open Fetch PDF from Drive and set Operation to
download. - Set File ID to the target PDF file, replacing
[YOUR_ID]with an actual ID. - Credential Required: Connect your
googleDriveOAuth2Apicredentials in Fetch PDF from Drive. - Open Fetch Image from Drive and set Operation to
download. - Set File ID to the target image file, replacing
[YOUR_ID]with an actual ID. - Credential Required: Connect your
googleDriveOAuth2Apicredentials in Fetch Image from Drive.
⚠️ Common Pitfall: Using a Drive file ID that doesn’t match the file type (PDF vs PNG) will cause upload errors downstream.
Step 3: Configure Public URL OCR Paths
Set direct URL OCR for public documents and images without uploading to Mistral.
- In Assign Document URL, set url to
=https://example.com/[YOUR_ID].pdfand replace[YOUR_ID]. - In Assign Image URL, set url to
=https://example.com/[YOUR_ID].pngand replace[YOUR_ID]. - Open OCR Public Document and confirm URL is
https://api.mistral.ai/v1/ocrwith JSON Body set to={ "model": "mistral-ocr-latest", "document": { "type": "document_url", "document_url": "{{ $json.url }}" } }. - Open OCR Public Image and confirm URL is
https://api.mistral.ai/v1/ocrwith JSON Body set to={ "model": "mistral-ocr-latest", "document": { "type": "image_url", "image_url": "{{ $json.url }}" } }. - Credential Required: Connect your
mistralCloudApicredentials in OCR Public Document and OCR Public Image.
Step 4: Configure File Upload OCR Paths
Upload files to Mistral, generate signed URLs, and run OCR via API for Drive-sourced files.
- In Upload File to Mistral, set URL to
https://api.mistral.ai/v1/files, Method toPOST, and Content Type tomultipart-form-data. - In Upload File to Mistral body parameters, set purpose to
ocrand set file toformBinaryDatawith inputDataFieldNamedata. - In Generate Signed Link, set URL to
=https://api.mistral.ai/v1/files/{{ $json.id }}/url, expiry query parameter to24, and Accept header toapplication/json. - Open OCR Document via API and confirm JSON Body is
={ "model": "mistral-ocr-latest", "document": { "type": "document_url", "document_url": "{{ $json.url }}" }, "include_image_base64": true }. - Repeat the same setup for image uploads using Upload Image to Mistral → Create Image Signed Link → OCR Image via API with the same URL patterns and parameters.
- Credential Required: Connect your
mistralCloudApicredentials in all Mistral HTTP nodes (10+ nodes including Upload File to Mistral, Generate Signed Link, Upload Image to Mistral, Create Image Signed Link, OCR Document via API, and OCR Image via API).
Tip: Ensure the binary property name is data from the Drive download nodes, or Mistral will reject the file upload.
Step 5: Configure Query-Based Analysis for Documents and Images
Send a query along with a document or image URL to Mistral chat completion for contextual answers.
- In Set Doc Query Inputs, set url to
=https://example.com/[YOUR_ID].pdfand query towhat is the total number of deposits?. - In Analyze Document Query, set URL to
https://api.mistral.ai/v1/chat/completionsand keep the JSON Body as={ "model": "mistral-small-latest", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "{{ $json.query }}" }, { "type": "document_url", "document_url": "{{ $json.url }}" } ] } ], "document_image_limit": 8, "document_page_limit": 64 }. - In Set Image Query Inputs, set url to
=https://example.com/[YOUR_ID].pngand query towhat is the total number of deposits?. - In Analyze Image Query, set URL to
https://api.mistral.ai/v1/chat/completionsand keep the JSON Body as={ "model": "pixtral-large-latest", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "{{ $json.query }}" }, { "type": "image_url", "image_url": "{{ $json.url }}" } ] } ], "document_image_limit": 8, "document_page_limit": 64 }. - Credential Required: Connect your
mistralCloudApicredentials in Analyze Document Query and Analyze Image Query.
⚠️ Common Pitfall: The query nodes use public URLs—ensure the URLs are accessible without authentication.
Step 6: Test and Activate Your Workflow
Validate each branch and confirm OCR outputs before activating the workflow.
- Click Execute Workflow to run from Manual Run Trigger.
- Verify that Assign Document URL and Assign Image URL both run in parallel and trigger their respective OCR nodes.
- Check outputs in OCR Public Document, OCR Public Image, OCR Document via API, and OCR Image via API for extracted text.
- Confirm Analyze Document Query and Analyze Image Query return a natural language answer to your query.
- When results look correct, toggle the workflow to Active to use it in production.
Common Gotchas
- Google Drive credentials can expire or need specific permissions. If things break, check the n8n credential connection and Drive file access sharing first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if your Drive, Sheets, and Mistral accounts are ready.
No. You’ll mostly connect accounts and paste in your Mistral API key. The only “technical” part is mapping OCR output into the Sheet columns you want.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Mistral OCR usage, which depends on how many pages you process.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s a common tweak. You can keep the Google Drive fetch steps the same, then adjust the “document query inputs” (the Set nodes) to ask for invoice-specific fields like invoice number, due date, tax, and line items. If you want higher reliability, create separate paths for invoices vs. receipts and route them with an If node based on filename or Drive folder.
Usually it’s an expired Google OAuth connection in n8n or the file permissions changed after someone moved folders. Reconnect the Google Drive credential in n8n, then confirm the exact file still exists and that the connected Google account can open it in Drive. If it fails only for certain files, check shared drives and domain restrictions, because those can block API access even when you can “see” the file in the UI.
A lot, within reason. On n8n Cloud, the practical limit is your monthly execution quota and how you batch documents, while self-hosting mostly depends on your server. Mistral OCR throughput depends on file size and page count, so most teams start with a few dozen documents a day and scale up once they’re happy with the table quality.
Often, yes. This workflow is doing a few things that get awkward in simpler tools: handling binary files, choosing between public URLs vs. signed links, and optionally running a document Q&A step. n8n also makes branching and merging easier when you split PDFs and images into separate paths. Zapier or Make can still work if your use case is “new file → send to OCR → paste result” and you don’t care about secure upload methods. If you’re unsure, Talk to an automation expert and you’ll get a straight recommendation.
This is the kind of workflow you set up once and rely on for months. The tables land in Google Sheets cleanly, and you get your time (and focus) back.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.