Google Drive to Google Sheets, clean OCR tables

Copying tables out of PDFs is the kind of work that looks “quick” until you’ve done it 40 times. One missed row, one shifted column, and your spreadsheet is suddenly lying to you.

Operations managers dealing with invoices and timesheets feel this pain daily. A bookkeeper cleaning up bank statements does too. Even a marketing lead tracking ad spend receipts runs into the same mess. This Drive Sheets OCR automation turns documents in Google Drive into clean, usable rows in Google Sheets.

Below you’ll see how the workflow works, what it replaces, and what you need to get reliable tables into Sheets without the retyping.

How This Automation Works

The full n8n workflow, from trigger to final output:

n8n Workflow Template: Google Drive to Google Sheets, clean OCR tables

Click to explore

flowchart LR

    subgraph sg0["When clicking ‘Test workflow’ Flow"]
        direction LR
        n1@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
        n9@{ icon: "mdi:swap-vertical", form: "rounded", label: "Document URL", pos: "b", h: 48 }
        n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Image URL", pos: "b", h: 48 }
        n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral DOC OCR1"]
        n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral IMAGE OCR1"]
        n10 --> n12
        n9 --> n11
        n1 --> n9
        n1 --> n10
    end

    subgraph sg1["Flow 2"]
        direction LR
        n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Upload"]
        n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Signed URL"]
        n3@{ icon: "mdi:cog", form: "rounded", label: "Import PDF", pos: "b", h: 48 }
        n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral DOC OCR"]
        n3 --> n0
        n0 --> n2
        n2 --> n7
    end

    subgraph sg2["Flow 3"]
        direction LR
        n4@{ icon: "mdi:cog", form: "rounded", label: "Import Image", pos: "b", h: 48 }
        n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Upload1"]
        n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral Signed URL1"]
        n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Mistral IMAGE OCR"]
        n4 --> n5
        n5 --> n6
        n6 --> n8
    end

    subgraph sg3["Flow 4"]
        direction LR
        n13@{ icon: "mdi:swap-vertical", form: "rounded", label: "Document URL1", pos: "b", h: 48 }
        n14["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Document Understanding"]
        n13 --> n14
    end

    subgraph sg4["Flow 5"]
        direction LR
        n15@{ icon: "mdi:swap-vertical", form: "rounded", label: "Image URL1", pos: "b", h: 48 }
        n16["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Document Mis-Understanding?"]
        n15 --> n16
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n1 trigger
    class n11,n12,n0,n2,n7,n5,n6,n8,n14,n16 api
    classDef customIcon fill:none,stroke:none
    class n11,n12,n0,n2,n7,n5,n6,n8,n14,n16 customIcon

The Problem: PDF Tables Don’t Belong in Spreadsheets

A lot of “business data” still arrives as PDFs or screenshots. Bank statements, supplier invoices, expense receipts, timesheets, compliance docs. The moment you need that data in Google Sheets, the real work starts: download the file, squint at the table, copy cells that don’t paste cleanly, then spend another pass fixing columns that drifted. It’s not only slow. It’s mentally exhausting, and it’s the kind of task that quietly creates errors you discover much later when reports don’t reconcile.

The friction compounds. Here’s where it breaks down in real life:

Copy-paste often merges columns, so you’re forced into manual cleanup before anyone can even use the data.
One skipped line item can throw off a full month of totals, which means rechecking the source document again.
Teams end up saving “final_v7.xlsx” versions because no one trusts the first pass.
When volume increases (month-end, tax season, campaign reporting), the process collapses under its own weight.

The Solution: Google Drive → Mistral OCR → Clean Tables in Sheets

This workflow pulls a PDF or an image from Google Drive, sends it to Mistral OCR, then turns the OCR output into structured, copy-friendly table data you can drop into Google Sheets. It supports two practical paths: a “public URL” method (fast to test) and a more secure approach where the document is uploaded to Mistral’s cloud storage and retrieved via a signed link. That signed link matters when you’re processing sensitive documents like bank statements, because it keeps files from being publicly accessible. The workflow also includes a document understanding option, which lets you ask questions about the document (for example, how many deposits appear) instead of extracting everything.

The workflow starts when you run it in n8n and provide a document or image URL, or it fetches the source file from Google Drive. It then uploads the file (or uses a public link), calls the Mistral OCR API via HTTP Request, and returns OCR content that’s already split by page and converted to markdown. From there, you can map the table content into rows and columns in Google Sheets, and optionally send summaries or exceptions through Gmail.

What You Get: Automation vs. Results

What This Workflow Automates

Results You’ll Get

Pulling a PDF or image straight from Google Drive without manual downloading.
Uploading the document to Mistral Cloud and generating a signed URL for secure OCR requests.
Running Mistral OCR for multi-page PDFs and single images, including page splitting and markdown conversion.
Using document understanding to answer targeted questions from the same document.

Turn a “retype the statement” task into about 10 minutes of review.
Fewer spreadsheet errors because the table structure survives the conversion.
A repeatable pipeline you can run every week or month-end without dread.
Better handling of messy scans, screenshots, and multi-page PDFs than basic copy-paste tools.
More control over data access when you use signed links instead of public URLs.

Example: What This Looks Like

Say you process 20 bank statement pages each week and you need line items in Google Sheets. Manually, it’s often about 10 minutes per page between copying, fixing columns, and double-checking totals, so you lose roughly 3 hours weekly. With this workflow, you fetch the PDF from Google Drive, let Mistral OCR handle the page splitting and conversion, then spend about 20 minutes reviewing and pasting clean tables into Sheets. That’s a couple hours back every week, plus fewer “why doesn’t this reconcile?” moments.

What You’ll Need

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Google Drive to store and fetch PDFs/images.
Google Sheets to store extracted tables as rows.
Mistral API key (get it from your Mistral Cloud dashboard)

Skill level: Intermediate. You’ll connect accounts, add an API key, and do light field mapping for the table output.

Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).

How It Works

A run is triggered. In the base workflow, it starts with a manual run in n8n, which is perfect for testing. In production, many teams adapt this to start when a new file lands in a specific Google Drive folder.

The document is retrieved and prepared. The workflow can fetch a PDF or an image from Google Drive, then read the file content so it’s ready to send to an OCR service. If you already have a public URL, it can use that too.

Mistral OCR does the heavy lifting. The workflow calls Mistral OCR via HTTP Request. You can either send the public URL to OCR directly or upload the file to Mistral Cloud first and generate a signed link (more secure, usually the better move for financial docs).

The output is shaped for spreadsheets. Mistral returns OCR content with page splitting and markdown conversion, which makes tables far easier to normalize into consistent columns for Google Sheets. If you want “answers” instead of raw extraction, the workflow also shows how to run a document query step.

You can easily modify the Google Drive source (folder, naming rules) to match how your team stores documents. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Manual Trigger

Start the workflow with a manual trigger so you can test OCR flows on demand.

Add the Manual Run Trigger node as the workflow trigger.
Connect Manual Run Trigger to both Assign Document URL and Assign Image URL; this creates a parallel start.
Confirm the parallel flow: Manual Run Trigger outputs to both Assign Document URL and Assign Image URL in parallel.

Tip: Use the manual trigger during setup to validate each OCR path before you automate the trigger later.

Step 2: Connect Google Drive Inputs

Configure Google Drive downloads for file-based OCR processing.

Open Fetch PDF from Drive and set Operation to download.
Set File ID to the target PDF file, replacing [YOUR_ID] with an actual ID.
Credential Required: Connect your googleDriveOAuth2Api credentials in Fetch PDF from Drive.
Open Fetch Image from Drive and set Operation to download.
Set File ID to the target image file, replacing [YOUR_ID] with an actual ID.
Credential Required: Connect your googleDriveOAuth2Api credentials in Fetch Image from Drive.

⚠️ Common Pitfall: Using a Drive file ID that doesn’t match the file type (PDF vs PNG) will cause upload errors downstream.

Step 3: Configure Public URL OCR Paths

Set direct URL OCR for public documents and images without uploading to Mistral.

In Assign Document URL, set url to =https://example.com/[YOUR_ID].pdf and replace [YOUR_ID].
In Assign Image URL, set url to =https://example.com/[YOUR_ID].png and replace [YOUR_ID].
Open OCR Public Document and confirm URL is https://api.mistral.ai/v1/ocr with JSON Body set to ={ "model": "mistral-ocr-latest", "document": { "type": "document_url", "document_url": "{{ $json.url }}" } }.
Open OCR Public Image and confirm URL is https://api.mistral.ai/v1/ocr with JSON Body set to ={ "model": "mistral-ocr-latest", "document": { "type": "image_url", "image_url": "{{ $json.url }}" } }.
Credential Required: Connect your mistralCloudApi credentials in OCR Public Document and OCR Public Image.

Step 4: Configure File Upload OCR Paths

Upload files to Mistral, generate signed URLs, and run OCR via API for Drive-sourced files.

In Upload File to Mistral, set URL to https://api.mistral.ai/v1/files, Method to POST, and Content Type to multipart-form-data.
In Upload File to Mistral body parameters, set purpose to ocr and set file to formBinaryData with inputDataFieldName data.
In Generate Signed Link, set URL to =https://api.mistral.ai/v1/files/{{ $json.id }}/url, expiry query parameter to 24, and Accept header to application/json.
Open OCR Document via API and confirm JSON Body is ={ "model": "mistral-ocr-latest", "document": { "type": "document_url", "document_url": "{{ $json.url }}" }, "include_image_base64": true }.
Repeat the same setup for image uploads using Upload Image to Mistral → Create Image Signed Link → OCR Image via API with the same URL patterns and parameters.
Credential Required: Connect your mistralCloudApi credentials in all Mistral HTTP nodes (10+ nodes including Upload File to Mistral, Generate Signed Link, Upload Image to Mistral, Create Image Signed Link, OCR Document via API, and OCR Image via API).

Tip: Ensure the binary property name is data from the Drive download nodes, or Mistral will reject the file upload.

Step 5: Configure Query-Based Analysis for Documents and Images

Send a query along with a document or image URL to Mistral chat completion for contextual answers.

In Set Doc Query Inputs, set url to =https://example.com/[YOUR_ID].pdf and query to what is the total number of deposits?.
In Analyze Document Query, set URL to https://api.mistral.ai/v1/chat/completions and keep the JSON Body as ={ "model": "mistral-small-latest", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "{{ $json.query }}" }, { "type": "document_url", "document_url": "{{ $json.url }}" } ] } ], "document_image_limit": 8, "document_page_limit": 64 }.
In Set Image Query Inputs, set url to =https://example.com/[YOUR_ID].png and query to what is the total number of deposits?.
In Analyze Image Query, set URL to https://api.mistral.ai/v1/chat/completions and keep the JSON Body as ={ "model": "pixtral-large-latest", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "{{ $json.query }}" }, { "type": "image_url", "image_url": "{{ $json.url }}" } ] } ], "document_image_limit": 8, "document_page_limit": 64 }.
Credential Required: Connect your mistralCloudApi credentials in Analyze Document Query and Analyze Image Query.

⚠️ Common Pitfall: The query nodes use public URLs—ensure the URLs are accessible without authentication.

Step 6: Test and Activate Your Workflow

Validate each branch and confirm OCR outputs before activating the workflow.

Click Execute Workflow to run from Manual Run Trigger.
Verify that Assign Document URL and Assign Image URL both run in parallel and trigger their respective OCR nodes.
Check outputs in OCR Public Document, OCR Public Image, OCR Document via API, and OCR Image via API for extracted text.
Confirm Analyze Document Query and Analyze Image Query return a natural language answer to your query.
When results look correct, toggle the workflow to Active to use it in production.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Common Gotchas

Google Drive credentials can expire or need specific permissions. If things break, check the n8n credential connection and Drive file access sharing first.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.

Frequently Asked Questions

How long does it take to set up this Drive Sheets OCR automation?

About 30 minutes if your Drive, Sheets, and Mistral accounts are ready.

Do I need coding skills to automate Drive Sheets OCR?

No. You’ll mostly connect accounts and paste in your Mistral API key. The only “technical” part is mapping OCR output into the Sheet columns you want.

Is n8n free to use for this Drive Sheets OCR workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Mistral OCR usage, which depends on how many pages you process.

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I customize this Drive Sheets OCR workflow for invoice line items instead of bank statements?

Yes, and it’s a common tweak. You can keep the Google Drive fetch steps the same, then adjust the “document query inputs” (the Set nodes) to ask for invoice-specific fields like invoice number, due date, tax, and line items. If you want higher reliability, create separate paths for invoices vs. receipts and route them with an If node based on filename or Drive folder.

Why is my Google Drive connection failing in this workflow?

Usually it’s an expired Google OAuth connection in n8n or the file permissions changed after someone moved folders. Reconnect the Google Drive credential in n8n, then confirm the exact file still exists and that the connected Google account can open it in Drive. If it fails only for certain files, check shared drives and domain restrictions, because those can block API access even when you can “see” the file in the UI.

How many documents can this Drive Sheets OCR automation handle?

A lot, within reason. On n8n Cloud, the practical limit is your monthly execution quota and how you batch documents, while self-hosting mostly depends on your server. Mistral OCR throughput depends on file size and page count, so most teams start with a few dozen documents a day and scale up once they’re happy with the table quality.

Is this Drive Sheets OCR automation better than using Zapier or Make?

Often, yes. This workflow is doing a few things that get awkward in simpler tools: handling binary files, choosing between public URLs vs. signed links, and optionally running a document Q&A step. n8n also makes branching and merging easier when you split PDFs and images into separate paths. Zapier or Make can still work if your use case is “new file → send to OCR → paste result” and you don’t care about secure upload methods. If you’re unsure, Talk to an automation expert and you’ll get a straight recommendation.

This is the kind of workflow you set up once and rely on for months. The tables land in Google Sheets cleanly, and you get your time (and focus) back.

Google Drive to Google Sheets, clean OCR tables

How This Automation Works

n8n Workflow Template: Google Drive to Google Sheets, clean OCR tables

The Problem: PDF Tables Don’t Belong in Spreadsheets