Google Gemini + Google Sheets for OCR data capture
Retyping receipts, invoices, and screenshot “proof” into a spreadsheet is the kind of work that quietly wrecks your day. One wrong digit, one missed field, and suddenly your totals don’t match and you’re hunting through photos in Slack.
Operations managers feel it when expenses pile up. Marketing teams run into it with sponsorship screenshots and ad receipts. And if you run a small business, Gemini Sheets OCR is the difference between tidy records and a constant low-grade mess.
This workflow gives you a simple API endpoint in n8n: send an image URL, get back structured JSON, then log it into Google Sheets. You’ll see how it works, what you need, and where teams usually trip up.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Google Gemini + Google Sheets for OCR data capture
flowchart LR
subgraph sg0["Flow 1"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/webhook.dark.svg' width='40' height='40' /></div><br/>Webhook"]
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/webhook.dark.svg' width='40' height='40' /></div><br/>Respond to Webhook"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Get image from URL"]
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Call Gemini API (Flash Lite).."]
n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Edit fields to output requir..", pos: "b", h: 48 }
n5@{ icon: "mdi:cog", form: "rounded", label: "Transform image to base64", pos: "b", h: 48 }
n0 --> n2
n2 --> n5
n5 --> n3
n3 --> n4
n4 --> n1
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0,n1,n2,n3 api
classDef customIcon fill:none,stroke:none
class n0,n1,n2,n3 customIcon
The Problem: Manual OCR data entry is slow and error-prone
Most “document capture” starts the same way: someone snaps a photo, drops it in a folder, and later another person squints at it while typing numbers into a sheet. It’s not just slow. It’s distracting, because you’re constantly switching context between images, tabs, and whatever naming convention your team used that week. Then come the follow-ups. A receipt photo is blurry, an invoice has two totals, or the screenshot is missing a date. Multiply that by 20 documents and you’ve burned about 2 hours without producing anything that feels like real work.
The friction compounds. Here’s where it breaks down.
- You end up retyping the same fields (vendor, total, tax, date) across different tools because nothing is standardized.
- Small mistakes slip through, which means month-end cleanup becomes a mini audit.
- Photos and screenshots rarely follow a consistent format, so simple templates don’t hold up.
- Even when a teammate “helps,” you still have to verify everything because you can’t trust the inputs.
The Solution: An n8n OCR endpoint that turns images into rows
This n8n workflow creates a ready-to-use webhook endpoint you can call from anything that can send an HTTP request. You pass in an image URL (a receipt photo, a scanned ID, a screenshot, even a document image stored in Drive). The workflow fetches that image, converts it into base64 so it can be processed reliably, and sends it to Google Gemini (Flash Lite) for AI-powered OCR. Gemini returns the extracted text, but the workflow doesn’t stop there. It maps the output into clean, predictable fields, then responds with structured JSON that contains only what you asked for. From there, logging those fields into Google Sheets is straightforward, so your data becomes searchable and usable instead of stuck in image form.
The workflow starts with an incoming webhook. Next, it downloads the image and prepares it for OCR. Finally, Gemini extracts the requested fields and n8n returns a neat JSON payload you can write into Google Sheets (or any system that accepts structured data).
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you process 30 receipts a week and it takes about 10 minutes each to open the image, find the total, type details, and double-check. That’s about 5 hours weekly, and it’s usually broken into annoying little chunks. With this workflow, someone pastes the image URL once (about 1 minute), the OCR runs in the background (often under a minute), and the structured fields are ready to write to Google Sheets. You still spot-check a few, but you’re no longer doing full manual entry.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Google Gemini API for AI OCR on images
- Google Sheets to store extracted fields as rows
- Gemini API key (get it from Google AI Studio)
Skill level: Beginner. You’ll paste credentials, set a few fields to extract, and test the endpoint with a sample image URL.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A webhook receives your request. You call the n8n endpoint with an image URL and the extraction parameters (the fields you want back). This can come from a form, a backend system, or even another workflow.
The image is fetched and prepared. n8n downloads the file from the URL, then converts it into base64 so the OCR request is consistent even when file hosting is inconsistent. This is the “make it reliable” step most DIY scripts skip.
Gemini extracts only the fields you asked for. The workflow sends the prepared image to Gemini Vision (Flash Lite) through an HTTP request. Instead of returning a wall of text, the response is shaped into structured data you can actually use.
A clean JSON response is returned (and can be logged to Sheets). The workflow maps the output into named fields, then responds to the original request with JSON. From there you can insert the values into Google Sheets, push them into a CRM, or route them for approval.
You can easily modify the extraction fields to match your documents, then map the output to different Google Sheets columns based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Webhook Trigger
Set up the inbound endpoint so external systems can send an image URL and parsing requirements.
- Add the Incoming Webhook Trigger node.
- Set Path to
data-extractor. - Set Response Mode to
responseNodeso Return Webhook Response handles the reply. - Optionally keep Flowpast Branding as a visual reference note (no configuration required).
⚠️ Common Pitfall: The incoming payload must include body.image_url, body.Requirement, and body.properties or later nodes will fail.
Step 2: Connect the Image Source
Download the image from the URL provided in the webhook payload.
- Add the Fetch Image via URL node.
- Set URL to
={{ $json.body.image_url }}. - Connect Incoming Webhook Trigger → Fetch Image via URL.
Step 3: Convert and Analyze the Image with Gemini
Convert the image to Base64 and send it to Gemini Vision with your dynamic schema instructions.
- Add the Convert Image to Base64 node and set Operation to
binaryToPropery. - Set Destination Key to
data1and keep Encoding asascii. - Add the Gemini Vision Request node and set URL to
=https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-lite:generateContent. - Set Method to
POST, Send Body totrue, and Body Content Type tojson. - Paste the full JSON Body expression:
={ "contents": [ { "role": "user", "parts": [ { "inlineData": { "data": "{{$json.data1}}", "mimeType": "image/jpeg" } } ] }, { "role": "user", "parts": [ { "text": "check this" } ] } ], "systemInstruction": { "role": "user", "parts": [ { "text": "{{ $('Incoming Webhook Trigger').first().json.body.Requirement}}" } ] }, "generationConfig": { "temperature": 1, "topK": 40, "topP": 0.95, "maxOutputTokens": 8192, "responseMimeType": "application/json", "responseSchema": { "type": "object", "properties": {{ $('Incoming Webhook Trigger').first().json.body.properties.toJsonString()}} } }}. - Credential Required: Connect your googlePalmApi credentials in Gemini Vision Request.
body.properties to control Gemini’s structured output.Step 4: Map the Output and Respond
Extract the JSON content from Gemini and return it to the caller.
- In Map Output Fields, add an assignment with Name
resultand Value={{ $json.candidates[0].content.parts[0].text.parseJson()}}. - Connect Gemini Vision Request → Map Output Fields → Return Webhook Response.
- Keep Return Webhook Response defaults to send the final JSON to the requester.
Step 5: Test and Activate Your Workflow
Validate the request/response flow and activate the endpoint for production use.
- Click Test Workflow and send a POST request to the Incoming Webhook Trigger URL with
body.image_url,body.Requirement, andbody.properties. - Confirm that Map Output Fields returns a parsed JSON object under
result. - Verify Return Webhook Response sends the structured JSON back to the caller.
- Toggle the workflow to Active to enable production requests.
Common Gotchas
- Google Gemini API credentials can expire or be restricted by project settings. If things break, check your Google AI Studio key and any API restrictions in Google Cloud first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if your Gemini key is ready.
No. You’ll paste in credentials and edit a few fields you want extracted. Testing is usually just calling the webhook with a sample image URL.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Gemini API usage, which depends on how many images you process.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. You’ll adjust the fields you request and the mapping in the “Map Output Fields” step so receipts pull items like total and tax, while invoices can include invoice number, due date, and vendor address. If your input is sometimes a Google Drive link, you can also swap the “Fetch Image via URL” part to read files from Drive first. The core stays the same: fetch, convert, extract, map, return.
Usually it’s an API key issue or a restriction on the key. Regenerate the Gemini key in Google AI Studio, then update it in the Gemini Vision HTTP request in n8n. Also check that the image URL is publicly accessible; OCR calls fail fast when the file can’t be fetched.
A lot, as long as your n8n plan and Gemini quota can keep up. On n8n Cloud you’re mainly limited by monthly executions, while self-hosting removes that cap and shifts the limit to your server and Gemini usage limits. In practice, most small teams run batches of a few dozen images at a time without thinking about it, then scale from there.
Often, yes, because you can keep the logic in one place and shape the JSON exactly how your spreadsheet needs it. n8n also makes it easier to self-host, which matters if you’re processing a lot of documents. Zapier or Make can still be fine for a simple “image in, row out” setup, but costs climb once you add branching, retries, and field mapping. Another factor is data control: some teams prefer keeping OCR processing on their own instance. If you’re unsure, Talk to an automation expert and describe your volume and document types.
Once this is running, images stop being “dead data.” The workflow handles the repetitive cleanup so your Google Sheets records stay usable, week after week.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.