ScrapingBee to Google Sheets, clean rows every run
Scraping competitor pricing into a spreadsheet sounds simple until you’re juggling broken selectors, messy copy-paste, and “why did this row shift again?” moments. You lose time, and worse, you stop trusting the data.
This ScrapingBee Sheets automation hits marketing managers building market trackers hardest, but founders and ops leads feel it too. You get clean, consistent Google Sheets rows every run, so tracking pricing, stock, and specs becomes routine instead of a weekly fire drill.
Below is what the workflow does, what you’ll need, and how to make it fit your use case without getting dragged into XPath or DOM spelunking.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: ScrapingBee to Google Sheets, clean rows every run
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n2@{ icon: "mdi:robot", form: "rounded", label: "Structured Output Parser", pos: "b", h: 48 }
n3@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Split Out", pos: "b", h: 48 }
n5@{ icon: "mdi:database", form: "rounded", label: "Google Sheets - Get list of ..", pos: "b", h: 48 }
n6@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set fields", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>ScrapingBee - Get page scree.."]
n8@{ icon: "mdi:wrench", form: "rounded", label: "HTML-based Scraping Tool", pos: "b", h: 48 }
n9@{ icon: "mdi:database", form: "rounded", label: "Google Sheets - Create Rows", pos: "b", h: 48 }
n10@{ icon: "mdi:robot", form: "rounded", label: "Vision-based Scraping Agent", pos: "b", h: 48 }
n4 --> n9
n6 --> n7
n3 -.-> n10
n8 -.-> n10
n2 -.-> n10
n10 --> n4
n5 --> n6
n7 --> n10
n0 --> n5
end
subgraph sg1["HTML-Scraping Tool Flow"]
direction LR
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>ScrapingBee- Get page HTML"]
n11@{ icon: "mdi:play-circle", form: "rounded", label: "HTML-Scraping Tool", pos: "b", h: 48 }
n12@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set fields - from AI agent q..", pos: "b", h: 48 }
n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/markdown.dark.svg' width='40' height='40' /></div><br/>HTML to Markdown"]
n11 --> n12
n1 --> n13
n12 --> n1
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0,n11 trigger
class n2,n10 ai
class n3 aiModel
class n8 ai
class n5,n9 database
class n7,n1 api
classDef customIcon fill:none,stroke:none
class n7,n1,n13 customIcon
The Problem: Scraped data is unreliable (and takes forever)
If you’ve ever tried to keep a “simple” competitor sheet updated, you know the trap. One site changes its layout and your scraper silently fails. Another renders content differently and you end up staring at blank fields. Then you patch it with a new selector, which works until the next redesign. Meanwhile, the spreadsheet fills with half-parsed text, mismatched columns, and duplicate rows. The result is a weird mix of wasted hours and low confidence, which means decisions get delayed or made on stale numbers.
The friction compounds. Here’s where it usually breaks down.
- You spend about 2 hours a week fixing brittle selectors and one-off scraping scripts.
- Rows land messy, so someone has to clean and reformat before the data is usable.
- Small mistakes slip in (a price in the wrong currency, specs merged into one cell), and the sheet becomes hard to trust.
- Scaling from 20 URLs to 200 URLs turns into a “don’t touch it” project because the process is too fragile.
The Solution: AI-powered scraping that writes tidy Google Sheets rows
This n8n workflow starts by pulling a list of URLs from Google Sheets, then uses ScrapingBee to capture what the page looks like (a full-page screenshot) and, when needed, the underlying HTML. A vision-based AI Agent reads the screenshot and extracts the fields you care about into structured JSON, which keeps the output consistent even when the page layout changes. If the screenshot route isn’t enough, the workflow can fall back to HTML, converts it to Markdown to reduce token usage, and extracts the same structured fields. Finally, it expands the structured output into clean items and appends them as tidy rows in Google Sheets. No manual cleanup. No “why is this in one cell?” nonsense.
The workflow kicks off from a manual trigger in n8n, reads your URL sheet, and maps each URL into a scraping request. ScrapingBee fetches a screenshot first, the AI Agent extracts your target data, and Google Sheets receives normalized rows that are easy to filter, chart, and compare.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you track 50 product pages every Monday and you want price, availability, and a key spec. Manually, even a careful process takes maybe 3 minutes per page between loading, copying, and cleaning, so you’re burning about 2.5 hours weekly. With this workflow, you update the URL list once, click run, and wait for scraping and extraction to finish (often 15–30 minutes depending on volume). Your Google Sheet ends up with tidy rows, ready to chart, share, or export.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Google Sheets to store URLs and save results.
- ScrapingBee for screenshots and HTML retrieval.
- Google Gemini API key (get it from Google AI Studio/Google Cloud credentials).
Skill level: Intermediate. You’ll connect accounts, paste API keys, and adjust the extraction fields you want in the structured output.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
You trigger a run. In the template, it starts with a Manual Start in n8n, which is perfect for weekly refreshes or ad-hoc checks before a campaign launches.
URLs are pulled from Google Sheets. The workflow reads your “URL sheet,” then maps the columns it needs (usually the URL plus any labels like brand, SKU, category, or region).
ScrapingBee captures the page, then AI extracts structured fields. A screenshot is fetched first so the Vision Extraction Agent can read what a human sees. If the screenshot path can’t confidently return the fields, the workflow can fall back to HTML, converts it to Markdown for token efficiency, and extracts from that instead.
Clean rows are appended back into Google Sheets. The structured output parser ensures the AI returns JSON that matches your schema, then the workflow expands the items and appends tidy rows so your sheet stays consistent across runs.
You can easily modify the fields you extract to match your market tracker. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
This workflow starts manually so you can test and validate the extraction flow before production use.
- Add and keep Manual Start Trigger as the first node in the workflow.
- Connect Manual Start Trigger to Retrieve URL Sheet to begin the data flow.
Step 2: Connect Google Sheets
Pull input URLs from Google Sheets and append extracted results back to a results sheet.
- Open Retrieve URL Sheet and set Authentication to
serviceAccount. - Credential Required: Connect your
googleApicredentials to Retrieve URL Sheet. - Choose the Document named
Google Sheets - Workflow Vision-Based Scrapingand SheetList of URLs. - Open Append Sheet Rows and confirm Operation is set to
appendand Authentication toserviceAccount. - Credential Required: Connect your
googleApicredentials to Append Sheet Rows. - Verify the column mappings in Append Sheet Rows, including
promo→{{$json.promo}}andcategory→{{$('Map URL Fields').item.json.url}}.
Step 3: Prepare URL and Query Mapping
Map incoming fields into a consistent format for the scraping and fallback tool flows.
- In Map URL Fields, set the url assignment to
{{$json.url}}. - Confirm the connection flow: Retrieve URL Sheet → Map URL Fields → ScrapingBee Screenshot Fetch.
- In Map Query Fields (used by the fallback tool), set url to
{{$json.query}}.
Step 4: Configure ScrapingBee Requests
Capture both screenshots and HTML for AI extraction, with an HTML fallback path if the image is insufficient.
- In ScrapingBee Screenshot Fetch, set the URL to
https://app.scrapingbee.com/api/v1and keep Send Query enabled. - Under query parameters, set api_key to your key (replace
[CONFIGURE_YOUR_API_KEY]) and url to{{$json.url}}. - Add screenshot_full_page with value
trueand keep the User-Agent header as provided. - In ScrapingBee Fetch HTML, set the URL to
https://app.scrapingbee.com/api/v1and add query parameters api_key (your key) and url set to{{$json.url}}. - Ensure ScrapingBee Fetch HTML connects to HTML Convert Markdown, which uses HTML set to
{{$json.data}}.
[CONFIGURE_YOUR_API_KEY]. Replace these placeholders with your real ScrapingBee API key or requests will fail.Step 5: Set Up the Vision Extraction Agent
Configure the AI extraction logic, the language model, and structured parser to return consistent product data.
- Open Vision Extraction Agent and keep Text set to
=Here is the screenshot you need to use to extract data about the page: {{ $json.url }}. - Confirm Vision Extraction Agent has Has Output Parser enabled and Passthrough Binary Images set to
true. - Connect Gemini Chat Model as the language model with Model Name set to
models/gemini-1.5-pro-latest. - Credential Required: Connect your
googlePalmApicredentials to Gemini Chat Model. - Attach Structured Data Parser to enforce the JSON schema shown in its JSON Schema Example.
- Attach HTML Fallback Tool as a tool to Vision Extraction Agent for HTML-based extraction when images are insufficient.
Step 6: Configure the Output Flow
Split the AI output into individual items and append each record to your results sheet.
- In Expand Output Items, set Field To Split Out to
output. - Verify the execution flow: Vision Extraction Agent → Expand Output Items → Append Sheet Rows.
- In Append Sheet Rows, confirm the mappings for
product_url→{{$json.product_title}}andpromo_percent→{{$json.promo_percentage}}.
Step 7: Test and Activate Your Workflow
Run a manual test to verify extraction output and sheet updates before enabling automated use.
- Click Execute Workflow on Manual Start Trigger to run the workflow end-to-end.
- Confirm that Retrieve URL Sheet reads URLs and ScrapingBee Screenshot Fetch returns screenshot data.
- Check that Vision Extraction Agent outputs structured items and Expand Output Items splits them correctly.
- Verify new rows appear in the Append Sheet Rows destination sheet with populated fields.
- When satisfied, toggle the workflow to Active for production use.
Common Gotchas
- Google Sheets credentials can expire or lack access to the target spreadsheet. If appending fails, check the n8n credential connection and confirm the sheet is shared with the right Google account.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- ScrapingBee can return incomplete pages if the target site blocks bots or requires heavy client-side rendering. Check the ScrapingBee response (status code and screenshot) first, then adjust ScrapingBee options or slow batch size to avoid rate limits.
Frequently Asked Questions
About 30–60 minutes if your APIs are ready.
No. You’ll mainly connect accounts, add your API keys, and tweak the fields you want extracted.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in ScrapingBee and Gemini API usage costs.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s the point of the template. You’ll adjust the schema in the Structured Data Parser and the prompt/instructions used by the Vision Extraction Agent so the JSON fields match what you track (price, currency, availability, SKU, shipping, and so on). If a site is hard for vision extraction, you can lean more on the HTML fallback path by tuning what gets passed into the HTML Convert Markdown step. You can also add extra columns in your Google Sheet (like “brand” or “category”) and map them in the “Map URL Fields” node so those labels are carried into every output row.
Most of the time it’s an invalid or expired API key, or the target site is blocking automated traffic. Check the ScrapingBee HTTP response and screenshot output in n8n first, because it will usually show you what the scraper actually received. If you see rate limiting, reduce how many URLs you process per run or add a longer wait between batches.
Hundreds per run is common, but it depends on your ScrapingBee limits, AI usage costs, and how aggressively you batch requests.
Often, yes. Zapier and Make can call APIs, but this workflow leans on more involved logic: screenshot-first extraction, structured parsing, fallback to HTML, and item expansion into clean rows. n8n handles branching and heavier workflows without forcing you into expensive task pricing for every conditional path. If you only need a simple “call endpoint → append row” flow, Zapier or Make can be quicker. If you want this level of reliability, n8n is usually the smoother route. Talk to an automation expert if you want help deciding.
Once this is running, your sheet stops being a “cleanup project” and starts being a dependable system. Honestly, that peace of mind is the whole win.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.