ScrapingBee to Google Sheets, clean rows every run

Scraping competitor pricing into a spreadsheet sounds simple until you’re juggling broken selectors, messy copy-paste, and “why did this row shift again?” moments. You lose time, and worse, you stop trusting the data.

This ScrapingBee Sheets automation hits marketing managers building market trackers hardest, but founders and ops leads feel it too. You get clean, consistent Google Sheets rows every run, so tracking pricing, stock, and specs becomes routine instead of a weekly fire drill.

Below is what the workflow does, what you’ll need, and how to make it fit your use case without getting dragged into XPath or DOM spelunking.

How This Automation Works

The full n8n workflow, from trigger to final output:

n8n Workflow Template: ScrapingBee to Google Sheets, clean rows every run

Click to explore

flowchart LR

    subgraph sg0["When clicking ‘Test workflow’ Flow"]
        direction LR
        n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
        n2@{ icon: "mdi:robot", form: "rounded", label: "Structured Output Parser", pos: "b", h: 48 }
        n3@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
        n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Split Out", pos: "b", h: 48 }
        n5@{ icon: "mdi:database", form: "rounded", label: "Google Sheets - Get list of ..", pos: "b", h: 48 }
        n6@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set fields", pos: "b", h: 48 }
        n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>ScrapingBee - Get page scree.."]
        n8@{ icon: "mdi:wrench", form: "rounded", label: "HTML-based Scraping Tool", pos: "b", h: 48 }
        n9@{ icon: "mdi:database", form: "rounded", label: "Google Sheets - Create Rows", pos: "b", h: 48 }
        n10@{ icon: "mdi:robot", form: "rounded", label: "Vision-based Scraping Agent", pos: "b", h: 48 }
        n4 --> n9
        n6 --> n7
        n3 -.-> n10
        n8 -.-> n10
        n2 -.-> n10
        n10 --> n4
        n5 --> n6
        n7 --> n10
        n0 --> n5
    end

    subgraph sg1["HTML-Scraping Tool Flow"]
        direction LR
        n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>ScrapingBee- Get page HTML"]
        n11@{ icon: "mdi:play-circle", form: "rounded", label: "HTML-Scraping Tool", pos: "b", h: 48 }
        n12@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set fields - from AI agent q..", pos: "b", h: 48 }
        n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/markdown.dark.svg' width='40' height='40' /></div><br/>HTML to Markdown"]
        n11 --> n12
        n1 --> n13
        n12 --> n1
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n0,n11 trigger
    class n2,n10 ai
    class n3 aiModel
    class n8 ai
    class n5,n9 database
    class n7,n1 api
    classDef customIcon fill:none,stroke:none
    class n7,n1,n13 customIcon

The Problem: Scraped data is unreliable (and takes forever)

If you’ve ever tried to keep a “simple” competitor sheet updated, you know the trap. One site changes its layout and your scraper silently fails. Another renders content differently and you end up staring at blank fields. Then you patch it with a new selector, which works until the next redesign. Meanwhile, the spreadsheet fills with half-parsed text, mismatched columns, and duplicate rows. The result is a weird mix of wasted hours and low confidence, which means decisions get delayed or made on stale numbers.

The friction compounds. Here’s where it usually breaks down.

You spend about 2 hours a week fixing brittle selectors and one-off scraping scripts.
Rows land messy, so someone has to clean and reformat before the data is usable.
Small mistakes slip in (a price in the wrong currency, specs merged into one cell), and the sheet becomes hard to trust.
Scaling from 20 URLs to 200 URLs turns into a “don’t touch it” project because the process is too fragile.

The Solution: AI-powered scraping that writes tidy Google Sheets rows

This n8n workflow starts by pulling a list of URLs from Google Sheets, then uses ScrapingBee to capture what the page looks like (a full-page screenshot) and, when needed, the underlying HTML. A vision-based AI Agent reads the screenshot and extracts the fields you care about into structured JSON, which keeps the output consistent even when the page layout changes. If the screenshot route isn’t enough, the workflow can fall back to HTML, converts it to Markdown to reduce token usage, and extracts the same structured fields. Finally, it expands the structured output into clean items and appends them as tidy rows in Google Sheets. No manual cleanup. No “why is this in one cell?” nonsense.

The workflow kicks off from a manual trigger in n8n, reads your URL sheet, and maps each URL into a scraping request. ScrapingBee fetches a screenshot first, the AI Agent extracts your target data, and Google Sheets receives normalized rows that are easy to filter, chart, and compare.

What You Get: Automation vs. Results

What This Workflow Automates

Results You’ll Get

It pulls a URL list from Google Sheets and prepares each scrape automatically.
ScrapingBee captures full-page screenshots and can also fetch HTML for backup extraction.
An AI Agent turns messy page content into structured JSON using a consistent schema.
The workflow expands JSON into items and appends clean rows back to Google Sheets every run.

Most teams get back about 2–4 hours per week by removing manual scraping and cleanup.
Your sheet stays “analysis-ready,” so reporting becomes a quick filter and pivot job.
Fewer silent failures when page layouts shift, because extraction relies on what’s visible.
Cleaner history over time, which makes trend tracking (pricing changes, stock status) actually reliable.
More confidence in decisions, since everyone is looking at the same normalized fields.

Example: What This Looks Like

Say you track 50 product pages every Monday and you want price, availability, and a key spec. Manually, even a careful process takes maybe 3 minutes per page between loading, copying, and cleaning, so you’re burning about 2.5 hours weekly. With this workflow, you update the URL list once, click run, and wait for scraping and extraction to finish (often 15–30 minutes depending on volume). Your Google Sheet ends up with tidy rows, ready to chart, share, or export.

What You’ll Need

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Google Sheets to store URLs and save results.
ScrapingBee for screenshots and HTML retrieval.
Google Gemini API key (get it from Google AI Studio/Google Cloud credentials).

Skill level: Intermediate. You’ll connect accounts, paste API keys, and adjust the extraction fields you want in the structured output.

Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).

How It Works

You trigger a run. In the template, it starts with a Manual Start in n8n, which is perfect for weekly refreshes or ad-hoc checks before a campaign launches.

URLs are pulled from Google Sheets. The workflow reads your “URL sheet,” then maps the columns it needs (usually the URL plus any labels like brand, SKU, category, or region).

ScrapingBee captures the page, then AI extracts structured fields. A screenshot is fetched first so the Vision Extraction Agent can read what a human sees. If the screenshot path can’t confidently return the fields, the workflow can fall back to HTML, converts it to Markdown for token efficiency, and extracts from that instead.

Clean rows are appended back into Google Sheets. The structured output parser ensures the AI returns JSON that matches your schema, then the workflow expands the items and appends tidy rows so your sheet stays consistent across runs.

You can easily modify the fields you extract to match your market tracker. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Manual Trigger

This workflow starts manually so you can test and validate the extraction flow before production use.

Add and keep Manual Start Trigger as the first node in the workflow.
Connect Manual Start Trigger to Retrieve URL Sheet to begin the data flow.

Step 2: Connect Google Sheets

Pull input URLs from Google Sheets and append extracted results back to a results sheet.

Open Retrieve URL Sheet and set Authentication to serviceAccount.
Credential Required: Connect your googleApi credentials to Retrieve URL Sheet.
Choose the Document named Google Sheets - Workflow Vision-Based Scraping and Sheet List of URLs.
Open Append Sheet Rows and confirm Operation is set to append and Authentication to serviceAccount.
Credential Required: Connect your googleApi credentials to Append Sheet Rows.
Verify the column mappings in Append Sheet Rows, including promo → {{$json.promo}} and category → {{$('Map URL Fields').item.json.url}}.

Step 3: Prepare URL and Query Mapping

Map incoming fields into a consistent format for the scraping and fallback tool flows.

In Map URL Fields, set the url assignment to {{$json.url}}.
Confirm the connection flow: Retrieve URL Sheet → Map URL Fields → ScrapingBee Screenshot Fetch.
In Map Query Fields (used by the fallback tool), set url to {{$json.query}}.

Step 4: Configure ScrapingBee Requests

Capture both screenshots and HTML for AI extraction, with an HTML fallback path if the image is insufficient.

In ScrapingBee Screenshot Fetch, set the URL to https://app.scrapingbee.com/api/v1 and keep Send Query enabled.
Under query parameters, set api_key to your key (replace [CONFIGURE_YOUR_API_KEY]) and url to {{$json.url}}.
Add screenshot_full_page with value true and keep the User-Agent header as provided.
In ScrapingBee Fetch HTML, set the URL to https://app.scrapingbee.com/api/v1 and add query parameters api_key (your key) and url set to {{$json.url}}.
Ensure ScrapingBee Fetch HTML connects to HTML Convert Markdown, which uses HTML set to {{$json.data}}.

⚠️ Common Pitfall: Both ScrapingBee Screenshot Fetch and ScrapingBee Fetch HTML include [CONFIGURE_YOUR_API_KEY]. Replace these placeholders with your real ScrapingBee API key or requests will fail.

Step 5: Set Up the Vision Extraction Agent

Configure the AI extraction logic, the language model, and structured parser to return consistent product data.

Open Vision Extraction Agent and keep Text set to =Here is the screenshot you need to use to extract data about the page: {{ $json.url }}.
Confirm Vision Extraction Agent has Has Output Parser enabled and Passthrough Binary Images set to true.
Connect Gemini Chat Model as the language model with Model Name set to models/gemini-1.5-pro-latest.
Credential Required: Connect your googlePalmApi credentials to Gemini Chat Model.
Attach Structured Data Parser to enforce the JSON schema shown in its JSON Schema Example.
Attach HTML Fallback Tool as a tool to Vision Extraction Agent for HTML-based extraction when images are insufficient.

For AI sub-nodes like Structured Data Parser and HTML Fallback Tool, credentials should be added to the parent node where required (for example, the language model credentials belong to Gemini Chat Model).

Step 6: Configure the Output Flow

Split the AI output into individual items and append each record to your results sheet.

In Expand Output Items, set Field To Split Out to output.
Verify the execution flow: Vision Extraction Agent → Expand Output Items → Append Sheet Rows.
In Append Sheet Rows, confirm the mappings for product_url → {{$json.product_title}} and promo_percent → {{$json.promo_percentage}}.

Step 7: Test and Activate Your Workflow

Run a manual test to verify extraction output and sheet updates before enabling automated use.

Click Execute Workflow on Manual Start Trigger to run the workflow end-to-end.
Confirm that Retrieve URL Sheet reads URLs and ScrapingBee Screenshot Fetch returns screenshot data.
Check that Vision Extraction Agent outputs structured items and Expand Output Items splits them correctly.
Verify new rows appear in the Append Sheet Rows destination sheet with populated fields.
When satisfied, toggle the workflow to Active for production use.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Common Gotchas

Google Sheets credentials can expire or lack access to the target spreadsheet. If appending fails, check the n8n credential connection and confirm the sheet is shared with the right Google account.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
ScrapingBee can return incomplete pages if the target site blocks bots or requires heavy client-side rendering. Check the ScrapingBee response (status code and screenshot) first, then adjust ScrapingBee options or slow batch size to avoid rate limits.

Frequently Asked Questions

How long does it take to set up this ScrapingBee Sheets automation automation?

About 30–60 minutes if your APIs are ready.

Do I need coding skills to automate ScrapingBee Sheets automation?

No. You’ll mainly connect accounts, add your API keys, and tweak the fields you want extracted.

Is n8n free to use for this ScrapingBee Sheets automation workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in ScrapingBee and Gemini API usage costs.

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I customize this ScrapingBee Sheets automation workflow for e-commerce price and stock tracking?

Yes, and it’s the point of the template. You’ll adjust the schema in the Structured Data Parser and the prompt/instructions used by the Vision Extraction Agent so the JSON fields match what you track (price, currency, availability, SKU, shipping, and so on). If a site is hard for vision extraction, you can lean more on the HTML fallback path by tuning what gets passed into the HTML Convert Markdown step. You can also add extra columns in your Google Sheet (like “brand” or “category”) and map them in the “Map URL Fields” node so those labels are carried into every output row.

Why is my ScrapingBee connection failing in this workflow?

Most of the time it’s an invalid or expired API key, or the target site is blocking automated traffic. Check the ScrapingBee HTTP response and screenshot output in n8n first, because it will usually show you what the scraper actually received. If you see rate limiting, reduce how many URLs you process per run or add a longer wait between batches.

How many URLs can this ScrapingBee Sheets automation automation handle?

Hundreds per run is common, but it depends on your ScrapingBee limits, AI usage costs, and how aggressively you batch requests.

Is this ScrapingBee Sheets automation automation better than using Zapier or Make?

Often, yes. Zapier and Make can call APIs, but this workflow leans on more involved logic: screenshot-first extraction, structured parsing, fallback to HTML, and item expansion into clean rows. n8n handles branching and heavier workflows without forcing you into expensive task pricing for every conditional path. If you only need a simple “call endpoint → append row” flow, Zapier or Make can be quicker. If you want this level of reliability, n8n is usually the smoother route. Talk to an automation expert if you want help deciding.

Once this is running, your sheet stops being a “cleanup project” and starts being a dependable system. Honestly, that peace of mind is the whole win.

ScrapingBee to Google Sheets, clean rows every run

How This Automation Works

n8n Workflow Template: ScrapingBee to Google Sheets, clean rows every run

The Problem: Scraped data is unreliable (and takes forever)