Scrape.do + Google Sheets, clean Amazon rows fast

Trying to compare Amazon products in a spreadsheet sounds simple. Then you spend your afternoon opening tabs, hunting for price and rating changes, and pasting messy snippets that never line up.

This Amazon rows automation hits marketers running competitor research first. But e-commerce operators tracking pricing and analysts building datasets feel the same drag. You end up with half-finished sheets and numbers you don’t fully trust.

This workflow pulls URLs from Google Sheets, scrapes reliably with Scrape.do, and writes back clean rows (name, price, rating, reviews, description). You’ll see how it works, what you need, and where teams usually get stuck.

How This Automation Works

See how this solves the problem:

n8n Workflow Template: Scrape.do + Google Sheets, clean Amazon rows fast

Click to explore

flowchart LR

    subgraph sg0["When clicking Test workflow Flow"]
        direction LR
        n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking Test workflow", pos: "b", h: 48 }
        n1@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model", pos: "b", h: 48 }
        n2@{ icon: "mdi:robot", form: "rounded", label: "Structured Output Parser", pos: "b", h: 48 }
        n3@{ icon: "mdi:database", form: "rounded", label: "1. Get Product URLs from Goo..", pos: "b", h: 48 }
        n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "2. Loop Through Each URL", pos: "b", h: 48 }
        n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>3. Scrape Product Page HTML"]
        n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/html.dark.svg' width='40' height='40' /></div><br/>4. Extract Raw Data Elements"]
        n7@{ icon: "mdi:robot", form: "rounded", label: "5. Clean & Structure Data wi..", pos: "b", h: 48 }
        n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "6. Format Final JSON Output", pos: "b", h: 48 }
        n9@{ icon: "mdi:database", form: "rounded", label: "7. Save Product Data to Goog..", pos: "b", h: 48 }
        n1 -.-> n7
        n4 --> n5
        n2 -.-> n7
        n5 --> n6
        n8 --> n9
        n0 --> n3
        n6 --> n7
        n7 --> n8
        n9 --> n4
        n3 --> n4
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n0 trigger
    class n2,n7 ai
    class n1 aiModel
    class n3,n9 database
    class n5 api
    classDef customIcon fill:none,stroke:none
    class n5,n6 customIcon

The Challenge: Amazon research turns into copy-paste chaos

Amazon pages are great for shoppers and awful for spreadsheets. One product shows price in a clean spot, another hides it behind a variant selector, and suddenly your “quick comparison” has five browser windows and a sticky note of guesses. Even when you do get the right numbers, you still have to normalize them into columns so you can sort, filter, and actually make a decision. And the worst part is the staleness: you do all that work, then prices change tomorrow and your sheet quietly becomes wrong.

It adds up fast. Here’s where it usually breaks down.

Manually checking 20 products can burn about 2 hours, and you still miss things like review count changes.
Different page layouts lead to inconsistent rows, so comparisons stop being apples-to-apples.
Basic scrapers often get blocked or return partial HTML, which means you waste time debugging instead of researching.
When you need weekly refreshes, the work becomes a recurring chore that never stays “done.”

The Fix: scrape Amazon URLs from Sheets and write back clean rows

This workflow turns your Google Sheet into the control center for Amazon product research. You keep a simple list of product URLs in an “input” tab, then run the automation when you want fresh data. n8n reads those URLs, processes them in batches, and sends each one to Scrape.do through an HTTP request so you get the page HTML reliably (even when Amazon tries to block automated traffic). Next, the workflow cleans the HTML and pulls out the pieces that matter. Finally, an OpenAI-powered extraction step verifies and structures the fields, so your output stays consistent even when Amazon’s layout changes. The results get appended back into a “results” tab in Google Sheets, ready to sort and compare.

The workflow starts with a manual launch trigger, then reads product links from Google Sheets. Scrape.do fetches each page, AI turns messy HTML into predictable fields, and Google Sheets receives a neat row per product. No tab juggling.

What Changes: Before vs. After

What This Eliminates

Impact You’ll See

Opening every Amazon URL in separate tabs to hunt for fields.
Copying prices, ratings, and review counts by hand into columns.
Reformatting messy text because page layouts vary across listings.
Retrying scrapes because you got blocked or received incomplete HTML.

Most teams go from “half a day of cleanup” to about 10 minutes of setup and a run.
Your sheet stays sortable because every product follows the same schema.
You can refresh the same list weekly without dreading it.
Fewer silent errors, because AI verifies and normalizes what was extracted.
Cleaner handoffs to teammates, clients, or downstream reporting.

Real-World Impact

Say you track 30 competitor products each week. Manually, you’ll spend maybe 5 minutes per product opening the page, finding price, rating, and review count, then formatting the row, which is about 2.5 hours total. With this workflow, you paste the 30 URLs into Google Sheets and run it: a minute to start, then it processes in batches and writes structured rows back automatically. You get the same dataset without the repetitive work.

Requirements

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Google Sheets for the input list and results table.
Scrape.do to fetch Amazon HTML reliably.
Scrape.do API token (get it from your Scrape.do dashboard).
OpenAI or OpenRouter API key (get it from your provider’s API keys page).

Skill level: Intermediate. You’ll paste credentials, set sheet IDs/tab names, and map a few columns.

Need help implementing this? Talk to an automation expert (free 15-minute consultation).

The Workflow Flow

You start the run manually. In n8n, the Manual Launch Trigger kicks off the workflow when you want a refresh, which is perfect for weekly research or before a pricing decision.

Google Sheets provides the URL queue. The workflow reads your tracking tab (the one holding Amazon product links) and prepares those URLs for batch processing so you don’t overwhelm anything.

Scrape.do fetches the product HTML. n8n sends each URL through an HTTP Request node using your Scrape.do token, then strips out irrelevant scripts and markup so the next step has cleaner input.

AI turns messy pages into consistent columns. The OpenAI Chat Model plus a structured output parser extracts name, price, rating, review count, and a usable description, then formats everything into predictable JSON fields.

Google Sheets gets a clean row per product. The final append step writes results into your results tab, so you can filter by rating, sort by price, or export to Excel if that’s your reporting flow.

You can easily modify the extracted fields to include things like brand, ASIN, or bullet features based on your needs. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Manual Trigger

Start the workflow manually so you can validate each step before running at scale.

Add a Manual Launch Trigger node as the workflow trigger.
Connect Manual Launch Trigger to Retrieve Product Links to start the data pipeline.

Step 2: Connect Google Sheets

Pull product URLs from a Google Sheet and prepare a destination sheet for structured outputs.

Open Retrieve Product Links and set Document to [YOUR_ID] (example: Amazon Product List).
Set Sheet to [YOUR_ID] (example: Sheet1) in Retrieve Product Links.
Credential Required: Connect your googleSheetsOAuth2Api credentials in Retrieve Product Links.
Open Append to Sheets and set Operation to append.
Set Document to [YOUR_ID] and Sheet to [YOUR_ID] (example: Sheet2) in Append to Sheets.
Credential Required: Connect your googleSheetsOAuth2Api credentials in Append to Sheets.

Tip: Ensure your source sheet includes a column named url so Fetch Page HTML can use {{$json.url}}.

Step 3: Batch and Scrape Product Pages

Split URLs into batches and request each page’s HTML using a scraping API.

Connect Retrieve Product Links to Iterate URL Batches to enable batch processing.
Connect Iterate URL Batches to Fetch Page HTML so each URL is scraped.
In Fetch Page HTML, set URL to =https://api.scrape.do/?token={{$vars.SCRAPEDO_TOKEN}}&url={{ encodeURIComponent($json.url) }}&geoCode=us&render=false.
Keep Options → Timeout at 60000 to avoid premature timeouts on slow pages.

⚠️ Common Pitfall: The expression uses $vars.SCRAPEDO_TOKEN. Make sure you’ve defined this variable in n8n or the HTTP request will fail.

Step 4: Extract Raw Fields from HTML

Parse product details from the HTML response using CSS selectors.

Connect Fetch Page HTML to Extract Raw Fields.
Set Operation in Extract Raw Fields to extractHtmlContent.
Review the extraction keys and selectors, such as productTitle with #productTitle, h1[data-automation-id="product-title"], .product-title and price with .a-price .a-offscreen, .a-price-whole, .a-price-fraction, .priceToPay .a-price .a-offscreen.

Step 5: Set Up AI Data Structuring

Use an LLM to transform raw scraped fields into a clean, structured JSON output.

Connect Extract Raw Fields to AI Data Structuring.
In AI Data Structuring, set Text to ={{ JSON.stringify($json, null, 2) }}.
Ensure Has Output Parser is enabled in AI Data Structuring.
Connect OpenAI Chat Engine to AI Data Structuring as the language model.
In OpenAI Chat Engine, set Model to gpt-4o-mini, Max Tokens to 500, Temperature to 0, and Response Format to json_object.
Connect Structured Result Parser to AI Data Structuring as the output parser and keep the schema as provided.

Credential Required: Connect your OpenAI credentials in OpenAI Chat Engine. Structured Result Parser is a sub-node—credentials must be added to OpenAI Chat Engine, not the parser.

Step 6: Prepare and Append Structured Data

Flatten the AI response and append it to your output Google Sheet.

Connect AI Data Structuring to Prepare JSON Fields.
In Prepare JSON Fields, set Field to Split Out to output.
Set Fields to Include to output.name, output.description, output.rating, output.reviews, output.price.
Connect Prepare JSON Fields to Append to Sheets.
Verify Append to Sheets is configured to auto-map input data in Columns.

Tip: The flow loops after Append to Sheets back to Iterate URL Batches, allowing batch-by-batch processing until all URLs are handled.

Step 7: Test and Activate Your Workflow

Run a manual test to verify scraping, AI structuring, and sheet appending before activating.

Click Execute Workflow and confirm Manual Launch Trigger fires correctly.
Check that Fetch Page HTML returns HTML and Extract Raw Fields produces values like productTitle and price.
Verify AI Data Structuring outputs a JSON object with name, description, rating, reviews, and price.
Confirm new rows are appended in the destination sheet by Append to Sheets.
Once verified, save the workflow and switch it to Active for production use.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Watch Out For

Google Sheets credentials can expire or need specific permissions. If things break, check the n8n Credentials screen and confirm the Sheet is shared with the connected Google account first.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.

Common Questions

How quickly can I implement this Amazon rows automation?

Usually about 30 minutes if your Google Sheet and API keys are ready.

Can non-technical teams implement this Amazon rows automation?

Yes. You’ll mostly connect accounts, paste tokens, and match columns in Google Sheets.

Is n8n free to use for this Amazon rows automation workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Scrape.do usage and AI API costs (often a few cents per run, depending on how many products you process).

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

How do I adapt this Amazon rows automation solution to my specific challenges?

You can change what gets extracted by updating the “Extract Raw Fields” cleaning step and the prompts used in the “AI Data Structuring” node. Common customizations include adding ASIN/brand, capturing feature bullets, and writing to a different results schema (like separate columns for “current price” and “list price”). If you want to swap the model, replace the OpenAI Chat Engine with your preferred provider and keep the same structured output format so Sheets stays consistent.

Why is my Scrape.do connection failing in this workflow?

Usually it’s an invalid or missing SCRAPEDO_TOKEN in the HTTP request. Check your token in Scrape.do, update the value in n8n, and rerun a single URL first. If the HTML comes back empty, you may be hitting plan limits or sending the wrong URL format (some shortened Amazon links redirect oddly). Less common, but real: your workflow may be processing too many URLs too quickly, so lowering batch size can stabilize things.

What’s the capacity of this Amazon rows automation solution?

It scales mainly with your n8n plan and your Scrape.do/AI limits. On n8n Cloud Starter, most teams comfortably run small-to-medium weekly batches; if you self-host, execution count isn’t the bottleneck, your server and provider rate limits are. Practically, start with 20–50 URLs per run, confirm accuracy, then increase batch size once you see stable results.

Is this Amazon rows automation better than using Zapier or Make?

Often, yes, because you need batching, HTML cleanup, and structured AI extraction in one flow, and n8n handles that without turning it into a pile of paid steps. Zapier and Make can work, but multi-step scraping plus parsing gets expensive and fiddly fast. n8n also gives you the self-host option, which matters when this becomes a weekly habit. That said, if your goal is only “copy one value from one page,” those tools can feel quicker. Talk to an automation expert if you want a second opinion on stack fit.

Once this is running, your Amazon comparisons become a refresh button, not a recurring task. The workflow handles the repetitive parts so you can spend your time making the call.

Scrape.do + Google Sheets, clean Amazon rows fast

How This Automation Works

n8n Workflow Template: Scrape.do + Google Sheets, clean Amazon rows fast

The Challenge: Amazon research turns into copy-paste chaos