Scrape.do + Google Sheets, clean Amazon rows fast
Trying to compare Amazon products in a spreadsheet sounds simple. Then you spend your afternoon opening tabs, hunting for price and rating changes, and pasting messy snippets that never line up.
This Amazon rows automation hits marketers running competitor research first. But e-commerce operators tracking pricing and analysts building datasets feel the same drag. You end up with half-finished sheets and numbers you don’t fully trust.
This workflow pulls URLs from Google Sheets, scrapes reliably with Scrape.do, and writes back clean rows (name, price, rating, reviews, description). You’ll see how it works, what you need, and where teams usually get stuck.
How This Automation Works
See how this solves the problem:
n8n Workflow Template: Scrape.do + Google Sheets, clean Amazon rows fast
flowchart LR
subgraph sg0["When clicking Test workflow Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking Test workflow", pos: "b", h: 48 }
n1@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model", pos: "b", h: 48 }
n2@{ icon: "mdi:robot", form: "rounded", label: "Structured Output Parser", pos: "b", h: 48 }
n3@{ icon: "mdi:database", form: "rounded", label: "1. Get Product URLs from Goo..", pos: "b", h: 48 }
n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "2. Loop Through Each URL", pos: "b", h: 48 }
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>3. Scrape Product Page HTML"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/html.dark.svg' width='40' height='40' /></div><br/>4. Extract Raw Data Elements"]
n7@{ icon: "mdi:robot", form: "rounded", label: "5. Clean & Structure Data wi..", pos: "b", h: 48 }
n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "6. Format Final JSON Output", pos: "b", h: 48 }
n9@{ icon: "mdi:database", form: "rounded", label: "7. Save Product Data to Goog..", pos: "b", h: 48 }
n1 -.-> n7
n4 --> n5
n2 -.-> n7
n5 --> n6
n8 --> n9
n0 --> n3
n6 --> n7
n7 --> n8
n9 --> n4
n3 --> n4
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n2,n7 ai
class n1 aiModel
class n3,n9 database
class n5 api
classDef customIcon fill:none,stroke:none
class n5,n6 customIcon
The Challenge: Amazon research turns into copy-paste chaos
Amazon pages are great for shoppers and awful for spreadsheets. One product shows price in a clean spot, another hides it behind a variant selector, and suddenly your “quick comparison” has five browser windows and a sticky note of guesses. Even when you do get the right numbers, you still have to normalize them into columns so you can sort, filter, and actually make a decision. And the worst part is the staleness: you do all that work, then prices change tomorrow and your sheet quietly becomes wrong.
It adds up fast. Here’s where it usually breaks down.
- Manually checking 20 products can burn about 2 hours, and you still miss things like review count changes.
- Different page layouts lead to inconsistent rows, so comparisons stop being apples-to-apples.
- Basic scrapers often get blocked or return partial HTML, which means you waste time debugging instead of researching.
- When you need weekly refreshes, the work becomes a recurring chore that never stays “done.”
The Fix: scrape Amazon URLs from Sheets and write back clean rows
This workflow turns your Google Sheet into the control center for Amazon product research. You keep a simple list of product URLs in an “input” tab, then run the automation when you want fresh data. n8n reads those URLs, processes them in batches, and sends each one to Scrape.do through an HTTP request so you get the page HTML reliably (even when Amazon tries to block automated traffic). Next, the workflow cleans the HTML and pulls out the pieces that matter. Finally, an OpenAI-powered extraction step verifies and structures the fields, so your output stays consistent even when Amazon’s layout changes. The results get appended back into a “results” tab in Google Sheets, ready to sort and compare.
The workflow starts with a manual launch trigger, then reads product links from Google Sheets. Scrape.do fetches each page, AI turns messy HTML into predictable fields, and Google Sheets receives a neat row per product. No tab juggling.
What Changes: Before vs. After
| What This Eliminates | Impact You’ll See |
|---|---|
|
|
Real-World Impact
Say you track 30 competitor products each week. Manually, you’ll spend maybe 5 minutes per product opening the page, finding price, rating, and review count, then formatting the row, which is about 2.5 hours total. With this workflow, you paste the 30 URLs into Google Sheets and run it: a minute to start, then it processes in batches and writes structured rows back automatically. You get the same dataset without the repetitive work.
Requirements
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Google Sheets for the input list and results table.
- Scrape.do to fetch Amazon HTML reliably.
- Scrape.do API token (get it from your Scrape.do dashboard).
- OpenAI or OpenRouter API key (get it from your provider’s API keys page).
Skill level: Intermediate. You’ll paste credentials, set sheet IDs/tab names, and map a few columns.
Need help implementing this? Talk to an automation expert (free 15-minute consultation).
The Workflow Flow
You start the run manually. In n8n, the Manual Launch Trigger kicks off the workflow when you want a refresh, which is perfect for weekly research or before a pricing decision.
Google Sheets provides the URL queue. The workflow reads your tracking tab (the one holding Amazon product links) and prepares those URLs for batch processing so you don’t overwhelm anything.
Scrape.do fetches the product HTML. n8n sends each URL through an HTTP Request node using your Scrape.do token, then strips out irrelevant scripts and markup so the next step has cleaner input.
AI turns messy pages into consistent columns. The OpenAI Chat Model plus a structured output parser extracts name, price, rating, review count, and a usable description, then formats everything into predictable JSON fields.
Google Sheets gets a clean row per product. The final append step writes results into your results tab, so you can filter by rating, sort by price, or export to Excel if that’s your reporting flow.
You can easily modify the extracted fields to include things like brand, ASIN, or bullet features based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
Start the workflow manually so you can validate each step before running at scale.
- Add a Manual Launch Trigger node as the workflow trigger.
- Connect Manual Launch Trigger to Retrieve Product Links to start the data pipeline.
Step 2: Connect Google Sheets
Pull product URLs from a Google Sheet and prepare a destination sheet for structured outputs.
- Open Retrieve Product Links and set Document to
[YOUR_ID](example: Amazon Product List). - Set Sheet to
[YOUR_ID](example: Sheet1) in Retrieve Product Links. - Credential Required: Connect your googleSheetsOAuth2Api credentials in Retrieve Product Links.
- Open Append to Sheets and set Operation to
append. - Set Document to
[YOUR_ID]and Sheet to[YOUR_ID](example: Sheet2) in Append to Sheets. - Credential Required: Connect your googleSheetsOAuth2Api credentials in Append to Sheets.
url so Fetch Page HTML can use {{$json.url}}.Step 3: Batch and Scrape Product Pages
Split URLs into batches and request each page’s HTML using a scraping API.
- Connect Retrieve Product Links to Iterate URL Batches to enable batch processing.
- Connect Iterate URL Batches to Fetch Page HTML so each URL is scraped.
- In Fetch Page HTML, set URL to
=https://api.scrape.do/?token={{$vars.SCRAPEDO_TOKEN}}&url={{ encodeURIComponent($json.url) }}&geoCode=us&render=false. - Keep Options → Timeout at
60000to avoid premature timeouts on slow pages.
$vars.SCRAPEDO_TOKEN. Make sure you’ve defined this variable in n8n or the HTTP request will fail.Step 4: Extract Raw Fields from HTML
Parse product details from the HTML response using CSS selectors.
- Connect Fetch Page HTML to Extract Raw Fields.
- Set Operation in Extract Raw Fields to
extractHtmlContent. - Review the extraction keys and selectors, such as productTitle with
#productTitle, h1[data-automation-id="product-title"], .product-titleand price with.a-price .a-offscreen, .a-price-whole, .a-price-fraction, .priceToPay .a-price .a-offscreen.
Step 5: Set Up AI Data Structuring
Use an LLM to transform raw scraped fields into a clean, structured JSON output.
- Connect Extract Raw Fields to AI Data Structuring.
- In AI Data Structuring, set Text to
={{ JSON.stringify($json, null, 2) }}. - Ensure Has Output Parser is enabled in AI Data Structuring.
- Connect OpenAI Chat Engine to AI Data Structuring as the language model.
- In OpenAI Chat Engine, set Model to
gpt-4o-mini, Max Tokens to500, Temperature to0, and Response Format tojson_object. - Connect Structured Result Parser to AI Data Structuring as the output parser and keep the schema as provided.
Step 6: Prepare and Append Structured Data
Flatten the AI response and append it to your output Google Sheet.
- Connect AI Data Structuring to Prepare JSON Fields.
- In Prepare JSON Fields, set Field to Split Out to
output. - Set Fields to Include to
output.name, output.description, output.rating, output.reviews, output.price. - Connect Prepare JSON Fields to Append to Sheets.
- Verify Append to Sheets is configured to auto-map input data in Columns.
Step 7: Test and Activate Your Workflow
Run a manual test to verify scraping, AI structuring, and sheet appending before activating.
- Click Execute Workflow and confirm Manual Launch Trigger fires correctly.
- Check that Fetch Page HTML returns HTML and Extract Raw Fields produces values like
productTitleandprice. - Verify AI Data Structuring outputs a JSON object with
name,description,rating,reviews, andprice. - Confirm new rows are appended in the destination sheet by Append to Sheets.
- Once verified, save the workflow and switch it to Active for production use.
Watch Out For
- Google Sheets credentials can expire or need specific permissions. If things break, check the n8n Credentials screen and confirm the Sheet is shared with the connected Google account first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Common Questions
Usually about 30 minutes if your Google Sheet and API keys are ready.
Yes. You’ll mostly connect accounts, paste tokens, and match columns in Google Sheets.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Scrape.do usage and AI API costs (often a few cents per run, depending on how many products you process).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
You can change what gets extracted by updating the “Extract Raw Fields” cleaning step and the prompts used in the “AI Data Structuring” node. Common customizations include adding ASIN/brand, capturing feature bullets, and writing to a different results schema (like separate columns for “current price” and “list price”). If you want to swap the model, replace the OpenAI Chat Engine with your preferred provider and keep the same structured output format so Sheets stays consistent.
Usually it’s an invalid or missing SCRAPEDO_TOKEN in the HTTP request. Check your token in Scrape.do, update the value in n8n, and rerun a single URL first. If the HTML comes back empty, you may be hitting plan limits or sending the wrong URL format (some shortened Amazon links redirect oddly). Less common, but real: your workflow may be processing too many URLs too quickly, so lowering batch size can stabilize things.
It scales mainly with your n8n plan and your Scrape.do/AI limits. On n8n Cloud Starter, most teams comfortably run small-to-medium weekly batches; if you self-host, execution count isn’t the bottleneck, your server and provider rate limits are. Practically, start with 20–50 URLs per run, confirm accuracy, then increase batch size once you see stable results.
Often, yes, because you need batching, HTML cleanup, and structured AI extraction in one flow, and n8n handles that without turning it into a pile of paid steps. Zapier and Make can work, but multi-step scraping plus parsing gets expensive and fiddly fast. n8n also gives you the self-host option, which matters when this becomes a weekly habit. That said, if your goal is only “copy one value from one page,” those tools can feel quicker. Talk to an automation expert if you want a second opinion on stack fit.
Once this is running, your Amazon comparisons become a refresh button, not a recurring task. The workflow handles the repetitive parts so you can spend your time making the call.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.