Firecrawl + Google Sheets: site maps you can use
You grab a competitor’s URL, start clicking around, and 30 minutes later you’ve got 12 tabs open and no usable list of pages. Worse, you still don’t know what’s a product page, what’s a category, and what’s just fluff.
This Firecrawl Sheets mapping automation hits marketers doing competitor research first, honestly. But sales teams doing lead enrichment and agency operators building audits feel it too. The outcome is simple: a clean Google Sheet with the site mapped and URLs sorted into product, category, and “other” tabs.
You’ll see exactly how the workflow pulls a full internal sitemap, extracts company insights, classifies URLs with AI, and writes everything into structured Sheets you can filter immediately.
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: Firecrawl + Google Sheets: site maps you can use
flowchart LR
subgraph sg0["Form Submission Flow"]
direction LR
n0@{ icon: "mdi:location-exit", form: "rounded", label: "Map a website and get urls", pos: "b", h: 48 }
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/form.svg' width='40' height='40' /></div><br/>Form Submission"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Scrape Website URL"]
n3@{ icon: "mdi:cog", form: "rounded", label: "Extract HTML", pos: "b", h: 48 }
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Clean HTML Content"]
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse JSON Data"]
n6@{ icon: "mdi:database", form: "rounded", label: "Update Domain Scraper Sheet", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse URLs with MetaData"]
n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse Array URLs"]
n9@{ icon: "mdi:swap-vertical", form: "rounded", label: "Split in Batches", pos: "b", h: 48 }
n10@{ icon: "mdi:robot", form: "rounded", label: "Categorising AI Agent", pos: "b", h: 48 }
n11@{ icon: "mdi:robot", form: "rounded", label: "Company Info Agent", pos: "b", h: 48 }
n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse All URLs with categories"]
n13@{ icon: "mdi:database", form: "rounded", label: "Append Categories", pos: "b", h: 48 }
n14@{ icon: "mdi:database", form: "rounded", label: "Append Products", pos: "b", h: 48 }
n15["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse Others"]
n16["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse Products"]
n17["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse Categories"]
n18@{ icon: "mdi:database", form: "rounded", label: "Append Others", pos: "b", h: 48 }
n3 --> n4
n15 --> n18
n18 --> n9
n16 --> n14
n14 --> n9
n1 --> n2
n5 --> n6
n8 --> n9
n17 --> n13
n9 --> n10
n13 --> n9
n4 --> n11
n11 --> n5
n2 --> n3
n10 --> n12
n7 --> n8
n0 --> n7
n6 --> n0
n12 --> n17
n12 --> n16
n12 --> n15
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n1 trigger
class n10,n11 ai
class n6,n13,n14,n18 database
class n2 api
class n4,n5,n7,n8,n12,n15,n16,n17 code
classDef customIcon fill:none,stroke:none
class n1,n2,n4,n5,n7,n8,n12,n15,n16,n17 customIcon
Why This Matters: Competitor Pages Are Hard to Catalog Fast
“Just map their site” sounds easy until you actually try it. You start from the homepage, click categories, open product pages, then hit a wall because navigation hides half the catalog behind filters and JavaScript. So you switch to a crawler, export a messy CSV, and still spend your afternoon sorting URLs by hand. It’s not only time. It’s context switching, second guessing, and redoing the same work next month because nothing is structured in a way your team can reuse.
The friction compounds. Here’s where it breaks down in real life.
- People copy and paste URLs into a sheet, but the list is incomplete and the naming is inconsistent.
- Exports from crawling tools often include tracking parameters and duplicates, so you waste time cleaning before you can even analyze.
- Someone has to manually label pages as “product” or “category,” and mistakes quietly ruin the conclusions you draw later.
- When you revisit the research, you can’t compare changes because last time’s data wasn’t mapped the same way.
What You’ll Build: An AI-Classified Site Map in Google Sheets
This workflow turns a single website URL into a structured, reusable research asset. It starts when you submit a domain through an n8n form. The workflow fetches the homepage, extracts the main text, and cleans it up so an AI agent can pull high-level company insights like industry, audience, and whether the business is B2B or B2C. Then Firecrawl maps the site and returns internal URLs with helpful metadata. Those URLs get processed in batches, classified by an AI evaluator into product pages, category pages, or non-commerce pages, and finally written into dedicated Google Sheets tabs. You end with a spreadsheet that feels like it was prepared by an analyst, not dumped out of a crawler.
The workflow starts with intake and homepage interpretation in n8n. After that, Firecrawl handles the heavy lifting of mapping internal links. Finally, AI classification and Google Sheets outputs give you clean tabs you can filter, share, and reuse for audits or competitor tracking.
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Say you map 5 competitor sites in a week. Manually, you might spend about 2 hours per site between clicking, copying URLs, and labeling pages, so that’s roughly a full day lost. With this workflow, submitting each site takes a minute, then you wait while Firecrawl maps URLs and the AI classifier sorts them in batches (often 10–20 minutes per site depending on size). That’s still real time, but it’s not your time, and the output is already in clean Google Sheets tabs.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Firecrawl for mapping internal URLs at scale
- Google Sheets to store results in structured tabs
- LLM credentials (Gemini or compatible) (get it from Google AI Studio or your provider dashboard)
Skill level: Beginner. You’ll connect accounts, paste API keys, and update a target Google Sheet.
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
Form submission starts the run. You paste a website URL into the n8n form trigger, which becomes the single source of truth for the rest of the workflow.
The homepage gets cleaned for analysis. n8n pulls the homepage HTML via HTTP Request, extracts the readable text, and sanitizes it so the AI agent can summarize the business without getting distracted by navigation and boilerplate.
Firecrawl maps the whole site. Once the domain is recorded in Google Sheets, Firecrawl fetches internal URLs and the workflow interprets URL metadata, then unpacks the URL array into items that can be processed safely.
URLs are classified and written into tabs. A split-in-batches loop sends URLs to the AI evaluator, a sorting step routes them into product/category/other branches, and Google Sheets nodes append rows into the right place.
You can easily modify the classification rules to exclude blogs or legal pages based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Form Trigger
Set up the workflow entry point so submissions kick off the URL classification flow.
- Add and open Form Intake Trigger.
- Use the default webhook settings generated by the node (the webhook ID is created automatically).
- Ensure the form fields collect the website URL needed by Fetch Website Source.
website or url in the form response to simplify mapping in downstream code nodes.Step 2: Connect Website Fetch and Extraction
Configure the initial crawl, HTML retrieval, and text cleanup chain.
- Open Fetch Website Source and set the request to the submitted URL (if required by your form structure).
- Verify the connection from Fetch Website Source → HTML Extraction → Sanitize HTML Text.
- Open HTML Extraction and configure selectors for the content you want to analyze (e.g., body text or specific elements).
- Review Sanitize HTML Text to ensure it outputs clean text for the AI model.
Step 3: Set Up AI Classification and Insights
Use Gemini to generate company insights and categorize URLs.
- Open Company Insight Agent and configure the prompt to summarize the company based on sanitized text.
- Credential Required: Connect your Google Gemini credentials in Company Insight Agent.
- Open Category AI Evaluator and configure the prompt to categorize URL batches.
- Credential Required: Connect your Google Gemini credentials in Category AI Evaluator.
- Confirm the flow: Sanitize HTML Text → Company Insight Agent → Decode JSON Payload → Update Domain Sheet → Website Map & URL Fetch → Interpret URL Metadata → Unpack URL Array → Batch Iterator → Category AI Evaluator.
Step 4: Configure URL Processing and Parallel Routing
Split and route categorized URLs into separate paths and prepare rows for Sheets.
- Open Sort URLs by Category and confirm it outputs structured buckets for category, product, and misc links.
- Verify parallel execution: Sort URLs by Category outputs to both Process Category Links and Process Product Links in parallel, and also to Process Misc Links in parallel.
- Review the code nodes (Process Category Links, Process Product Links, Process Misc Links, plus other code nodes in the flow) to ensure they map the URL data into row-ready structures.
- Confirm Batch Iterator is connected after each append step to continue processing additional URL batches.
url, category, title) to avoid mismatches across branches.Step 5: Configure Google Sheets Outputs
Send the classified data to the appropriate Sheets tabs.
- Open Update Domain Sheet and set the target spreadsheet and sheet for domain-level data.
- Open Append Category Rows, Append Product Rows, and Append Misc Rows and set the spreadsheet and sheet names for each category.
- Credential Required: Connect your Google Sheets credentials in Update Domain Sheet, Append Category Rows, Append Product Rows, and Append Misc Rows.
Step 6: Test and Activate Your Workflow
Run a manual test to validate the full path from form intake to Sheets output.
- Click Execute Workflow and submit a sample form response to Form Intake Trigger.
- Confirm successful execution through Fetch Website Source, HTML Extraction, Sanitize HTML Text, and the AI nodes.
- Check that Update Domain Sheet and the append nodes write new rows to the correct Sheets tabs.
- When satisfied, toggle the workflow to Active to enable production processing.
Troubleshooting Tips
- Google Sheets credentials can expire or need specific permissions. If things break, check the connected Google account in n8n’s Credentials and confirm the Sheet is shared with it.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Quick Answers
About 30 minutes if your credentials are ready.
No. You will connect accounts, paste API keys, and edit a few fields like your target Sheet.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Firecrawl and LLM API usage, which depends on how many pages you map.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you probably should. You can adjust what counts as “product” or “category” by editing the prompt in the Category AI Evaluator, and you can add exclusions in the URL processing steps (like filtering out /blog, /careers, or /legal). Many teams also extend the Google Sheets output with extra columns like “priority,” “notes,” or “target keyword.” If you prefer OpenAI instead of Gemini, swap the chat model/agent node and keep the rest of the structure the same.
Usually it’s an API key problem or the wrong Firecrawl base settings in n8n. Regenerate the Firecrawl API key, update it in Credentials, then rerun with a small site first. If the mapping returns empty, the domain may block crawlers or require cookies, so you may need to adjust crawl settings or start from a different entry URL.
A lot, as long as you batch it. On n8n Cloud, your limit is mostly your monthly executions and how many URLs you choose to classify; on self-hosting, the practical limit is server resources and API rate limits. The workflow is already designed with Split in Batches, which keeps big sites from crashing the run. If you’re mapping thousands of URLs regularly, reduce what you classify, increase batch size carefully, and expect longer waits.
For this kind of multi-step crawling and classification, n8n is usually the smoother choice because batching, branching, and code-based cleanup don’t get awkward or expensive. Zapier and Make can do parts of it, but large URL lists often turn into lots of billable tasks. n8n also gives you the option to self-host, which matters once you’re running this every week. If you only want a simple “URL in, row out” flow for a handful of pages, the lighter tools can be fine. Talk to an automation expert if you’re unsure which route fits your volume.
Once this is in place, competitor research stops being a one-off scramble and becomes a dataset you can build on. The workflow handles the sorting, and you get your headspace back.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.