Firecrawl to Google Sheets, clean sitemap hierarchy
You grab a sitemap, paste it into a spreadsheet, and five minutes later it’s chaos. Duplicate URLs. Weird encoding. No parent-child structure. Now you’re “auditing” while basically doing data cleanup.
This Firecrawl Sheets sitemap automation hits SEO specialists first, but web developers doing migrations and marketing leads prepping audits feel the drag too. The workflow turns one website URL into a clean, hierarchical sitemap in Google Sheets, so you can review architecture without losing an afternoon.
Below you’ll see exactly what the workflow produces, how it works in plain English, and what you need to run it reliably.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Firecrawl to Google Sheets, clean sitemap hierarchy
flowchart LR
subgraph sg0["When chat message received Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When chat message received", pos: "b", h: 48 }
n1@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Firecrawl OK", pos: "b", h: 48 }
n2@{ icon: "mdi:cog", form: "rounded", label: "Copy template", pos: "b", h: 48 }
n3@{ icon: "mdi:database", form: "rounded", label: "Data mapping", pos: "b", h: 48 }
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Sorting URL into table"]
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/webhook.dark.svg' width='40' height='40' /></div><br/>Bad URL"]
n6@{ icon: "mdi:location-exit", form: "rounded", label: "Map a website and get urls", pos: "b", h: 48 }
n1 --> n2
n1 --> n5
n2 --> n4
n4 --> n3
n6 --> n1
n0 --> n6
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n1 decision
class n3 database
class n5 api
class n4 code
classDef customIcon fill:none,stroke:none
class n4,n5 customIcon
The Problem: Sitemaps Are “Available,” Not Usable
A sitemap file is supposed to make life easier. In reality, it often creates a new job: turning a pile of URLs into something you can actually analyze. You end up copying from XML, then sorting, then trying to guess navigation levels by looking at slugs. Subdomains get mixed in. Parameters sneak through. And when you share the “audit sheet” with a client or teammate, you spend half the meeting explaining what they’re looking at instead of discussing what to fix.
The friction compounds. Especially when you do this more than once a month.
- You spend about 1–2 hours per site just turning URLs into a reviewable structure.
- Without parent-child relationships, it’s hard to spot orphaned pages or bloated sections quickly.
- Teams accidentally audit pages that aren’t meant to be indexed because the source data is messy.
- When a crawl fails, you often learn too late, after you’ve already created documents and folders you now have to delete.
The Solution: Generate a Hierarchical Sitemap Sheet Automatically
This workflow starts with a simple input: the website URL, sent through an n8n chat interface. From there, Firecrawl follows the site’s sitemap files (and only the sitemap files) to discover pages in a way that’s fast and respectful of crawling rules. If the crawl is blocked or the URL is invalid, the workflow stops cleanly and tells you right away. If it succeeds, n8n duplicates a Google Sheets template in your Google Drive, processes the URLs into a level-based hierarchy (Niv 0 to Niv 5), and then appends the rows into the sheet with clickable links. What you end up with is a structured sitemap you can filter, scan, and share without doing manual cleanup first.
The workflow kicks off when you submit a URL in chat. Firecrawl collects sitemap-declared URLs, then a hierarchy step organizes them into parent-child levels. Finally, Google Drive creates the spreadsheet from your template and Google Sheets receives the formatted rows.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you audit 5 sites in a week. Manually, even a “simple” sitemap usually takes about 60–90 minutes to copy out, dedupe, decode weird URLs, and sort into something resembling a hierarchy, so that’s roughly 6–8 hours of prep work. With this workflow, you paste a URL into chat (maybe 1 minute), wait for crawling and processing (often about 10 minutes), then open the finished Google Sheet from the link. You still do the thinking, but you get most of your week back.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Firecrawl for sitemap-based crawling and URL discovery
- Google Sheets to store the hierarchical sitemap output
- Firecrawl API key (get it from your Firecrawl dashboard)
Skill level: Intermediate. You’ll connect credentials, duplicate a template, and swap a file ID once.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
You send a website URL via chat. The workflow uses an n8n chat trigger, so there’s no form to build and no “run this every Tuesday” scheduling to manage.
Firecrawl follows sitemap-declared pages only. That matters because it keeps the crawl clean and ethical. It respects robots.txt and uses settings designed for sitemap discovery (so you’re not accidentally spidering the whole site).
The workflow checks success before doing any file work. If the website blocks crawlers, or the URL is malformed, you get a clear failure response and the run ends. No leftover spreadsheets littering your Drive.
A Google Sheets template is duplicated and filled with hierarchy levels. n8n creates a new sheet in your Google Drive, organizes URLs into a parent-child tree up to five levels deep, then appends rows into Google Sheets as clickable links.
You can easily modify the template layout to match your audit style (columns, naming, extra notes fields), so the output fits how your team already works. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Chat Trigger
Set up the entry point so users can submit a website URL via chat and start the sitemap crawl.
- Add the Incoming Chat Trigger node.
- Set Mode to
webhook. - Enable Public so the webhook can be reached externally.
- Connect Incoming Chat Trigger to Crawl Site Map URLs.
https://) in the chat input to avoid crawl errors.Step 2: Connect Firecrawl for URL Mapping
Configure the sitemap crawl using Firecrawl to map the website structure.
- Add the Crawl Site Map URLs node.
- Credential Required: Connect your firecrawlApi credentials.
- Set URL to
{{ $json.chatInput }}. - Set Operation to
map. - Enable Sitemap Only by setting it to
true. - Set Ignore Sitemap to
false. - Connect Crawl Site Map URLs to Crawl Success Check.
Step 3: Set Up the Crawl Validation and Response Logic
Use conditional logic to handle successful crawls and return a friendly error when the URL is invalid or unsupported.
- Add the Crawl Success Check node.
- Configure the condition to check Boolean True with Left Value set to
{{ $json.success }}. - Connect the true output of Crawl Success Check to Duplicate Sheet Template.
- Connect the false output of Crawl Success Check to Return Invalid URL.
- In Return Invalid URL, set Respond With to
jsonand keep the Response Body as{ "text": "L'url {{ $('Incoming Chat Trigger').item.json.chatInput }} n'est pas une url correcte ou elle n'est pas prise en compte par ce service" }.
Step 4: Connect Google Drive
Duplicate a spreadsheet template to store the site hierarchy for each new URL.
- Add the Duplicate Sheet Template node.
- Credential Required: Connect your googleDriveOAuth2Api credentials.
- Set Operation to
copy. - Set Name to
{{ $('Incoming Chat Trigger').item.json.chatInput }} - n8n - Arborescence. - Set File ID to your template spreadsheet ID (replace
[YOUR_ID]). - Connect Duplicate Sheet Template to Organize URL Hierarchy.
Step 5: Set Up URL Processing
Transform the crawled URLs into a structured hierarchy that can be written to Google Sheets.
- Add the Organize URL Hierarchy node.
- Keep the JavaScript Code as provided to build the multi-level hierarchy and hyperlink output.
- Verify the node reads data from Crawl Site Map URLs via
$('Crawl Site Map URLs').item.json. - Connect Organize URL Hierarchy to Append Hierarchy Rows.
Step 6: Configure Output to Google Sheets
Append the generated hierarchy rows into the duplicated spreadsheet.
- Add the Append Hierarchy Rows node.
- Credential Required: Connect your googleSheetsOAuth2Api credentials.
- Set Operation to
append. - Set Document ID to
{{ $('Duplicate Sheet Template').item.json.id }}. - Set Sheet Name to
FR. - Keep Columns mapping on auto-map so fields like Niv 0 to Niv 5 and URL are appended correctly.
Niv 0 through Niv 5 and URL to avoid blank rows.Step 7: Test and Activate Your Workflow
Run a full test to confirm the crawl, hierarchy creation, and spreadsheet output.
- Click Execute Workflow and send a valid website URL to Incoming Chat Trigger.
- Confirm Crawl Site Map URLs returns
success: trueand Crawl Success Check follows the true branch. - Verify a new spreadsheet is created by Duplicate Sheet Template and rows are appended by Append Hierarchy Rows.
- If the URL is invalid, ensure Return Invalid URL responds with the error message.
- When satisfied, toggle the workflow to Active for production use.
Common Gotchas
- Google Drive credentials can expire or need specific permissions. If things break, check the n8n Credentials screen first, then confirm the Google account can create copies in Drive.
- If the site blocks crawling, Firecrawl may return a failed status even when the sitemap exists. Check robots.txt and security tooling (like bot protection) before assuming the workflow is wrong.
- The template copy node depends on a file ID. If you forget to replace the default template ID with your own, you may get permission errors or end up writing to a sheet you can’t share properly.
Frequently Asked Questions
About 30 minutes if you already have your credentials ready.
No. You’ll connect accounts and paste in your template file ID once.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Firecrawl API usage costs (it depends on crawl volume and your plan).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should, honestly. The easiest approach is to duplicate your own Google Sheets template, then update the “Duplicate Sheet Template” node to use your template’s file ID. You can also adjust what gets written by tweaking the mapping in “Append Hierarchy Rows,” which is where columns and formatting decisions show up. Common customizations include adding a “Notes” column for audits, renaming Niv columns to “Level,” and separating subdomains into dedicated tabs.
Usually it’s an API key issue (wrong key, expired key, or the credential was never selected on the node). It can also be the target site blocking external crawlers via robots.txt or bot protection, which means Firecrawl can’t fetch sitemap URLs successfully. If it fails on only some sites, check those sites first, not your n8n setup. Rate limits can also show up when you run a lot of audits back-to-back.
It can handle most normal sitemaps, but the real limit is your Firecrawl plan and how big the sitemap is.
Often, yes, because this job needs crawling, validation, and hierarchy processing, not just “move data from A to B.” n8n handles branching logic cleanly (success vs. failure), and self-hosting avoids execution limits that can get pricey when you run lots of audits. You also have much more control over how the hierarchy is built before writing to Google Sheets, which is where most of the value lives. Zapier or Make can still work if you only need a basic two-step export and you’re fine with a flatter, less structured output. Talk to an automation expert if you’re not sure which fits.
Once you have a clean hierarchy in Google Sheets, the audit gets simpler and the conversation gets sharper. Set it up once, and future sitemap reviews feel almost unfairly easy.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.