🔓 Unlock all 10,000+ workflows & prompts free Join Newsletter →
✅ Full access unlocked — explore all 10,000 AI workflow and prompt templates Browse Templates →
Home n8n Workflow
January 22, 2026

Website pages to Google Sheets, research ready notes

Lisa Granqvist Partner Workflow Automation Expert

You start research with good intentions, then you’re buried in tabs. Copying links into a spreadsheet, grabbing images for reference, pasting chunks of text into “notes” that you’ll never clean up later. It’s slow, and it’s weirdly exhausting.

This website to Sheets automation hits marketers and SEO leads first, but it also saves agency operators who build competitor decks on tight timelines. You get one Google Sheet row per site with links, images, and Markdown-ready page content, which means research that’s searchable and reusable.

Below, you’ll see how the workflow crawls a site from the homepage, filters what matters, and appends clean outputs to Google Sheets so your “research” stops being a messy browser session.

How This Automation Works

See how this solves the problem:

n8n Workflow Template: Website pages to Google Sheets, research ready notes

The Challenge: Turning a Website Into Usable Research Notes

A website looks organized until you try to “capture it.” The homepage links out to product pages, documentation, blogs, case studies, and random campaign landing pages that may or may not still matter. If you collect it manually, you end up with half a list, broken URLs, missing images, and copy-pasted snippets that don’t keep their structure. Then comes the worst part: you don’t trust your notes, so you re-check everything when it’s time to write the audit or brief. That’s wasted time, plus mental load you didn’t plan for.

Here’s where it breaks down in real life.

  • You lose about 2 hours per site just collecting links and keeping them tidy.
  • Images get saved “somewhere,” and later you can’t remember which page they came from.
  • Copy-pasting content strips formatting, which makes it harder to scan and reuse in docs or AI tools.
  • Duplicates and non-HTTP links sneak in, so your sheet looks complete but behaves like junk data.

The Fix: Crawl, Filter, and Append Clean Website Notes to Sheets

This workflow turns a website into a structured, research-ready snapshot inside Google Sheets. You start by defining a target URL (usually the homepage). The automation fetches that homepage HTML, pulls every link it can find, then splits the list into individual URLs so it can clean them properly. It removes duplicates, validates that the links are real web links (not mailto, anchors, or odd formats), and then routes them into two buckets: image assets and actual content pages. For pages, it fetches the HTML, converts it into Markdown so headings and lists stay readable, bundles the results, and appends everything to your Google Sheet.

The flow begins with one website URL and one run. From there, HTTP Request does the fetching, the workflow separates images vs. pages using a Switch, and Google Sheets gets three clean appends (images, links, content) that you can search, filter, and export later.

What Changes: Before vs. After

Real-World Impact

Say you’re doing competitor research for 5 sites in a week. Manually, it’s easy to spend about 10 minutes collecting links, another 10 minutes grabbing images and references, and about 20 minutes copying key page text per site (roughly 40 minutes each). That’s around 3+ hours weekly, and the output is inconsistent. With this workflow, you set the URL, run it, and get links, images, and Markdown content appended to Google Sheets in one go, usually in under 20 minutes per site including waiting on requests. You get most of that time back, and the sheet is actually reusable.

Requirements

  • n8n instance (try n8n Cloud free)
  • Self-hosting option if you prefer (Hostinger works well)
  • Google Sheets for storing links, images, and content.
  • Google account (OAuth) to allow n8n to edit your sheet.
  • Target website URL (use a homepage or section hub).

Skill level: Beginner. You’ll connect Google credentials, paste a URL, and map your sheet columns once.

Need help implementing this? Talk to an automation expert (free 15-minute consultation).

The Workflow Flow

You define the website to crawl. A Set node stores your website URL so the rest of the workflow always knows what “home” is. Most people point it at the homepage, but a resources hub works too.

The homepage gets fetched and links are extracted. HTTP Request pulls the raw HTML, then the workflow parses out link targets and explodes them into a list of individual URLs it can evaluate one-by-one.

Links are cleaned, validated, and routed. Duplicates are removed, only HTTPS (or valid web) links are kept, and a Switch separates image assets from content pages so you don’t mix media with text.

Content pages are converted into Markdown and saved. The workflow fetches each page’s HTML, converts it to Markdown for readability, bundles the output, then appends images, links, and content into Google Sheets using dedicated “Add … to Sheet” actions.

You can easily modify the filtering rules to crawl only certain paths (like /blog) based on your needs. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Manual Trigger

This workflow starts with a manual run so you can test crawling before scheduling it.

  1. Add or verify the Manual Launch Start node as the trigger.
  2. Connect Manual Launch Start to Define Website URL to pass the input into the crawl flow.

Step 2: Connect Google Sheets

These nodes write the crawl results to your Google Sheet.

  1. Open Append Images to Sheet and set Operation to appendOrUpdate, Sheet Name to your-sheet-name, and Document ID to your-document-id.
  2. Set Images mapping to {{ $json.links.join('\n\n') }} and Website mapping to {{ $('Define Website URL').item.json.website_url }}.
  3. Credential Required: Connect your googleSheetsOAuth2Api credentials in Append Images to Sheet.
  4. Open Append Links to Sheet and set Operation to appendOrUpdate, Sheet Name to your-sheet-name, and Document ID to your-document-id.
  5. Set Links mapping to {{ $json.links.join('\n\n') }} and Website mapping to {{ $('Define Website URL').item.json.website_url }}.
  6. Credential Required: Connect your googleSheetsOAuth2Api credentials in Append Links to Sheet.
  7. Open Append Content to Sheet and set Operation to appendOrUpdate, Sheet Name to your-sheet-name, and Document ID to your-document-id.
  8. Set Website mapping to {{ $('Define Website URL').item.json.website_url }} and Scraped Content mapping to {{ $json.data.join('\n\n').slice(0, 50000) }}.
  9. Credential Required: Connect your googleSheetsOAuth2Api credentials in Append Content to Sheet.

⚠️ Common Pitfall: Ensure your Google Sheet has columns named Website, Links, Scraped Content, and Images to match the mappings.

Step 3: Set Up Website URL and Homepage Fetch

This section sets the target site and pulls the homepage HTML for link extraction.

  1. In Define Website URL, set website_url to the target site, such as https://example.com.
  2. In Fetch Homepage HTML, set URL to {{ $json.website_url }}.
  3. Keep the Fetch Homepage HTML options defaults unless your site needs special headers.

If the homepage blocks crawlers, add headers like a user agent in Fetch Homepage HTML.

Step 4: Extract, Filter, and Route Links

These nodes extract links, remove duplicates, validate HTTPS, and route images vs pages.

  1. In Pull Link Targets, keep Operation set to extractHtmlContent with CSS Selector a and Attribute href returning an array to links.
  2. Set Explode Link List Field to Split Out to links so each link becomes its own item.
  3. Keep Deduplicate URLs to remove duplicate link entries.
  4. In Validate HTTPS Links, use the condition {{ $json.links }} starts with https://.
  5. In Route Images vs Pages, verify the image regex rule uses {{ $json.links }} and the regex =^https?:\/\/.*\.(?:png|jpe?g|gif|webp|bmp|svg|ico)(?:\?.*)?$.

Route Images vs Pages outputs to both Group Image URLs and the page branch (Group Page URLsFetch Page Content) in parallel.

⚠️ Common Pitfall: Relative URLs will be filtered out by Validate HTTPS Links. If you need relative links, add a normalization step before validation.

Step 5: Fetch Page Content and Convert to Markdown

This branch crawls non-image pages and bundles their text content.

  1. In Group Page URLs, keep Field to Aggregate set to links to bundle page URLs for sheet output.
  2. In Fetch Page Content, set URL to {{ $json.links }} so each page is requested.
  3. In HTML to Markdown, set HTML to {{ $json.data }} to convert fetched HTML into markdown.
  4. In Bundle Page Text, aggregate data to prepare content for the sheet.

Step 6: Configure Output Aggregation to Sheets

Image URLs, page links, and content are aggregated and appended into the sheet.

  1. Verify Group Image URLs aggregates the links field before sending to Append Images to Sheet.
  2. Verify Group Page URLs connects to Append Links to Sheet for link storage.
  3. Ensure Bundle Page Text connects to Append Content to Sheet to store the scraped content.

Step 7: Test and Activate Your Workflow

Run a manual test to confirm links, images, and content are written to Google Sheets.

  1. Click Execute Workflow and trigger Manual Launch Start.
  2. Confirm Append Links to Sheet, Append Images to Sheet, and Append Content to Sheet each add a row for the target Website.
  3. If results are missing, inspect Validate HTTPS Links and Route Images vs Pages outputs for filtered items.
  4. When results look correct, set the workflow to Active for production runs.
🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Watch Out For

  • Google Sheets credentials can expire or need specific permissions. If things break, check the credential in n8n and confirm the Google account can edit that spreadsheet.
  • Some websites block rapid requests or return different HTML to bots. If the crawl suddenly comes back empty, slow it down with a Wait node after the homepage fetch and rerun.
  • Google Sheets cells cap out around 50k characters. If you crawl long pages, keep the slice limit in place or split content across multiple rows so you don’t lose data silently.

Common Questions

How quickly can I implement this website to Sheets automation?

About 30 minutes if your Google account is ready.

Can non-technical teams implement this website to Sheets crawl?

Yes, because there’s no code involved. You’ll mainly connect Google Sheets and paste in the website URL you want to crawl.

Is n8n free to use for this website to Sheets workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Google usage (usually free for normal Sheets use).

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

How do I adapt this website to Sheets solution to my specific challenges?

You can. The easiest wins are in the filtering and routing: tweak “Validate HTTPS Links” to include only certain paths (like /blog/), and adjust “Route Images vs Pages” if you want to keep PDFs or exclude file types. If you need more depth, add another HTTP fetch pass after “Pull Link Targets” to crawl a second layer. Common customizations include saving page titles, adding a “Category” column, or splitting long Markdown into multiple rows.

Why is my Google Sheets connection failing in this workflow?

Usually it’s an OAuth issue: the credential expired, the wrong Google account is connected, or the account doesn’t have edit access to that spreadsheet. Update the credential in n8n, then double-check the spreadsheet ID and sheet name in each “Append … to Sheet” node. If it still fails, look for a permissions prompt in your Google account and re-authorize.

What’s the capacity of this website to Sheets solution?

It depends more on the target site than on n8n. For most marketing sites, crawling a few dozen internal links per run is fine, but very large sites may hit rate limits or produce content too large for a single Google Sheets cell. If you self-host n8n, execution volume is effectively limited by your server. On n8n Cloud, higher plans support more monthly executions, so you can crawl more sites without babysitting it.

Is this website to Sheets automation better than using Zapier or Make?

Often, yes, because this is more than a simple “trigger then write a row.” You’re doing deduplication, validation, routing, aggregation, and conversion to Markdown, which is the kind of multi-step logic that gets awkward (and pricey) in Zapier. n8n also lets you self-host, so you can run lots of crawls without worrying about task counts. Zapier or Make can still be fine for a tiny version, like capturing one URL at a time from a form. If you want help picking the right tool for your exact setup, Talk to an automation expert.

Once this is running, your “research” becomes a repeatable asset instead of a one-off scramble. Set it up, crawl what you need, and move on to the work that actually pays off.

Need Help Setting This Up?

Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.

Lisa Granqvist

Workflow Automation Expert

Expert in workflow automation and no-code tools.

×

Use template

Get instant access to this n8n workflow Json file

💬
Get a free quote today!
Get a free quote today!

Tell us what you need and we'll get back to you within one working day.

Get a free quote today!
Get a free quote today!

Tell us what you need and we'll get back to you within one working day.

Launch login modal Launch register modal