Apify to Google Sheets, competitor topic map ready
Competitor research sounds simple until you’re ten tabs deep, copying headings into a doc, and you still can’t tell what topics actually drive their traffic.
SEO specialists feel this in every content audit. A content strategist trying to build a topic map feels it too. And if you run an agency, it turns into billable hours spent on busywork. This competitor research automation replaces the manual browsing with one structured table in Google Sheets.
You’ll learn what this workflow does, what you need to run it, and how to use the output to plan content with a lot more confidence.
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: Apify to Google Sheets, competitor topic map ready
flowchart LR
subgraph sg0["Language Analysis Ag Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/webhook.dark.svg' width='40' height='40' /></div><br/>Incoming Webhook Trigger"]
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/webhook.dark.svg' width='40' height='40' /></div><br/>Return Form Page"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Assemble Crawl Payload"]
n3@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Validate Input Presence", pos: "b", h: 48 }
n4@{ icon: "mdi:database", form: "rounded", label: "Log Request Details", pos: "b", h: 48 }
n5@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Crawl Rival Site", pos: "b", h: 48 }
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Retrieve Crawl Dataset"]
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Derive Page Metadata"]
n8@{ icon: "mdi:brain", form: "rounded", label: "Gemini Content Review", pos: "b", h: 48 }
n9@{ icon: "mdi:robot", form: "rounded", label: "Language Analysis Agent", pos: "b", h: 48 }
n10["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Normalize Model Output"]
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Generate Sheet Label"]
n12@{ icon: "mdi:database", form: "rounded", label: "Create Results Sheet", pos: "b", h: 48 }
n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Combine Streams"]
n14@{ icon: "mdi:swap-vertical", form: "rounded", label: "Map Sheet Fields", pos: "b", h: 48 }
n15@{ icon: "mdi:database", form: "rounded", label: "Store Captured Results", pos: "b", h: 48 }
n16@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Check Email Payload", pos: "b", h: 48 }
n17@{ icon: "mdi:message-outline", form: "rounded", label: "Dispatch Email Report", pos: "b", h: 48 }
n13 --> n14
n9 --> n10
n0 --> n1
n0 --> n2
n11 --> n12
n11 --> n13
n14 --> n15
n2 --> n3
n8 -.-> n9
n16 --> n17
n7 --> n9
n6 --> n7
n3 --> n5
n3 --> n4
n15 --> n16
n5 --> n6
n12 --> n13
n10 --> n11
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n9 ai
class n8 aiModel
class n3,n5,n16 decision
class n4,n12,n15 database
class n0,n1,n6 api
class n2,n7,n10,n11 code
classDef customIcon fill:none,stroke:none
class n0,n1,n2,n6,n7,n10,n11,n13 customIcon
Why This Matters: Competitor Topic Research Is Slow (and Messy)
Manual competitor research is the kind of task that steals time in tiny pieces. You review a competitor’s blog, skim categories, open articles, copy a few headings, then realize you missed the “resources” section or the glossary that’s quietly pulling links. Next thing you know, you’ve got scattered notes, inconsistent naming, and no way to compare two competitors without starting over. Honestly, the worst part is the mental load: you spend energy collecting data instead of spotting patterns and planning what to publish.
It adds up fast. Here’s where it usually breaks down:
- Copying titles, entities, and categories into spreadsheets takes about 2 hours per competitor site, even when you move quickly.
- Two people will label the same topic differently, so your “analysis” becomes cleanup.
- Important pages get missed because navigation hides them, or the site structure is deeper than you expected.
- You can’t reliably score content depth or coverage without a consistent, repeatable extraction process.
What You’ll Build: An Automated Competitor Topic Map in Google Sheets
This workflow starts with a simple form submission through an n8n webhook. You enter the competitor domain(s), plus crawl limits like depth and max pages. From there, Apify crawls the site and produces a dataset of URLs and page content you can analyze. The workflow pulls that dataset back into n8n, derives page metadata, and then uses AI (Gemini plus an analysis agent) to turn messy page text into structured outputs like topic hierarchies, entities, and depth scores. Finally, everything is normalized into clean fields, a labeled results sheet is created in Google Sheets, and rows are saved in a format you can filter, chart, or export. If you want, it can also email a report via Gmail once the run completes.
The workflow begins when you submit the crawl request. It then crawls and parses competitor pages, turning content into a consistent topic structure. At the end, it writes a tidy, analyzable table to Google Sheets (and optionally sends an email summary).
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Say you audit 5 competitors for a quarterly content refresh. Manually, if each site takes about 2 hours to review and log, that’s roughly 10 hours before you even start planning. With this workflow, you spend about 10 minutes submitting each crawl (around 50 minutes total), then you wait for Apify and the AI steps to process. You still review the output, but the copy-paste part is gone, and most teams get the bulk of that 10 hours back.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Google Sheets for storing the topic map output.
- Apify to crawl competitor websites at scale.
- Google Generative AI (Gemini) key (get it from Google AI Studio)
Skill level: Intermediate. You’ll connect accounts, add API keys, and tweak a few crawl settings safely.
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
You submit a crawl request. The workflow is triggered by an incoming webhook, which returns a simple form page so you can provide a competitor domain and crawl limits without touching the workflow every time.
Your inputs get validated and logged. n8n checks that required fields are present, then writes a request log to Google Sheets so you have a record of what was crawled and when.
Apify crawls the competitor site and returns a dataset. The workflow runs the Apify crawler, then pulls the resulting dataset via HTTP Request and derives useful metadata from each page.
AI converts raw pages into a topic map. Gemini and the language analysis agent extract structured topics, entities, and depth signals, then a normalization step cleans everything into consistent fields for reporting.
Results are written to a fresh Google Sheet (and optionally emailed). A sheet label is generated, a new results sheet is created, streams are merged, and final rows are stored. If you include an email payload, Gmail sends a report.
You can easily modify crawl depth and max pages based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Webhook Trigger
Set up the inbound request and the HTML response page for submitting competitor crawl requests.
- Open Incoming Webhook Trigger and set Path to
competitors. - Set Response Mode to
responseNodeand enable Multiple Methods. - Open Return Form Page and keep Respond With set to
textwith the provided HTML in Response Body. - Copy your production webhook URL and replace
https://[YOUR_WEBHOOK_URL]inside the HTML in Return Form Page.
https://[YOUR_WEBHOOK_URL] in the HTML, the form will submit to a placeholder URL and the workflow will never trigger.Step 2: Connect Google Sheets
Prepare the logging and results storage in Google Sheets.
- Open Log Request Details and select the target spreadsheet and Sheet Name
CONFIG. - Credential Required: Connect your googleSheetsOAuth2Api credentials in Log Request Details.
- Open Create Results Sheet and set Title to
{{ $json.sheet_name }}. - Credential Required: Connect your googleSheetsOAuth2Api credentials in Create Results Sheet and select the same spreadsheet.
- Open Store Captured Results and set Sheet Name to
{{ $('Generate Sheet Label').item.json.sheet_name }}. - Credential Required: Connect your googleSheetsOAuth2Api credentials in Store Captured Results.
Step 3: Set Up Crawl Assembly, Validation, and Parallel Logging
Build the crawl payload, validate inputs, and run the crawl and logging in parallel.
- Open Assemble Crawl Payload and confirm the code handles form input normalization and Apify payload construction.
- Open Validate Input Presence and keep the condition
{{ $json.startUrls[0].method }}equalsGET. - Validate Input Presence outputs to both Crawl Rival Site and Log Request Details in parallel.
- Open Crawl Rival Site and confirm Custom Body contains
{{ JSON.stringify({ ... }) }}withmaxRequestsPerCrawlandmaxDepthreferencing the input values. - Credential Required: Connect your apifyApi credentials in Crawl Rival Site.
- Open Retrieve Crawl Dataset and keep URL set to
{{ "https://api.apify.com/v2/datasets/" + ($json.data?.defaultDatasetId || $json.defaultDatasetId || "[YOUR_ID]") + "/items?clean=true&format=json&offset=0&limit=1" }}.
Step 4: Set Up AI Analysis and Normalization
Extract metadata, run the AI analysis, and normalize structured output for storage.
- Open Derive Page Metadata to ensure the code generates page_url, title, and markdown fields from the crawl result.
- Open Language Analysis Agent and keep the Text prompt with the JSON schema and the expression
{{ $json["markdown"] }}. - Open Gemini Content Review and ensure it is linked as the language model to Language Analysis Agent.
- Credential Required: Connect your googlePalmApi credentials in Gemini Content Review. This credential powers Language Analysis Agent.
- Open Normalize Model Output to ensure it formats main_topics_flat and key_entities_flat for downstream use.
- Open Generate Sheet Label to keep the sheet naming logic and ensure each analysis generates a unique tab name.
Step 5: Configure Output, Merging, and Email Delivery
Merge sheet creation with analysis output, store results, and send email reports conditionally.
- Generate Sheet Label outputs to both Create Results Sheet and Combine Streams in parallel.
- Open Combine Streams to ensure it merges the created sheet with the analysis output before mapping.
- Open Map Sheet Fields and confirm mappings:
- page_url →
{{ $('Normalize Model Output').item.json.page_url }} - main_topics →
{{ $('Normalize Model Output').item.json.main_topics_flat }} - key_words →
{{ $('Normalize Model Output').item.json.key_entities_flat }}
- page_url →
- Open Check Email Payload and verify the conditions:
{{ $json.page_url }}matches^https?://{{ $json.main_topics }}is not empty{{ $json.key_words }}is not empty
- Open Dispatch Email Report and set:
- Send To to
{{ $('Assemble Crawl Payload').item.json.notify_email }} - Subject to
SEO Audit Report: {{ $json.page_url }} - Message to the provided HTML template
- Send To to
- Credential Required: Connect your gmailOAuth2 credentials in Dispatch Email Report.
Step 6: Test and Activate Your Workflow
Verify the workflow end-to-end using the webhook form and then enable it for production use.
- Click Execute Workflow and open the Incoming Webhook Trigger test URL in your browser.
- Submit the form from Return Form Page with a valid competitor URL and email address.
- Confirm a new tab is created by Create Results Sheet and rows are appended by Store Captured Results.
- Verify that Dispatch Email Report sends a report when Check Email Payload passes.
- Once successful, toggle the workflow to Active to accept production requests.
Troubleshooting Tips
- Google Sheets credentials can expire or need specific permissions. If things break, check the Google connection in n8n’s Credentials list first.
- Apify runs can fail when crawl limits are too aggressive for a site. Review the run logs in Apify, then lower crawl_depth_num or max_pages_num and try again.
- Default AI prompts are generic. Add your exact definition of “topic,” “entity,” and “depth score” early, or you will keep editing outputs later.
Quick Answers
About 30 minutes if your keys and accounts are ready.
No. You’ll mostly connect services and adjust crawl settings. The included Code steps are already written, and you can run the workflow without editing them.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Apify usage and Gemini/OpenAI API costs for the AI analysis.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. You can change the competitor domains and crawl limits in the form inputs, then adjust max_pages_num and crawl_depth_num in the crawl payload logic. If you want different columns in Google Sheets, update the “Map Sheet Fields” step (and the normalization/code steps that shape the output). Common tweaks include adding a “content type” column, capturing internal link counts, and routing a summary to Slack instead of email.
Usually it’s expired OAuth access or a Google account permission issue. Reconnect the Google Sheets credential in n8n and confirm the target spreadsheet is accessible to that account. Also check that the workflow is allowed to create new sheets if you’re using the “Create Results Sheet” action. If it only fails sometimes, you may be hitting Google API limits during big crawls, so batching writes helps.
It depends on your crawl limits and plan, but most teams run a few hundred pages per competitor without issue if the crawl depth is sane. On n8n Cloud, higher tiers support higher monthly execution volume, while self-hosting removes execution caps (your server becomes the limit). Apify and the AI steps usually become the bottleneck before n8n does.
For this use case, often yes. You’re combining a crawler run, dataset retrieval, multi-step AI parsing, normalization, and conditional email delivery, which is a lot of branching and data shaping. n8n is more comfortable with that kind of “workflow logic,” and self-hosting matters because this workflow includes community nodes that require it. Zapier or Make can still work for a simpler version, but you’ll usually end up compromising on the crawl/AI depth or paying more as volume grows. If you want help deciding, Talk to an automation expert and explain your monthly crawl size.
Once this is running, competitor research stops being a recurring project and becomes a repeatable input to your content plan. The workflow collects the data. You make the calls.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.