Scrapeless + Pinecone: SEO drafts without busywork
Keyword research turns into a tab explosion fast. You scrape results, copy snippets into a doc, try to “feel” the intent, then stare at a blank page because your notes are scattered and your examples are inconsistent.
This SEO draft automation hits content marketers hardest, but agency leads and solo founders feel it too. You will go from “research soup” to a clean brief and a first draft that actually sounds like a real writer (not a generic template).
This workflow uses Scrapeless for SERP research and scraping, then Pinecone to store a reusable knowledge base, and finally an LLM to generate drafts using that knowledge. You’ll see what it automates, what you get out of it, and what to watch for when you set it up.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Scrapeless + Pinecone: SEO drafts without busywork
flowchart LR
subgraph sg0["When clicking ‘Execute workflow’ Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Execute workf..", pos: "b", h: 48 }
n1@{ icon: "mdi:cube-outline", form: "rounded", label: "Pinecone Vector Store", pos: "b", h: 48 }
n2@{ icon: "mdi:robot", form: "rounded", label: "Default Data Loader", pos: "b", h: 48 }
n3@{ icon: "mdi:robot", form: "rounded", label: "Recursive Character Text Spl..", pos: "b", h: 48 }
n10@{ icon: "mdi:cog", form: "rounded", label: "Aggregate", pos: "b", h: 48 }
n11@{ icon: "mdi:cog", form: "rounded", label: "Convert to File", pos: "b", h: 48 }
n12@{ icon: "mdi:swap-vertical", form: "rounded", label: "Edit Fields1", pos: "b", h: 48 }
n13@{ icon: "mdi:robot", form: "rounded", label: "Basic LLM Chain1", pos: "b", h: 48 }
n14@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model1", pos: "b", h: 48 }
n15["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/markdown.dark.svg' width='40' height='40' /></div><br/>Markdown"]
n16["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/html.dark.svg' width='40' height='40' /></div><br/>HTML"]
n17@{ icon: "mdi:swap-vertical", form: "rounded", label: "Loop Over Items", pos: "b", h: 48 }
n18@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings Google Gemini", pos: "b", h: 48 }
n19@{ icon: "mdi:cog", form: "rounded", label: "Crawl all Blogs", pos: "b", h: 48 }
n20["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse content and extract in.."]
n21@{ icon: "mdi:cog", form: "rounded", label: "Scrape detailed contents", pos: "b", h: 48 }
n22@{ icon: "mdi:cog", form: "rounded", label: "Analyze target keywords on G..", pos: "b", h: 48 }
n23@{ icon: "mdi:swap-vertical", form: "rounded", label: "Split Out the url and text", pos: "b", h: 48 }
n15 --> n16
n10 --> n11
n12 --> n22
n11 --> n1
n19 --> n20
n17 --> n10
n17 --> n21
n13 --> n15
n2 -.-> n1
n18 -.-> n1
n21 --> n17
n14 -.-> n13
n23 --> n17
n3 -.-> n2
n0 --> n19
n0 --> n12
n20 --> n23
n22 --> n13
end
subgraph sg1["When chat message received Flow"]
direction LR
n4@{ icon: "mdi:play-circle", form: "rounded", label: "When chat message received", pos: "b", h: 48 }
n5@{ icon: "mdi:memory", form: "rounded", label: "Window Buffer Memory", pos: "b", h: 48 }
n6@{ icon: "mdi:robot", form: "rounded", label: "AI Agent1", pos: "b", h: 48 }
n7@{ icon: "mdi:cube-outline", form: "rounded", label: "Pinecone Vector Store3", pos: "b", h: 48 }
n8@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings Google Gemini3", pos: "b", h: 48 }
n9@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model3", pos: "b", h: 48 }
n5 -.-> n6
n7 --> n6
n8 -.-> n7
n9 -.-> n6
n4 --> n7
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0,n4 trigger
class n2,n3,n13,n6 ai
class n14,n9 aiModel
class n5 ai
class n1,n7 ai
class n18,n8 ai
class n20 code
classDef customIcon fill:none,stroke:none
class n15,n16,n20 customIcon
The Problem: SEO drafts get stuck between research and writing
Most SEO teams don’t struggle with “writing.” They struggle with everything around it. You gather SERP data, look at top posts, grab a few angles, pull out long-tail keywords, and try to turn that into a brief that a writer can actually use. Then you do it again next week, from scratch, with a slightly different keyword and the same messy process. The cost isn’t just time (though it’s easily a few hours per article). It’s inconsistency. Your briefs vary by who made them, your drafts drift off intent, and you end up editing the same structural problems over and over.
The friction compounds. Here’s where it breaks down in real life:
- Manual SERP collection is slow, and it’s easy to miss patterns when your notes live in five places.
- Your “examples” aren’t reusable, so every new draft starts with the same re-learning curve.
- Writers get vague direction like “make it more actionable,” which means extra rounds and slower publishing.
- Even good AI outputs need heavy editing because the model has no stable reference set for tone and structure.
The Solution: Scrapeless SERP research + Pinecone memory + AI drafting
This workflow turns your research process into a repeatable system. It starts by scraping a set of proven articles from a strong source (the template uses a well-known writer as an example) and then breaks that content into clean chunks that can be searched later. Those chunks are converted into embeddings (a numeric “fingerprint” the AI can search by meaning) and saved into Pinecone so you have a durable knowledge base. Next, when you want a new post, the workflow runs live SERP analysis with Scrapeless for your target keyword and asks an LLM to create a keyword analysis report in Markdown, then converts it into HTML for easier use in briefs. Finally, another AI step writes a draft using Retrieval-Augmented Generation (RAG), pulling the most relevant chunks from Pinecone so the output stays grounded in your chosen reference style and topics.
The workflow starts when you manually launch it for research and knowledge-base building, or when you trigger it via a chat message for generation. Scrapeless gathers SERP results and page content, Pinecone stores and retrieves the best matching references, and the LLM produces a structured report plus a full article body. The result feels less like “prompting” and more like running a content production pipeline.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you publish 3 SEO posts a week. A typical manual cycle looks like: about 45 minutes pulling SERP competitors and snippets, about 45 minutes extracting angles and long-tail terms, then another hour assembling a usable brief and outline (so roughly 2–3 hours before writing even starts). With this workflow, you trigger the run with one keyword, let Scrapeless gather SERP data, and the system generates the HTML report plus a first draft after the processing finishes. You still edit, of course, but you’re starting from a structured brief and a grounded draft, not a blank doc.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Scrapeless to scrape pages and run SERP analysis.
- Pinecone to store and retrieve your vector knowledge base.
- LLM API key (get it from OpenAI or Google Gemini).
Skill level: Intermediate. You’ll connect credentials, install a community node on self-hosted n8n, and adjust a few model/index settings.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
Knowledge base build trigger. You manually launch the workflow to crawl and scrape a collection of reference articles through Scrapeless, then prepare them for storage.
Chunking and embedding. The scraped text is split into smaller chunks, converted into embeddings using a model like Gemini Embedding, and packaged into documents that Pinecone can index for similarity search.
SERP research and planning. You provide a keyword, Scrapeless fetches the SERP results, and an LLM turns that into a keyword analysis report (long-tail ideas, intent notes, angles) formatted in Markdown and then HTML.
RAG drafting output. A chat-triggered generation step pulls relevant knowledge from Pinecone and produces a full SEO draft plus supporting outputs (like title ideas and the HTML report), which you can route into Google Drive or Google Sheets if you want storage and handoff.
You can easily modify the reference source to match your niche and brand voice based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
This workflow starts with a manual run to initiate both content crawling and keyword planning in parallel.
- Add the Manual Launch Trigger node as the entry point.
- Confirm there are no parameters to set for Manual Launch Trigger.
- Ensure Manual Launch Trigger outputs to both Crawl Blog Pages and Assign Keyword Fields in parallel.
Manual Launch Trigger outputs to both Crawl Blog Pages and Assign Keyword Fields in parallel.
Step 2: Connect Scrapeless and Parse Blog Content
This branch crawls blog pages, extracts links and content, and batches the results for aggregation.
- In Crawl Blog Pages, set URL to
https://www.scrapeless.com/en/blog, Resource tocrawler, Operation tocrawl, and Limit Crawl Pages to20. - Credential Required: Connect your scrapelessApi credentials in Crawl Blog Pages.
- In Parse Content Details, keep the provided JavaScript to extract
title,mainContent, andextractedLinks. - In Split Links and Text, set Field to Split Out to
extractedLinks. - In Scrape Full Content, set URL to
{{ $json.url }}and Resource tocrawler. - Credential Required: Connect your scrapelessApi credentials in Scrape Full Content.
- Keep Iterate Batch Items connected so it loops through scraped pages and passes batches to Aggregate Records.
Tip: If Parse Content Details logs “Markdown content is not a string,” verify that Crawl Blog Pages is returning a valid markdown field.
Step 3: Aggregate, Convert, and Store Content in Pinecone
This step aggregates scraped content into a file, chunks it, and inserts embeddings into Pinecone.
- In Aggregate Records, set Include to
specifiedFields, Aggregate toaggregateAllItemData, and Fields to Include tomarkdown. - In Convert Data to File, set Operation to
toTextand Source Property todata. - Configure Recursive Text Chunker with Chunk Size
2000and Chunk Overlap200. - In Standard Data Loader, set Data Type to
binaryto load the file output. - In Pinecone Index Writer, set Mode to
insert, Pinecone Namespace toDataPlace, and select Pinecone Indexseo-writer. - Credential Required: Connect your pineconeApi credentials in Pinecone Index Writer.
- Gemini Embedding Builder is connected as the embedding model for Pinecone Index Writer — ensure credentials are added to the parent connection for the AI embedding input (not the sub-node).
⚠️ Common Pitfall: If Pinecone Index Writer fails to insert vectors, confirm that the seo-writer index exists and the namespace DataPlace is valid.
Step 4: Configure Keyword Planning and Content Formatting
This branch builds keyword strategy from SERP data, then formats the AI output into HTML.
- In Assign Keyword Fields, set Keywords to
"Scraping", "Google trends"and Search Intent toPeople searching to get tips on Scraping. - In Analyze SERP Keywords, set q to
{{ $json.Keywords }}. - Credential Required: Connect your scrapelessApi credentials in Analyze SERP Keywords.
- In LLM Keyword Planner, keep the Text prompt as provided, including the expressions
{{ $('Assign Keyword Fields').item.json.Keywords }},{{ $('Assign Keyword Fields').item.json['Search Intent'] }}, and{{ JSON.stringify($json.organic_results) }}. - Gemini Chat Model A is connected as the language model for LLM Keyword Planner — ensure credentials are added to the parent connection for the AI language model input (not the sub-node).
- In Markdown Converter, set Mode to
markdownToHtmland Markdown to{{ $json.text }}. - In HTML Formatter, keep the HTML template and ensure the content placeholder uses
{{ $json.data }}.
Step 5: Set Up Conversational Retrieval and AI Reasoning
This branch enables chat-based retrieval from Pinecone and uses a reasoning agent to generate answers.
- Use Chat Message Trigger as the entry point for chat-based queries.
- In Pinecone Index Reader, set Mode to
loadand Prompt to{{ $json.chatInput }}, with Pinecone NamespaceDataPlaceand Pinecone Indexseo-writer. - Credential Required: Connect your pineconeApi credentials in Pinecone Index Reader.
- In AI Reasoning Agent, set Text to
{{ $('Chat Message Trigger').first().json.chatInput }}and keep the system message template for contextual answers. - In Conversation Buffer Memory, set Session Key to
{{ $('Chat Message Trigger').first().json.sessionId }}and Session ID Type tocustomKey. - Gemini Embedding Engine is connected as the embedding model for Pinecone Index Reader — ensure credentials are added to the parent connection for the AI embedding input (not the sub-node).
- Gemini Chat Model is connected as the language model for AI Reasoning Agent — ensure credentials are added to the parent connection for the AI language model input (not the sub-node).
Step 6: Test & Activate Your Workflow
Run a full test of both branches to verify data flow and AI outputs before enabling production use.
- Click Execute Workflow from Manual Launch Trigger and confirm that HTML Formatter produces a formatted report.
- Send a test chat message through Chat Message Trigger and verify AI Reasoning Agent returns a response using Pinecone context.
- Check the execution log to confirm Pinecone Index Writer inserted vectors and Pinecone Index Reader returned documents.
- When successful, toggle the workflow to Active to enable chat-based retrieval in production.
Common Gotchas
- Scrapeless credentials can expire or need specific permissions. If things break, check the n8n Credentials tab and your Scrapeless API key status first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
Plan on about 1–2 hours if you already have your API keys and Pinecone index ready.
No. You’ll mostly connect accounts and edit prompts and fields. Light tweaking in a Code node is optional, not required.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in LLM and embedding usage (often a few dollars per month at moderate volume) plus Pinecone storage.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. Replace the scraped source URLs in the Scrapeless crawl/scrape part with your own best posts (or your client’s), then adjust the AI Agent prompt to enforce your tone, formatting rules, and “do/don’t” list. Common customizations include changing the embedding model dimensions, storing extra metadata per chunk (topic, product line), and swapping the chat model (Gemini/OpenAI/OpenRouter) based on quality and cost.
Usually it’s an expired or incorrect API key in n8n Credentials. It can also be an account-level permission issue, or a blocked request pattern if you’re scraping too aggressively. Check the HTTP response in the Scrapeless node execution output first because it will usually tell you what’s wrong in plain text.
If you self-host, there’s no execution limit (it mostly depends on your server and API rate limits). On n8n Cloud, it depends on your plan’s monthly executions, and each scrape + embed + draft run can consume multiple executions. Practically, most small teams can run dozens of drafts a week without issues if they batch scraping and keep the knowledge base build separate from daily drafting.
Often, yes, because this workflow relies on branching logic, batching, and RAG-style retrieval that’s awkward (or pricey) in simpler automation tools. n8n is also easier to self-host, which matters when you start running lots of scrapes and AI steps. Zapier or Make can still be fine for “keyword in, doc out” basics, but you’ll hit limits faster once you add vector databases and multi-step content pipelines. Honestly, the best choice depends on volume and how much control you want over prompts and data handling. Talk to an automation expert if you want a quick recommendation based on your setup.
Once this is running, you stop rebuilding the same research stack every week. The workflow handles the repetitive lift, and you spend your time on the parts that actually move rankings: editing, differentiation, and publishing.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.