Bright Data + Gemini: clean Wikipedia summaries
Copying Wikipedia into a doc sounds simple. Then you hit messy formatting, random footnotes, citation brackets, and you still have to turn it into something your team will actually read.
Content strategists feel it when they’re building briefs fast. Market researchers deal with it when they’re pulling sources all week. And if you run an agency, you’ve probably done this at 10pm for “one last deliverable”. This Wikipedia summary automation cleans that up and gives you a consistent output you can reuse.
You’ll set up an n8n workflow that fetches a Wikipedia page through Bright Data, has Gemini format and summarize it, then ships the summary to a webhook (and optionally into Google Sheets). You’ll also learn where to customize the extraction and the final brief so it matches your use case.
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: Bright Data + Gemini: clean Wikipedia summaries
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n1@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model For..", pos: "b", h: 48 }
n2@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model2", pos: "b", h: 48 }
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Summary Webhook Notifier"]
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Wikipedia Web Request"]
n5@{ icon: "mdi:robot", form: "rounded", label: "LLM Data Extractor", pos: "b", h: 48 }
n6@{ icon: "mdi:robot", form: "rounded", label: "Concise Summary Generator", pos: "b", h: 48 }
n7@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set Wikipedia URL with Brigh..", pos: "b", h: 48 }
n5 --> n6
n4 --> n5
n6 --> n3
n2 -.-> n5
n0 --> n7
n7 --> n4
n1 -.-> n6
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n5,n6 ai
class n1,n2 aiModel
class n3,n4 api
classDef customIcon fill:none,stroke:none
class n3,n4 customIcon
Why This Matters: Turning Wikipedia Into Usable Briefs
Wikipedia is great for quick context, but it’s not delivered in a “brief-ready” format. The moment you need repeatable research, manual copy-paste becomes a quiet tax on your day. You pull a page, skim, extract the relevant section, strip citations, rewrite in your voice, then paste it into a doc or spreadsheet. Do that for five pages and you’re already behind. Worse, the summaries vary depending on who did them, which means stakeholders don’t trust the output and you end up re-checking the source anyway.
It adds up fast. Here’s where it breaks down in real teams.
- Formatting clean-up is surprisingly slow, and it interrupts your focus every time you switch from reading to editing.
- People summarize differently, so the “brief” becomes inconsistent across projects and clients.
- Source links and section references get lost, which makes later fact-checking annoying and error-prone.
- Scaling beyond a few pages turns into a backlog, because nobody wants to be the person stuck doing “Wikipedia duty”.
What You’ll Build: Bright Data → Gemini → Clean Summary Output
This workflow starts with a simple trigger in n8n (manual run, scheduled run, or an incoming webhook if you prefer). It takes a Wikipedia URL, then uses Bright Data’s Web Unlocker to fetch the page HTML reliably, even when scraping protections or rate limits get annoying. Next, Gemini cleans the page content into readable text, removing the clutter that makes summaries feel rough. From there, a summarization step creates a brief you can actually drop into a report, a client deck, or a knowledge base entry. Finally, the workflow sends the structured output to a webhook endpoint so you can route it to Google Sheets, a database, or whatever system your team already uses.
The workflow begins when you set the target Wikipedia URL and Bright Data zone. Bright Data retrieves the raw page content, then Gemini formats it into clean text and generates a short, consistent summary. Last, n8n dispatches the result to your webhook for storage, sharing, or downstream automation.
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Say you need 10 Wikipedia briefs each week for a market research roundup. Manually, you might spend about 15 minutes per page between cleaning, summarizing, and pasting into a sheet, so that’s roughly 2.5 hours weekly. With this workflow, you set the URL (or send it in), wait for processing, and the clean summary gets dispatched automatically in a couple minutes. You still review the output, but you’re reviewing, not rewriting.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Bright Data Web Unlocker for reliable Wikipedia HTML fetching
- Google Gemini API to clean and summarize the content
- Bright Data Web Unlocker token (get it from your Bright Data zone settings)
Skill level: Beginner. You’ll connect credentials, paste a token, and edit a few fields like the target URL and webhook endpoint.
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
Set the Wikipedia target and Bright Data zone. The workflow kicks off from a manual trigger (or you can later swap it to scheduled/webhook). A “set fields” step stores the Wikipedia URL you want and the Bright Data Web Unlocker zone details so the rest of the flow stays consistent.
Fetch the page content through Bright Data. n8n sends an HTTP request to the Web Unlocker endpoint and retrieves the page HTML. This is the part that saves you from random blocks, inconsistent results, and “works on my laptop” scraping headaches.
Clean the text and build the summary with Gemini. The LLM text formatter turns messy HTML into human-readable text, then the summarization chain produces the brief. You can keep it short, or expand it into bullets, sections, and entity lists (people, companies, dates) depending on what your team needs.
Send the final output to your systems. The workflow dispatches the summary to a webhook endpoint. That webhook can write to Google Sheets, trigger a Slack post, store in a database, or fan out into multiple steps.
You can easily modify the summary format to match your reporting template, or change the destination from a webhook to Google Sheets based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
This workflow starts manually and sets the Wikipedia target before fetching content.
- Add the Manual Execution Start node as the trigger.
- Connect Manual Execution Start to Assign Wiki Target & Zone.
- In Assign Wiki Target & Zone, set url to
https://en.wikipedia.org/wiki/Cloud_computing?product=unlocker&method=api. - Set zone to
web_unlocker1.
Step 2: Connect Bright Data and Fetch the Page
Fetch the Wikipedia page content through Bright Data using the zone and URL from the previous node.
- Add the Bright Data Fetch Request node and connect it after Assign Wiki Target & Zone.
- Set URL to
https://api.brightdata.com/request. - Set Method to
POSTand enable Send Body and Send Headers. - In Body Parameters, set zone to
{{ $json.zone }}, url to{{ $json.url }}, and format toraw. - Credential Required: Connect your httpHeaderAuth credentials in Bright Data Fetch Request.
Step 3: Set Up the LLM Formatting and Summarization Chain
The content is formatted and summarized through the LLM chain nodes.
- Add LLM Text Formatter and connect it after Bright Data Fetch Request.
- Set Text in LLM Text Formatter to
{{ $json.data }}. - Ensure Prompt Type is set to
defineand Has Output Parser is enabled. - Connect Gemini Pro Chat Model to LLM Text Formatter as the language model.
- Credential Required: Connect your googlePalmApi credentials in Gemini Pro Chat Model.
- Add Brief Summary Builder and connect it after LLM Text Formatter.
- Set Chunking Mode to
advancedand keep the prompt as configured. - Connect Gemini Flash Summarizer to Brief Summary Builder as the language model.
- Credential Required: Connect your googlePalmApi credentials in Gemini Flash Summarizer.
Step 4: Configure the Output Webhook
Send the summarized text to the destination webhook.
- Add Dispatch Summary Webhook and connect it after Brief Summary Builder.
- Set URL to
https://webhook.site/ce41e056-c097-48c8-a096-9b876d3abbf7. - Enable Send Body and set summary to
{{ $json.response.text }}in Body Parameters.
Step 5: Test and Activate Your Workflow
Run a manual test to verify the end-to-end execution and then activate for production use.
- Click Execute Workflow from Manual Execution Start to run a test.
- Confirm that Bright Data Fetch Request returns content and LLM Text Formatter receives
{{ $json.data }}. - Verify Brief Summary Builder produces a summarized output and Dispatch Summary Webhook receives
{{ $json.response.text }}. - When successful, toggle the workflow to Active for production use.
Troubleshooting Tips
- Bright Data credentials can expire or the token can be for the wrong zone. If things break, check your Web Unlocker token and zone settings in Bright Data first, then update the Header Auth credential in n8n.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Quick Answers
About 30 minutes if you already have your Bright Data and Gemini keys.
No. You’ll mostly paste credentials and edit a few fields like the URL and webhook destination.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Bright Data usage and Gemini API costs.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. Swap the Wikipedia URL in the “Assign Wiki Target & Zone” step, then adjust the prompts in “LLM Text Formatter” and “Brief Summary Builder” to output bullets, key entities, or a longer executive brief. A common tweak is forcing a consistent structure (Overview, Key facts, Timeline, Sources) so Google Sheets rows stay uniform. You can also replace the “Dispatch Summary Webhook” destination with a Google Sheets insert if Sheets is your main home base.
Usually it’s a token or zone mismatch. Regenerate your Bright Data Web Unlocker token (or confirm you’re using the right zone), then update the Header Authentication credential in n8n. Also check the HTTP Request node headers, because a missing “Bearer” prefix will fail silently in a way that looks like a network issue. If it only fails sometimes, you may be hitting rate limits, so slow down runs or stagger requests.
If you self-host n8n, there’s no execution cap (it mainly depends on your server) and most teams comfortably run dozens to hundreds of summaries a day. On n8n Cloud, your monthly execution limit depends on the plan. Bright Data and Gemini will usually be the real bottlenecks because they add per-request cost and occasional throttling. Practically, start with a batch of 20 pages, confirm quality, then scale up with scheduling.
Often, yes, because this kind of flow benefits from multi-step processing: fetch HTML, clean it, summarize it, then dispatch structured output. n8n handles branching and prompt iterations without feeling like you’re fighting the platform. It’s also easier to self-host, which matters if you run this frequently. Zapier or Make can still work if your needs are tiny and you prefer a simpler UI, but costs and limitations show up fast when you start processing lots of pages. If you want help picking the right approach, Talk to an automation expert.
Once this is running, Wikipedia stops being a time sink and becomes an input you can trust. The workflow handles the repetitive cleanup so you can spend your attention on decisions, not formatting.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.