Bright Data + Google Gemini for smarter web research
Research falls apart the moment the “important” page won’t load, blocks your scraper, or changes layout again. Then you’re stuck copying chunks into docs, trying to summarize messy text, and hoping you didn’t miss the one quote you needed.
This is where Bright Data research automation pays off. Marketing leads chasing competitor moves feel it first, but agency strategists and ops-minded founders get dragged into the same manual grind. This workflow turns a single URL into reusable topics, trends, and sentiment without you babysitting the process.
You’ll see how it pulls content through Bright Data’s Web Unlocker, has Google Gemini structure it, and then ships the results to your webhook endpoints while saving clean JSON files for later.
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: Bright Data + Google Gemini for smarter web research
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n1@{ icon: "mdi:robot", form: "rounded", label: "Markdown to Textual Data Ext..", pos: "b", h: 48 }
n2@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set URL and Bright Data Zone", pos: "b", h: 48 }
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
n5@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model for..", pos: "b", h: 48 }
n6@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model for..", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Perform Bright Data Web Requ.."]
n8@{ icon: "mdi:robot", form: "rounded", label: "Topic Extractor with the str..", pos: "b", h: 48 }
n9@{ icon: "mdi:robot", form: "rounded", label: "Trends by location and categ..", pos: "b", h: 48 }
n10@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
n12@{ icon: "mdi:code-braces", form: "rounded", label: "Create a binary file for top..", pos: "b", h: 48 }
n13@{ icon: "mdi:cog", form: "rounded", label: "Write the topics file to disk", pos: "b", h: 48 }
n14@{ icon: "mdi:cog", form: "rounded", label: "Write the trends file to disk", pos: "b", h: 48 }
n15@{ icon: "mdi:code-braces", form: "rounded", label: "Create a binary data for tends", pos: "b", h: 48 }
n10 -.-> n9
n2 --> n7
n15 --> n14
n12 --> n13
n7 --> n1
n0 --> n2
n1 --> n8
n1 --> n3
n1 --> n9
n5 -.-> n1
n8 --> n4
n8 --> n12
n6 -.-> n8
n9 --> n11
n9 --> n15
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n1,n8,n9 ai
class n5,n6,n10 aiModel
class n3,n4,n7,n11 api
class n12,n15 code
classDef customIcon fill:none,stroke:none
class n3,n4,n7,n11 customIcon
Why This Matters: Web research that doesn’t collapse on contact
Most “web research” looks simple until you do it at scale. The page you need is geo-gated, rate-limited, or protected by bot defenses. So you try again, change networks, grab screenshots, paste chunks into a doc, and still end up with an unstructured wall of text that no one wants to read. Even when you finally capture the content, turning it into something useful (topics, trend angles, sentiment, and a shareable output) becomes another half-day of busywork. Honestly, the worst part is the mental load: you’re doing fragile, repetitive steps while also trying to think strategically.
It adds up fast. Here’s where it usually breaks down.
- You waste about 1–2 hours per research round just getting the page content into a usable form.
- When a page blocks you, your research schedule slips, and the “insight” arrives after it’s useful.
- Manual summaries drift in quality, which means two people can read the same page and report different “takeaways.”
- Sharing results is messy because there’s no consistent format your team can reuse across reports and dashboards.
What You’ll Build: Bright Data capture + Gemini analysis pipeline
This workflow starts with a URL you want to research, then uses Bright Data’s Web Unlocker to retrieve the page reliably, even when the site is “difficult.” Once the page comes back, the workflow converts the content into markdown and hands it to Google Gemini inside n8n. From there, Gemini turns the page into plain text you can actually work with, then runs structured extraction to produce clean JSON outputs for topics and for trend clusters (by location and category). A sentiment-focused analysis is also triggered via webhook, so you can feed that result into whatever tool you already use for reporting. Finally, the workflow saves both the topics and trends outputs as local JSON files, so you have a durable artifact you can reuse later.
The flow is simple: provide a target URL and Bright Data zone, fetch the page through Bright Data, then let Gemini extract and structure what matters. At the end, your webhook endpoints get the outputs and your disk gets two tidy JSON files for archiving or downstream processing.
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Let’s say you do competitor research on 10 pages each week. Manually, it’s usually about 20 minutes to capture the content (especially when pages fight you), plus another 30 minutes to summarize and format, so you’re looking at roughly 8 hours weekly. With this workflow, you spend about 2 minutes setting the URL and starting the run, then wait roughly 10–20 minutes for capture and analysis per page. That’s closer to 2–3 hours total, and the output is already structured for reuse.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Bright Data for Web Unlocker page retrieval
- Google Gemini (PaLM) API to extract topics, trends, sentiment
- Bright Data zone + credentials (create a Web Unlocker zone in Bright Data)
Skill level: Intermediate. You’ll connect APIs, paste credentials, and adjust a few node settings like URLs, webhooks, and file paths.
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
You provide the target URL and Bright Data zone. The workflow is triggered manually, then a Set node defines which page to capture and which Bright Data Web Unlocker zone should be used for access.
Bright Data fetches the page content reliably. An HTTP Request node runs against Bright Data’s API, which helps you pull content from sites that normally block scraping or require more resilient access.
Google Gemini turns the page into structured insight. The markdown content is converted into plain text, then two structured extraction steps generate (1) a topics output and (2) trend clusters by location and category. A separate webhook call dispatches sentiment-related output so you can route it wherever you like.
Results get delivered and archived. The workflow sends outputs to your webhook endpoints, and it also writes two JSON files to disk (topics and trends) after converting the data to a binary payload.
You can easily modify the target URL and the AI prompts to fit different research goals, like product messaging analysis or category monitoring. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
This workflow starts on demand so you can test the extraction pipeline manually.
- Add the Manual Start Trigger node as the entry point.
- Connect Manual Start Trigger to Define Target URL and Zone.
Step 2: Connect the Bright Data Request
Set the target URL and scraping zone, then submit the request to Bright Data.
- In Define Target URL and Zone, set url to
https://www.bbc.com/news/world. - In Define Target URL and Zone, set zone to
web_unlocker1. - Open Execute Bright Data Request and set URL to
https://api.brightdata.com/request. - Set Method to
POST, enable Send Body and Send Headers. - In Body Parameters, set zone to
{{ $json.zone }}, url to{{ $json.url }}?product=unlocker&method=api, format toraw, and data_format tomarkdown. - Credential Required: Connect your
httpHeaderAuthcredentials in Execute Bright Data Request.
Step 3: Set Up Markdown Parsing with Gemini
Convert the scraped markdown to plain text before downstream analysis.
- In Markdown to Plain Text, set Text to
=You need to analyze the below markdown and convert to textual data. Please do not output with your own thoughts. Make sure to output with textual data only with no links, scripts, css etc. {{ $json.data }}. - Keep Prompt Type set to
defineand ensure the message includes “You are a markdown expert”. - Open Gemini Chat Model for Parsing and confirm Model Name is
models/gemini-2.0-flash-exp. - Credential Required: Connect your
googlePalmApicredentials in Gemini Chat Model for Parsing.
Step 4: Configure Parallel Topic and Trend Analysis
After parsing, the workflow branches into parallel analysis paths.
- Confirm that Markdown to Plain Text outputs to both Structured Topic Analyzer and Send Markdown Extraction Webhook and Cluster Trends by Region in parallel.
- In Structured Topic Analyzer, set Text to
=Perform the topic analysis on the below content and output with the structured information. Here's the content: {{ $('Execute Bright Data Request').item.json.data }}. - Keep Schema Type set to
manualand paste the provided JSON schema into Input Schema. - In Cluster Trends by Region, set Text to
=Perform the data analysis on the below content and output with the structured information by clustering the emerging trends by location and category Here's the content: {{ $('Execute Bright Data Request').item.json.data }}. - Keep Schema Type set to
manualand paste the provided JSON schema into Input Schema. - Open Gemini Chat Model for Sentiment and Gemini Chat Model for Trends and confirm Model Name is
models/gemini-2.0-flash-exp. - Credential Required: Connect your
googlePalmApicredentials in Gemini Chat Model for Sentiment. - Credential Required: Connect your
googlePalmApicredentials in Gemini Chat Model for Trends.
Step 5: Configure Webhook Outputs and File Writes
Send summaries to webhooks and save topic/trend JSON outputs to disk.
- In Send Markdown Extraction Webhook, set URL to
https://webhook.site/3c36d7d1-de1b-4171-9fd3-643ea2e4dd76and enable Send Body with content set to{{ $json.text }}. - Verify Structured Topic Analyzer outputs to both Dispatch Sentiment Webhook and Build Topics Binary Payload in parallel.
- In Dispatch Sentiment Webhook, set URL to
https://webhook.site/3c36d7d1-de1b-4171-9fd3-643ea2e4dd76and set summary to{{ $json.output }}. - Verify Cluster Trends by Region outputs to both Send Trends Webhook and Build Trends Binary Payload in parallel.
- In Send Trends Webhook, set URL to
https://webhook.site/3c36d7d1-de1b-4171-9fd3-643ea2e4dd76and set summary to{{ $json.output }}. - In Build Topics Binary Payload and Build Trends Binary Payload, keep the provided Function Code that base64-encodes JSON output.
- Set Save Topics File to Disk to write with File Name
d:\topics.jsonand Operationwrite. - Set Save Trends File to Disk to write with File Name
d:\trends.jsonand Operationwrite.
d:\topics.json and d:\trends.json require a Windows host. Update paths if you run n8n on Linux or Docker.Step 6: Test and Activate Your Workflow
Run a manual test to confirm the full extraction and analysis flow, then enable it for production use.
- Click Execute Workflow on Manual Start Trigger to run a test.
- Verify Execute Bright Data Request returns markdown in
dataand that Markdown to Plain Text outputs clean text. - Confirm that Structured Topic Analyzer and Cluster Trends by Region both run and that webhook requests succeed.
- Check that
d:\topics.jsonandd:\trends.jsonare created with structured JSON. - When satisfied, toggle the workflow Active to enable it for ongoing use.
Troubleshooting Tips
- Bright Data credentials can expire or require the right zone permissions. If things break, check your Bright Data Web Unlocker zone settings and the Header Auth credentials in the “Execute Bright Data Request” node first.
- If you’re using Wait-like behavior in downstream systems (or your target site responds slowly), processing times vary. Bump up timeouts in the HTTP Request nodes if webhook calls fail or return empty payloads.
- Default prompts in Gemini nodes are generic. Add your brand voice and strict output formatting in the structured extractor prompts early, or you will end up cleaning JSON by hand later.
Quick Answers
About 30 minutes if your Bright Data and Gemini credentials are ready.
No. You’ll mostly connect accounts, paste API keys, and edit a URL and a few webhook settings.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Bright Data usage and Google Gemini API costs, which vary based on how much content you process.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. Swap the target site by changing the URL in “Define Target URL and Zone,” then tweak the prompts inside “Structured Topic Analyzer” and “Cluster Trends by Region” to match what you want extracted (pricing mentions, feature comparisons, brand claims, or regulatory language are common picks). If your team uses a spreadsheet or database instead of local files, replace the “Save Topics File to Disk” and “Save Trends File to Disk” steps with Google Sheets or a database node. You can also point the webhook nodes to Slack, a reporting tool, or your internal API.
Usually it’s the Web Unlocker zone name, missing permissions on the Bright Data account, or an auth header that’s out of date. Confirm the zone configured in “Define Target URL and Zone” actually exists, then re-check the credentials used in “Execute Bright Data Request.” If the target site is especially aggressive, you may also need to adjust Bright Data settings on their side (that’s not an n8n issue).
On n8n Cloud Starter, expect a few thousand executions per month, and higher tiers handle more; self-hosting has no execution cap (it mostly depends on your server). In practice, this workflow is usually run “per page,” and most teams batch 10–50 URLs at a time once they’re confident in the prompts and webhook handling.
Often, yes. This workflow benefits from n8n’s ability to handle multi-step branching (capture, parse, extract, webhook, file output) without turning it into a fragile chain of separate zaps or scenarios. n8n also makes self-hosting practical, which matters when you run lots of research jobs and don’t want to pay per tiny step. Zapier or Make can still be fine for simple “URL in, summary out” needs, but structured extraction plus file handling gets clunky fast. If you’re unsure, Talk to an automation expert and describe your volume and outputs.
Once this is running, web research stops being a recurring fire drill. You get clean outputs you can reuse, forward, store, and build on.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.