Bright Data + Google Gemini for smarter web research

Research falls apart the moment the “important” page won’t load, blocks your scraper, or changes layout again. Then you’re stuck copying chunks into docs, trying to summarize messy text, and hoping you didn’t miss the one quote you needed.

This is where Bright Data research automation pays off. Marketing leads chasing competitor moves feel it first, but agency strategists and ops-minded founders get dragged into the same manual grind. This workflow turns a single URL into reusable topics, trends, and sentiment without you babysitting the process.

You’ll see how it pulls content through Bright Data’s Web Unlocker, has Google Gemini structure it, and then ships the results to your webhook endpoints while saving clean JSON files for later.

How This Automation Works

Here’s the complete workflow you’ll be setting up:

n8n Workflow Template: Bright Data + Google Gemini for smarter web research

Click to explore

flowchart LR

    subgraph sg0["When clicking ‘Test workflow’ Flow"]
        direction LR
        n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
        n1@{ icon: "mdi:robot", form: "rounded", label: "Markdown to Textual Data Ext..", pos: "b", h: 48 }
        n2@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set URL and Bright Data Zone", pos: "b", h: 48 }
        n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
        n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
        n5@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model for..", pos: "b", h: 48 }
        n6@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model for..", pos: "b", h: 48 }
        n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Perform Bright Data Web Requ.."]
        n8@{ icon: "mdi:robot", form: "rounded", label: "Topic Extractor with the str..", pos: "b", h: 48 }
        n9@{ icon: "mdi:robot", form: "rounded", label: "Trends by location and categ..", pos: "b", h: 48 }
        n10@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
        n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
        n12@{ icon: "mdi:code-braces", form: "rounded", label: "Create a binary file for top..", pos: "b", h: 48 }
        n13@{ icon: "mdi:cog", form: "rounded", label: "Write the topics file to disk", pos: "b", h: 48 }
        n14@{ icon: "mdi:cog", form: "rounded", label: "Write the trends file to disk", pos: "b", h: 48 }
        n15@{ icon: "mdi:code-braces", form: "rounded", label: "Create a binary data for tends", pos: "b", h: 48 }
        n10 -.-> n9
        n2 --> n7
        n15 --> n14
        n12 --> n13
        n7 --> n1
        n0 --> n2
        n1 --> n8
        n1 --> n3
        n1 --> n9
        n5 -.-> n1
        n8 --> n4
        n8 --> n12
        n6 -.-> n8
        n9 --> n11
        n9 --> n15
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n0 trigger
    class n1,n8,n9 ai
    class n5,n6,n10 aiModel
    class n3,n4,n7,n11 api
    class n12,n15 code
    classDef customIcon fill:none,stroke:none
    class n3,n4,n7,n11 customIcon

Why This Matters: Web research that doesn’t collapse on contact

Most “web research” looks simple until you do it at scale. The page you need is geo-gated, rate-limited, or protected by bot defenses. So you try again, change networks, grab screenshots, paste chunks into a doc, and still end up with an unstructured wall of text that no one wants to read. Even when you finally capture the content, turning it into something useful (topics, trend angles, sentiment, and a shareable output) becomes another half-day of busywork. Honestly, the worst part is the mental load: you’re doing fragile, repetitive steps while also trying to think strategically.

It adds up fast. Here’s where it usually breaks down.

You waste about 1–2 hours per research round just getting the page content into a usable form.
When a page blocks you, your research schedule slips, and the “insight” arrives after it’s useful.
Manual summaries drift in quality, which means two people can read the same page and report different “takeaways.”
Sharing results is messy because there’s no consistent format your team can reuse across reports and dashboards.

What You’ll Build: Bright Data capture + Gemini analysis pipeline

This workflow starts with a URL you want to research, then uses Bright Data’s Web Unlocker to retrieve the page reliably, even when the site is “difficult.” Once the page comes back, the workflow converts the content into markdown and hands it to Google Gemini inside n8n. From there, Gemini turns the page into plain text you can actually work with, then runs structured extraction to produce clean JSON outputs for topics and for trend clusters (by location and category). A sentiment-focused analysis is also triggered via webhook, so you can feed that result into whatever tool you already use for reporting. Finally, the workflow saves both the topics and trends outputs as local JSON files, so you have a durable artifact you can reuse later.

The flow is simple: provide a target URL and Bright Data zone, fetch the page through Bright Data, then let Gemini extract and structure what matters. At the end, your webhook endpoints get the outputs and your disk gets two tidy JSON files for archiving or downstream processing.

What You’re Building

What Gets Automated

What You’ll Achieve

Fetching protected or hard-to-access pages via Bright Data Web Unlocker.
Converting messy page content into clean plain text using Google Gemini.
Extracting structured topics and trend clusters as consistent JSON outputs.
Sending results to your webhook endpoints and writing JSON files to disk.

Turn one URL into analysis outputs in about 10–20 minutes of hands-off time.
Get the same report structure every run, so comparisons are straightforward.
Reduce copy-paste errors and “missed sections” that happen in manual research.
Make research reusable for dashboards, briefs, or weekly stakeholder updates.
Share findings faster because your webhook can push straight into your stack.

Expected Results

Let’s say you do competitor research on 10 pages each week. Manually, it’s usually about 20 minutes to capture the content (especially when pages fight you), plus another 30 minutes to summarize and format, so you’re looking at roughly 8 hours weekly. With this workflow, you spend about 2 minutes setting the URL and starting the run, then wait roughly 10–20 minutes for capture and analysis per page. That’s closer to 2–3 hours total, and the output is already structured for reuse.

Before You Start

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Bright Data for Web Unlocker page retrieval
Google Gemini (PaLM) API to extract topics, trends, sentiment
Bright Data zone + credentials (create a Web Unlocker zone in Bright Data)

Skill level: Intermediate. You’ll connect APIs, paste credentials, and adjust a few node settings like URLs, webhooks, and file paths.

Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).

Step by Step

You provide the target URL and Bright Data zone. The workflow is triggered manually, then a Set node defines which page to capture and which Bright Data Web Unlocker zone should be used for access.

Bright Data fetches the page content reliably. An HTTP Request node runs against Bright Data’s API, which helps you pull content from sites that normally block scraping or require more resilient access.

Google Gemini turns the page into structured insight. The markdown content is converted into plain text, then two structured extraction steps generate (1) a topics output and (2) trend clusters by location and category. A separate webhook call dispatches sentiment-related output so you can route it wherever you like.

Results get delivered and archived. The workflow sends outputs to your webhook endpoints, and it also writes two JSON files to disk (topics and trends) after converting the data to a binary payload.

You can easily modify the target URL and the AI prompts to fit different research goals, like product messaging analysis or category monitoring. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Manual Trigger

This workflow starts on demand so you can test the extraction pipeline manually.

Add the Manual Start Trigger node as the entry point.
Connect Manual Start Trigger to Define Target URL and Zone.

Use manual runs while you’re tuning prompts and webhook payloads to avoid unnecessary API costs.

Step 2: Connect the Bright Data Request

Set the target URL and scraping zone, then submit the request to Bright Data.

In Define Target URL and Zone, set url to https://www.bbc.com/news/world.
In Define Target URL and Zone, set zone to web_unlocker1.
Open Execute Bright Data Request and set URL to https://api.brightdata.com/request.
Set Method to POST, enable Send Body and Send Headers.
In Body Parameters, set zone to {{ $json.zone }}, url to {{ $json.url }}?product=unlocker&method=api, format to raw, and data_format to markdown.
Credential Required: Connect your httpHeaderAuth credentials in Execute Bright Data Request.

⚠️ Common Pitfall: Bright Data will fail if the zone does not exist in your account or if the auth header is missing.

Step 3: Set Up Markdown Parsing with Gemini

Convert the scraped markdown to plain text before downstream analysis.

In Markdown to Plain Text, set Text to =You need to analyze the below markdown and convert to textual data. Please do not output with your own thoughts. Make sure to output with textual data only with no links, scripts, css etc. {{ $json.data }}.
Keep Prompt Type set to define and ensure the message includes “You are a markdown expert”.
Open Gemini Chat Model for Parsing and confirm Model Name is models/gemini-2.0-flash-exp.
Credential Required: Connect your googlePalmApi credentials in Gemini Chat Model for Parsing.

The Gemini model is connected to Markdown to Plain Text as its language model—credentials belong on Gemini Chat Model for Parsing, not on Markdown to Plain Text.

Step 4: Configure Parallel Topic and Trend Analysis

After parsing, the workflow branches into parallel analysis paths.

Confirm that Markdown to Plain Text outputs to both Structured Topic Analyzer and Send Markdown Extraction Webhook and Cluster Trends by Region in parallel.
In Structured Topic Analyzer, set Text to =Perform the topic analysis on the below content and output with the structured information. Here's the content: {{ $('Execute Bright Data Request').item.json.data }}.
Keep Schema Type set to manual and paste the provided JSON schema into Input Schema.
In Cluster Trends by Region, set Text to =Perform the data analysis on the below content and output with the structured information by clustering the emerging trends by location and category Here's the content: {{ $('Execute Bright Data Request').item.json.data }}.
Keep Schema Type set to manual and paste the provided JSON schema into Input Schema.
Open Gemini Chat Model for Sentiment and Gemini Chat Model for Trends and confirm Model Name is models/gemini-2.0-flash-exp.
Credential Required: Connect your googlePalmApi credentials in Gemini Chat Model for Sentiment.
Credential Required: Connect your googlePalmApi credentials in Gemini Chat Model for Trends.

Both analyzers use Gemini models via their connected language model nodes—set credentials on Gemini Chat Model for Sentiment and Gemini Chat Model for Trends, not on the analyzer nodes.

Step 5: Configure Webhook Outputs and File Writes

Send summaries to webhooks and save topic/trend JSON outputs to disk.

In Send Markdown Extraction Webhook, set URL to https://webhook.site/3c36d7d1-de1b-4171-9fd3-643ea2e4dd76 and enable Send Body with content set to {{ $json.text }}.
Verify Structured Topic Analyzer outputs to both Dispatch Sentiment Webhook and Build Topics Binary Payload in parallel.
In Dispatch Sentiment Webhook, set URL to https://webhook.site/3c36d7d1-de1b-4171-9fd3-643ea2e4dd76 and set summary to {{ $json.output }}.
Verify Cluster Trends by Region outputs to both Send Trends Webhook and Build Trends Binary Payload in parallel.
In Send Trends Webhook, set URL to https://webhook.site/3c36d7d1-de1b-4171-9fd3-643ea2e4dd76 and set summary to {{ $json.output }}.
In Build Topics Binary Payload and Build Trends Binary Payload, keep the provided Function Code that base64-encodes JSON output.
Set Save Topics File to Disk to write with File Name d:\topics.json and Operation write.
Set Save Trends File to Disk to write with File Name d:\trends.json and Operation write.

⚠️ Common Pitfall: The file paths d:\topics.json and d:\trends.json require a Windows host. Update paths if you run n8n on Linux or Docker.

Step 6: Test and Activate Your Workflow

Run a manual test to confirm the full extraction and analysis flow, then enable it for production use.

Click Execute Workflow on Manual Start Trigger to run a test.
Verify Execute Bright Data Request returns markdown in data and that Markdown to Plain Text outputs clean text.
Confirm that Structured Topic Analyzer and Cluster Trends by Region both run and that webhook requests succeed.
Check that d:\topics.json and d:\trends.json are created with structured JSON.
When satisfied, toggle the workflow Active to enable it for ongoing use.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Troubleshooting Tips

Bright Data credentials can expire or require the right zone permissions. If things break, check your Bright Data Web Unlocker zone settings and the Header Auth credentials in the “Execute Bright Data Request” node first.
If you’re using Wait-like behavior in downstream systems (or your target site responds slowly), processing times vary. Bump up timeouts in the HTTP Request nodes if webhook calls fail or return empty payloads.
Default prompts in Gemini nodes are generic. Add your brand voice and strict output formatting in the structured extractor prompts early, or you will end up cleaning JSON by hand later.

Quick Answers

What’s the setup time for this Bright Data research automation?

About 30 minutes if your Bright Data and Gemini credentials are ready.

Is coding required for this web research automation?

No. You’ll mostly connect accounts, paste API keys, and edit a URL and a few webhook settings.

Is n8n free to use for this Bright Data research workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Bright Data usage and Google Gemini API costs, which vary based on how much content you process.

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I modify this Bright Data research workflow for different use cases?

Yes, and you should. Swap the target site by changing the URL in “Define Target URL and Zone,” then tweak the prompts inside “Structured Topic Analyzer” and “Cluster Trends by Region” to match what you want extracted (pricing mentions, feature comparisons, brand claims, or regulatory language are common picks). If your team uses a spreadsheet or database instead of local files, replace the “Save Topics File to Disk” and “Save Trends File to Disk” steps with Google Sheets or a database node. You can also point the webhook nodes to Slack, a reporting tool, or your internal API.

Why is my Bright Data connection failing in this workflow?

Usually it’s the Web Unlocker zone name, missing permissions on the Bright Data account, or an auth header that’s out of date. Confirm the zone configured in “Define Target URL and Zone” actually exists, then re-check the credentials used in “Execute Bright Data Request.” If the target site is especially aggressive, you may also need to adjust Bright Data settings on their side (that’s not an n8n issue).

What volume can this Bright Data research workflow process?

On n8n Cloud Starter, expect a few thousand executions per month, and higher tiers handle more; self-hosting has no execution cap (it mostly depends on your server). In practice, this workflow is usually run “per page,” and most teams batch 10–50 URLs at a time once they’re confident in the prompts and webhook handling.

Is this Bright Data research automation better than using Zapier or Make?

Often, yes. This workflow benefits from n8n’s ability to handle multi-step branching (capture, parse, extract, webhook, file output) without turning it into a fragile chain of separate zaps or scenarios. n8n also makes self-hosting practical, which matters when you run lots of research jobs and don’t want to pay per tiny step. Zapier or Make can still be fine for simple “URL in, summary out” needs, but structured extraction plus file handling gets clunky fast. If you’re unsure, Talk to an automation expert and describe your volume and outputs.

Once this is running, web research stops being a recurring fire drill. You get clean outputs you can reuse, forward, store, and build on.

Bright Data + Google Gemini for smarter web research

How This Automation Works

n8n Workflow Template: Bright Data + Google Gemini for smarter web research

Why This Matters: Web research that doesn’t collapse on contact