Crunchbase to Google Sheets, leads enriched and ready
Pulling company details out of Crunchbase sounds simple until you’re on your fifth tab, copying funding info into a sheet, and realizing half your “notes” are inconsistent. It’s slow. It’s error-prone. And it quietly wrecks your follow-up because the context never makes it into the spreadsheet.
This Crunchbase Sheets automation hits SDRs first (because you’re the one doing the research). But marketing analysts building segmented lists and RevOps trying to standardize lead intel feel the same pain. The outcome is straightforward: clean Google Sheets rows plus an executive summary you can actually use for outreach.
You’ll see exactly how the workflow turns a Crunchbase company page into structured fields, a quick summary, and a webhook payload your other tools can act on.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Crunchbase to Google Sheets, leads enriched and ready
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n1@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set URL and Bright Data Zone", pos: "b", h: 48 }
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Perform Bright Data Web Requ.."]
n3@{ icon: "mdi:robot", form: "rounded", label: "Markdown to Textual Data Ext..", pos: "b", h: 48 }
n4@{ icon: "mdi:database", form: "rounded", label: "Google Sheets", pos: "b", h: 48 }
n5@{ icon: "mdi:robot", form: "rounded", label: "Structured Output Parser", pos: "b", h: 48 }
n6@{ icon: "mdi:robot", form: "rounded", label: "Structured Data Extractor", pos: "b", h: 48 }
n7@{ icon: "mdi:code-braces", form: "rounded", label: "Create a binary data for Sum..", pos: "b", h: 48 }
n8@{ icon: "mdi:code-braces", form: "rounded", label: "Create a binary data for Str..", pos: "b", h: 48 }
n9@{ icon: "mdi:cog", form: "rounded", label: "Write the summarized content..", pos: "b", h: 48 }
n10@{ icon: "mdi:cog", form: "rounded", label: "Write the structured content..", pos: "b", h: 48 }
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
n12@{ icon: "mdi:robot", form: "rounded", label: "Summarize the content", pos: "b", h: 48 }
n13@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model", pos: "b", h: 48 }
n14@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model1", pos: "b", h: 48 }
n15@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model2", pos: "b", h: 48 }
n13 -.-> n3
n14 -.-> n12
n15 -.-> n6
n12 --> n7
n5 -.-> n6
n6 --> n4
n6 --> n8
n6 --> n11
n1 --> n2
n2 --> n3
n0 --> n1
n3 --> n12
n3 --> n6
n7 --> n9
n8 --> n10
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n3,n5,n6,n12 ai
class n13,n14,n15 aiModel
class n4 database
class n2,n11 api
class n7,n8 code
classDef customIcon fill:none,stroke:none
class n2,n11 customIcon
The Problem: Crunchbase research doesn’t scale
Crunchbase is great for finding companies, but the “last mile” is brutal. You open a profile, copy the basics, grab funding details, try to interpret industry tags, and then paste everything into Google Sheets (plus a few notes so future-you remembers why this company mattered). After a few dozen companies, your spreadsheet becomes a museum of different formats and half-finished fields. Worse, the context lives in someone’s head or a Slack message, not where the team can use it. Honestly, the time sink is bad, but the inconsistency is what really breaks outreach.
It adds up fast. Here’s where it breaks down.
- You spend about 10 minutes per company just copying and cleaning fields like location, industry tags, and funding details.
- Two people can research the same company and log completely different notes, which means your “lead list” isn’t reliably comparable.
- Raw text from a page is hard to use, so the executive context gets skipped when you’re in a hurry.
- Even when the sheet is filled in, nothing downstream happens automatically, so leads sit there until someone remembers.
The Solution: Crunchbase pages transformed into structured lead intel
This n8n workflow turns a single Crunchbase company URL into a complete “lead record” your team can actually work with. It starts by taking the target Crunchbase URL you provide and pulling the page content through Bright Data’s Web Unlocker, which is built for scraping and access issues that make normal requests flaky. That raw content often comes back as messy markdown, so the workflow converts it into readable text. Then OpenAI extracts the important fields (company name, location, funding rounds, founding year, industry tags, and more) and generates an executive summary that’s usable for sales or market research. Finally, it writes the structured fields and summary into Google Sheets, saves a copy of the outputs to disk for audit/debug, and sends a webhook payload so another system can react right away.
The workflow starts when you run it in n8n with a Crunchbase company URL and your Bright Data zone set. From there, it scrapes the page, turns it into clean text, and runs two AI passes: one for structured fields, one for the summary. The end result lands in Google Sheets, plus optional file storage and a webhook notification.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you research 20 companies for a new outbound segment. Manually, it’s usually about 10 minutes per company to copy fields, clean formatting, and write a quick note, so you’re at roughly 3 hours (and that’s if nothing gets missed). With this workflow, you paste the URL once, let Bright Data pull the page, and wait for the AI extraction and summary to finish. The “hands-on” time drops to a few minutes total, and the sheet is ready for outreach without another cleanup pass.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Bright Data for Crunchbase scraping via Web Unlocker
- Google Sheets to store structured lead rows
- OpenAI API key (get it from your OpenAI dashboard)
Skill level: Intermediate. You’ll connect credentials, set a URL/zone value, and test a run end-to-end.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
Provide a Crunchbase URL. You run the workflow and it assigns the target company page URL along with your Bright Data Web Unlocker zone settings.
Scrape the page content reliably. Bright Data pulls the content even when Crunchbase blocks basic scrapers, and n8n receives the response for processing.
Turn raw content into usable inputs. The workflow converts scraped markdown into plain text, then sends that text through two OpenAI paths: one that extracts structured fields and one that creates a readable executive summary.
Deliver the outputs where your team works. Structured fields are appended to Google Sheets, files are saved locally for traceability, and a webhook request sends the same lead payload to whatever system should react next (Slack, a CRM, or an internal endpoint).
You can easily modify the fields you extract to match your sheet columns and outreach needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
Start the workflow with a manual trigger so you can validate the data extraction and AI processing before scheduling.
- Add the Manual Run Trigger node as the starting point.
- Leave default settings, since this node has no parameters in the workflow.
- Connect Manual Run Trigger to Assign Target URL & Zone.
Step 2: Connect Bright Data and Fetch the Target Page
Configure the target URL and zone, then call Bright Data to retrieve markdown content.
- In Assign Target URL & Zone, add two fields: url set to
https://www.crunchbase.com/organization/stripeand zone set toweb_unlocker1. - Open Bright Data API Call and set URL to
https://api.brightdata.com/request. - Set Method to
POST, enable Send Body, and enable Send Headers. - In Body Parameters, set zone to
={{ $json.zone }}and url to={{ $json.url }}, plus format torawand data_format tomarkdown. - Credential Required: Connect your httpHeaderAuth credentials in Bright Data API Call.
- Connect Assign Target URL & Zone to Bright Data API Call.
Step 3: Clean Markdown and Set Up AI Models
Convert markdown into clean text, then prepare the LLMs used for summarization and structuring.
- In Convert Markdown to Text, set Text to
=You need to analyze the below markdown and convert to textual data. Please do not output with your own thoughts. Make sure to output with textual data only with no links, scripts, css etc.\n\n{{ $json.data }}. - In Convert Markdown to Text, add a message value with Message set to
You are a markdown expert. - Credential Required: Connect your openAiApi credentials in Primary Chat Model and select model
gpt-4o-mini. - Credential Required: Connect your openAiApi credentials in Summary Chat Model and select model
gpt-4o-mini. - Credential Required: Connect your openAiApi credentials in Structuring Chat Model and select model
gpt-4o-mini. - Connect Bright Data API Call to Convert Markdown to Text.
Convert Markdown to Text outputs to both Summarize Content and Extract Structured Fields in parallel.
Step 4: Summarize and Structure the Content
Run parallel AI chains: one produces a summary, the other extracts structured fields using a JSON schema.
- In Summarize Content, set Chunking Mode to
advanced. - Connect Summary Chat Model as the language model for Summarize Content (credentials are added on Summary Chat Model).
- In Extract Structured Fields, set Text to
=Extract the structured info from the below content.\n\nHere's the Content: {{ $json.text }}and enable Has Output Parser. - Open Structured Output Reader and paste the JSON schema into JSON Schema Example exactly as provided in the workflow.
- Connect Structured Output Reader as the output parser for Extract Structured Fields. This sub-node does not take credentials; add OpenAI credentials to Structuring Chat Model.
Extract Structured Fields outputs to Update Spreadsheet, Build Binary for Struct, and Send Webhook for Results in parallel.
Step 5: Configure Spreadsheet and File Outputs
Store structured data in Google Sheets and persist both summary and structured outputs as local JSON files.
- In Update Spreadsheet, set Operation to
appendOrUpdate, Document to[YOUR_ID], and Sheet toSheet1. - Credential Required: Connect your googleSheetsOAuth2Api credentials in Update Spreadsheet.
- In Build Binary for Summary, keep the function code exactly as provided to convert JSON to binary.
- In Save Summary to File, set Operation to
writeand File Name to=d:\Crunchbase-Summary.json. - In Build Binary for Struct, keep the same function code to convert structured JSON to binary.
- In Save Structured File, set Operation to
writeand File Name to=d:\Crunchbase-Summary.json.
⚠️ Common Pitfall: Both Save Summary to File and Save Structured File write to the same path (d:\Crunchbase-Summary.json). Use different filenames if you want to keep both outputs.
Step 6: Configure Webhook Delivery for Results
Send the structured output to an external webhook endpoint.
- In Send Webhook for Results, set URL to
https://webhook.site/[YOUR_ID]. - Enable Send Body and set body parameter summary to
={{ $json.output }}.
Step 7: Test and Activate Your Workflow
Run an end-to-end test to confirm the parallel branches complete and outputs are written successfully.
- Click Execute Workflow to run Manual Run Trigger and verify data flows into Bright Data API Call.
- Confirm that Convert Markdown to Text splits into both Summarize Content and Extract Structured Fields in parallel.
- Check that Update Spreadsheet appends a row and Send Webhook for Results receives the JSON payload.
- Verify files are created by Save Summary to File and Save Structured File.
- When satisfied, switch the workflow to Active for production use.
Common Gotchas
- Bright Data credentials can expire or need specific permissions. If things break, check your Bright Data Web Unlocker token and zone settings in the n8n credential and the “Assign Target URL & Zone” values first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if your Bright Data, OpenAI, and Google Sheets accounts are ready.
No. You’ll mostly be pasting API keys and matching extracted fields to your Google Sheets columns.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API usage plus Bright Data scraping costs, which depend on your volume.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. You can adjust the prompt used in “Extract Structured Fields” to include fields like revenue, social links, or leadership details, and you can tweak “Summarize Content” to produce a sales-ready snapshot instead of a generic overview. If you want the sheet to match your CRM, align the output parser fields to your exact column names so the “Update Spreadsheet” step stays clean. You can also route the webhook to HubSpot or Salesforce so leads are created automatically.
Usually it’s an expired token or the wrong zone name. Confirm the Bearer token in your Header Auth credential, then re-check the zone value set in the “Assign Target URL & Zone” node. If it works sometimes and fails other times, you may be hitting usage limits or getting blocked patterns that require a different Bright Data configuration.
On n8n Cloud Starter, you can run a healthy volume of executions each month, and higher tiers handle more. If you self-host, there’s no execution cap from n8n; you’re limited by your server and the time each scrape plus AI run takes. In practice, most teams run this in batches (like 20–100 companies at a time) and keep an eye on Bright Data and OpenAI usage.
For this use case, usually yes. You need multi-step processing (scrape, clean, extract fields, summarize, write to Sheets, save files, then send a webhook), and n8n handles that kind of branching without turning every extra step into a separate paid task. The self-hosting option is also a big deal if you plan to run lots of lead research. Zapier or Make can still work if you only want a very simple “URL in, row out” flow and you’re okay with less control. If you’re torn, Talk to an automation expert and we’ll point you to the simplest setup.
Once this is running, Crunchbase research turns into a repeatable pipeline instead of a weekly scramble. The workflow handles the tedious parts so your team can focus on targeting and messaging.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.