Crunchbase to Google Sheets, startup research cleaned
Startup research sounds simple until you’re three tabs deep, copying Crunchbase fields into a sheet, and realizing half the rows don’t match your column structure. Then the “quick scan” turns into a messy cleanup session.
This Crunchbase Sheets automation hits growth marketers hardest, but founders doing DIY market research and agency strategists building weekly landscape updates feel it too. You get a consistent Google Sheet with the 10 most recently founded relevant startups, plus a share-ready Gemini comparison summary.
Below is how the workflow runs, what it replaces, and the practical numbers behind why it’s worth automating.
How This Automation Works
See how this solves the problem:
n8n Workflow Template: Crunchbase to Google Sheets, startup research cleaned
flowchart LR
subgraph sg0["When User Completes Form Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Snapshot Progress"]
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>HTTP Request- Post API call .."]
n2@{ icon: "mdi:cog", form: "rounded", label: "Wait - Polling Bright Data", pos: "b", h: 48 }
n3@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If - Checking status of Snap..", pos: "b", h: 48 }
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>HTTP Request - Getting data .."]
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Code - Parse and Clean JSON .."]
n6@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
n7@{ icon: "mdi:robot", form: "rounded", label: "Google Gemini - Comparative ..", pos: "b", h: 48 }
n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/form.svg' width='40' height='40' /></div><br/>When User Completes Form"]
n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Merge"]
n10["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Code - Combining JSON and AI.."]
n11@{ icon: "mdi:database", form: "rounded", label: "Google Sheets - Export Results", pos: "b", h: 48 }
n9 --> n10
n0 --> n3
n6 -.-> n7
n8 --> n1
n2 --> n0
n5 --> n7
n5 --> n9
n10 --> n11
n7 --> n9
n1 --> n2
n4 --> n5
n3 --> n2
n3 --> n4
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n8 trigger
class n7 ai
class n6 aiModel
class n3 decision
class n11 database
class n0,n1,n4 api
class n5,n10 code
classDef customIcon fill:none,stroke:none
class n0,n1,n4,n5,n8,n9,n10 customIcon
The Challenge: Crunchbase research that doesn’t fall apart in a spreadsheet
When you’re scouting startups by a theme (like “AI in healthcare” or “carbon capture”), the hard part is not finding companies. It’s turning scattered profiles into something you can actually use. One row has founders and investor names. Another row is missing employee count. A third has a long description pasted into the wrong column because the formatting is different. And once you finally have 10 companies, you still have to write the “so what” summary for a teammate or client.
It adds up fast. Here’s where it breaks down.
- Copying Crunchbase fields into Google Sheets takes long enough that you start skipping fields, which makes the dataset less useful.
- Inconsistent formatting (dates, employee ranges, social links) makes sorting and filtering unreliable.
- Picking “the most recent” startups becomes a manual judgment call instead of a clean, repeatable rule.
- After the data work, you still have to synthesize a comparison summary, which often gets rushed or postponed.
The Fix: Fresh Crunchbase startups pulled, cleaned, and summarized
This workflow starts with one simple input: a keyword that describes what you’re researching. n8n sends that keyword to Bright Data’s Crunchbase snapshot API, kicks off a dataset job, then checks status until the snapshot is ready. Once it’s available, the workflow fetches the JSON, normalizes the company records (so every row has the same structure), and sorts by founded date to surface the most recently founded startups. Then it selects the top 10 companies and sends them to Google Gemini for a comparative analysis that reads like a quick briefing, not a raw dump of facts. Finally, it merges the AI summary back into the export and appends everything to Google Sheets for tracking.
The flow begins when you submit your research keyword through a form. Bright Data pulls matching Crunchbase records, then a cleanup step standardizes fields and selects the 10 newest companies. Gemini produces a single “compare these 10” summary, and Google Sheets receives clean rows plus the one-batch analysis.
What Changes: Before vs. After
| What This Eliminates | Impact You’ll See |
|---|---|
|
|
Real-World Impact
Say you do one research refresh per week for a new campaign angle. Manually pulling 10 startups often means about 10 minutes per company to open profiles, copy fields, and fix formatting, plus another 30 minutes to write a comparison. That’s roughly 2.5 hours each week. With this workflow, you spend about 5 minutes entering the keyword and choosing the sheet, then wait for the snapshot and AI summary to complete. The output is already structured and ready to share.
Requirements
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Bright Data to access Crunchbase snapshot API.
- Google Sheets for storing and tracking results.
- Google Gemini API key (get it from Google AI Studio / Gemini API console).
Skill level: Intermediate. You’ll connect accounts, add API keys, and may tweak a code node’s field mapping.
Need help implementing this? Talk to an automation expert (free 15-minute consultation).
The Workflow Flow
A keyword gets submitted through a form. You enter something like “edtech” or “AI in healthcare,” and that single phrase becomes the filter for the Crunchbase pull.
Bright Data runs a Crunchbase snapshot job and n8n waits for it. The workflow triggers the job, pauses briefly, then polls status until the snapshot is ready so you’re not guessing when data will arrive.
The raw JSON is cleaned and shaped into “spreadsheet-ready” records. A code step normalizes key fields (name, founded date, website, funding_total, founders, and more), sorts by founded_date, and selects the 10 most recent companies.
Gemini writes a comparison summary, and Google Sheets stores everything. The AI summary is merged into the final export (attached once per batch to avoid repetition), then the workflow appends all 10 companies to your target sheet.
You can easily modify the fields you export to match your sheet columns based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Form Trigger
Set up the intake form so users can submit a keyword that drives the Crunchbase search and analysis.
- Add the User Form Trigger node and set Form Title to
Search from Crunchbase by keyword. - Under Form Fields, add a field labeled Keyword with placeholder
e.g. "AI in healthcare"and mark it as required. - Set Form Description to
Please provide a keyword to search and compare relevant startupsand keep Response Mode aslastNode.
Step 2: Connect Bright Data API and Poll for Results
Trigger the Bright Data dataset job, then poll until the snapshot is ready.
- In Trigger Bright Data Job, set URL to
https://api.brightdata.com/datasets/v3/triggerand Method toPOST. - In Trigger Bright Data Job → Body Parameters, set keyword to
={{ $json["Keyword"] }}. - In Trigger Bright Data Job → Query Parameters, set dataset_id to
[YOUR_ID], type todiscover_new, discover_by tokeyword, and include_errors totrue. - Add your Bright Data token in Trigger Bright Data Job and Check Snapshot Status headers: Authorization =
Bearer [CONFIGURE_YOUR_TOKEN]. - Configure Polling Delay with Amount set to
15and connect Trigger Bright Data Job → Polling Delay → Check Snapshot Status. - In Check Snapshot Status, set URL to
=https://api.brightdata.com/datasets/v3/progress/{{ $('Trigger Bright Data Job').item.json.snapshot_id }}. - In Snapshot Ready Branch, set the condition Left Value to
={{ $json.status }}and Right Value torunning. Connect the true path back to Polling Delay and the false path to Fetch Snapshot Data. - In Fetch Snapshot Data, set URL to
=https://api.brightdata.com/datasets/v3/snapshot/{{ $json.snapshot_id }}and set the query format tojson.
running, it will keep polling until the status changes. Ensure your dataset completes and returns a non-running status so Fetch Snapshot Data can proceed.Step 3: Set Up Data Normalization and AI Analysis
Normalize the dataset into structured company data, then run the AI comparison in parallel before merging.
- In Normalize Company JSON, keep Language as
pythonand use the provided script to sort byfounded_dateand return the top 10 companies. - Ensure Normalize Company JSON outputs to both Company Comparison Draft and Merge Analysis Streams in parallel.
- In Company Comparison Draft, set Text to
={{$json}}and keep Prompt Type asdefine. - Verify the message includes the user keyword expression
{{ $('User Form Trigger').first().json['Keyword']}}to ground the analysis. - Connect Gemini Chat Engine as the language model for Company Comparison Draft and keep Model Name as
models/gemini-2.0-flash. - Connect Company Comparison Draft and Normalize Company JSON into Merge Analysis Streams, then connect Merge Analysis Streams to Assemble Final Report.
Normalize Company JSON outputs to both Company Comparison Draft and Merge Analysis Streams in parallel.
Step 4: Configure the Output Destination
Assemble the final report and append results to your Google Sheet.
- In Assemble Final Report, keep Language as
pythonand use the provided script to merge AI analysis into the first company record. - Connect Assemble Final Report to Append Results Sheet.
- In Append Results Sheet, set Operation to
appendand select the target spreadsheet in Document ID. - Choose the output sheet in Sheet Name (example:
gid=0), and keep Mapping Mode asautoMapInputData. - Verify column mapping includes the expression for name:
={{ $json.companies[0].name }}.
Step 5: Test and Activate Your Workflow
Validate the end-to-end flow with a manual run and then turn on production execution.
- Click Execute Workflow and submit the User Form Trigger with a sample keyword (e.g.,
AI in healthcare). - Watch the run: Trigger Bright Data Job should return a
snapshot_id, Polling Delay should loop until Snapshot Ready Branch exits, and Fetch Snapshot Data should return JSON. - Confirm that Normalize Company JSON outputs structured companies and Company Comparison Draft produces a comparison text that merges in Assemble Final Report.
- Verify Append Results Sheet appends rows to your spreadsheet with company data and AI analysis.
- When results look correct, toggle the workflow to Active for production use.
Watch Out For
- Bright Data credentials can expire or need specific permissions. If things break, check the three HTTP Request nodes where the API key is set first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Common Questions
About 30 minutes if you already have your API keys.
Yes, but plan for one person to handle the initial API setup. After that, running it is just submitting a keyword and checking the sheet.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Bright Data usage plus Gemini API costs (usually a few cents per run, depending on prompt size).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
You can. Most customization happens in the “Normalize Company JSON” code step and the final “Assemble Final Report” step, because that’s where fields are selected, renamed, and cleaned. Add or remove columns like lead_investors, monthly_visits, or products_and_services based on what your team actually reviews. You can also change the “top 10” logic to top 20, or switch sorting from founded_date to funding_total if you’re doing a different kind of scan.
Most of the time it’s an API key issue in one of the three HTTP Request nodes. Regenerate the Bright Data key, update it everywhere it appears, and confirm the snapshot endpoint you’re calling matches your plan. If it starts working and then fails again, it can also be rate limiting or the snapshot job taking longer than expected, so the status check returns “not ready” for longer than your wait loop allows.
Practically, it’s “as often as you want,” as long as your Bright Data and Gemini quotas can handle it. Each run appends 10 rows plus one shared AI analysis field, so weekly tracking stays lightweight. On self-hosted n8n there’s no execution limit, and on n8n Cloud it depends on your plan’s monthly execution allowance.
For this use case, n8n is usually the better fit because you need polling logic (wait + status checks), JSON cleanup, and a clean merge between “company rows” and “one-per-batch” AI analysis. Zapier and Make can do parts of this, but multi-request snapshot control plus code-based normalization gets clunky fast, and costs tend to climb when you re-run often. If you just want a simple “form to sheet” workflow with no cleanup, those tools can be fine. This one is closer to a mini data pipeline, honestly. Talk to an automation expert if you want a quick recommendation based on volume.
Once this is set up, your “startup landscape” stops being a one-off task and becomes a repeatable system. The workflow handles the tedious parts, and you get to focus on what the list actually means.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.