Crunchbase to Google Sheets, startup research cleaned

Startup research sounds simple until you’re three tabs deep, copying Crunchbase fields into a sheet, and realizing half the rows don’t match your column structure. Then the “quick scan” turns into a messy cleanup session.

This Crunchbase Sheets automation hits growth marketers hardest, but founders doing DIY market research and agency strategists building weekly landscape updates feel it too. You get a consistent Google Sheet with the 10 most recently founded relevant startups, plus a share-ready Gemini comparison summary.

Below is how the workflow runs, what it replaces, and the practical numbers behind why it’s worth automating.

How This Automation Works

See how this solves the problem:

n8n Workflow Template: Crunchbase to Google Sheets, startup research cleaned

Click to explore

flowchart LR

    subgraph sg0["When User Completes Form Flow"]
        direction LR
        n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Snapshot Progress"]
        n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>HTTP Request- Post API call .."]
        n2@{ icon: "mdi:cog", form: "rounded", label: "Wait - Polling Bright Data", pos: "b", h: 48 }
        n3@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If - Checking status of Snap..", pos: "b", h: 48 }
        n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>HTTP Request - Getting data .."]
        n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Code - Parse and Clean JSON .."]
        n6@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
        n7@{ icon: "mdi:robot", form: "rounded", label: "Google Gemini - Comparative ..", pos: "b", h: 48 }
        n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/form.svg' width='40' height='40' /></div><br/>When User Completes Form"]
        n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Merge"]
        n10["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Code - Combining JSON and AI.."]
        n11@{ icon: "mdi:database", form: "rounded", label: "Google Sheets - Export Results", pos: "b", h: 48 }
        n9 --> n10
        n0 --> n3
        n6 -.-> n7
        n8 --> n1
        n2 --> n0
        n5 --> n7
        n5 --> n9
        n10 --> n11
        n7 --> n9
        n1 --> n2
        n4 --> n5
        n3 --> n2
        n3 --> n4
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n8 trigger
    class n7 ai
    class n6 aiModel
    class n3 decision
    class n11 database
    class n0,n1,n4 api
    class n5,n10 code
    classDef customIcon fill:none,stroke:none
    class n0,n1,n4,n5,n8,n9,n10 customIcon

The Challenge: Crunchbase research that doesn’t fall apart in a spreadsheet

When you’re scouting startups by a theme (like “AI in healthcare” or “carbon capture”), the hard part is not finding companies. It’s turning scattered profiles into something you can actually use. One row has founders and investor names. Another row is missing employee count. A third has a long description pasted into the wrong column because the formatting is different. And once you finally have 10 companies, you still have to write the “so what” summary for a teammate or client.

It adds up fast. Here’s where it breaks down.

Copying Crunchbase fields into Google Sheets takes long enough that you start skipping fields, which makes the dataset less useful.
Inconsistent formatting (dates, employee ranges, social links) makes sorting and filtering unreliable.
Picking “the most recent” startups becomes a manual judgment call instead of a clean, repeatable rule.
After the data work, you still have to synthesize a comparison summary, which often gets rushed or postponed.

The Fix: Fresh Crunchbase startups pulled, cleaned, and summarized

This workflow starts with one simple input: a keyword that describes what you’re researching. n8n sends that keyword to Bright Data’s Crunchbase snapshot API, kicks off a dataset job, then checks status until the snapshot is ready. Once it’s available, the workflow fetches the JSON, normalizes the company records (so every row has the same structure), and sorts by founded date to surface the most recently founded startups. Then it selects the top 10 companies and sends them to Google Gemini for a comparative analysis that reads like a quick briefing, not a raw dump of facts. Finally, it merges the AI summary back into the export and appends everything to Google Sheets for tracking.

The flow begins when you submit your research keyword through a form. Bright Data pulls matching Crunchbase records, then a cleanup step standardizes fields and selects the 10 newest companies. Gemini produces a single “compare these 10” summary, and Google Sheets receives clean rows plus the one-batch analysis.

What Changes: Before vs. After

What This Eliminates

Impact You’ll See

Manually copying Crunchbase profile details into spreadsheet columns.
Cleaning mismatched fields after the paste (dates, links, descriptions, and missing values).
Hand-sorting companies to find the most recent startups for a theme.
Writing a comparison summary from scratch every time you refresh the list.

Most teams get a usable sheet in under an hour instead of spending about 3 hours.
Each batch lands with consistent columns like funding_total, founders, lead_investors, and monthly_visits.
The “top 10 most recent” rule is repeatable, so weekly updates don’t drift.
A Gemini-generated comparison is ready to paste into an email, deck, or Slack update.
You build a running research log in Google Sheets, which makes trends easier to spot.

Real-World Impact

Say you do one research refresh per week for a new campaign angle. Manually pulling 10 startups often means about 10 minutes per company to open profiles, copy fields, and fix formatting, plus another 30 minutes to write a comparison. That’s roughly 2.5 hours each week. With this workflow, you spend about 5 minutes entering the keyword and choosing the sheet, then wait for the snapshot and AI summary to complete. The output is already structured and ready to share.

Requirements

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Bright Data to access Crunchbase snapshot API.
Google Sheets for storing and tracking results.
Google Gemini API key (get it from Google AI Studio / Gemini API console).

Skill level: Intermediate. You’ll connect accounts, add API keys, and may tweak a code node’s field mapping.

Need help implementing this? Talk to an automation expert (free 15-minute consultation).

The Workflow Flow

A keyword gets submitted through a form. You enter something like “edtech” or “AI in healthcare,” and that single phrase becomes the filter for the Crunchbase pull.

Bright Data runs a Crunchbase snapshot job and n8n waits for it. The workflow triggers the job, pauses briefly, then polls status until the snapshot is ready so you’re not guessing when data will arrive.

The raw JSON is cleaned and shaped into “spreadsheet-ready” records. A code step normalizes key fields (name, founded date, website, funding_total, founders, and more), sorts by founded_date, and selects the 10 most recent companies.

Gemini writes a comparison summary, and Google Sheets stores everything. The AI summary is merged into the final export (attached once per batch to avoid repetition), then the workflow appends all 10 companies to your target sheet.

You can easily modify the fields you export to match your sheet columns based on your needs. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Form Trigger

Set up the intake form so users can submit a keyword that drives the Crunchbase search and analysis.

Add the User Form Trigger node and set Form Title to Search from Crunchbase by keyword.
Under Form Fields, add a field labeled Keyword with placeholder e.g. "AI in healthcare" and mark it as required.
Set Form Description to Please provide a keyword to search and compare relevant startups and keep Response Mode as lastNode.

Step 2: Connect Bright Data API and Poll for Results

Trigger the Bright Data dataset job, then poll until the snapshot is ready.

In Trigger Bright Data Job, set URL to https://api.brightdata.com/datasets/v3/trigger and Method to POST.
In Trigger Bright Data Job → Body Parameters, set keyword to ={{ $json["Keyword"] }}.
In Trigger Bright Data Job → Query Parameters, set dataset_id to [YOUR_ID], type to discover_new, discover_by to keyword, and include_errors to true.
Add your Bright Data token in Trigger Bright Data Job and Check Snapshot Status headers: Authorization = Bearer [CONFIGURE_YOUR_TOKEN].
Configure Polling Delay with Amount set to 15 and connect Trigger Bright Data Job → Polling Delay → Check Snapshot Status.
In Check Snapshot Status, set URL to =https://api.brightdata.com/datasets/v3/progress/{{ $('Trigger Bright Data Job').item.json.snapshot_id }}.
In Snapshot Ready Branch, set the condition Left Value to ={{ $json.status }} and Right Value to running. Connect the true path back to Polling Delay and the false path to Fetch Snapshot Data.
In Fetch Snapshot Data, set URL to =https://api.brightdata.com/datasets/v3/snapshot/{{ $json.snapshot_id }} and set the query format to json.

⚠️ Common Pitfall: If Snapshot Ready Branch checks for running, it will keep polling until the status changes. Ensure your dataset completes and returns a non-running status so Fetch Snapshot Data can proceed.

Step 3: Set Up Data Normalization and AI Analysis

Normalize the dataset into structured company data, then run the AI comparison in parallel before merging.

In Normalize Company JSON, keep Language as python and use the provided script to sort by founded_date and return the top 10 companies.
Ensure Normalize Company JSON outputs to both Company Comparison Draft and Merge Analysis Streams in parallel.
In Company Comparison Draft, set Text to ={{$json}} and keep Prompt Type as define.
Verify the message includes the user keyword expression {{ $('User Form Trigger').first().json['Keyword']}} to ground the analysis.
Connect Gemini Chat Engine as the language model for Company Comparison Draft and keep Model Name as models/gemini-2.0-flash.
Connect Company Comparison Draft and Normalize Company JSON into Merge Analysis Streams, then connect Merge Analysis Streams to Assemble Final Report.

Normalize Company JSON outputs to both Company Comparison Draft and Merge Analysis Streams in parallel.

Credential Required: Connect your Google Gemini credentials in Gemini Chat Engine. Company Comparison Draft uses this language model, so credentials must be added to Gemini Chat Engine, not the chain node.

Step 4: Configure the Output Destination

Assemble the final report and append results to your Google Sheet.

In Assemble Final Report, keep Language as python and use the provided script to merge AI analysis into the first company record.
Connect Assemble Final Report to Append Results Sheet.
In Append Results Sheet, set Operation to append and select the target spreadsheet in Document ID.
Choose the output sheet in Sheet Name (example: gid=0), and keep Mapping Mode as autoMapInputData.
Verify column mapping includes the expression for name: ={{ $json.companies[0].name }}.

Credential Required: Connect your Google Sheets credentials in Append Results Sheet.

Step 5: Test and Activate Your Workflow

Validate the end-to-end flow with a manual run and then turn on production execution.

Click Execute Workflow and submit the User Form Trigger with a sample keyword (e.g., AI in healthcare).
Watch the run: Trigger Bright Data Job should return a snapshot_id, Polling Delay should loop until Snapshot Ready Branch exits, and Fetch Snapshot Data should return JSON.
Confirm that Normalize Company JSON outputs structured companies and Company Comparison Draft produces a comparison text that merges in Assemble Final Report.
Verify Append Results Sheet appends rows to your spreadsheet with company data and AI analysis.
When results look correct, toggle the workflow to Active for production use.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Watch Out For

Bright Data credentials can expire or need specific permissions. If things break, check the three HTTP Request nodes where the API key is set first.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.

Common Questions

How quickly can I implement this Crunchbase Sheets automation?

About 30 minutes if you already have your API keys.

Can non-technical teams implement this startup research automation?

Yes, but plan for one person to handle the initial API setup. After that, running it is just submitting a keyword and checking the sheet.

Is n8n free to use for this Crunchbase Sheets automation workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Bright Data usage plus Gemini API costs (usually a few cents per run, depending on prompt size).

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

How do I adapt this Crunchbase Sheets automation solution to my specific challenges?

You can. Most customization happens in the “Normalize Company JSON” code step and the final “Assemble Final Report” step, because that’s where fields are selected, renamed, and cleaned. Add or remove columns like lead_investors, monthly_visits, or products_and_services based on what your team actually reviews. You can also change the “top 10” logic to top 20, or switch sorting from founded_date to funding_total if you’re doing a different kind of scan.

Why is my Bright Data connection failing in this workflow?

Most of the time it’s an API key issue in one of the three HTTP Request nodes. Regenerate the Bright Data key, update it everywhere it appears, and confirm the snapshot endpoint you’re calling matches your plan. If it starts working and then fails again, it can also be rate limiting or the snapshot job taking longer than expected, so the status check returns “not ready” for longer than your wait loop allows.

What’s the capacity of this Crunchbase Sheets automation solution?

Practically, it’s “as often as you want,” as long as your Bright Data and Gemini quotas can handle it. Each run appends 10 rows plus one shared AI analysis field, so weekly tracking stays lightweight. On self-hosted n8n there’s no execution limit, and on n8n Cloud it depends on your plan’s monthly execution allowance.

Is this Crunchbase Sheets automation better than using Zapier or Make?

For this use case, n8n is usually the better fit because you need polling logic (wait + status checks), JSON cleanup, and a clean merge between “company rows” and “one-per-batch” AI analysis. Zapier and Make can do parts of this, but multi-request snapshot control plus code-based normalization gets clunky fast, and costs tend to climb when you re-run often. If you just want a simple “form to sheet” workflow with no cleanup, those tools can be fine. This one is closer to a mini data pipeline, honestly. Talk to an automation expert if you want a quick recommendation based on volume.

Once this is set up, your “startup landscape” stops being a one-off task and becomes a repeatable system. The workflow handles the tedious parts, and you get to focus on what the list actually means.

Crunchbase to Google Sheets, startup research cleaned

How This Automation Works

n8n Workflow Template: Crunchbase to Google Sheets, startup research cleaned

The Challenge: Crunchbase research that doesn’t fall apart in a spreadsheet