Apify to Google Sheets, YC leads ready to use
You find a great Y Combinator search page, then the slow part starts. Tabs everywhere, copied links pasted into the wrong row, and “I’ll clean it later” turning into a messy sheet you don’t want to touch.
This Apify Google Sheets setup hits SDRs first, honestly. But VC analysts building sourcing lists and founders doing partnership outreach feel the same drag. You want a prospect list you can trust, without spending your best hours doing admin work.
This workflow pulls YC company and founder data through Apify, then drops it into Google Sheets in a clean, usable format. You’ll see what it automates, what results to expect, and what you need to run it reliably.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Apify to Google Sheets, YC leads ready to use
flowchart LR
subgraph sg0["Start Workflow Flow"]
direction LR
n0@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Run an Actor", pos: "b", h: 48 }
n1@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Get dataset items", pos: "b", h: 48 }
n2@{ icon: "mdi:play-circle", form: "rounded", label: "Start Workflow", pos: "b", h: 48 }
n3@{ icon: "mdi:database", form: "rounded", label: "Add data to Google Sheet", pos: "b", h: 48 }
n0 --> n1
n2 --> n0
n1 --> n3
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n2 trigger
class n0,n1 decision
class n3 database
The Problem: YC lead research turns into spreadsheet busywork
YC is a goldmine, but turning it into a working lead list is where momentum dies. You open a filtered directory page, click into profiles, copy a website, grab LinkedIn, try to find founder names, then paste it all into a sheet that slowly becomes inconsistent. One row has “YC S21,” another has “Summer 2021,” and half the LinkedIn fields are blank because you got interrupted. After a couple runs, you don’t even trust your own list, so you re-check everything. That’s the worst part.
None of this feels hard in the moment. It just keeps happening, and the friction compounds.
- You burn about 2 hours turning “interesting companies” into a real sheet.
- Manual copy-paste creates silent errors, like the wrong founder attached to the wrong company.
- Inconsistent formatting makes outreach personalization harder, because you’re always cleaning before you start.
- When someone asks for “the same list, but for a different batch,” you start from scratch.
The Solution: Scrape YC with Apify and keep a live Google Sheet
This n8n workflow gives you a repeatable way to turn any YC directory search into structured data you can actually use. You trigger it manually when you want fresh leads. n8n tells Apify to run a Y Combinator Directory Scraper actor against your exact search URL (batch, industry, region, whatever filters you picked). When Apify finishes, the workflow pulls the dataset records back into n8n, maps the fields into the columns you care about, and updates your Google Sheet with new rows. The output is not a “data dump.” It’s a prospecting sheet that looks like you built it on purpose.
The workflow starts with a manual launch in n8n. Apify does the heavy lifting by scraping and structuring the YC results. Google Sheets becomes the final destination, so your list is easy to sort, dedupe, and hand off for outreach.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you want a list of 100 YC companies from a specific batch and industry. Manually, if you spend maybe 2 minutes per company to copy the basics (site, location, description) and another minute chasing founder details, that’s about 5 hours of tedious work. With this workflow, you paste your filtered YC search URL into the Apify actor input, click execute, and wait for the run to finish. The actual “human time” is closer to 10 minutes, and the sheet fills in automatically.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Apify to run the YC directory scraper actor
- Google Sheets to store and share the lead list
- Apify API key (get it from Apify Console → Integrations)
Skill level: Intermediate. You’ll mainly connect accounts and match fields to the right sheet columns.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
Manual run from n8n. You click “Execute workflow” when you want a fresh pull of YC companies (weekly, daily, whenever your pipeline needs it).
Apify scrapes the YC directory page you chose. In the Apify “Run an Actor” step, you provide a YC search URL with your filters already applied. Apify visits each listing and returns structured fields instead of raw HTML.
n8n retrieves the dataset records. Once the actor run completes, the workflow fetches all dataset items, which typically include company info plus founder details when available.
Google Sheets gets updated. The final step writes the mapped values into your spreadsheet, so your columns stay consistent and your list is ready for enrichment, outreach, or import.
You can easily modify the YC search URL to target a new batch or industry based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
Start the workflow manually so you can validate the Apify scrape and Google Sheets updates before automating further.
- Add the Manual Launch Trigger node as the workflow trigger.
- Keep default settings for Manual Launch Trigger since no parameters are required.
Step 2: Connect Apify and Launch the Actor
Configure the Apify actor run that scrapes the Y Combinator directory based on your filters.
- Add the Execute Apify Actor node and connect it to Manual Launch Trigger.
- Credential Required: Connect your apifyApi credentials.
- Select your actor in Actor (currently set to
[YOUR_ID]). - Set Custom Body to
{ "maxCompanies": 5, "startUrls": "{https://www.ycombinator.com/companies?industry=Fintech®ions=America%20%2F%20Canada&team_size=%5B%221%22%2C%2225%22%5D}", "proxyConfiguration": { "useApifyProxy": true } }.
[YOUR_ID] with your actual Apify Actor ID, or the run will fail.Step 3: Retrieve the Apify Dataset Records
Pull the dataset output from the actor run so it can be written to Google Sheets.
- Add the Retrieve Dataset Records node and connect it to Execute Apify Actor.
- Credential Required: Connect your apifyApi credentials.
- Set Resource to
Datasets. - Set Dataset ID to
{{ $json.defaultDatasetId }}so it uses the dataset created by the actor run.
Step 4: Configure the Google Sheets Output
Append or update startup records in your spreadsheet using the dataset fields.
- Add the Update Spreadsheet Rows node and connect it to Retrieve Dataset Records.
- Credential Required: Connect your googleSheetsOAuth2Api credentials.
- Set Operation to
appendOrUpdate. - Select your Document (currently
[YOUR_ID]) and Sheet (currentlygid=0/Sheet1). - Map the column values as defined:
- Company →
{{ $json.company_name }} - Founded →
{{ $json.year_founded }} - Website →
{{ $json.website }} - LinkedIn →
{{ $json.company_linkedin }} - Location →
{{ $json.company_location }} - Description →
{{ $json.long_description }} - Industry Tags →
{{ $json['tags/0'] }} {{ $json['tags/1'] }} {{ $json['tags/2'] }} {{ $json['tags/3'] }} - Founder 1 Name →
{{ $json['founders/0/name'] }} - Founder 2 Name →
{{ $json['founders/1/name'] }} - Founder 1 LinkedIn →
{{ $json['founders/0/linkedin'] }} - Founder 2 LinkedIn →
{{ $json['founders/1/linkedin'] }}
- Company →
- Ensure Matching Columns includes
Companyto update existing rows by company name.
Step 5: Test and Activate Your Workflow
Run a manual execution to verify the scrape and spreadsheet update, then activate for production use.
- Click Execute Workflow to run Manual Launch Trigger and start the flow.
- Confirm Execute Apify Actor completes and that Retrieve Dataset Records outputs a list of startup objects.
- Check your Google Sheet to verify new or updated rows in Update Spreadsheet Rows.
- When satisfied, toggle the workflow Active to enable production use.
Common Gotchas
- Apify credentials can expire or need specific permissions. If things break, check your Apify token in n8n’s Credentials panel first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if your sheet columns are already created.
No coding required. You’ll connect Apify and Google Sheets, then paste your YC search URL and map fields once.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Apify usage, since the YC scraper runs on Apify credits.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s the whole point. You change the YC directory search URL inside the Apify “Run an Actor” node, then adjust maxCompanies if you want a smaller or bigger pull. If your spreadsheet has extra columns, update the Google Sheets mapping to fill them. Some teams also add a “Batch” or “Source URL” column so they can trace exactly where each row came from later.
Usually it’s an expired or wrong Apify API token saved in n8n credentials. Regenerate the token in Apify, update the n8n credential, and try again. If the actor starts but returns empty data, double-check the YC search URL you pasted and confirm the actor still supports that page structure. Rate limits and low Apify credits can also cause runs to fail halfway through.
A few hundred per run is typical, and you can control it with the actor’s maxCompanies setting.
For scraping-driven workflows, n8n is usually a better fit because it handles multi-step logic cleanly and you can self-host to avoid per-task pricing. Zapier and Make can work, but scraping often needs “run job → wait → fetch dataset → loop items → write rows,” and those platforms can get expensive or fiddly with that pattern. Another practical difference is control. In n8n you can add checks (skip blank websites, tag rows by batch, stop if the dataset is empty) without fighting the tool. If you only need a simple two-step sync, Zapier might be faster. If you’re not sure, Talk to an automation expert.
Once this is in place, YC sourcing stops being a “research day” and becomes a button you click. The workflow handles the repetitive stuff, and your sheet stays clean enough to actually use.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.