Indeed to Airtable, clean company profiles fast
Company research sounds simple until you do it at scale. You open 20 Indeed company pages, copy bits into a spreadsheet, lose the tab with the “good notes,” and end up with a messy list you don’t trust.
This Indeed Airtable automation hits market researchers hardest, but recruiters building target lists and consultants doing quick due diligence feel it too. The goal is straightforward: turn raw Indeed company URLs into clean, searchable Airtable records with consistent summaries.
You’ll see how the workflow pulls URLs from Airtable, scrapes the page reliably (even when Indeed tries to block it), and uses AI to extract and summarize the company profile so you can tag, sort, and reuse the data.
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: Indeed to Airtable, clean company profiles fast
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n1@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model For..", pos: "b", h: 48 }
n2@{ icon: "mdi:wrench", form: "rounded", label: "Webhook HTTP Request", pos: "b", h: 48 }
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Perform Indeed Web Request"]
n4@{ icon: "mdi:robot", form: "rounded", label: "Indeed Expert AI Agent", pos: "b", h: 48 }
n5@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
n6@{ icon: "mdi:robot", form: "rounded", label: "Markdown to Textual Data Ext..", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/markdown.dark.svg' width='40' height='40' /></div><br/>Convert Markdown to HTML"]
n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Initiate a Webhook Notificat.."]
n9@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set Bright Data Zone", pos: "b", h: 48 }
n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Loop Over Items", pos: "b", h: 48 }
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/airtable.svg' width='40' height='40' /></div><br/>Airtable"]
n12@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If Link field is not empty", pos: "b", h: 48 }
n13@{ icon: "mdi:cog", form: "rounded", label: "Wait", pos: "b", h: 48 }
n14@{ icon: "mdi:robot", form: "rounded", label: "Indeed Summarizer", pos: "b", h: 48 }
n15@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model for..", pos: "b", h: 48 }
n13 --> n12
n11 --> n10
n10 --> n13
n14 --> n4
n9 --> n11
n2 -.-> n4
n4 --> n10
n7 --> n8
n5 -.-> n6
n12 --> n3
n3 --> n6
n3 --> n7
n0 --> n9
n6 --> n14
n15 -.-> n4
n1 -.-> n14
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n4,n6,n14 ai
class n1,n5,n15 aiModel
class n2 ai
class n12 decision
class n11 database
class n3,n8 api
classDef customIcon fill:none,stroke:none
class n3,n7,n8,n11 customIcon
Why This Matters: Reliable company research without the busywork
If you’ve ever tried to turn “a few quick company checks” into a real list, you know what happens. One Indeed page has a clean description, another is mostly reviews, another hides the details behind dynamic sections, and suddenly your notes are inconsistent. You spend more time formatting than learning. Then someone asks, “Can we filter this by industry and size?” and you realize your “research” is basically unsearchable text blobs. The worst part is the mental load: you can’t tell what you’ve already captured, what’s missing, and what’s outdated.
It adds up fast. Here’s where it breaks down.
- You end up copying and pasting the same fields over and over, which turns a 30-company list into an afternoon project.
- Indeed pages don’t follow one tidy template, so manual extraction becomes a judgment call that varies by person and day.
- When scraping fails (blocks, timeouts, weird HTML), you either skip the company or waste time troubleshooting.
- Your final dataset is hard to reuse because it isn’t normalized, tagged, or summarized in a consistent voice.
What You’ll Build: Indeed company profiles that land in Airtable already cleaned
This workflow starts with a simple input: a table in Airtable containing Indeed company profile URLs. When you run it, n8n pulls those records in batches, checks each one has a usable link, and then requests the company page through Bright Data’s Web Unlocker (so you get consistent access instead of random blocks). Next, AI steps in. The workflow takes the raw page content, extracts the meaningful text, and asks Google Gemini to summarize and structure it into something you can actually work with. Finally, it posts the clean output to a webhook (and can render it as HTML too), which means you can send it back to Airtable, a sheet, a CRM, or an internal dashboard without rewriting the workflow.
The workflow begins by reading URLs from Airtable and pacing the requests with a short wait. After scraping the Indeed page, AI generates a consistent company summary and structured fields. Then the result is pushed to your chosen webhook endpoint for storage, alerts, or downstream automations.
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Say you’re researching 30 companies for a competitor list. Manually, even a “quick” pass is maybe 8 minutes per company between reading, copying, cleaning, and writing a short summary, which is about 4 hours. With this workflow, you spend about 15 minutes getting your Airtable list ready and kicking off the run, then it processes in the background with waits and batching. You review the finished Airtable rows when it’s done instead of doing 30 mini projects.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Airtable for the URL list and saved company records.
- Bright Data Web Unlocker to fetch Indeed pages reliably.
- Google Gemini API key (get it from Google AI Studio or Vertex AI).
- Airtable Personal Access Token (create it in Airtable account settings).
Skill level: Intermediate. You’ll be connecting credentials, editing a few fields, and testing with a small batch before scaling up.
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
You start it on demand. The workflow uses a Manual Start Trigger, so you run it when you have a fresh batch of companies to research (or when you want to refresh older records).
Airtable provides the queue. n8n retrieves your Airtable records, iterates through them in batches, and adds a short wait so you don’t hammer requests or hit limits too quickly.
Bright Data fetches the Indeed page. The workflow validates the link first, then pulls the company profile HTML through Web Unlocker, which is designed to handle the blocking that ruins basic scraping attempts.
Gemini extracts and summarizes. The raw content is turned into readable text, then Google Gemini (plus an agent step) generates a clean company summary and structured details you can store and reuse.
The cleaned result goes where you want. The workflow posts the summary to your webhook endpoint (and can render HTML), so you can write back to Airtable, populate Google Sheets, or trigger follow-up automations.
You can easily modify the summary prompt to extract different fields (like hiring signals or customer sentiment) based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
This workflow starts manually so you can test the scraping and summarization flow on demand.
- Add and place Manual Start Trigger at the start of the workflow.
- Connect Manual Start Trigger to Configure Bright Data Zone to match the execution flow.
Step 2: Connect Airtable
Pull the company list from Airtable for batch processing.
- Add Retrieve Airtable Records and set Operation to
search. - Select your Airtable Base and Table (e.g.,
Indeed→Table 1). - Credential Required: Connect your
airtableTokenApicredentials. - Connect Configure Bright Data Zone → Retrieve Airtable Records → Iterate Through Batches.
Link field, since Validate Link Presence checks {{ $json.Link }} before scraping.Step 3: Set Up Batch Control, Delay, and Link Validation
These nodes throttle requests and prevent invalid URLs from being scraped.
- In Configure Bright Data Zone, set the zone assignment to
web_unlocker1. - Place Iterate Through Batches to control how many records are processed per cycle.
- Configure Pause Execution with Amount set to
10seconds. - In Validate Link Presence, keep the condition set to String → notEmpty with Left Value
{{ $json.Link }}.
Step 4: Configure Indeed Page Request and Parallel Processing
Scrape the Indeed company page and process the response in parallel as both raw text and HTML.
- In Request Indeed Page, set URL to
https://api.brightdata.com/requestand Method toPOST. - Enable Send Body and Send Headers.
- Set body parameters:
zone ={{ $('Configure Bright Data Zone').item.json.zone }}
url =https://www.indeed.com/cmp/{{ encodeURI($('Retrieve Airtable Records').item.json.Link) }}?product=unlocker&method=api
format =raw
data_format =markdown - Credential Required: Connect your
httpHeaderAuthcredentials in Request Indeed Page. - Ensure Request Indeed Page outputs to both Extract Text from Markdown and Render Markdown to HTML in parallel.
Step 5: Set Up AI Extraction, Summarization, and Agent Analysis
These nodes convert markdown to clean text, summarize it, and structure the results for webhook delivery.
- In Extract Text from Markdown, set Text to
You need to analyze the below markdown and convert to textual data. {{ $json.data }}. - Gemini Chat Engine is connected as the language model for Extract Text from Markdown — ensure credentials are added to Gemini Chat Engine. Credential Required: Connect your
googlePalmApicredentials. - Summarize Company Details receives the extracted text; Gemini Summary Model powers this summarization — Credential Required: Connect your
googlePalmApicredentials. - In Indeed Analysis Agent, keep Text set to
You are an Indeed Expert... {{ $('Extract Text from Markdown').item.json.text }}so it formats the summary for delivery. - Gemini Agent Model is connected as the language model for Indeed Analysis Agent — Credential Required: Connect your
googlePalmApicredentials. - Send Summary Webhook is an AI tool connected to Indeed Analysis Agent; configure tool behavior here, and add any needed auth on the parent agent if required by your endpoint.
Step 6: Configure HTML Output Webhook
This branch converts the markdown into HTML and posts it to a webhook endpoint.
- In Render Markdown to HTML, set Mode to
markdownToHtmland Markdown to{{ $json.data }}. - Configure Post HTML Webhook with URL set to
https://webhook.site/daf9d591-a130-4010-b1d3-0c66f8fcf467and Send Body enabled. - Set the body parameter html_response to
{{ $json.data }}.
Step 7: Test and Activate Your Workflow
Run a full test to validate scraping, AI summarization, and webhook outputs before going live.
- Click Manual Start Trigger → Execute Workflow to run a test.
- Confirm that Request Indeed Page returns markdown data and that Validate Link Presence passes for valid records.
- Verify that Extract Text from Markdown and Summarize Company Details produce AI output, and that Indeed Analysis Agent sends data via Send Summary Webhook.
- Check your webhook endpoint to see both the HTML payload from Post HTML Webhook and the structured JSON from Send Summary Webhook.
- Once successful, toggle the workflow Active for production use.
Troubleshooting Tips
- Airtable credentials can expire or need specific permissions. If things break, check your Personal Access Token scopes and the base access in Airtable first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Quick Answers
About 30 minutes if your Airtable base and API keys are ready.
No coding required. You’ll connect credentials and tweak a couple of fields and prompts.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Bright Data usage and Gemini API costs, which depend on how many pages you process.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you probably should. You can adjust what gets extracted by editing the “Summarize Company Details” prompt, then change where it goes by swapping the “Send Summary Webhook” destination. Common tweaks include extracting hiring trends, pulling job listings or salary signals, and writing the final output back into Airtable fields instead of posting to another system.
Usually it’s the Personal Access Token. Make sure it has permission to read the base and the specific table, then reselect the correct base/table in the Airtable node so n8n refreshes the schema. If the workflow used to work and suddenly doesn’t, rotate the token in Airtable and update the credential in n8n. Also double-check you didn’t rename key fields the workflow maps to.
It can handle hundreds of companies per run, but the practical limit is your Bright Data and Gemini usage plus how aggressively you pace requests with the Wait and batching steps.
Often, yes. This workflow isn’t just “move data from A to B”; it scrapes a page, cleans it, runs AI extraction, and then routes the result. n8n is simply more comfortable with that kind of multi-step logic, branching (like the “Validate Link Presence” check), and batching without turning every extra step into a separate paid task. Zapier or Make can still work if you keep it very small, but you’ll usually feel the edges once you add scraping and AI. If you want help choosing the fastest path, Talk to an automation expert.
Once this is running, “company research” becomes a refreshable dataset, not a one-off chore. Set it up, feed it URLs, and let Airtable become the place your team actually trusts.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.