OpenAI + NocoDB, news summaries logged for you
You find an important update on a company’s news page, then lose it two days later because it was never logged anywhere. No RSS feed. No clean export. Just tabs, copy-paste, and “I’ll do it later” notes that you honestly never revisit.
This OpenAI NocoDB news automation hits marketers and market research folks first, but agency leads tracking competitors feel it too. You get a tidy archive of the newest posts, with summaries and keywords you can actually search.
Below, you’ll see how the workflow pulls fresh links, extracts the newest items, summarizes them with OpenAI, then stores everything in NocoDB with dates, URLs, and key terms.
How This Automation Works
See how this solves the problem:
n8n Workflow Template: OpenAI + NocoDB, news summaries logged for you
flowchart LR
subgraph sg0["Weekly Schedule Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "Weekly Schedule Trigger", pos: "b", h: 48 }
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Retrieve News Index Page"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/html.dark.svg' width='40' height='40' /></div><br/>Pull Links HTML Snippet"]
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/html.dark.svg' width='40' height='40' /></div><br/>Pull Publication Dates"]
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/itemLists.svg' width='40' height='40' /></div><br/>Split Date Items"]
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/itemLists.svg' width='40' height='40' /></div><br/>Split Link Items"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Combine Dates and Links"]
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Filter Recent Posts"]
n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Fetch Post Pages"]
n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/html.dark.svg' width='40' height='40' /></div><br/>Parse Post Details"]
n10["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Join Content with Metadata"]
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/openAi.dark.svg' width='40' height='40' /></div><br/>Generate Brief Summary"]
n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/openAi.dark.svg' width='40' height='40' /></div><br/>Derive Key Terms"]
n13@{ icon: "mdi:swap-vertical", form: "rounded", label: "Map Summary Field", pos: "b", h: 48 }
n14@{ icon: "mdi:swap-vertical", form: "rounded", label: "Map Keyword Field", pos: "b", h: 48 }
n15["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Combine AI Outputs"]
n16["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Merge AI with Metadata"]
n17["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/nocodb.svg' width='40' height='40' /></div><br/>Store in NocoDB Table"]
n15 --> n16
n11 --> n13
n12 --> n14
n3 --> n4
n8 --> n9
n13 --> n15
n14 --> n15
n6 --> n7
n4 --> n6
n5 --> n6
n9 --> n10
n0 --> n1
n7 --> n10
n7 --> n8
n10 --> n11
n10 --> n12
n10 --> n16
n16 --> n17
n2 --> n5
n1 --> n2
n1 --> n3
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n1,n8 api
class n7 code
classDef customIcon fill:none,stroke:none
class n1,n2,n3,n4,n5,n6,n7,n8,n9,n10,n11,n12,n15,n16,n17 customIcon
The Challenge: Tracking news with no RSS feed
Some news pages make monitoring simple. Others are a slow drip of frustration. When a site has no RSS feed (like Colt’s news site in this workflow), you’re stuck manually checking for updates, opening posts one-by-one, and trying to keep a “latest news” spreadsheet accurate. And the worst part is the mental overhead: remembering to check, remembering what you already saw, and remembering where you saved it. Miss a week and you’re backscrolling, guessing dates, and inevitably skipping items that mattered.
It adds up fast. Here’s where it breaks down in real teams.
- Someone checks the site “when they remember,” so updates get noticed late.
- Links get saved without context, which means you re-read the same announcement twice.
- Summaries are inconsistent because they depend on who had time that day.
- Keywords never get added, so searching your own archive becomes pointless.
The Fix: Weekly scraping, OpenAI summaries, NocoDB logging
This workflow runs on a weekly schedule and does the unglamorous work for you. It starts by pulling the news index page, then scrapes two separate pieces of data from that page: the links to each post and the publication dates (because many sites show dates outside the article content). Next, it pairs each link with its date, filters down to only the newest posts, and fetches each post page to extract the actual content. Once the workflow has clean text plus the metadata, OpenAI generates a brief summary and a set of technical keywords. Finally, everything is merged into one structured record and saved into a NocoDB table so you can search, sort, and reference it later.
The workflow starts with a weekly cron-style trigger. From there, HTTP requests and HTML parsing pull links, dates, and article content, and OpenAI produces summaries plus key terms. NocoDB becomes the final home for a clean archive that stays readable months from now.
What Changes: Before vs. After
| What This Eliminates | Impact You’ll See |
|---|---|
|
|
Real-World Impact
Say you monitor one competitor site weekly and it typically has about 10 posts visible on the index page. Manually, you’ll spend maybe 5 minutes checking what’s new, then another 10 minutes opening a few posts, plus 10 minutes writing notes and adding keywords. That’s roughly 30 minutes a week, and it’s easy to miss something when you’re rushed. With this workflow, the weekly trigger runs automatically, and your “human time” drops to a quick review inside NocoDB, maybe 5 minutes, because the summary and keywords are already there.
Requirements
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- OpenAI API for summaries and keyword extraction.
- NocoDB to store a searchable news table.
- OpenAI API key (get it from the OpenAI platform dashboard)
Skill level: Intermediate. You’ll paste credentials, adjust a few fields, and you should be comfortable finding CSS selectors via Inspect.
Need help implementing this? Talk to an automation expert (free 15-minute consultation).
The Workflow Flow
Weekly schedule trigger. Once a week, n8n kicks off the run automatically, so you’re not relying on someone’s calendar reminder.
Scrape links and publication dates. The workflow requests the news index page, then uses HTML extraction with CSS selectors to pull link snippets and date elements. It splits both lists, then merges them back together so each post has the right date attached.
Filter to only recent posts. A small code step compares dates and keeps only what’s new since the last run (or within your chosen window). This is what stops the database from filling with duplicates.
Fetch each post and generate AI output. For each new link, the workflow fetches the article page, parses the main content, then sends that text to OpenAI twice: once for a brief summary, and once to derive technical keywords. The outputs are mapped into clean fields and merged with the original metadata.
Store to NocoDB. The final merged record is saved into your NocoDB table with the link, date, summary, and keywords ready for searching and reporting.
You can easily modify the “recent posts” window to match your cadence based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Weekly Schedule Trigger
This workflow runs on a weekly cadence to scrape and summarize the latest news posts.
- Open Weekly Schedule Trigger and set the schedule rule to weekly timing.
- Set Weeks to trigger on day
3, Hour to4, and Minute to32. - Ensure the trigger is connected to Retrieve News Index Page.
Step 2: Connect the News Index Fetch and HTML Extraction
These nodes fetch the news index page and extract publication dates and links in parallel.
- In Retrieve News Index Page, set URL to
https://www.colt.net/resources/type/news/and keep the response format as text. - Confirm Retrieve News Index Page outputs to both Pull Links HTML Snippet and Pull Publication Dates in parallel.
- In Pull Links HTML Snippet, set Operation to
extractHtmlContentand keep the CSS selector=div:nth-child(9) > div:nth-child(3) > a:nth-child(2)with Return Value asattributeand Attribute ashref. - In Pull Publication Dates, set Operation to
extractHtmlContentand keep the CSS selectordiv:nth-child(9) > div:nth-child(2) > span:nth-child(1).
Step 3: Set Up List Processing and Date Filtering
This stage splits the extracted arrays, merges dates and links by position, and filters recent posts.
- In Split Date Items, set Field to Split Out to
dataand Destination Field Name toDate. - In Split Link Items, set Field to Split Out to
dataand Destination Field Name toLink. - In Combine Dates and Links, set Mode to
combineand Combination Mode tomergeByPosition. - In Filter Recent Posts, keep the JavaScript filter logic and adjust the window if needed. The current line uses
currentDate.getDate() - 70. - Confirm Filter Recent Posts outputs to both Join Content with Metadata and Fetch Post Pages in parallel.
Step 4: Configure Post Retrieval and Content Parsing
This branch fetches each filtered post and extracts the title and content for AI processing.
- In Fetch Post Pages, set URL to the expression
={{ $json["Link"] }}. - In Parse Post Details, set Operation to
extractHtmlContentand ensure the selectors areh1.fl-heading > span:nth-child(1)for title and.fl-node-5c7574ae7d5c6 > div:nth-child(1)for content. - In Join Content with Metadata, set Mode to
combineand Combination Mode tomergeByPosition.
Step 5: Set Up AI Summarization and Keyword Extraction
The workflow generates summaries and key terms in parallel, then merges these AI outputs with the metadata.
- In Generate Brief Summary, set Model to
gpt-4-1106-previewand keep the prompt content=Create a summary in less than 70 words {{ $json["content"] }}. Credential Required: Connect your openAiApi credentials. - In Derive Key Terms, set Model to
gpt-4-1106-previewand keep the prompt content=name the 3 most important technical keywords in {{ $json["content"] }} ? just name them without any explanations or other sentences. Credential Required: Connect your openAiApi credentials. - In Map Summary Field, set the field name to
=summaryand set the value to={{ $json["message"]["content"] }}, with Include set tonone. - In Map Keyword Field, set the field name to
keywordsand set the value to={{ $json["message"]["content"] }}, with Include set tonone. - Confirm Join Content with Metadata outputs to Generate Brief Summary, Derive Key Terms, and Merge AI with Metadata in parallel.
- In Combine AI Outputs and Merge AI with Metadata, set Mode to
combineand Combination Mode tomergeByPosition.
$json["message"]["content"]. If the response format changes, update these mappings.Step 6: Configure the NocoDB Storage Output
This step stores the enriched news records in your NocoDB table.
- Open Store in NocoDB Table and set Project ID to
[YOUR_ID]and Table to[YOUR_ID]. - Map fields as follows: News_Source to
=Colt, Title to={{ $json["title"] }}, Date to={{ $json["Date"] }}, Link to={{ $json["Link"] }}, Summary to={{ $json["summary"] }}, and Keywords to={{ $json["keywords"] }}. - Credential Required: Connect your nocoDbApiToken credentials.
Step 7: Test & Activate Your Workflow
Verify the workflow end-to-end before enabling weekly execution.
- Click Execute Workflow to run a manual test from Weekly Schedule Trigger.
- Confirm that Filter Recent Posts outputs only items within your date window, and that Fetch Post Pages and Join Content with Metadata both receive data in parallel.
- Check that Generate Brief Summary and Derive Key Terms return content and that Store in NocoDB Table creates new rows.
- When successful, toggle the workflow to Active for weekly automation.
Watch Out For
- NocoDB credentials can expire or need specific permissions. If things break, check the API token and table permissions in NocoDB first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Common Questions
About 30 minutes if you already have NocoDB and an OpenAI API key.
Yes, but you’ll want someone comfortable using browser Inspect to grab CSS selectors. The rest is mostly pasting credentials and testing a run.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API costs, which are usually a few cents per batch of articles depending on length.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
You can swap the “Retrieve News Index Page” and the HTML parsing nodes to match your target site’s structure, then update the CSS selectors used in “Pull Links HTML Snippet,” “Pull Publication Dates,” and “Parse Post Details.” Common tweaks include changing the “Filter Recent Posts” logic to your preferred time window, expanding the OpenAI prompts to produce a longer executive summary, and adjusting the NocoDB fields so keywords become tags you can filter on.
Usually it’s the wrong credential type (you need an API key, not a normal ChatGPT login) or a key that was rotated and never updated in n8n. It can also be a quota issue if you’re processing a lot of long pages at once, so check your OpenAI usage dashboard and try shortening the extracted content.
On a typical n8n setup, handling a few dozen new posts in a run is fine, and the limiter is usually website response time and OpenAI throughput. If you self-host, there’s no execution limit, but your server resources will decide how many pages you can fetch in parallel. On n8n Cloud, capacity depends on your plan’s monthly executions, so a weekly workflow like this is generally lightweight unless you expand it to many sites.
Often, yes. Web scraping with multi-step parsing (links, dates, merges, then per-link fetching) tends to get awkward in simple “trigger-action” tools, and you’ll feel it as soon as the site structure changes. n8n also makes it easier to add branching logic, deduping, and code-based filtering without paying extra per step. The bigger factor is control: self-hosting means you can run as many executions as your server can handle, which is handy if you later monitor 10 sites instead of one. Zapier or Make can still be a fine choice for simpler sources like RSS or newsletters. If you’re torn, Talk to an automation expert and we’ll map the cheapest reliable option.
Set this up once, and your weekly “did anything change?” routine turns into a clean table you can search in seconds. The workflow handles the repetition. You keep the insight.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.