arXiv to Notion, with Telegram digests you trust
Checking arXiv “quickly” turns into 40 open tabs, half-read PDFs, and that nagging feeling you still missed the one paper everyone will talk about tomorrow. It’s not the reading that breaks you. It’s the hunting, sorting, and saving.
Founders tracking what might shift their roadmap feel this pain first. A research lead trying to keep a small team aligned gets it too. And if you do marketing for an AI product, arXiv Notion automation is an easy way to stay sharp without living in feeds.
This workflow pulls new AI papers from arXiv, files them into Notion with clean metadata and PDF links, generates Gemini “deep summaries,” then sends a daily Telegram digest you can actually trust. You’ll see how it works, what you need, and where people usually get stuck.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: arXiv to Notion, with Telegram digests you trust
flowchart LR
subgraph sg0["Scheduled Daily Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Code (Parse Gemini JSON in c.."]
n1@{ icon: "mdi:swap-vertical", form: "rounded", label: "Split Results", pos: "b", h: 48 }
n2@{ icon: "mdi:play-circle", form: "rounded", label: "Scheduled Daily Trigger", pos: "b", h: 48 }
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>HTTP Request"]
n4@{ icon: "mdi:cog", form: "rounded", label: "Format Conversor", pos: "b", h: 48 }
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Dedupe with Static Data"]
n6@{ icon: "mdi:swap-vertical", form: "rounded", label: "Edit Fields for Notion (incl..", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/notion.dark.svg' width='40' height='40' /></div><br/>Register to Notion Database"]
n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "Freeze page_id", pos: "b", h: 48 }
n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/notion.dark.svg' width='40' height='40' /></div><br/>Append a block (adding 'bloc.."]
n10["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send a text message"]
n11@{ icon: "mdi:swap-vertical", form: "rounded", label: "Clean page_id", pos: "b", h: 48 }
n12@{ icon: "mdi:robot", form: "rounded", label: "Analyze doc (Prompt Ultra-Pro)", pos: "b", h: 48 }
n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/merge.svg' width='40' height='40' /></div><br/>Merge page_id & Summary"]
n14@{ icon: "mdi:swap-vertical", form: "rounded", label: "Debug before append", pos: "b", h: 48 }
n15["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/notion.dark.svg' width='40' height='40' /></div><br/>Append chunks as blocks"]
n16@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Execute workf..", pos: "b", h: 48 }
n17@{ icon: "mdi:cog", form: "rounded", label: "End Telegram branch (no furt..", pos: "b", h: 48 }
n18@{ icon: "mdi:swap-vertical", form: "rounded", label: "Process each paper (Gemini +..", pos: "b", h: 48 }
n19@{ icon: "mdi:cog", form: "rounded", label: "Return to paper loop (next p..", pos: "b", h: 48 }
n20@{ icon: "mdi:cog", form: "rounded", label: "Delay between paper summaries", pos: "b", h: 48 }
n21["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Prepare Notion payload (JSON)"]
n22["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Filter recent papers (last 2.."]
n3 --> n4
n11 --> n12
n11 --> n13
n1 --> n22
n8 --> n9
n4 --> n1
n14 --> n15
n10 --> n17
n15 --> n20
n5 --> n21
n13 --> n0
n2 --> n3
n7 --> n8
n7 --> n10
n20 --> n19
n21 --> n6
n12 --> n13
n22 --> n5
n19 --> n18
n6 --> n7
n0 --> n14
n16 --> n3
n9 --> n18
n18 --> n11
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n2,n16 trigger
class n12 ai
class n7,n9,n15 database
class n3 api
class n0,n5,n21,n22 code
classDef customIcon fill:none,stroke:none
class n0,n3,n5,n7,n9,n10,n13,n15,n21,n22 customIcon
The Problem: arXiv overload turns into missed papers
arXiv is an incredible resource, but it’s also a firehose. You open the AI feed intending to scan a handful of abstracts, and suddenly it’s a mini project: filter out duplicates, figure out what’s new since yesterday, click into PDFs, then save “the good ones” somewhere you will remember later. Even if you have a Notion database, getting papers in there usually means copy-pasting titles, IDs, authors, and links (and inevitably messing up one of them). The cost isn’t just time. It’s context switching, and it makes “keeping up” feel heavier than it should.
It adds up fast. Here’s where it breaks down in real life.
- You keep re-finding the same papers because there’s no reliable dedupe across days.
- Saving papers to Notion is manual, which means metadata gets inconsistent and searching later becomes annoying.
- Even when you grab the PDFs, summarizing them takes long enough that you postpone it, then forget why you saved them.
- Your “daily check” depends on willpower, so busy mornings quietly erase your research habit.
The Solution: arXiv → Notion pages + Gemini summaries + Telegram digest
This n8n workflow turns arXiv into a daily research assistant that runs at 08:00. It pulls the latest Artificial Intelligence papers from arXiv’s API, converts the feed into clean JSON, then filters to a recent time window so you’re not reprocessing yesterday’s list. Next, it removes duplicate records and builds a structured Notion page for each paper, including the core metadata and a direct PDF URL. After the Notion page exists, Gemini reads the PDF and produces a “deep research” summary in chunks, which are appended back into that same Notion page as readable blocks. Finally, the workflow posts a Telegram update with the title, a short abstract, and links to both the PDF and your Notion entry.
The workflow starts on a schedule (or manually when you’re testing). arXiv data gets cleaned, filtered, and deduped before anything is saved. Then Notion becomes your system of record, and Telegram becomes your daily reminder to actually look.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say your team wants to track 10 new AI papers a day. Manually, it’s maybe 6 minutes per paper to open arXiv, grab the PDF link, copy the title/authors, paste into Notion, and write even a rough 3–4 sentence takeaway. That’s about an hour, every weekday. With this workflow, you spend a couple minutes skimming the Telegram digest, then open only the 1–2 Notion pages that look relevant. The rest is already filed and summarized for you.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Notion for your paper database and summaries.
- Telegram to post a daily digest to a channel.
- Google Gemini API key (get it from Google AI Studio / Google Cloud console).
Skill level: Intermediate. You’ll mostly map fields and add credentials, plus light tweaking if your Notion schema is custom.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A daily scheduled trigger starts the run. The workflow is set to fire at 08:00, and there’s also a manual trigger for testing so you’re not waiting until tomorrow morning.
arXiv data is fetched and cleaned. n8n pulls the AI feed via HTTP Request, converts the XML into JSON, and splits entries into individual paper items that are easier to process.
Recent-only filtering and deduping keeps things sane. Code nodes apply a time window and remove duplicates, so your Notion database doesn’t slowly fill with repeats or stale results.
Notion becomes the archive, then Gemini fills in the insight. For each paper, the workflow creates a Notion page, inserts a “summary” heading, asks Gemini to produce a deep summary from the PDF, parses the summary into chunks, and appends them as rich-text blocks. A Wait node pauses between papers so you don’t overwhelm downstream calls.
You can easily modify the arXiv query to focus on specific topics (like agents, retrieval, or diffusion) based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Trigger Type
Set up both manual and scheduled triggers so you can run on-demand or daily.
- Open Manual Execution Trigger to enable manual runs (no configuration required).
- Open Scheduled Automation Start and confirm the rule interval is set to run at
8(hour of day). - Verify both triggers connect to External API Request so either trigger can start the workflow.
Step 2: Connect the Research Feed Source
Configure the arXiv API request and convert XML to JSON for downstream parsing.
- Open External API Request and set URL to
https://export.arxiv.org/api/query. - Under Query Parameters, set search_query to
abs:"artificial intelligence", sortBy tosubmittedDate, sortOrder todescending, start to0, and max_results to100. - Open XML to JSON Mapper and enable Merge Attributes so XML attributes are preserved.
- Open Split Feed Entries and set Field to Split Out to
feed.entry.
Step 3: Filter and Deduplicate Papers
Keep only recent entries and remove duplicates across runs.
- Open Filter Recent Papers and confirm the time window is
const HOURS = 24;in the JS code. - Review the output fields built in Filter Recent Papers (e.g.,
title,summary,authors,published). - Open Remove Duplicate Records and keep
$getWorkflowStaticData('global')for persistent deduplication.
Step 4: Set Up Notion Payload and Field Mapping
Build the Notion-ready payload and map fields before creating records.
- Open Compose Notion Payload and replace
TU_DATABASE_IDwith your Notion database ID. - Confirm Map Fields for Notion assignments such as title set to
{{ $json.properties.title.title[0].text.content }}and published set to{{ $json.properties.published.date.start }}. - Keep the abstract_clean cleaning expression exactly as configured:
{{ String($json.properties.abstract.rich_text[0].text.content || $json.summary).replace(/\\n/g, ' ').replace(/\n/g, ' ').replace(/\s+/g, ' ').trim() }}. - Confirm url_pdf builds the PDF URL with the expression
{{ (() => { const src = ($json.link || $json.properties.arxiv_id.rich_text[0].text.content || '').trim(); if (!src) return ''; let u = src.replace('/abs/', '/pdf/'); if (!/^https?:\/\/i.test(u)) u = 'https://' + u.replace(/^\/+/, ''); if (!u.toLowerCase().endsWith('.pdf')) u += '.pdf'; return u;})() }}.
Step 5: Configure Notion Record Creation and Iteration
Create database pages, capture page IDs, and prepare the loop for per-paper processing.
- Open Create Notion Record and set Title to
{{ $json.title }}with Resource set todatabasePage. - In Create Notion Record properties, confirm each field uses the mapped expressions, such as
{{ $json.abstract_clean }}for the abstract. - Open Store Page Identifier and keep Include Other Fields enabled, mapping
=page_idto{{ $json.id }}. - Open Insert Summary Heading and confirm the block type is heading_2 with text
Deep Research Summary. - Open Iterate Paper Processing to ensure batch processing is enabled for looped summary insertion.
Step 6: Configure AI Summarization and Notion Block Appending
Generate deep summaries, parse them into chunks, and append to the Notion page while pacing requests.
- Open Normalize Page Identifier and keep the expression
{{ $json.page_id || $json.parent?.page_id || $json.page_id || $json.results?.[0]?.parent?.page_id || $json.id }}to standardize page IDs. - In Generate Deep Summary, confirm Resource is
documentand Document URLs is{{ $('Map Fields for Notion').item.json.url_pdf }}. - Credential Required: Connect your googlePalmApi credentials in Generate Deep Summary and verify the model is
models/gemini-2.5-pro. - Confirm Normalize Page Identifier outputs to both Generate Deep Summary and Combine Page and Summary in parallel, and that Combine Page and Summary uses Mode
combinewith Combine BycombineByPosition. - Open Parse Summary Chunks and leave the JSON parsing code intact to extract
chunksintosliceitems. - Open Prepare Chunk Debug and confirm mappings like
{{ $json.slice.length }}for dbg_slice_len. - Open Append Summary Blocks and set block Text Content to
{{ $json.slice || '—' }}with Resourceblock. - Open Pause Between Summaries and set Amount to
0.4to throttle Notion updates. - Ensure Loop Back to Next Paper connects to Iterate Paper Processing to continue the loop.
Step 7: Configure Telegram Notifications
Send a Telegram update when each Notion page is created.
- Open Dispatch Telegram Update and set Chat ID to your target value, e.g.,
[YOUR_ID]. - Keep the message template as configured, including expressions like
{{ $('Create Notion Record').item.json.property_title }}and{{ $('Map Fields for Notion').item.json.abstract_clean }}. - Credential Required: Connect your telegramApi credentials in Dispatch Telegram Update.
- Confirm Dispatch Telegram Update outputs to End Telegram Path.
Step 8: Test and Activate Your Workflow
Run a manual test, verify outputs in Notion and Telegram, then activate the schedule.
- Click Execute Workflow using Manual Execution Trigger and watch data pass through External API Request, XML to JSON Mapper, and Split Feed Entries.
- Confirm new pages appear in Notion from Create Notion Record and that blocks are appended by Append Summary Blocks.
- Verify Telegram messages arrive from Dispatch Telegram Update with valid links and summary text.
- Once successful, activate the workflow and rely on Scheduled Automation Start for daily execution.
Common Gotchas
- Notion credentials can expire or need specific permissions. If things break, check your Notion integration connection inside n8n and confirm the database is shared with that integration.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Gemini API limits and PDF accessibility can cause flaky summaries. If the PDF link is blocked or too slow, the “Generate Deep Summary” step may return partial output, so review the run history for that node first.
Frequently Asked Questions
Plan on about 45 minutes if your Notion database and Telegram bot are ready.
No. You’ll connect accounts, paste in API keys, and map a few Notion properties.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Gemini API usage, which depends on how many PDFs you summarize per day.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s one of the best tweaks to make. Update the arXiv query in the “External API Request” HTTP Request node to target your preferred categories or search terms. Common customizations include narrowing to subtopics (like agents or RAG), increasing max_results on busy days, and changing the time-window logic in the “Filter Recent Papers” code node so you catch weekend releases.
Usually it’s permissions, not n8n. Make sure the Notion database is shared with your Notion integration, then re-check the credentials in n8n and re-select the database in the “Create Notion Record” node. If it fails only sometimes, look at the property mapping in “Map Fields for Notion,” because a mismatched select/tag value can cause the create call to error out.
Practically, it handles “as many as you’re willing to summarize,” because the bottleneck is the PDF summarization step. On n8n Cloud Starter you’re limited by monthly executions, while self-hosting has no execution limit (your server and API quotas matter more). If you expect 50+ papers a day, add a stricter filter (keywords, categories, or max_results) and keep the Wait node so you don’t spike API errors. Most people start with 10–20 papers daily and adjust from there. Honestly, you want fewer, better papers anyway.
For this workflow, n8n has a few advantages: more complex logic with unlimited branching at no extra cost, a self-hosting option for unlimited executions, and native code/looping support for chunked summaries that Zapier tends to make awkward. Zapier or Make can work if you only want “RSS in, message out,” but the Notion + PDF summarization + append-in-chunks part is where you’ll feel the limits. If you want to tune deduping, time windows, or the summary format, n8n stays flexible. Talk to an automation expert if you’re not sure which fits.
Once this is running, your research “system” stops depending on motivation. The workflow does the collecting and summarizing, and you just show up for the few papers worth your attention.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.