Telegram + Gemini: paper summaries delivered in chat
Your “quick paper check” turns into 12 open tabs, three half-read abstracts, and a note you’ll never find again. Then someone pings you for the takeaway. And you’re back to copy-pasting links, skimming PDFs, and trying to sound confident.
Marketing leads doing competitive research feel this. So do founders tracking new tech, and consultants who live on “send me the summary” requests. This Telegram Gemini summaries automation gives you a clean, readable paper summary right inside chat, without the tab circus.
You’ll see how the workflow handles links, voice notes, and screenshots, how it pulls the paper details, and how it delivers the result as a message or a file when the output gets long.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Telegram + Gemini: paper summaries delivered in chat
flowchart LR
subgraph sg0["Start Telegram Bot Flow"]
direction LR
n0@{ icon: "mdi:cog", form: "rounded", label: "Decodo", pos: "b", h: 48 }
n1@{ icon: "mdi:memory", form: "rounded", label: "Simple Memory", pos: "b", h: 48 }
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send Fallback Text"]
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Start Telegram Bot"]
n4@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Detect Message Type", pos: "b", h: 48 }
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Download Telegram Photo"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Download Telegram Voice"]
n7@{ icon: "mdi:robot", form: "rounded", label: "Analyze Image Content", pos: "b", h: 48 }
n8@{ icon: "mdi:robot", form: "rounded", label: "Transcribe Voice Message", pos: "b", h: 48 }
n9@{ icon: "mdi:swap-vertical", form: "rounded", label: "Format Image Text", pos: "b", h: 48 }
n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Format Voice Text", pos: "b", h: 48 }
n11@{ icon: "mdi:swap-vertical", form: "rounded", label: "Prepare Chat Data", pos: "b", h: 48 }
n12@{ icon: "mdi:robot", form: "rounded", label: "Research Summary Agent", pos: "b", h: 48 }
n14@{ icon: "mdi:brain", form: "rounded", label: "Gemini Research Model", pos: "b", h: 48 }
n15@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Check Telegram Message Length", pos: "b", h: 48 }
n16@{ icon: "mdi:cog", form: "rounded", label: "Convert Output to Text File", pos: "b", h: 48 }
n17["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send Research Summary File"]
n18["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send Research Summary Message"]
n0 -.-> n12
n1 -.-> n12
n9 --> n11
n10 --> n11
n11 --> n12
n3 --> n4
n4 --> n5
n4 --> n11
n4 --> n6
n4 --> n2
n7 --> n9
n14 -.-> n12
n12 --> n15
n5 --> n7
n6 --> n8
n8 --> n10
n16 --> n17
n15 --> n16
n15 --> n18
end
subgraph sg1["Generate Search URL Flow"]
direction LR
n13@{ icon: "mdi:brain", form: "rounded", label: "Gemini URL Interpreter", pos: "b", h: 48 }
n19@{ icon: "mdi:robot", form: "rounded", label: "Generate Search URL Insights", pos: "b", h: 48 }
n20@{ icon: "mdi:swap-vertical", form: "rounded", label: "Define Search URLs", pos: "b", h: 48 }
n20 --> n19
n13 -.-> n19
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n3 trigger
class n7,n8,n12,n19 ai
class n14,n13 aiModel
class n1 ai
class n4,n15 decision
classDef customIcon fill:none,stroke:none
class n2,n3,n5,n6,n17,n18 customIcon
The Problem: Paper reading gets scattered fast
Academic reading usually starts simple: “Just skim this one paper.” Then it spreads. You open arXiv, follow citations, jump to Google Scholar, and suddenly you’re juggling a dozen sources with no consistent note format. If someone sends a screenshot of a figure or a voice note like “can you check this claim?”, the friction is worse because the information isn’t even searchable. The cost isn’t only time. It’s context-switching, broken focus, and summaries that change tone depending on who wrote them that day.
Small frictions stack up. Here’s where it usually breaks down in real teams.
- You end up rewriting the same “TL;DR + key methods + limitations” structure over and over.
- Copy-pasting paper links into notes gets messy, so retrieval later is basically luck.
- Voice notes and screenshots turn into “I’ll handle it later,” which means they disappear.
- Manual summarizing invites mistakes, especially when you skim under pressure.
The Solution: A Telegram research assistant that summarizes for you
This workflow turns a Telegram bot into a research assistant that can understand what you send and respond with a useful summary. It starts when a message hits your bot (text, an image, or a voice note). If it’s a screenshot, Gemini reads the content and extracts the relevant text. If it’s voice, Gemini transcribes it into a clean prompt. Then the workflow builds a structured “chat payload” and hands it to a Research Synthesis Agent, which uses Gemini to interpret URLs and intent, and Decodo to scrape paper details like title, abstract, and publication metadata. Finally, you get the summary back in Telegram, either as a normal message or as a file if the response is too long for chat.
The workflow starts in Telegram and immediately routes the message by type. From there it converts whatever you sent into usable text, runs research + synthesis, then checks length and delivers the result in the right format. No extra apps to open.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you review 10 papers a week and you usually spend about 25 minutes each: open the link, skim the abstract, grab a few details, write a short takeaway, then paste it into wherever you keep notes. That’s about 4 hours a week. With this workflow, you forward the arXiv link (or drop a screenshot/voice note) in Telegram in under a minute, wait a couple minutes for the summary, and you’re done. Call it 30 minutes of your time for the week instead of half a day.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Telegram for receiving messages and delivering summaries.
- Google Gemini to parse, transcribe, extract, and summarize.
- Decodo to scrape paper details from scholarly sources.
Skill level: Intermediate. You’ll connect a Telegram bot, add API credentials, and adjust one system prompt placeholder.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A Telegram message kicks it off. You send an arXiv link, a typed question, a screenshot of a paper, or a voice note to your bot. The workflow’s trigger listens for new messages and immediately hands them to a routing step.
The workflow turns “whatever you sent” into clean text. Screenshots go through an image-inspection step (Gemini extracts the readable content). Voice notes go through transcription. Plain text goes straight through. At the end of this stage, everything looks the same to the summarizer, which is the whole point.
Research + synthesis happens in one place. A Gemini URL parser interprets academic URLs and intent, then the Research Synthesis Agent uses Decodo scraping to pull details like titles, abstracts, and publication info. After that, Gemini composes a concise summary that’s actually usable, not a vague paragraph.
Delivery adapts to message length. If the response is short enough, it comes back as a normal Telegram message. If it runs long, n8n converts it to a file and sends the file instead, so you don’t lose content to chat limits.
You can easily modify the summary structure to match your note template, or change what sources get scraped based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Telegram Trigger
This workflow starts when a user sends a message to your Telegram bot. Configure the trigger to listen for incoming messages.
- Add the Telegram Bot Trigger node and set Updates to
message. - Credential Required: Connect your Telegram API credentials to Telegram Bot Trigger.
- Confirm the trigger is connected to Route Message Type as the next node.
Step 2: Route and Build Chat Input from Telegram Messages
Incoming messages are routed by type (photo, text, voice), then transformed into a unified chat payload for the AI agent.
- In Route Message Type, add three rules with the following left values:
{{ $json.message.photo }},{{ $json.message.text }}, and{{ $json.message.voice }}, each with a notEmpty condition. - Connect the Photo output to Fetch Telegram Photo and set File ID to
{{ $json.message.photo[3].file_id }}with Resource set tofile. - Connect the Voice output to Fetch Telegram Voice and set File ID to
{{ $json.message.voice.file_id }}with Resource set tofile. - Send non-supported types to Send Fallback Notice with Text set to
Sorry, i can only support text, photo, and voice messageand Chat ID set to{{ $json.message.chat.id }}. - In Compose Image Text, set message.text to
Caption: {{ $('Telegram Bot Trigger').item.json.message.caption ?? "[none]" }} --- Image: {{ $json.content.parts[0].text }}. - In Compose Voice Text, set message.text to
{{ $json.content.parts[0].text }}. - In Assemble Chat Payload, set chatInput to
{{ $json.message.text }}and chatId to{{ $('Telegram Bot Trigger').item.json.message.chat.id }}. - Credential Required: Connect your Telegram API credentials to all Telegram action nodes (5+ nodes, including Fetch Telegram Photo, Fetch Telegram Voice, Send Fallback Notice, Deliver Summary File, and Deliver Summary Message).
$json.message.photo[3]. If the sender’s photo doesn’t have four sizes, this index can be missing. Adjust the index or add a fallback if needed.Step 3: Configure URL Insight Generation
Before research starts, the workflow defines academic URLs and parses their parameters for the AI agent to use.
- In Set Search URLs, set urls to
{{[\n "https://scholar.google.com/scholar?q=artificial+intelligence&hl=en&as_sdt=0,5",\n "https://scholar.google.com/scholar?as_ylo=2025&q=artificial+intelligence&hl=en&as_sdt=0,5",\n "https://arxiv.org/search/?query=artificial+intelligence&searchtype=all&abstracts=show&order=-announced_date_first",\n "https://arxiv.org/search/?query=artificial+intelligence&searchtype=all&source=header",\n]}}. - Connect Set Search URLs to Generate URL Insights and keep Prompt Type as
define. - Ensure Gemini URL Parser is connected as the AI language model for Generate URL Insights.
- Credential Required: Connect your Google Gemini credentials to Gemini URL Parser.
Step 4: Set Up AI Research Processing
The AI agent receives the user’s query and the URL insights, scrapes academic sources, and returns a structured summary.
- In Inspect Image Content, set Resource to
image, Input Type tobinary, and Operation toanalyze. - In Transcribe Voice Audio, set Resource to
audioand Input Type tobinary. - Connect Gemini Research Chat as the language model for Research Synthesis Agent.
- Credential Required: Connect your Google Gemini credentials to Inspect Image Content, Transcribe Voice Audio, and Gemini Research Chat.
- Confirm Data Scraper Tool is connected as an AI tool for Research Synthesis Agent and Session Memory Buffer is connected as AI memory.
- Credential Required: Add credentials for AI tools on the parent Research Synthesis Agent connection (not on Data Scraper Tool or Session Memory Buffer directly).
{{INPUT_SEARCH_URL_INSIGHTS}}, which is supplied by Generate URL Insights.Step 5: Configure Output Delivery
The workflow checks response length and sends either a message or a text file back to Telegram.
- In Check Message Length, set the condition to Number > 4000 using
{{ $json.output.length }}as the left value. - On the true branch, configure Convert Output to File with Operation
toTextand Source Propertyoutput. - In Deliver Summary File, set Chat ID to
{{ $('Assemble Chat Payload').item.json.chatId }}and Operation tosendDocument. - On the false branch, configure Deliver Summary Message with Text
{{ $json.output }}and Chat ID{{ $('Assemble Chat Payload').item.json.chatId }}.
Step 6: Test & Activate Your Workflow
Validate the trigger, routing, AI processing, and output delivery before enabling production use.
- Click Execute Workflow and send a text, photo, and voice message to your bot to verify all routes from Route Message Type.
- Confirm Inspect Image Content and Transcribe Voice Audio return text in Compose Image Text and Compose Voice Text.
- Verify Research Synthesis Agent produces an
outputfield and that Check Message Length routes correctly. - Check Telegram for either a message from Deliver Summary Message or a file from Deliver Summary File.
- Once tests succeed, toggle the workflow to Active to run continuously.
Common Gotchas
- Telegram bot credentials can expire or be pasted incorrectly. If messages never trigger the workflow, check the Telegram Bot Trigger node’s token first.
- If you’re using Wait nodes or external processing (like transcription and scraping), processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in Gemini/AI Agent nodes are generic. Add your preferred format (bullets, “key claims,” “limitations,” citations) early or you will be editing outputs forever.
Frequently Asked Questions
About an hour if your Telegram bot and Gemini credentials are ready.
No. You’ll connect accounts, paste API keys, and edit a couple of text fields in n8n.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Gemini and Decodo usage costs, which depend on how many messages and papers you process.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. Update the instructions inside the Research Synthesis Agent to enforce your structure (for example: 5 bullets, then “methods,” then “limitations,” then “why it matters”). You can also tweak the Gemini URL parser prompt so it prioritizes arXiv vs. Google Scholar links depending on what your team uses most. If you want every summary stored, add Airtable or Google Sheets after the delivery step.
Usually it’s a bad or rotated bot token in the Telegram Bot Trigger or Send Message nodes. Double-check the credential selected in n8n, then message the bot directly to confirm it still responds. If photos and voice notes fail but text works, the issue is often missing permissions or an expired file fetch request when pulling media from Telegram.
A lot, as long as your n8n execution limits and API quotas keep up.
For this use case, n8n is usually the better fit because the logic branches (text vs. image vs. voice) are easy to manage, and AI + file handling workflows get expensive fast on Zapier-style task pricing. Self-hosting is also a big deal if you expect lots of messages, since you are not paying per tiny step. That said, if you only want “Telegram link in, summary out” with no media support, Zapier or Make can be quicker to click together. The real difference is control: prompts, parsing, and length-based file delivery are simpler to fine-tune in n8n. Talk to an automation expert if you want help choosing.
This is the kind of automation that quietly fixes your week. Fewer tabs, better notes, and summaries you can actually trust when you need to act fast.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.