Telegram to Gmail, voice notes turned into action items
You record a quick Telegram voice note, fully intending to “deal with it later.” Later turns into never, or worse, a vague memory and a half-written to-do list. Important details slip. Action items get missed. And the feedback you actually needed stays trapped in audio.
This Telegram voice automation hits marketers collecting campaign feedback first, honestly. But agency owners and busy operators get pulled into the same mess when voice notes become the default for updates and decisions. The goal is simple: turn voice into a clean email recap you can forward, search, and act on.
This workflow takes Telegram voice messages, transcribes them, analyzes sentiment and key points, then sends a structured Gmail summary with action items. You’ll see how it works, what you need, and where teams usually trip up.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Telegram to Gmail, voice notes turned into action items
flowchart LR
subgraph sg0["Voice Message Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Get a file"]
n1@{ icon: "mdi:cog", form: "rounded", label: "Wait", pos: "b", h: 48 }
n2@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If", pos: "b", h: 48 }
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Get Transcript"]
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Upload Audio to Assembly AI"]
n5@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Route: Text vs Audio", pos: "b", h: 48 }
n6@{ icon: "mdi:robot", form: "rounded", label: "Transcript Analysis (AI)", pos: "b", h: 48 }
n7@{ icon: "mdi:message-outline", form: "rounded", label: "Send Analysis Email", pos: "b", h: 48 }
n8@{ icon: "mdi:cog", form: "rounded", label: "The End", pos: "b", h: 48 }
n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Request Transcript from Asse.."]
n10["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Voice Message"]
n2 --> n6
n2 --> n1
n1 --> n3
n0 --> n4
n10 --> n5
n3 --> n2
n7 --> n8
n5 --> n0
n6 --> n7
n4 --> n9
n9 --> n1
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n10 trigger
class n6 ai
class n2,n5 decision
class n3,n4,n9 api
classDef customIcon fill:none,stroke:none
class n0,n3,n4,n9,n10 customIcon
The Problem: Voice Notes Don’t Turn Into Action
Voice notes feel efficient in the moment. You talk for 45 seconds, send it, and move on. Then someone has to replay it, take notes, pull out decisions, and translate that into tasks. If you’re juggling multiple clients, campaigns, or internal projects, those replays pile up fast. And because audio isn’t searchable, the “What did we decide?” question turns into a scavenger hunt through chat history and half-remembered context.
It adds up fast. Here’s where it breaks down in real life.
- People listen to the same note two or three times just to capture the details accurately.
- Action items end up scattered across DMs, sticky notes, and someone’s personal to-do app.
- Tone gets misread, so a “minor issue” turns into a priority fire drill (or the opposite).
- You can’t easily forward audio to stakeholders, which means decisions don’t travel.
The Solution: Telegram Voice → Transcript → Gmail Recap
This workflow turns every Telegram voice message into a structured email you can actually use. A message hits your Telegram bot, n8n grabs the audio file, and sends it to a speech-to-text service (AssemblyAI) for transcription. Once the transcript is ready, OpenAI analyzes it and produces a polished “exec-ready” recap: a short summary, sentiment label and score, key points, action items, notable quotes, and topics. Finally, Gmail delivers that report to the inboxes you choose, so the information becomes searchable, shareable, and easy to route to the right person.
The workflow starts when a voice note arrives in Telegram. It uploads audio, waits for transcription, and checks status until the transcript is complete. Then AI extracts the insights and Gmail sends the final report in a consistent format.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you collect five Telegram voice notes a week from a client and your team listens, rewinds, and summarizes each one. If that takes about 15 minutes per note, you’re spending roughly 75 minutes weekly just converting audio into something actionable. With this workflow, you send the note once, then wait for transcription and analysis (often around 5–15 minutes in the background) and get a Gmail recap automatically. Your manual time drops to basically zero beyond recording the note.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Telegram Bot to receive voice messages.
- AssemblyAI for speech-to-text transcription.
- OpenAI API key (get it from the OpenAI dashboard).
- Gmail account (OAuth2) to send the summary emails.
Skill level: Intermediate. You’ll paste API keys, connect Gmail OAuth, and adjust a couple of fields like the destination email address.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A Telegram voice note arrives. The workflow triggers from your Telegram bot and routes the message based on whether it contains audio (the template is built around voice, but routing leaves room for text too).
The audio gets uploaded for transcription. n8n fetches the Telegram file, then uses HTTP requests to upload it to AssemblyAI and start a transcript job. No downloading to your laptop. No manual uploads.
Status checks prevent half-finished outputs. A wait step gives the transcript time to process, then the workflow retrieves the transcript and checks if it’s complete. If it’s not ready yet, it waits and tries again, so the AI analysis runs on the final text.
AI turns raw text into a recap you can forward. OpenAI analyzes the transcript and returns a structured bundle: executive summary (about 120–180 words), sentiment label and score, key points, action items, notable quotes, and topics. Gmail sends that to your chosen recipient(s), then the workflow ends cleanly.
You can easily modify the email format to match your internal templates, or change the “who gets notified” logic based on keywords in the transcript. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Telegram Trigger
This workflow starts when a Telegram message is received, then routes voice messages for transcription.
- Add the Telegram Voice Trigger node to your canvas.
- Credential Required: Connect your telegramApi credentials in Telegram Voice Trigger.
- Set Updates to
messageto capture incoming messages. - Connect Telegram Voice Trigger to Route Audio or Text.
Step 2: Connect Telegram File Retrieval and Audio Upload
This step fetches the voice file from Telegram and uploads it to AssemblyAI for transcription.
- In Route Audio or Text, confirm the Voice rule uses
{{ $json.message.voice.file_id }}with the Exists operator, and the Text message rule uses{{ $json.message.text }}. - Configure Fetch Telegram File with Resource set to
fileand File ID set to{{ $json.message.voice.file_id }}. - Credential Required: Connect your telegramApi credentials in Fetch Telegram File.
- Set Upload Audio to Speech API with URL
https://api.assemblyai.com/v2/upload, MethodPOST, Content TypebinaryData, and Input Data Field Namedata. - In Upload Audio to Speech API, add header parameters: authorization
[CONFIGURE_YOUR_API_KEY]and Content-Typeapplication/json. - Connect Fetch Telegram File → Upload Audio to Speech API.
⚠️ Common Pitfall: If the AssemblyAI API key is missing or incorrect in Upload Audio to Speech API, the upload will fail and no transcript will be created.
Step 3: Create and Monitor the Transcript Job
This section creates the transcription job, waits, and polls AssemblyAI until the transcript is completed.
- Configure Request Transcript Job with URL
https://api.assemblyai.com/v2/transcriptand MethodPOST. - In Request Transcript Job, set the body parameter audio_url to
{{ $json.upload_url }}and add headers authorization[CONFIGURE_YOUR_API_KEY]and Content-Typeapplication/json. - Set Delay Processing Amount to
10to wait before polling. - Configure Retrieve Transcript with URL
=https://api.assemblyai.com/v2/transcript/{{ $json.id }}and include the same authorization headers. - In Check Transcript Status, set the condition to compare
{{ $json.status }}equalscompleted. - Ensure the flow is connected: Upload Audio to Speech API → Request Transcript Job → Delay Processing → Retrieve Transcript → Check Transcript Status.
Tip: If your audio files are long, increase Delay Processing or add a loop for additional polling.
Step 4: Set Up AI Transcript Insights and Email Output
When the transcript is ready, the AI model summarizes it and formats a detailed HTML email report.
- In AI Transcript Insights, set Model to
{{ "gpt-4.1-mini" }}and enable JSON Output. - Credential Required: Connect your openAiApi credentials in AI Transcript Insights.
- Verify the system prompt references the transcript:
{{ $node["Retrieve Transcript"].json["text"] }}. - Configure Dispatch Summary Email with Send To set to
=[YOUR_EMAIL]. - Set the Subject to
=Transcript summary & sentiment – {{ $now.toISO() }}. - Paste the provided HTML/JS message template into Message (keeps formatting, action items, quotes, and topics).
- Credential Required: Connect your gmailOAuth2 credentials in Dispatch Summary Email.
- Connect Check Transcript Status (true path) → AI Transcript Insights → Dispatch Summary Email → Finish Workflow.
⚠️ Common Pitfall: If AI Transcript Insights does not return valid JSON, the email template in Dispatch Summary Email may fail to render correctly.
Step 5: Test and Activate Your Workflow
Verify the end-to-end flow with a real Telegram voice message before enabling the workflow.
- Click Execute Workflow and send a voice message to your Telegram bot.
- Confirm the flow progresses through Fetch Telegram File, Upload Audio to Speech API, and Request Transcript Job.
- Wait for Retrieve Transcript and ensure Check Transcript Status passes when
{{ $json.status }}equalscompleted. - Check your inbox for the HTML summary from Dispatch Summary Email.
- When successful, toggle the workflow to Active for production use.
Common Gotchas
- Telegram Bot credentials can expire or the bot can lose access to the chat. If things break, check BotFather settings and confirm the chat is still allowed to message the bot.
- If you’re using Wait nodes or external transcription, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if you already have your API keys.
No. You’ll connect accounts, paste API keys, and edit a few fields like the recipient email.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in AssemblyAI transcription costs and OpenAI API usage (usually small per message).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s a common upgrade. After the transcript is retrieved (right before the AI analysis, or right after it), add a Google Sheets node to append a row with the transcript, sentiment, and action items. You can also swap the Gmail step so some summaries go to a shared inbox while the raw transcript goes to a sheet for search. If you want something prebuilt, the “store it in Sheets” pattern is very similar to how the workflow routes data already.
Usually it’s the bot token, chat permissions, or the bot simply not being allowed to read messages in that chat anymore. Recheck the Telegram credentials in n8n, then verify the bot is still present and receiving voice messages. If the workflow can trigger but can’t fetch the file, it’s often a file URL/access issue, so look closely at the “Fetch Telegram File” node configuration.
Plenty for most small teams: dozens per day is normal, and the real limits come from your n8n plan and transcription throughput. On n8n Cloud Starter you get a monthly execution cap, while higher plans handle more. If you self-host, you’re mostly limited by your server and API rate limits. Practically, this workflow processes one note end-to-end per run, so if you expect spikes (like 200 notes after an event), you’ll want a bigger server or a queueing approach.
Often, yes, because this workflow needs waiting, polling for transcript status, and conditional branching when a transcript isn’t ready yet. Zapier and Make can do it, but it tends to get awkward (and sometimes expensive) once you add retries and formatting. n8n is also easier to self-host, which matters if you’re processing a lot of voice notes. If you only want “voice in, email out” with no retries, the simpler tools can be fine. Talk to an automation expert if you want a quick recommendation based on your volume and workflow.
Once this is running, your Telegram voice notes stop being “stuff you’ll remember later” and start turning into decisions you can act on. The workflow handles the repetitive cleanup. You keep moving.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.