Telegram + OpenAI: voice notes into searchable text
Your Telegram chats are full of decisions, client details, and “don’t forget this” moments. And then they disappear into voice notes that nobody can search, skim, or copy into docs.
Marketing managers end up replaying audio to pull quotes. Agency owners lose action items across client threads. Even ops leads feel it when approvals live inside 45 seconds of mumbled context. This Telegram transcription automation turns voice into clean text you can actually use.
You’ll set up an n8n workflow that listens for voice notes, transcribes them with OpenAI, falls back to Gemini if needed, and posts readable text back to the same chat (even when messages are long).
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: Telegram + OpenAI: voice notes into searchable text
flowchart LR
subgraph sg0["Incoming Telegram Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Incoming Telegram Trigger"]
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Notify Start Transcription"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send Access Denied"]
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Alert Missing File"]
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Retrieve File for GPT"]
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Fetch File for Gemini"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Warn Unrecognized File"]
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send Transcript Output"]
n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send Chunked Output"]
n9@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Validate Sender Access", pos: "b", h: 48 }
n10@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Detect Message Type", pos: "b", h: 48 }
n11@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set Voice File ID", pos: "b", h: 48 }
n12@{ icon: "mdi:swap-vertical", form: "rounded", label: "Set Audio File ID", pos: "b", h: 48 }
n13@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Validate Audio Format", pos: "b", h: 48 }
n14@{ icon: "mdi:robot", form: "rounded", label: "OpenAI Transcription", pos: "b", h: 48 }
n15@{ icon: "mdi:robot", form: "rounded", label: "Gemini Transcription", pos: "b", h: 48 }
n16@{ icon: "mdi:swap-vertical", form: "rounded", label: "Map Text Variable", pos: "b", h: 48 }
n17@{ icon: "mdi:swap-vertical", form: "rounded", label: "Map Text Variable 2", pos: "b", h: 48 }
n18@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Check Text Length", pos: "b", h: 48 }
n19["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Split Text Chunks"]
n9 --> n10
n9 --> n2
n4 --> n14
n4 --> n1
n19 --> n8
n5 --> n15
n0 --> n9
n15 --> n17
n14 --> n16
n14 --> n5
n12 --> n13
n11 --> n13
n16 --> n18
n13 --> n4
n13 --> n6
n17 --> n18
n18 --> n7
n18 --> n19
n10 --> n11
n10 --> n12
n10 --> n3
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n14,n15 ai
class n9,n10,n13,n18 decision
class n19 code
classDef customIcon fill:none,stroke:none
class n0,n1,n2,n3,n4,n5,n6,n7,n8,n19 customIcon
Why This Matters: Voice Notes Aren’t Searchable
Voice notes feel fast in the moment. But later, they’re friction. Someone asks, “What did the client say about pricing?” and now you’re scrubbing through audio, turning volume up, rewinding, and hoping you caught the important part. Multiply that by a busy week of team chats and you get a quiet tax on your time. The worst part is the context loss: decisions don’t make it into your docs, and follow-ups get missed because the “real info” lived in audio.
It adds up fast. Here’s where it usually breaks down.
- People stop documenting because replaying audio is annoying, so knowledge stays trapped in Telegram.
- Manual transcription is slow and error-prone, especially when multiple people talk or accents vary.
- One long voice message can exceed Telegram’s 4,000-character limit when transcribed, so the text gets cut off or never sent.
- Without access control, anyone can trigger transcriptions and burn through AI credits (sometimes accidentally).
What You’ll Build: Secure Voice-to-Text in Telegram
This workflow turns Telegram voice messages into readable text replies, automatically. It starts when someone posts a voice note or audio file in your Telegram group, then verifies the sender is allowed to use the transcription service. If the message contains supported audio, n8n downloads the file and sends it to OpenAI for transcription (with a quick “transcription started” notice so people aren’t left guessing). If OpenAI errors out, the workflow routes the same file to Gemini as a backup. Finally, the transcribed text is posted back into the chat, and long transcripts are split into multiple messages so nothing gets chopped.
The flow is simple in practice. Telegram triggers the run, access control keeps usage clean, and AI handles the transcription. Then n8n formats the result for Telegram’s limits and delivers it right where your team already works.
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Say your team gets 10 voice notes a day and each one takes about 3 minutes to replay, pause, and type into something usable. That’s roughly 30 minutes daily, and it’s usually the worst 30 minutes because it interrupts real work. With this workflow, you drop the voice note as usual, get a “started” notification, then receive the transcript back in chat. Your manual time becomes close to zero, and long messages still arrive as multiple chunks instead of failing.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Telegram for receiving voice notes and posting transcripts.
- OpenAI to transcribe audio with Whisper.
- Google Gemini as backup transcription if OpenAI fails.
- OpenAI API key (get it from your OpenAI dashboard).
Skill level: Intermediate. You’ll connect Telegram + AI credentials, then adjust a few rules (authorized users, formats, and output behavior).
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
A Telegram message comes in. The workflow triggers on new messages in your group and captures sender details plus the message type (voice note, audio file, or just text).
Access gets checked immediately. An “if” rule verifies the sender against an approved list. If they’re not authorized, n8n replies with an access denied message and stops, which means you don’t spend AI credits on random requests.
Audio is detected and validated. The workflow figures out whether there’s a file to transcribe, pulls the correct Telegram file ID, and checks the audio format (OGG voice messages, MP3, M4A/MP4, and other supported types). If there’s no audio or an unknown format, it sends a clear warning back to the chat.
Transcription runs, with a fallback. n8n downloads the file and sends it to OpenAI for transcription while also posting a quick “started” notification. If OpenAI errors, the workflow automatically routes the same file to Gemini, then maps the resulting text into a single output variable so the rest of the workflow behaves the same.
The transcript is delivered safely. If the transcription is under Telegram’s 4,000-character limit, it posts once. If it’s longer, a code step splits it into readable chunks and sends multiple messages in sequence.
You can easily modify the authorized user list to match your team, or adjust how chunking behaves based on your chat style. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Telegram Trigger
Set up the entry point that listens for incoming Telegram messages and starts the workflow.
- Add and open Incoming Telegram Trigger.
- Set Updates to
message. - Credential Required: Connect your telegramApi credentials.
Tip: Ensure your Telegram bot is already created and allowed to receive messages from the users you plan to authorize.
Step 2: Validate Sender and Detect Message Type
Gate access to the bot and route messages based on whether they contain voice or audio files.
- Open Validate Sender Access and set the conditions to allow only authorized users, using
={{ $json.message.from.username }}with allowed valuesUser 2andUser 1. - Confirm Validate Sender Access routes to Detect Message Type on the true branch and to Send Access Denied on the false branch.
- In Detect Message Type, verify the rules check for voice and audio objects using
={{ $json.message.voice }}and={{ $json.message.audio }}, and keep Fallback Output set toextra. - Confirm Detect Message Type routes to Set Voice File ID, Set Audio File ID, or Alert Missing File depending on the message content.
⚠️ Common Pitfall: If the username in Validate Sender Access does not exactly match the sender’s Telegram username (case-sensitive), the workflow will always route to Send Access Denied.
Step 3: Prepare File IDs and Validate Audio Format
Extract the file ID from the message and enforce accepted audio MIME types before downloading the file.
- In Set Voice File ID, set the assignment file_id to
={{ $json.message.voice.file_id }}and enable Include Other Fields. - In Set Audio File ID, set the assignment file_id to
={{ $json.message.audio.file_id }}and enable Include Other Fields. - In Validate Audio Format, keep the MIME checks for
audio/ogg,audio/mpeg,audio/mp4, andaudio/m4ausing the expressions referencing Incoming Telegram Trigger. - Confirm the false branch of Validate Audio Format routes to Warn Unrecognized File.
Credential Required: Connect your telegramApi credentials to all Telegram action nodes (8 total) including Send Access Denied, Alert Missing File, Warn Unrecognized File, Notify Start Transcription, Retrieve File for GPT, Fetch File for Gemini, Send Transcript Output, and Send Chunked Output.
Step 4: Retrieve the File and Run Parallel Transcription
Download the media file and launch both the transcription process and a user notification in parallel.
- In Retrieve File for GPT, set Resource to
fileand File ID to={{ $json.file_id }}. - Retrieve File for GPT outputs to both OpenAI Transcription and Notify Start Transcription in parallel.
- In Notify Start Transcription, set Text to
Starting transcription. Please wait.and Chat ID to={{ $('Incoming Telegram Trigger').item.json.message.chat.id }}. - Credential Required: Connect your openAiApi credentials to OpenAI Transcription.
Tip: Parallel execution ensures the user is notified immediately while transcription starts in the background.
Step 5: Configure Gemini Fallback and Map Transcription Text
Send the file to Gemini and normalize both transcription outputs into a consistent text field.
- In Fetch File for Gemini, set Resource to
fileand File ID to={{ $('Retrieve File for GPT').item.json.result.file_id }}. - In Gemini Transcription, set Resource to
audio, Input Type tobinary, and Binary Property Name to=data. - Credential Required: Connect your googlePalmApi credentials to Gemini Transcription.
- In Map Text Variable, set text to
={{ $json.text }}. - In Map Text Variable 2, set text to
={{ $json.content.parts[0].text }}.
Step 6: Route by Text Length and Send Output
Send short transcripts as a single message and split long transcripts into chunks.
- In Check Text Length, keep the condition
={{ $json["text"].length }}lt4000. - On the true branch, send output via Send Transcript Output with Text set to
={{ $json.text }}and Chat ID set to={{ $('Incoming Telegram Trigger').item.json.message.chat.id }}. - On the false branch, use Split Text Chunks with the provided JavaScript to split into
4000-character chunks. - In Send Chunked Output, set Text to
={{ $json.body }}and Chat ID to={{ $('Incoming Telegram Trigger').item.json.message.chat.id }}.
⚠️ Common Pitfall: If Split Text Chunks is modified to return a different field than body, Send Chunked Output will send empty messages.
Step 7: Test & Activate Your Workflow
Verify end-to-end behavior with real Telegram messages before going live.
- Click Execute Workflow and send a voice or audio file to your Telegram bot.
- Confirm that Notify Start Transcription sends the “Starting transcription. Please wait.” message immediately.
- Check that transcription output appears via Send Transcript Output for short text or via multiple messages from Send Chunked Output for long text.
- Once verified, toggle the workflow to Active for production use.
Troubleshooting Tips
- Telegram credentials can expire or require the right bot permissions. If messages aren’t being received or sent, check your Telegram bot access in n8n’s Credentials first.
- If you’re using Wait-style timing anywhere else in your workspace or relying on external transcription responses, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts and settings in AI nodes can be generic. If you want consistent formatting (bullets, action items, speaker labels), define that output style early or you will be cleaning transcripts by hand.
Quick Answers
About 30 minutes if your Telegram bot and API keys are ready.
No. You’ll mostly connect accounts and edit a few simple rules. The only “code” piece is already included for splitting long text.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI and Gemini API usage (most teams find it cheap for voice notes).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. You can change the authorized user check in the “Validate Sender Access” step, swap transcription providers by adjusting the OpenAI and Gemini nodes, and tweak the “Split Text Chunks” logic if you want smaller replies or different formatting (like action items first).
Usually it’s the bot permissions or an expired credential in n8n. Confirm your bot is in the group, can read messages, and can post replies, then re-save the Telegram credential. If it triggers but can’t download files, the file ID mapping may not match the message type (voice vs audio), so check the message detection and “Set Voice File ID / Set Audio File ID” steps.
On n8n Cloud, it depends on your plan’s monthly executions, and each transcription is typically one execution. If you self-host, there’s no execution cap, so the real limit becomes your server and API rate limits. Practically, most small teams can handle dozens of voice notes per day without thinking about it. If you’re transcribing long audio all day, you’ll want queueing and stronger monitoring.
Often, yes. n8n handles branching logic (access control, message type detection, fallback to Gemini, and chunking) without turning your automation into a spaghetti monster. You also get a self-hosting option, which is a big deal if you expect high volume or want predictable costs. Zapier and Make can still work if your needs are simple, but this workflow has enough “ifs” that the pricing and complexity can creep up. If you’re torn, Talk to an automation expert and describe your volume plus your security needs.
Once this is running, voice notes stop being “lost context” and start becoming usable documentation. Set it up once, then let the workflow do the boring part.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.