Telegram + OpenAI Whisper: readable voice note replies
Voice notes are convenient… until you need to find that one detail later. Then it turns into replaying audio in public, missing context, and losing decisions inside a messy chat history.
This Telegram Whisper replies automation hits support leads first, because “what did they say?” slows down every handoff. But agency owners and ops managers feel it too when clients send updates by voice and nobody wants to misquote them.
This workflow turns Telegram voice notes into readable text and posts the transcription right back into the same thread. You’ll see how it works, what you need, and where teams usually trip up.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Telegram + OpenAI Whisper: readable voice note replies
flowchart LR
subgraph sg0["Telegram Message Hook Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Fetch Voice File"]
n1@{ icon: "mdi:robot", form: "rounded", label: "Convert Audio to Text", pos: "b", h: 48 }
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Telegram Message Hook"]
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Dispatch Transcript Reply"]
n4@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Route Incoming Chat", pos: "b", h: 48 }
n0 --> n1
n2 --> n4
n4 --> n3
n4 --> n0
n1 --> n3
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n2 trigger
class n1 ai
class n4 decision
classDef customIcon fill:none,stroke:none
class n0,n2,n3 customIcon
The Problem: Voice Notes Create Hidden Work
In theory, voice notes save time. In reality, they push the work downstream. Someone has to listen, pull out the key facts, and then restate them in writing so the rest of the team can act. If you’re running support, operations, or client work, that “someone” is often you. It’s also fragile: one missed number, one misunderstood date, one background-noise moment, and you’re replying with the wrong details. Multiply that by a few conversations a day and it becomes a quiet tax on focus.
It adds up fast. Here’s where it breaks down in real life.
- You end up replaying the same voice note two or three times just to catch names, dates, and the actual ask.
- Chats stop being searchable, so “we already answered this” turns into more time spent hunting.
- Hand-offs get messy because teammates can’t quickly skim audio while juggling other tickets.
- Important requests sit idle because nobody wants to listen right now, which means slower responses and frustrated customers.
The Solution: Auto-Transcribe Telegram Voice Notes and Reply in-Thread
This n8n workflow listens for every new Telegram message your bot receives. When the message is plain text, it simply treats it as readable content and prepares a clean reply path. When the message is a voice note, the workflow fetches the audio file from Telegram, sends it to OpenAI Whisper (the whisper-1 speech-to-text model), and converts it into a text transcript. Then it posts that transcription back into the same Telegram chat, so the conversation becomes readable and searchable immediately. No copy-paste. No extra apps. Just a clear, written version of what was said, exactly where your team already works.
The workflow starts with a Telegram message trigger. A routing step splits voice from text, then the voice branch downloads the file and transcribes it in OpenAI. Finally, one send-message step posts the final text back to Telegram.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say your team receives 10 voice notes a day in Telegram from customers or clients. Manually, you might spend about 10 minutes per note to listen, re-listen, and type a clean reply, so that’s roughly 100 minutes daily. With this workflow, the “work” becomes sending the voice note as usual, then waiting about a minute for the transcription to appear in-thread. That’s about an hour and a half back each day, and the transcript is there for anyone to search later.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Telegram for receiving messages via a bot
- OpenAI to transcribe voice notes with Whisper
- Telegram Bot Token (get it from BotFather in Telegram)
- OpenAI API Key (get it from your OpenAI dashboard)
Skill level: Beginner. You’ll mostly paste API keys, connect accounts, and test with a real voice note.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A Telegram message arrives. The workflow triggers the moment your bot receives a new message in Telegram.
The workflow routes voice vs. text. A router checks the message type. Text goes straight through as-is, while voice notes take the audio path.
Voice notes get downloaded and transcribed. n8n fetches the voice file from Telegram, then sends it to OpenAI Whisper to convert speech into clean, readable text.
The transcript is posted back in the same chat. One send-message action replies with the final text, so the thread becomes easy to scan and search.
You can easily modify the reply format to include timestamps, speaker labels, or a short summary based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Telegram Trigger
Set up the workflow to listen for incoming Telegram messages and route them into the logic flow.
- Add the Telegram Message Hook node as your trigger.
- Set Updates to
message. - Credential Required: Connect your telegramApi credentials in Telegram Message Hook.
- Confirm the node connects to Route Incoming Chat.
Step 2: Route Incoming Text vs Voice
Use a switch to split text and voice messages into their appropriate paths.
- Open Route Incoming Chat and add two rules.
- For the text branch, set the Left Value to
={{ $json.message.text }}and use the exists operator. - For the voice branch, set the Left Value to
={{ $json.message.voice }}and use the exists operator. - Ensure Route Incoming Chat connects to Dispatch Transcript Reply for text and to Fetch Voice File for voice messages.
Step 3: Set Up Audio Transcription
Download the voice file from Telegram and send it to the transcription node.
- Open Fetch Voice File and set Resource to
file. - Set File ID to
={{ $json.message.voice.file_id }}. - Credential Required: Connect your telegramApi credentials in Fetch Voice File.
- Open Convert Audio to Text, set Resource to
audio, and Operation totranscribe. - Credential Required: Connect your openAiApi credentials in Convert Audio to Text.
- Confirm Fetch Voice File outputs to Convert Audio to Text, and Convert Audio to Text outputs to Dispatch Transcript Reply.
Step 4: Configure the Reply to Telegram
Send a response back to the user with the text or transcript.
- Open Dispatch Transcript Reply and set Text to
={{ $json.message.text }} {{ $json.text }}. - Set Chat ID to
={{ $('Telegram Message Hook').item.json.message.chat.id }}. - In Additional Fields, set Append Attribution to
false. - Credential Required: Connect your telegramApi credentials in Dispatch Transcript Reply.
Step 5: Test and Activate Your Workflow
Validate that text and voice messages flow through the correct paths before enabling the automation.
- Click Execute Workflow in n8n to start a manual test.
- Send a text message and a voice message to your Telegram bot.
- Confirm Route Incoming Chat routes text to Dispatch Transcript Reply and voice to Fetch Voice File then Convert Audio to Text.
- Verify the bot replies with the original text and/or transcription in Telegram.
- Toggle the workflow to Active to run it in production.
Common Gotchas
- Telegram bot credentials can expire or be misconfigured. If messages stop triggering, check the bot token inside the Telegram Trigger node settings first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- OpenAI API access can fail due to billing limits or missing permissions. If transcription errors appear, confirm your OpenAI API key is active and your account has available quota.
Frequently Asked Questions
About 20 minutes if you already have your Telegram bot token and OpenAI API key.
No. You will connect Telegram and OpenAI, then test the voice-note branch.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API costs, which are usually small for short voice notes.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s a common upgrade. After the transcription is created, add a Google Sheets “append row” action and write the transcript, sender name, and date into a new row. You can also include the Telegram chat ID so you can trace it back later.
Usually it’s the bot token. Regenerate or re-copy the token from BotFather, then update the Telegram credentials in n8n. Also confirm the bot is actually in the chat and has permission to read messages, because private/group settings can block delivery. If it fails only sometimes, Telegram API rate limits or temporary network issues can be the culprit.
It depends more on your n8n plan and your server than the workflow itself. On n8n Cloud, your monthly execution limit caps volume, so high-traffic support inboxes may need a higher plan. If you self-host, there’s no hard execution limit, but you’ll want enough CPU and memory to process bursts. Practically, most small teams can run this all day without thinking about it unless they’re receiving nonstop voice notes.
Often, yes. Whisper transcription usually needs a few moving parts (download file, send to transcription, then post a clean reply), and n8n handles that kind of branching without feeling cramped. Self-hosting is also a big deal if you expect a lot of messages and don’t want every extra step to increase your bill. Zapier or Make can still work if you prefer their UI, but file handling gets fiddly fast. If you want help choosing, Talk to an automation expert.
Once this is running, voice notes stop being a bottleneck and start being usable information. The workflow handles the repetitive listening and typing so you can focus on the response.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.