Telegram + OpenAI Whisper: readable voice note replies

Voice notes are convenient… until you need to find that one detail later. Then it turns into replaying audio in public, missing context, and losing decisions inside a messy chat history.

This Telegram Whisper replies automation hits support leads first, because “what did they say?” slows down every handoff. But agency owners and ops managers feel it too when clients send updates by voice and nobody wants to misquote them.

This workflow turns Telegram voice notes into readable text and posts the transcription right back into the same thread. You’ll see how it works, what you need, and where teams usually trip up.

How This Automation Works

The full n8n workflow, from trigger to final output:

n8n Workflow Template: Telegram + OpenAI Whisper: readable voice note replies

Click to explore

flowchart LR

    subgraph sg0["Telegram Message Hook Flow"]
        direction LR
        n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Fetch Voice File"]
        n1@{ icon: "mdi:robot", form: "rounded", label: "Convert Audio to Text", pos: "b", h: 48 }
        n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Telegram Message Hook"]
        n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Dispatch Transcript Reply"]
        n4@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Route Incoming Chat", pos: "b", h: 48 }
        n0 --> n1
        n2 --> n4
        n4 --> n3
        n4 --> n0
        n1 --> n3
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n2 trigger
    class n1 ai
    class n4 decision
    classDef customIcon fill:none,stroke:none
    class n0,n2,n3 customIcon

The Problem: Voice Notes Create Hidden Work

In theory, voice notes save time. In reality, they push the work downstream. Someone has to listen, pull out the key facts, and then restate them in writing so the rest of the team can act. If you’re running support, operations, or client work, that “someone” is often you. It’s also fragile: one missed number, one misunderstood date, one background-noise moment, and you’re replying with the wrong details. Multiply that by a few conversations a day and it becomes a quiet tax on focus.

It adds up fast. Here’s where it breaks down in real life.

You end up replaying the same voice note two or three times just to catch names, dates, and the actual ask.
Chats stop being searchable, so “we already answered this” turns into more time spent hunting.
Hand-offs get messy because teammates can’t quickly skim audio while juggling other tickets.
Important requests sit idle because nobody wants to listen right now, which means slower responses and frustrated customers.

The Solution: Auto-Transcribe Telegram Voice Notes and Reply in-Thread

This n8n workflow listens for every new Telegram message your bot receives. When the message is plain text, it simply treats it as readable content and prepares a clean reply path. When the message is a voice note, the workflow fetches the audio file from Telegram, sends it to OpenAI Whisper (the whisper-1 speech-to-text model), and converts it into a text transcript. Then it posts that transcription back into the same Telegram chat, so the conversation becomes readable and searchable immediately. No copy-paste. No extra apps. Just a clear, written version of what was said, exactly where your team already works.

The workflow starts with a Telegram message trigger. A routing step splits voice from text, then the voice branch downloads the file and transcribes it in OpenAI. Finally, one send-message step posts the final text back to Telegram.

What You Get: Automation vs. Results

What This Workflow Automates

Results You’ll Get

Detects whether an incoming Telegram message is text or voice.
Downloads the Telegram voice file automatically when needed.
Sends audio to OpenAI Whisper and receives a transcript.
Replies back to the same Telegram chat with the readable text.

Save about 10 minutes per voice note you no longer replay.
Make chat history searchable, which reduces repeated questions.
Speed up handoffs because teammates can skim instead of listening.
Reduce “I heard it differently” mistakes in client and support threads.
Build a clean base for follow-up automation (summaries, tasks, logging).

Example: What This Looks Like

Say your team receives 10 voice notes a day in Telegram from customers or clients. Manually, you might spend about 10 minutes per note to listen, re-listen, and type a clean reply, so that’s roughly 100 minutes daily. With this workflow, the “work” becomes sending the voice note as usual, then waiting about a minute for the transcription to appear in-thread. That’s about an hour and a half back each day, and the transcript is there for anyone to search later.

What You’ll Need

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Telegram for receiving messages via a bot
OpenAI to transcribe voice notes with Whisper
Telegram Bot Token (get it from BotFather in Telegram)
OpenAI API Key (get it from your OpenAI dashboard)

Skill level: Beginner. You’ll mostly paste API keys, connect accounts, and test with a real voice note.

Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).

How It Works

A Telegram message arrives. The workflow triggers the moment your bot receives a new message in Telegram.

The workflow routes voice vs. text. A router checks the message type. Text goes straight through as-is, while voice notes take the audio path.

Voice notes get downloaded and transcribed. n8n fetches the voice file from Telegram, then sends it to OpenAI Whisper to convert speech into clean, readable text.

The transcript is posted back in the same chat. One send-message action replies with the final text, so the thread becomes easy to scan and search.

You can easily modify the reply format to include timestamps, speaker labels, or a short summary based on your needs. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Telegram Trigger

Set up the workflow to listen for incoming Telegram messages and route them into the logic flow.

Add the Telegram Message Hook node as your trigger.
Set Updates to message.
Credential Required: Connect your telegramApi credentials in Telegram Message Hook.
Confirm the node connects to Route Incoming Chat.

Tip: Ensure your Telegram bot is already started by a user; otherwise, incoming messages won’t reach Telegram Message Hook.

Step 2: Route Incoming Text vs Voice

Use a switch to split text and voice messages into their appropriate paths.

Open Route Incoming Chat and add two rules.
For the text branch, set the Left Value to ={{ $json.message.text }} and use the exists operator.
For the voice branch, set the Left Value to ={{ $json.message.voice }} and use the exists operator.
Ensure Route Incoming Chat connects to Dispatch Transcript Reply for text and to Fetch Voice File for voice messages.

⚠️ Common Pitfall: If the switch conditions use the wrong data path, voice messages will be treated as text and skip transcription.

Step 3: Set Up Audio Transcription

Download the voice file from Telegram and send it to the transcription node.

Open Fetch Voice File and set Resource to file.
Set File ID to ={{ $json.message.voice.file_id }}.
Credential Required: Connect your telegramApi credentials in Fetch Voice File.
Open Convert Audio to Text, set Resource to audio, and Operation to transcribe.
Credential Required: Connect your openAiApi credentials in Convert Audio to Text.
Confirm Fetch Voice File outputs to Convert Audio to Text, and Convert Audio to Text outputs to Dispatch Transcript Reply.

Step 4: Configure the Reply to Telegram

Send a response back to the user with the text or transcript.

Open Dispatch Transcript Reply and set Text to ={{ $json.message.text }} {{ $json.text }}.
Set Chat ID to ={{ $('Telegram Message Hook').item.json.message.chat.id }}.
In Additional Fields, set Append Attribution to false.
Credential Required: Connect your telegramApi credentials in Dispatch Transcript Reply.

Tip: The reply merges both plain text and transcription results, so text messages still return immediately while voice messages include the transcript.

Step 5: Test and Activate Your Workflow

Validate that text and voice messages flow through the correct paths before enabling the automation.

Click Execute Workflow in n8n to start a manual test.
Send a text message and a voice message to your Telegram bot.
Confirm Route Incoming Chat routes text to Dispatch Transcript Reply and voice to Fetch Voice File then Convert Audio to Text.
Verify the bot replies with the original text and/or transcription in Telegram.
Toggle the workflow to Active to run it in production.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Common Gotchas

Telegram bot credentials can expire or be misconfigured. If messages stop triggering, check the bot token inside the Telegram Trigger node settings first.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
OpenAI API access can fail due to billing limits or missing permissions. If transcription errors appear, confirm your OpenAI API key is active and your account has available quota.

Frequently Asked Questions

How long does it take to set up this Telegram Whisper replies automation?

About 20 minutes if you already have your Telegram bot token and OpenAI API key.

Do I need coding skills to automate Telegram Whisper replies?

No. You will connect Telegram and OpenAI, then test the voice-note branch.

Is n8n free to use for this Telegram Whisper replies workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API costs, which are usually small for short voice notes.

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I customize this Telegram Whisper replies workflow for saving transcripts to Google Sheets?

Yes, and it’s a common upgrade. After the transcription is created, add a Google Sheets “append row” action and write the transcript, sender name, and date into a new row. You can also include the Telegram chat ID so you can trace it back later.

Why is my Telegram connection failing in this workflow?

Usually it’s the bot token. Regenerate or re-copy the token from BotFather, then update the Telegram credentials in n8n. Also confirm the bot is actually in the chat and has permission to read messages, because private/group settings can block delivery. If it fails only sometimes, Telegram API rate limits or temporary network issues can be the culprit.

How many voice notes can this Telegram Whisper replies automation handle?

It depends more on your n8n plan and your server than the workflow itself. On n8n Cloud, your monthly execution limit caps volume, so high-traffic support inboxes may need a higher plan. If you self-host, there’s no hard execution limit, but you’ll want enough CPU and memory to process bursts. Practically, most small teams can run this all day without thinking about it unless they’re receiving nonstop voice notes.

Is this Telegram Whisper replies automation better than using Zapier or Make?

Often, yes. Whisper transcription usually needs a few moving parts (download file, send to transcription, then post a clean reply), and n8n handles that kind of branching without feeling cramped. Self-hosting is also a big deal if you expect a lot of messages and don’t want every extra step to increase your bill. Zapier or Make can still work if you prefer their UI, but file handling gets fiddly fast. If you want help choosing, Talk to an automation expert.

Once this is running, voice notes stop being a bottleneck and start being usable information. The workflow handles the repetitive listening and typing so you can focus on the response.

Telegram + OpenAI Whisper: readable voice note replies

How This Automation Works

n8n Workflow Template: Telegram + OpenAI Whisper: readable voice note replies

The Problem: Voice Notes Create Hidden Work