Telegram + Google Speech-to-Text, faster support

Voice notes in Telegram feel convenient right up until you have to act on them. You can’t search them, you can’t skim them, and forwarding “that one message from yesterday” turns into a time sink.

This hits support teams first, but operations leads and client-facing agency owners deal with it too. With Telegram voice transcription automation in n8n, every voice message becomes clean text, matched to the right member, then routed to the right flow.

Below you’ll see how the workflow works, what you get out of it, and what to watch for when you turn it on in a real support environment.

How This Automation Works

The full n8n workflow, from trigger to final output:

n8n Workflow Template: Telegram + Google Speech-to-Text, faster support

Click to explore

flowchart LR

    subgraph sg0["Telegram Flow"]
        direction LR
        n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Telegram Trigger"]
        n1@{ icon: "mdi:play-circle", form: "rounded", label: "Test Trigger", pos: "b", h: 48 }
        n2@{ icon: "mdi:swap-vertical", form: "rounded", label: "Test Input", pos: "b", h: 48 }
        n3@{ icon: "mdi:swap-vertical", form: "rounded", label: "Telegram Input", pos: "b", h: 48 }
        n4@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Type Switch", pos: "b", h: 48 }
        n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Download Audio"]
        n6@{ icon: "mdi:cog", form: "rounded", label: "Extract from File", pos: "b", h: 48 }
        n7@{ icon: "mdi:cog", form: "rounded", label: "Google STT", pos: "b", h: 48 }
        n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "Telegram Voice Input", pos: "b", h: 48 }
        n9@{ icon: "mdi:swap-vertical", form: "rounded", label: "Input", pos: "b", h: 48 }
        n10@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If Telegram", pos: "b", h: 48 }
        n11@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If Active", pos: "b", h: 48 }
        n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse Service"]
        n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/redis.svg' width='40' height='40' /></div><br/>Member Cache"]
        n14@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If Member Cache", pos: "b", h: 48 }
        n15["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/postgres.svg' width='40' height='40' /></div><br/>Load Memer Data"]
        n16["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/redis.svg' width='40' height='40' /></div><br/>Save Member Cache"]
        n17@{ icon: "mdi:swap-vertical", form: "rounded", label: "Member", pos: "b", h: 48 }
        n18@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Switch", pos: "b", h: 48 }
        n19@{ icon: "mdi:swap-vertical", form: "rounded", label: "English", pos: "b", h: 48 }
        n20@{ icon: "mdi:swap-vertical", form: "rounded", label: "yue-Hant-HK", pos: "b", h: 48 }
        n21@{ icon: "mdi:swap-vertical", form: "rounded", label: "cmn-Hant-TW", pos: "b", h: 48 }
        n22@{ icon: "mdi:swap-vertical", form: "rounded", label: "cmn-Hans-CN", pos: "b", h: 48 }
        n23@{ icon: "mdi:swap-vertical", form: "rounded", label: "ja-JP", pos: "b", h: 48 }
        n24@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If Transcript", pos: "b", h: 48 }
        n25@{ icon: "mdi:swap-vertical", form: "rounded", label: "No Transcript Input", pos: "b", h: 48 }
        n26@{ icon: "mdi:cog", form: "rounded", label: "Demo Call Back", pos: "b", h: 48 }
        n27@{ icon: "mdi:cog", form: "rounded", label: "Demo Call Center", pos: "b", h: 48 }
        n28["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Telegram Test Output"]
        n9 --> n27
        n9 --> n10
        n23 --> n7
        n17 --> n4
        n18 --> n20
        n18 --> n22
        n18 --> n21
        n18 --> n23
        n18 --> n19
        n19 --> n7
        n11 --> n16
        n7 --> n24
        n2 --> n9
        n10 --> n28
        n4 --> n3
        n4 --> n5
        n21 --> n7
        n20 --> n7
        n13 --> n14
        n1 --> n2
        n24 --> n8
        n24 --> n25
        n12 --> n17
        n5 --> n6
        n3 --> n9
        n14 --> n12
        n14 --> n15
        n15 --> n11
        n0 --> n13
        n6 --> n18
        n16 --> n17
        n25 --> n26
        n8 --> n9
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n0,n1 trigger
    class n4,n10,n11,n14,n18,n24 decision
    class n13,n15,n16 database
    class n12 code
    classDef customIcon fill:none,stroke:none
    class n0,n5,n12,n13,n15,n16,n28 customIcon

The Problem: Voice notes slow down support

Telegram voice messages create a weird kind of support debt. Someone has to stop what they’re doing, put on headphones (or find a quiet moment), listen to a rambling explanation, then retype the important parts into a ticket or internal chat. Multiply that by dozens of messages and you end up spending real time just translating “audio” into “something the team can work with.” And if you can’t quickly identify who the sender is (active member, past customer, wrong contact), you waste even more cycles asking basic questions you already have in a database.

The friction compounds. Here’s where things usually break.

Agents replay voice notes to catch details, which turns simple requests into a long back-and-forth.
Nothing is searchable, so “What did they say last week?” becomes manual detective work.
Requests arrive without member context, so routing is guesswork and handoffs get sloppy.
Audio attachments are easy to miss during busy hours, which means delayed replies and unhappy customers.

The Solution: Transcribe, enrich, and route every Telegram message

This workflow turns Telegram into a reliable “call-in” channel that your team can actually operate. It starts when your Telegram bot receives a message. The workflow looks up the sender in PostgreSQL (with Redis caching so you’re not hammering the database), checks if the member is active, then normalizes everything into one clean payload. If the message is voice, it fetches the audio file from Telegram, extracts the audio data, and sends it to Google Speech-to-Text for transcription. From there, the workflow routes the now-text request into the right downstream process using sub-workflows (a call center flow, a callback flow, or your own ticketing/CRM logic), and finally sends a Telegram reply back to the user.

The workflow begins at a Telegram trigger, then branches by content type (text vs voice). It enriches the request with member context and language settings, runs the correct sub-workflow, and responds in Telegram so the user knows you received it.

What You Get: Automation vs. Results

What This Workflow Automates

Results You’ll Get

Detects Telegram text versus voice messages and handles both paths automatically.
Fetches voice note audio files from Telegram and extracts usable audio data.
Transcribes voice notes via Google Speech-to-Text with multi-language routing.
Looks up the sender in PostgreSQL and caches member context in Redis for faster future requests.

Save about 5 minutes per voice note because nobody has to retype it.
Searchable, copy-pastable request text that your team can route and escalate cleanly.
Fewer “who is this?” messages since member status and profile are attached to the request.
More consistent handoffs because every request arrives in the same structured format.
Better coverage during peak hours since audio doesn’t get skipped or forgotten.

Example: What This Looks Like

Say your inbox gets 20 voice notes a day. Manually, if an agent spends roughly 5 minutes listening, replaying, and typing a summary, that’s about 100 minutes daily (close to 2 hours) just to convert audio into text. With this workflow, the agent’s “work” is basically zero: the trigger fires instantly, transcription runs in the background, and the request hits your call-center sub-workflow already labeled with member context. You still reply like a human, but you’re not doing the busywork first.

What You’ll Need

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Telegram Bot to receive incoming messages.
Google Speech-to-Text to transcribe voice notes into text.
PostgreSQL database for member records and status checks.
Redis to cache member lookups and reduce database load.
Google Cloud API key (get it from Google Cloud Console credentials).

Skill level: Advanced. You will connect multiple services, install a community node, and map fields carefully to match your member database.

Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).

How It Works

Telegram receives the message. The workflow starts with a Telegram trigger (and an optional chat test trigger) so you can test without waiting for real users.

Member context gets attached early. The sender is checked in Redis first, then in PostgreSQL if needed, and the workflow validates things like “active member” before routing anything onward.

Voice notes become text. If the inbound content is audio, the workflow fetches the file from Telegram, extracts the audio, selects the right language path (English, Cantonese, Traditional/Simplified Chinese, Japanese), then runs Google Speech-to-Text and verifies a transcript is actually present.

The request is routed to your real support logic. A unified payload is passed into one of the execute sub-workflow nodes (the demo call center/callback flows), and a Telegram reply is sent back so the user gets a clear next step.

You can easily modify the member table and routing rules to match your own support operations. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Telegram Trigger

Set up the inbound entry points for Telegram messages and the chat-based test trigger used for validation.

Add and open Telegram Inbound Trigger to receive voice and text messages.
Credential Required: Connect your Telegram Bot API credentials in Telegram Inbound Trigger.
Enable Chat Test Trigger if you want a built-in test webhook for local validation.
Connect Chat Test Trigger to Assign Test Payload to simulate message data.

Tip: Use Chat Test Trigger for quick tests without messaging your bot, and keep Telegram Inbound Trigger for production input.

Step 2: Normalize and Route Incoming Content

Standardize message payloads and send them down the correct path for voice or text processing.

Configure Assign Test Payload to output a structured object that mirrors Telegram payload fields.
In Map Telegram Payload, map required fields into a consistent format for downstream nodes.
Use Route by Content Type to branch voice messages to Fetch Audio File and non-voice content to Map Telegram Payload.
Confirm Map Telegram Payload outputs to Unified Input Hub for unified processing.

⚠️ Common Pitfall: If Route by Content Type conditions are not defined, all messages may go down the wrong path and skip voice transcription.

Step 3: Configure Voice File Retrieval and Transcription

Fetch the audio file, extract binary data, detect language, and send it to the speech service.

Open Fetch Audio File and set it to download the Telegram voice file using the incoming file ID.
Credential Required: Connect your Telegram Bot API credentials in Fetch Audio File.
Ensure Extract Audio Data receives the binary from Fetch Audio File to prepare it for transcription.
Route extracted audio into Language Router and define branches for Set Cantonese Locale, Set Simplified CN, Set Traditional CN, Set Japanese Locale, and Set English Locale.
Connect each locale node to Speech-to-Text Service so the correct language code is used.
Credential Required: Connect your Google Cloud Speech credentials in Speech-to-Text Service.

Step 4: Validate Transcript and Merge Input

Check transcription results and unify text from voice or message content for downstream processing.

In Transcript Present?, verify that the speech response contains a transcript.
Send successful transcriptions to Map Voice Transcript and map the transcript to a standard field.
Send missing transcripts to Handle Missing Transcript, then to Run Sub-Workflow (Config Required) for a fallback flow.
Route Map Voice Transcript into Unified Input Hub so voice and text inputs are processed uniformly.

Step 5: Configure Member Cache and Profile Logic

Load or cache member profiles, then build a unified profile used for routing and processing.

Set up Redis Member Cache to query cached member data from Redis.
Credential Required: Connect your Redis credentials in Redis Member Cache and Store Member Cache.
In Cache Hit Check, route cache misses to Load Member Records and hits to Parse Service Mode.
Credential Required: Connect your Postgres credentials in Load Member Records.
Send Load Member Records to Validate Member Active, then to Store Member Cache to persist updates.
Build the unified profile in Build Member Profile and connect it to Route by Content Type to continue processing.

Step 6: Set Up Parallel Routing and Telegram Reply

Split unified input to sub-workflows and Telegram reply logic, then configure the outgoing message.

Unified Input Hub outputs to both Run Sub-Workflow (Config Required) 2 and Check Telegram Source in parallel.
Configure Run Sub-Workflow (Config Required) 2 to call the appropriate workflow ID.
Use Check Telegram Source to determine when to send a response.
In Send Telegram Reply, set your message text and target chat fields using the incoming payload.
Credential Required: Connect your Telegram Bot API credentials in Send Telegram Reply.

Tip: If Run Sub-Workflow (Config Required) and Run Sub-Workflow (Config Required) 2 are not configured with valid workflow IDs, executions will appear to run but no downstream actions will occur.

Step 7: Test & Activate Your Workflow

Validate the workflow with test data, then activate it for production use.

Click Execute Workflow and trigger Chat Test Trigger to confirm the end-to-end path runs through Assign Test Payload and Unified Input Hub.
Send a real Telegram voice message to the bot and confirm Fetch Audio File → Speech-to-Text Service → Map Voice Transcript executes successfully.
Verify that a successful run produces a unified input and sends a message from Send Telegram Reply.
Once validated, toggle the workflow to Active to enable production runs.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Common Gotchas

Telegram credentials can expire or the bot can lose access to the chat. If things break, check your Telegram bot token and the bot’s chat permissions first.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
Google Speech-to-Text can return empty transcripts when audio is too quiet or the language route is wrong. Double-check the language router settings and log the raw response before assuming the workflow “failed.”

Frequently Asked Questions

How long does it take to set up this Telegram voice transcription automation?

Plan on about 1-2 hours if you already have Telegram, Google, Postgres, and Redis ready.

Do I need coding skills to automate Telegram voice transcription?

No. You will mainly connect credentials, install the Google Speech community node, and map a few fields carefully.

Is n8n free to use for this Telegram voice transcription workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Google Speech-to-Text API usage costs, which depend on audio minutes and language.

Where can I host n8n to run this Telegram voice transcription automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I customize this Telegram voice transcription workflow for different languages and routing rules?

Yes, but expect a little fiddling. You can change the language behavior in the Language Router switch and its locale set nodes (English, Cantonese, Traditional/Simplified Chinese, Japanese). You can also replace the “Run Sub-Workflow (Config Required)” nodes with your own flows, like “Create Ticket,” “Assign Agent,” or “Send to Slack.” Many teams also swap the default member table (sys_member) by updating the PostgreSQL query used to load member records.

Why is my Telegram connection failing in this workflow?

Usually it’s an invalid bot token or the bot doesn’t have permission in the chat where messages are coming from. Regenerate the token in BotFather if needed, then reselect the credentials in the Telegram Trigger and Telegram reply nodes. Also confirm you’re testing in the same chat type you’ll use in production, because group and private chat setups behave differently.

How many messages can this Telegram voice transcription automation handle?

It depends on your n8n plan and your server, but most small teams can handle a few hundred messages a day without thinking too hard about it.

Is this Telegram voice transcription automation better than using Zapier or Make?

Often, yes. This workflow does more than “transcribe then send”: it branches on content type, checks member status, caches lookups in Redis, routes by language, and can hand off into multiple sub-workflows. In Zapier or Make you can build something similar, but you’ll usually end up with several separate scenarios/zaps, more paid operations, and less control over error handling. n8n is also easier to run in queue mode for production scaling if volume grows. If your needs are truly simple (one voice note to one transcription to one email), Zapier/Make can be quicker. Talk to an automation expert if you want help choosing.

Once voice notes become structured text with member context, support stops feeling chaotic. Set it up once, and your team gets those hours back every week.

Telegram + Google Speech-to-Text, faster support

How This Automation Works

n8n Workflow Template: Telegram + Google Speech-to-Text, faster support

The Problem: Voice notes slow down support