Telegram + HeyGen: voice notes into video links
Your best content ideas show up as voice notes. And then they die in chat because turning “quick audio” into “shareable video” means downloading files, transcribing, copying into tools, waiting on renders, and finally sending links back.
This Telegram HeyGen video automation hits marketing managers first, but agency leads and content ops folks feel it too. The outcome is simple: you speak a brief in Telegram, and you get a HeyGen video link back in the same chat.
Below you’ll see how the workflow runs, what it automates end-to-end, and what to watch for when you turn it on in your own account.
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: Telegram + HeyGen: voice notes into video links
flowchart LR
subgraph sg0["Message Flow"]
direction LR
n0@{ icon: "mdi:swap-horizontal", form: "rounded", label: "is Completed", pos: "b", h: 48 }
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Message Trigger"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Downloading File"]
n3@{ icon: "mdi:robot", form: "rounded", label: "Transcribing voice memo", pos: "b", h: 48 }
n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Setting ID Fields", pos: "b", h: 48 }
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Generating Video"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Video Status Update"]
n7@{ icon: "mdi:cog", form: "rounded", label: "10s Buffer", pos: "b", h: 48 }
n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "Setting Output", pos: "b", h: 48 }
n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Sending Video URL"]
n7 --> n6
n0 --> n8
n0 --> n7
n8 --> n9
n1 --> n2
n2 --> n3
n5 --> n6
n4 --> n5
n6 --> n0
n3 --> n4
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n1 trigger
class n3 ai
class n0 decision
class n5,n6 api
classDef customIcon fill:none,stroke:none
class n1,n2,n5,n6,n9 customIcon
Why This Matters: Turning voice briefs into UGC is painfully manual
Voice notes are the fastest way to capture direction. But the moment you try to convert them into a usable UGC-style video, the work multiplies. You download a Telegram file, rename it (because “voice_19384.ogg” tells you nothing), run a transcription, paste the text into another tool, pick settings, hit generate, then keep checking if it’s done. If you’re juggling multiple clients or campaigns, you’ll also lose track of which brief became which video link. Honestly, it’s not hard work. It’s just endless work.
The friction compounds. Here’s where it usually breaks down.
- Someone forgets to send the final link back to the original Telegram thread, so approvals stall for hours.
- Audio files get downloaded to personal laptops, which creates a messy trail and inconsistent naming.
- Transcriptions drift from the original intent because people “clean them up” differently every time.
- HeyGen renders finish at different times, and manual status-checking turns into constant tab switching.
What You’ll Build: Telegram voice notes that return a HeyGen video link
This workflow sits inside Telegram and acts like a quiet production assistant. When a new voice message arrives, n8n grabs the attached audio file and runs it through a transcription step (an OpenAI-based node is already in the template). That transcript becomes the script for HeyGen. n8n sends a video generation request to HeyGen’s API, then checks the render status until it’s ready. After a short buffer (10 seconds in the template), the workflow formats the result and posts the final HeyGen URL back into the same Telegram chat where the brief started. No file downloads, no “where’s the link?”, no chasing renders.
The workflow starts with a Telegram message trigger. Then it moves through download → transcription → HeyGen generate → status polling. Finally, it sends a clean video link back to Telegram so the conversation can continue where it belongs.
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Say you produce 5 UGC videos a week from Telegram briefs. Manually, you’ll spend maybe 10 minutes per video just downloading, uploading, transcribing, and sending links, plus another 5 minutes checking render status a few times. That’s around 75 minutes of pure admin weekly, not counting rework. With this workflow, your “work” is sending the voice note (about a minute) and waiting for the link to appear, because n8n handles the transcription and HeyGen status checks automatically.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Telegram Bot for receiving voice notes and replying.
- HeyGen to generate the avatar video from text.
- HeyGen API key (get it from your HeyGen account settings)
Skill level: Beginner. You’ll connect accounts, paste an API key, and test the workflow once.
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
A Telegram voice note arrives. The workflow triggers on new inbound Telegram messages, so you don’t need forms, uploads, or separate requests.
The audio file gets pulled in and transcribed. n8n retrieves the attached file from Telegram, then runs it through the transcription node to produce clean text you can pass downstream.
HeyGen generates the video and n8n keeps checking on it. The workflow submits a create-video request via HTTP, then polls HeyGen’s status endpoint with a short wait in between so you don’t hammer the API or return too early.
The final link goes back to the same chat. Once the video is ready, n8n formats the output and replies in Telegram with the URL, which keeps review and approvals tidy.
You can easily modify the HeyGen request to use a different avatar or template based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Telegram Trigger
This workflow begins when a Telegram message arrives, so you’ll first configure the incoming trigger.
- Add the Telegram Inbound Trigger node to your canvas.
- Open Telegram Inbound Trigger and connect your Telegram bot credentials.
- Verify that the node is ready to receive updates from your bot.
Credential Required: Connect your Telegram credentials.
Step 2: Connect Telegram File Retrieval
Once a message arrives, the workflow fetches the audio file from Telegram for transcription.
- Connect Telegram Inbound Trigger to Retrieve Telegram File.
- In Retrieve Telegram File, select your Telegram credentials.
- Confirm the node is configured to pull the file data from the trigger output.
Credential Required: Connect your Telegram credentials.
Step 3: Set Up Transcription and Identifier Assignment
This step transcribes the audio note and tags the request with identifiers used for the video creation API.
- Connect Retrieve Telegram File to Transcribe Audio Note.
- Open Transcribe Audio Note and add your OpenAI credentials.
- Connect Transcribe Audio Note to Assign Identifier Fields.
- In Assign Identifier Fields, map any IDs or metadata you want passed into the API request.
Credential Required: Connect your OpenAI credentials.
Step 4: Configure Video Creation, Status Checks, and Output
This section submits the video request, checks completion, waits if needed, and then sends the final link back to Telegram.
- Connect Assign Identifier Fields to Create Video Request.
- Configure Create Video Request with your API endpoint, method, and payload values.
- Connect Create Video Request to Check Video Status and set the API call to poll the job status.
- Connect Check Video Status to Completion Gate for conditional routing.
- From Completion Gate, route successful results to Prepare Output Data and incomplete results to Delay 10 Seconds.
- Connect Delay 10 Seconds back to Check Video Status to create a polling loop.
- In Prepare Output Data, format the final message and video URL for Telegram delivery.
- Connect Prepare Output Data to Send Video Link and configure the message fields.
Credential Required: Connect your Telegram credentials for Send Video Link.
Step 5: Test and Activate Your Workflow
Run a manual test to validate the entire flow before enabling it for production use.
- Click Execute Workflow and send a voice note to your Telegram bot to trigger Telegram Inbound Trigger.
- Confirm that Transcribe Audio Note produces text and Create Video Request returns a job identifier.
- Verify that Check Video Status loops through Delay 10 Seconds until completion.
- Check that Send Video Link delivers the final URL to Telegram.
- Toggle the workflow to Active to run it automatically for all incoming voice notes.
Troubleshooting Tips
- Telegram Bot credentials can expire or need specific permissions. If things break, check the bot token in n8n Credentials and confirm the bot can read messages in that chat.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- HeyGen API calls can fail due to missing headers, an invalid key, or account-level limits. If you get errors, review the HTTP Request node’s response body first, then regenerate the HeyGen API key if needed.
Quick Answers
About 30 minutes if your Telegram bot and HeyGen account are ready.
No. You’ll mostly connect credentials and map a few fields between nodes.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in HeyGen API usage and transcription costs (for many teams, this is just a few dollars a month unless you’re processing lots of long audio).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s the main reason this template is useful. You can change the “Create Video Request” HTTP node to point at a different HeyGen template or avatar, and you can adjust what gets sent into the request in “Assign Identifier Fields.” Common tweaks include adding a fixed intro line to every transcript, routing different chats to different HeyGen avatars, or swapping transcription settings so noisy audio still produces clean scripts.
Most of the time it’s the bot token or chat permissions. Regenerate the Telegram bot token if needed, reselect the correct chat in the Telegram Trigger node, and confirm the bot can actually read messages there. Also check that the incoming message really contains a voice file, because a plain text message will not have a downloadable audio attachment.
On a typical n8n Cloud plan, you can handle hundreds to thousands of runs a month, and self-hosting mainly depends on your server. In practice, HeyGen rendering time becomes the bottleneck, so the workflow is fine for day-to-day team usage but you’ll want batching and logging if you’re generating lots of videos daily.
Sometimes. If all you want is “Telegram message → call HeyGen → reply with link,” Zapier or Make can work, but polling render status and handling file downloads can get awkward fast. n8n is better when you need logic like “only run for voice notes,” retry behavior, or a wait/poll loop without paying extra for every tiny step. It also gives you a self-host option, which matters when volume grows. If you’re unsure, Talk to an automation expert and we’ll point you to the simplest setup.
Once this is running, your team can stay in Telegram and still ship videos fast. The workflow handles the repetitive parts, so you can focus on the message and the creative.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.