Telegram + HeyGen: voice notes into video links

Your best content ideas show up as voice notes. And then they die in chat because turning “quick audio” into “shareable video” means downloading files, transcribing, copying into tools, waiting on renders, and finally sending links back.

This Telegram HeyGen video automation hits marketing managers first, but agency leads and content ops folks feel it too. The outcome is simple: you speak a brief in Telegram, and you get a HeyGen video link back in the same chat.

Below you’ll see how the workflow runs, what it automates end-to-end, and what to watch for when you turn it on in your own account.

How This Automation Works

Here’s the complete workflow you’ll be setting up:

n8n Workflow Template: Telegram + HeyGen: voice notes into video links

Click to explore

flowchart LR

    subgraph sg0["Message Flow"]
        direction LR
        n0@{ icon: "mdi:swap-horizontal", form: "rounded", label: "is Completed", pos: "b", h: 48 }
        n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Message Trigger"]
        n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Downloading File"]
        n3@{ icon: "mdi:robot", form: "rounded", label: "Transcribing voice memo", pos: "b", h: 48 }
        n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Setting ID Fields", pos: "b", h: 48 }
        n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Generating Video"]
        n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Video Status Update"]
        n7@{ icon: "mdi:cog", form: "rounded", label: "10s Buffer", pos: "b", h: 48 }
        n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "Setting Output", pos: "b", h: 48 }
        n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Sending Video URL"]
        n7 --> n6
        n0 --> n8
        n0 --> n7
        n8 --> n9
        n1 --> n2
        n2 --> n3
        n5 --> n6
        n4 --> n5
        n6 --> n0
        n3 --> n4
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n1 trigger
    class n3 ai
    class n0 decision
    class n5,n6 api
    classDef customIcon fill:none,stroke:none
    class n1,n2,n5,n6,n9 customIcon

Why This Matters: Turning voice briefs into UGC is painfully manual

Voice notes are the fastest way to capture direction. But the moment you try to convert them into a usable UGC-style video, the work multiplies. You download a Telegram file, rename it (because “voice_19384.ogg” tells you nothing), run a transcription, paste the text into another tool, pick settings, hit generate, then keep checking if it’s done. If you’re juggling multiple clients or campaigns, you’ll also lose track of which brief became which video link. Honestly, it’s not hard work. It’s just endless work.

The friction compounds. Here’s where it usually breaks down.

Someone forgets to send the final link back to the original Telegram thread, so approvals stall for hours.
Audio files get downloaded to personal laptops, which creates a messy trail and inconsistent naming.
Transcriptions drift from the original intent because people “clean them up” differently every time.
HeyGen renders finish at different times, and manual status-checking turns into constant tab switching.

What You’ll Build: Telegram voice notes that return a HeyGen video link

This workflow sits inside Telegram and acts like a quiet production assistant. When a new voice message arrives, n8n grabs the attached audio file and runs it through a transcription step (an OpenAI-based node is already in the template). That transcript becomes the script for HeyGen. n8n sends a video generation request to HeyGen’s API, then checks the render status until it’s ready. After a short buffer (10 seconds in the template), the workflow formats the result and posts the final HeyGen URL back into the same Telegram chat where the brief started. No file downloads, no “where’s the link?”, no chasing renders.

The workflow starts with a Telegram message trigger. Then it moves through download → transcription → HeyGen generate → status polling. Finally, it sends a clean video link back to Telegram so the conversation can continue where it belongs.

What You’re Building

What Gets Automated

What You’ll Achieve

Detect a new Telegram voice note the moment it’s posted.
Download the audio file automatically for processing.
Transcribe speech to text and map it into a HeyGen request.
Poll HeyGen for completion and post the finished URL back to chat.

Turn a 1-minute brief into a shareable link in about 10–20 minutes (mostly render time).
Keep approvals in one thread instead of spread across DMs and tools.
Reduce “lost request” moments because every video replies to its source message.
Standardize how scripts get generated, which means fewer rewrites.
Ship more UGC variations without adding another coordinator.

Expected Results

Say you produce 5 UGC videos a week from Telegram briefs. Manually, you’ll spend maybe 10 minutes per video just downloading, uploading, transcribing, and sending links, plus another 5 minutes checking render status a few times. That’s around 75 minutes of pure admin weekly, not counting rework. With this workflow, your “work” is sending the voice note (about a minute) and waiting for the link to appear, because n8n handles the transcription and HeyGen status checks automatically.

Before You Start

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
Telegram Bot for receiving voice notes and replying.
HeyGen to generate the avatar video from text.
HeyGen API key (get it from your HeyGen account settings)

Skill level: Beginner. You’ll connect accounts, paste an API key, and test the workflow once.

Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).

Step by Step

A Telegram voice note arrives. The workflow triggers on new inbound Telegram messages, so you don’t need forms, uploads, or separate requests.

The audio file gets pulled in and transcribed. n8n retrieves the attached file from Telegram, then runs it through the transcription node to produce clean text you can pass downstream.

HeyGen generates the video and n8n keeps checking on it. The workflow submits a create-video request via HTTP, then polls HeyGen’s status endpoint with a short wait in between so you don’t hammer the API or return too early.

The final link goes back to the same chat. Once the video is ready, n8n formats the output and replies in Telegram with the URL, which keeps review and approvals tidy.

You can easily modify the HeyGen request to use a different avatar or template based on your needs. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Telegram Trigger

This workflow begins when a Telegram message arrives, so you’ll first configure the incoming trigger.

Add the Telegram Inbound Trigger node to your canvas.
Open Telegram Inbound Trigger and connect your Telegram bot credentials.
Verify that the node is ready to receive updates from your bot.

Credential Required: Connect your Telegram credentials.

Step 2: Connect Telegram File Retrieval

Once a message arrives, the workflow fetches the audio file from Telegram for transcription.

Connect Telegram Inbound Trigger to Retrieve Telegram File.
In Retrieve Telegram File, select your Telegram credentials.
Confirm the node is configured to pull the file data from the trigger output.

Credential Required: Connect your Telegram credentials.

Step 3: Set Up Transcription and Identifier Assignment

This step transcribes the audio note and tags the request with identifiers used for the video creation API.

Connect Retrieve Telegram File to Transcribe Audio Note.
Open Transcribe Audio Note and add your OpenAI credentials.
Connect Transcribe Audio Note to Assign Identifier Fields.
In Assign Identifier Fields, map any IDs or metadata you want passed into the API request.

Credential Required: Connect your OpenAI credentials.

⚠️ Common Pitfall: If the audio file is not passed correctly from Retrieve Telegram File, transcription will fail. Ensure the file output is accessible to Transcribe Audio Note.

Step 4: Configure Video Creation, Status Checks, and Output

This section submits the video request, checks completion, waits if needed, and then sends the final link back to Telegram.

Connect Assign Identifier Fields to Create Video Request.
Configure Create Video Request with your API endpoint, method, and payload values.
Connect Create Video Request to Check Video Status and set the API call to poll the job status.
Connect Check Video Status to Completion Gate for conditional routing.
From Completion Gate, route successful results to Prepare Output Data and incomplete results to Delay 10 Seconds.
Connect Delay 10 Seconds back to Check Video Status to create a polling loop.
In Prepare Output Data, format the final message and video URL for Telegram delivery.
Connect Prepare Output Data to Send Video Link and configure the message fields.

Tip: Completion Gate outputs to both Prepare Output Data and Delay 10 Seconds based on the condition, enabling a loop until the video is ready.

Credential Required: Connect your Telegram credentials for Send Video Link.

Step 5: Test and Activate Your Workflow

Run a manual test to validate the entire flow before enabling it for production use.

Click Execute Workflow and send a voice note to your Telegram bot to trigger Telegram Inbound Trigger.
Confirm that Transcribe Audio Note produces text and Create Video Request returns a job identifier.
Verify that Check Video Status loops through Delay 10 Seconds until completion.
Check that Send Video Link delivers the final URL to Telegram.
Toggle the workflow to Active to run it automatically for all incoming voice notes.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Troubleshooting Tips

Telegram Bot credentials can expire or need specific permissions. If things break, check the bot token in n8n Credentials and confirm the bot can read messages in that chat.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
HeyGen API calls can fail due to missing headers, an invalid key, or account-level limits. If you get errors, review the HTTP Request node’s response body first, then regenerate the HeyGen API key if needed.

Quick Answers

What’s the setup time for this Telegram HeyGen video automation?

About 30 minutes if your Telegram bot and HeyGen account are ready.

Is coding required for this Telegram HeyGen video automation?

No. You’ll mostly connect credentials and map a few fields between nodes.

Is n8n free to use for this Telegram HeyGen video workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in HeyGen API usage and transcription costs (for many teams, this is just a few dollars a month unless you’re processing lots of long audio).

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I modify this Telegram HeyGen video workflow for different use cases?

Yes, and it’s the main reason this template is useful. You can change the “Create Video Request” HTTP node to point at a different HeyGen template or avatar, and you can adjust what gets sent into the request in “Assign Identifier Fields.” Common tweaks include adding a fixed intro line to every transcript, routing different chats to different HeyGen avatars, or swapping transcription settings so noisy audio still produces clean scripts.

Why is my Telegram connection failing in this workflow?

Most of the time it’s the bot token or chat permissions. Regenerate the Telegram bot token if needed, reselect the correct chat in the Telegram Trigger node, and confirm the bot can actually read messages there. Also check that the incoming message really contains a voice file, because a plain text message will not have a downloadable audio attachment.

What volume can this Telegram HeyGen video workflow process?

On a typical n8n Cloud plan, you can handle hundreds to thousands of runs a month, and self-hosting mainly depends on your server. In practice, HeyGen rendering time becomes the bottleneck, so the workflow is fine for day-to-day team usage but you’ll want batching and logging if you’re generating lots of videos daily.

Is this Telegram HeyGen video automation better than using Zapier or Make?

Sometimes. If all you want is “Telegram message → call HeyGen → reply with link,” Zapier or Make can work, but polling render status and handling file downloads can get awkward fast. n8n is better when you need logic like “only run for voice notes,” retry behavior, or a wait/poll loop without paying extra for every tiny step. It also gives you a self-host option, which matters when volume grows. If you’re unsure, Talk to an automation expert and we’ll point you to the simplest setup.

Once this is running, your team can stay in Telegram and still ship videos fast. The workflow handles the repetitive parts, so you can focus on the message and the creative.

Telegram + HeyGen: voice notes into video links

How This Automation Works

n8n Workflow Template: Telegram + HeyGen: voice notes into video links

Why This Matters: Turning voice briefs into UGC is painfully manual