Telegram + OpenAI: researched posts from voice notes
Creating “one quick post” shouldn’t turn into 12 tabs, three half-finished drafts, and a note to yourself to “add sources later.” But that’s what happens when research, ideation, and writing all live in different places.
Social media managers feel this every day. A marketing lead trying to keep a weekly cadence feels it too. And honestly, so does the founder who just wants Telegram OpenAI posts without learning a new tool. This workflow turns a voice note into researched drafts and image prompts in one pass.
Below you’ll see exactly how the automation runs, what you get out the other end, and where teams usually customize it to match their brand voice.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Telegram + OpenAI: researched posts from voice notes
flowchart LR
subgraph sg0["Receive Telegram Messages Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Receive Telegram Messages"]
n1@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Voice or Text?", pos: "b", h: 48 }
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Fetch Voice Message"]
n3@{ icon: "mdi:robot", form: "rounded", label: "Transcribe Voice to Text", pos: "b", h: 48 }
n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Prepare for LLM", pos: "b", h: 48 }
n5@{ icon: "mdi:robot", form: "rounded", label: "AI Agent", pos: "b", h: 48 }
n6@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model", pos: "b", h: 48 }
n7@{ icon: "mdi:wrench", form: "rounded", label: "SerpAPI", pos: "b", h: 48 }
n8@{ icon: "mdi:robot", form: "rounded", label: "Structured Output Parser", pos: "b", h: 48 }
n9@{ icon: "mdi:cog", form: "rounded", label: "Extract from File", pos: "b", h: 48 }
n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Prepare Final Output", pos: "b", h: 48 }
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Generate Image"]
n7 -.-> n5
n5 --> n11
n11 --> n9
n1 --> n2
n1 --> n4
n4 --> n5
n9 --> n10
n6 -.-> n5
n2 --> n3
n8 -.-> n5
n3 --> n5
n0 --> n1
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n3,n5,n8 ai
class n6 aiModel
class n7 ai
class n1 decision
class n11 api
classDef customIcon fill:none,stroke:none
class n0,n2,n11 customIcon
The Problem: social post drafting still takes too long
You already know the pattern. You have a solid topic idea, but then you need “just a little research,” which becomes a rabbit hole. Next, you open a doc, start drafting, second-guess the hook, and realize you still need an image concept. Then it’s back to Google, back to your notes app, back to your prompt library. By the time you’ve got something decent, you’ve spent about an hour and your energy is gone. The worst part is the inconsistency: some posts are great, others feel rushed because the process is hard to repeat.
The friction compounds. Here’s where it usually breaks down.
- Research and writing happen in separate tools, so you keep copying snippets back and forth.
- Voice memos are fast to record, but turning them into usable drafts still becomes a manual chore.
- Image direction is an afterthought, which means visuals look generic or you skip them entirely.
- When you’re busy, quality drops because the process depends on your focus and time that day.
The Solution: turn Telegram messages into researched drafts
This workflow lets you message a Telegram bot with either text or a voice note, then it does the heavy lifting for you. If you send audio, it grabs the voice clip and transcribes it with OpenAI Whisper. Next, the workflow packages your request into a clean prompt and hands it to an AI agent that can do real online research using SerpAPI. Based on what it finds, the agent produces structured, ready-to-edit social post drafts plus a detailed image prompt you can use in your preferred art tool. If you enable the optional image step, it can even send a request out via HTTP to generate or fetch an image and attach it to the final output.
It starts with a simple Telegram message. AI handles transcription (if needed), research, and drafting. Finally, you receive a tidy result bundle you can paste into your scheduler, send for approval, or store for later.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you publish 5 posts a week. Manually, a “quick” research + draft pass is often about 1 hour per post, once you include reading sources, writing, and coming up with an image concept (so roughly 5 hours weekly). With this workflow, you send a 30-second voice note in Telegram, wait a few minutes for transcription, research, and drafting, and you’re done. Call it about 5 minutes of your time per post. That’s roughly 4 hours back every week, without lowering the bar.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Telegram for capturing voice notes and text prompts.
- OpenAI to transcribe audio and generate drafts.
- SerpAPI API key (get it from your SerpAPI dashboard).
Skill level: Intermediate. You’ll connect credentials, paste API keys, and test a few runs in Telegram.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
Telegram message kicks it off. You send a text prompt or a voice note to your Telegram bot, and n8n instantly triggers the workflow.
Voice messages become text. If the incoming message is audio, the workflow fetches the voice clip from Telegram and sends it to OpenAI Whisper for transcription, so the rest of the automation can treat voice and text the same way.
Research + drafting happen together. Your request is packaged into a clean input, then the AI agent uses an OpenAI chat model plus a SerpAPI research tool to pull context and produce structured outputs (draft posts and an image prompt).
Final output is assembled for reuse. n8n formats everything into a consistent response. If you enable the image step, an HTTP request can generate or retrieve an image and pass the binary file into the final result.
You can easily modify the research sources and output format to match each platform. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Telegram Trigger
Start by setting up the incoming Telegram message trigger so the workflow can receive voice or text messages.
- Add and configure Telegram Intake Trigger as your entry node.
- Set Updates to include
message. - Credential Required: Connect your
telegramApicredentials in Telegram Intake Trigger. - Leave Additional Fields empty unless you need advanced Telegram options.
Step 2: Route Voice vs Text Inputs
Use a switch to separate audio messages from text messages before processing.
- Configure Route Voice vs Text with three outputs: Audio, Text, and Error.
- Set the Audio rule to check existence of
={{ $json.message.voice.file_id }}. - Set the Text rule to check existence of
={{ $json.message.text || "" }}. - Leave the Error output as a placeholder for future error handling.
message in Telegram Intake Trigger.Step 3: Process Voice and Text Inputs
Convert audio messages to text and normalize plain text input for the AI agent.
- On the Audio path, configure Retrieve Voice Clip with Resource set to
fileand File ID set to={{ $json.message.voice.file_id }}. - Credential Required: Connect your
telegramApicredentials in Retrieve Voice Clip. - Connect Retrieve Voice Clip to Convert Audio to Text and set Resource to
audioand Operation totranslate. - Credential Required: Connect your
openAiApicredentials in Convert Audio to Text. - On the Text path, configure Assemble LLM Input to assign text to
={{$json.message.text}}.
Step 4: Set Up the AI Agent and Tools
Configure the AI agent, its model, research tool, and structured output parser.
- Configure Content Strategy Agent with Text set to
={{$json.text}}and Prompt Type set todefine. - Ensure Content Strategy Agent has Has Output Parser enabled.
- Connect OpenAI Chat Engine as the language model for Content Strategy Agent and set Model to
gpt-4o-mini. - Credential Required: Connect your
openAiApicredentials in OpenAI Chat Engine. - Connect Serp Research Tool as the tool for Content Strategy Agent.
- Credential Required: Connect your
serpApicredentials in Serp Research Tool (this is a tool sub-node for Content Strategy Agent). - Connect Structured JSON Parser as the output parser for Content Strategy Agent and keep the schema example:
{ "content": "[SOCIAL_MEDIA_CONTENT]", "image_prompt": "[IMAGE_PROMPT]" }
Step 5: Generate Image and Compose the Final Output
Send the image prompt to Hugging Face, extract the binary response, and assemble the final payload.
- Configure Create Image Request with URL set to
https://router.huggingface.co/hf-inference/models/stabilityai/stable-diffusion-3.5-largeand Method set toPOST. - Enable Send Body and set inputs to
={{ $json.output.image_prompt }}. - Credential Required: Connect your
huggingFaceApicredentials in Create Image Request. - If required by your endpoint, also connect
httpHeaderAuthcredentials in Create Image Request. - Configure Extract Binary Content with Operation set to
binaryToProperyto transform the image output. - Configure Compose Final Result to assign content to
={{ $('Content Strategy Agent').item.json.output.content }}and image to={{ $json.data }}.
data. Validate the model endpoint and authentication before production use.Step 6: Test and Activate Your Workflow
Verify the end-to-end flow from Telegram input to final output and then activate the workflow.
- Click Execute Workflow and send a test message (text or voice) to your Telegram bot.
- Confirm Route Voice vs Text routes correctly and Convert Audio to Text creates transcribed text for voice inputs.
- Verify Content Strategy Agent returns JSON with
contentandimage_prompt, and that Create Image Request returns image data. - Check Compose Final Result for a combined payload containing the generated text and image data.
- Once validated, toggle the workflow to Active for production use.
Common Gotchas
- Telegram bot tokens can be revoked or you may be messaging the wrong bot. If intake suddenly stops, check the Telegram Trigger node settings and the bot token first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if your Telegram bot and API keys are ready.
No. You’ll mainly connect accounts and paste API keys into n8n. The “logic” is already built into the workflow.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI and SerpAPI usage; for light drafting, many teams spend a few dollars a month.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s worth doing. Update the instructions you send into the Content Strategy Agent so it writes in a LinkedIn voice (tighter hook, fewer hashtags, clearer takeaway). Many teams also adjust the Structured JSON Parser output so it returns multiple variations: one short, one story-driven, and one “contrarian” angle. If you keep the structure consistent, approvals get easier because everyone knows what they’re reviewing.
Usually it’s a bad bot token or the bot was removed from the chat you’re testing in. Regenerate the token in BotFather, then update the Telegram credentials in n8n. If voice notes fail but text works, check the “Retrieve Voice Clip” node permissions and confirm the workflow is receiving audio files from your Telegram client.
On a small n8n Cloud plan, it comfortably handles dozens of requests a day for most teams.
It depends on how “smart” you need the middle part to be. This workflow uses an AI agent with research tooling and structured outputs, which is much easier to control in n8n when you want branching and richer logic. n8n also gives you a self-hosted path, so you’re not paying per tiny step once volume ramps up. Zapier or Make can still be fine for simple “message in, message out” flows, and they may feel simpler on day one. If you’re unsure, Talk to an automation expert and we’ll map the cheapest option for your usage.
Set it up once, then create drafts the moment an idea hits. The workflow handles the repetitive parts so you can focus on publishing and improving what works.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.