WhatsApp + OpenAI: faster support replies with context
Your WhatsApp support inbox moves fast. Too fast. By the time you’ve listened to a voice note, opened a blurry screenshot, and asked “Can you repeat that?”, the customer is already annoyed.
This WhatsApp OpenAI replies automation hits Support Leads first, but ops managers and agency teams handling client DMs feel it too. You get faster replies that stay on-brand, and the assistant actually remembers the conversation so customers don’t have to keep re-explaining.
Below you’ll see how the workflow handles text, voice, images, and PDFs, how it keeps context, and what you need to run it reliably.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: WhatsApp + OpenAI: faster support replies with context
flowchart LR
subgraph sg0["Audio Transcription Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/webhook.dark.svg' width='40' height='40' /></div><br/>Incoming WhatsApp Hook"]
n1@{ icon: "mdi:swap-vertical", form: "rounded", label: "Map Incoming Payload", pos: "b", h: 48 }
n2@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Validate Sender List", pos: "b", h: 48 }
n3@{ icon: "mdi:cog", form: "rounded", label: "Initial Short Delay", pos: "b", h: 48 }
n4@{ icon: "mdi:web", form: "rounded", label: "Flag Message Read", pos: "b", h: 48 }
n5@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Route by Content Type", pos: "b", h: 48 }
n6@{ icon: "mdi:swap-vertical", form: "rounded", label: "Capture Text Input", pos: "b", h: 48 }
n7@{ icon: "mdi:swap-vertical", form: "rounded", label: "Capture Audio Base64", pos: "b", h: 48 }
n8@{ icon: "mdi:cog", form: "rounded", label: "Convert Audio File", pos: "b", h: 48 }
n9@{ icon: "mdi:robot", form: "rounded", label: "Audio Transcription", pos: "b", h: 48 }
n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Wrap Audio Text", pos: "b", h: 48 }
n11@{ icon: "mdi:swap-vertical", form: "rounded", label: "Capture Image Base64", pos: "b", h: 48 }
n12@{ icon: "mdi:cog", form: "rounded", label: "Convert Image File", pos: "b", h: 48 }
n13@{ icon: "mdi:robot", form: "rounded", label: "Image Description", pos: "b", h: 48 }
n14@{ icon: "mdi:swap-vertical", form: "rounded", label: "Wrap Image Text", pos: "b", h: 48 }
n15@{ icon: "mdi:swap-vertical", form: "rounded", label: "Capture Document Base64", pos: "b", h: 48 }
n16@{ icon: "mdi:cog", form: "rounded", label: "Convert Document File", pos: "b", h: 48 }
n17@{ icon: "mdi:robot", form: "rounded", label: "Document Extraction", pos: "b", h: 48 }
n18@{ icon: "mdi:swap-vertical", form: "rounded", label: "Wrap Document Text", pos: "b", h: 48 }
n19@{ icon: "mdi:web", form: "rounded", label: "Notify Unsupported Type", pos: "b", h: 48 }
n20@{ icon: "mdi:cog", form: "rounded", label: "Stop Unsupported Flow", pos: "b", h: 48 }
n21@{ icon: "mdi:swap-vertical", form: "rounded", label: "Normalize Input Field", pos: "b", h: 48 }
n22["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/redis.svg' width='40' height='40' /></div><br/>Enqueue Message"]
n23@{ icon: "mdi:cog", form: "rounded", label: "Delay for More Input", pos: "b", h: 48 }
n24["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/redis.svg' width='40' height='40' /></div><br/>Fetch Queued Items"]
n25@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Check Latest Message", pos: "b", h: 48 }
n26@{ icon: "mdi:cog", form: "rounded", label: "Halt Nonfinal", pos: "b", h: 48 }
n27@{ icon: "mdi:swap-vertical", form: "rounded", label: "Merge Message List", pos: "b", h: 48 }
n28@{ icon: "mdi:robot", form: "rounded", label: "Assistant Agent", pos: "b", h: 48 }
n29@{ icon: "mdi:brain", form: "rounded", label: "Primary Chat Model", pos: "b", h: 48 }
n30@{ icon: "mdi:memory", form: "rounded", label: "Postgres Memory Store", pos: "b", h: 48 }
n31@{ icon: "mdi:robot", form: "rounded", label: "Format Reply Blocks", pos: "b", h: 48 }
n32@{ icon: "mdi:brain", form: "rounded", label: "Secondary Chat Model", pos: "b", h: 48 }
n33@{ icon: "mdi:robot", form: "rounded", label: "Structured JSON Parser", pos: "b", h: 48 }
n34@{ icon: "mdi:swap-vertical", form: "rounded", label: "Split Reply Items", pos: "b", h: 48 }
n35@{ icon: "mdi:swap-vertical", form: "rounded", label: "Iterate Reply Batch", pos: "b", h: 48 }
n36@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Detect Image Link", pos: "b", h: 48 }
n37@{ icon: "mdi:web", form: "rounded", label: "Dispatch Image", pos: "b", h: 48 }
n38@{ icon: "mdi:web", form: "rounded", label: "Dispatch Text", pos: "b", h: 48 }
n39@{ icon: "mdi:cog", form: "rounded", label: "Pause Between Sends", pos: "b", h: 48 }
n40@{ icon: "mdi:cog", form: "rounded", label: "Aggregate Results", pos: "b", h: 48 }
n41["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/redis.svg' width='40' height='40' /></div><br/>Purge Message Queue"]
n42@{ icon: "mdi:cog", form: "rounded", label: "Completion Marker", pos: "b", h: 48 }
n3 --> n4
n28 --> n31
n40 --> n41
n38 --> n39
n37 --> n39
n41 --> n42
n21 --> n22
n6 --> n21
n4 --> n5
n13 --> n14
n36 --> n37
n36 --> n38
n35 --> n40
n35 --> n36
n22 --> n23
n34 --> n35
n31 --> n34
n8 --> n9
n12 --> n13
n25 --> n27
n25 --> n26
n0 --> n1
n9 --> n10
n29 -.-> n28
n10 --> n21
n14 --> n21
n32 -.-> n31
n1 --> n2
n16 --> n17
n24 --> n25
n17 --> n18
n27 --> n28
n7 --> n8
n11 --> n12
n30 -.-> n28
n18 --> n21
n5 --> n6
n5 --> n7
n5 --> n11
n5 --> n15
n5 --> n19
n39 --> n35
n2 --> n3
n23 --> n24
n15 --> n16
n19 --> n20
n33 -.-> n31
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n9,n13,n17,n28,n31,n33 ai
class n29,n32 aiModel
class n30 ai
class n2,n5,n25,n36 decision
class n22,n24,n41 database
class n0,n4,n19,n37,n38 api
classDef customIcon fill:none,stroke:none
class n0,n22,n24,n41 customIcon
The Problem: WhatsApp Support Gets Messy Fast
WhatsApp support sounds simple until you’re doing it all day. Messages come in as paragraphs, half-finished thoughts, voice notes, and screenshots with tiny text. One customer sends three follow-ups before you even reply to the first. Then your teammate jumps in, doesn’t see the earlier context, and asks the same question again. It’s not just slow. It feels chaotic, and the back-and-forth burns time you should be using to actually solve issues.
None of this is “hard” work. It’s just constant. And that’s why it drags.
- Voice messages force someone to stop what they’re doing, put on headphones, and listen in real time.
- Images and PDFs add friction because the customer expects you to “just know” what’s inside them.
- Context disappears across shifts, which means customers repeat themselves and you look uncoordinated.
- Manual replies drift off-brand over time, especially when different people are trying to be “helpful” in different ways.
The Solution: Context-Aware WhatsApp Replies, Automatically
This workflow turns incoming WhatsApp messages into clean, consistent replies using OpenAI, while keeping conversation history so responses don’t reset every time. A customer message hits your n8n webhook through the Evolution API node. The workflow marks the message as read, figures out what type of content it is (text, audio, image, or document), then extracts the useful information: audio gets transcribed with Whisper, images get described with a vision-capable model, and PDFs/documents get parsed into text. That normalized input is queued briefly in Redis so multiple back-to-back messages get handled together, not as a messy trickle. Finally, an AI Agent generates the response using memory, formats the reply into one or more sendable blocks, and sends it back through WhatsApp with short pauses between messages so delivery looks natural.
The workflow starts with an incoming WhatsApp webhook and a quick validation step. It then converts whatever the customer sent into plain text, merges recent messages, and generates a contextual reply. Last, it dispatches text (and images if needed), then clears the queue so the next thread starts clean.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say your inbox gets about 40 support messages a day, and around 15 of them are voice notes, screenshots, or PDFs. Manually, it’s easy to spend 5 minutes per “weird format” message between listening, opening files, and typing a careful reply, which is about 75 minutes daily. With this workflow, the customer still messages normally, but the assistant handles transcription, extraction, and drafting automatically. You typically spend a minute skimming and approving, so you get roughly an hour back on an average day.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Evolution API to receive/send WhatsApp messages
- OpenAI for GPT responses, Whisper, and vision
- Redis connection (get it from your Redis host dashboard)
Skill level: Intermediate. You’ll be comfortable self-hosting n8n and pasting API credentials into the right places.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A WhatsApp message triggers the workflow. Evolution API sends the incoming event to your n8n webhook, then the workflow maps the payload and checks the sender against an allowed list.
The message gets converted into usable text. A switch routes the content type: text is captured directly, audio is converted into a file and transcribed, images are analyzed into a description, and documents are extracted into readable text.
Context is built before the assistant replies. The normalized message is queued in Redis, the workflow waits briefly for more input (because people send “one thought” as three messages), then pulls the queued items and keeps only the latest batch for the response.
OpenAI generates and formats the reply. An AI Agent uses a chat model plus memory to draft a helpful response, then a formatting step splits it into sendable blocks and chooses text vs. image dispatch when needed.
You can easily modify the system prompt to match your brand voice and adjust how long the workflow waits and remembers context based on your support style. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Webhook Trigger
Start by setting up the incoming WhatsApp webhook that receives all messages for the assistant.
- Add the Incoming WhatsApp Hook node as the workflow trigger.
- Set Path to
whatsapp-multimodal. - Set HTTP Method to
POST. - Copy the production webhook URL and configure it in your WhatsApp provider.
Step 2: Map and Validate Incoming Messages
Normalize the inbound payload and restrict which senders can access the assistant.
- In Map Incoming Payload, keep the assignments that map key fields such as event, instance, remoteJid, messageId, pushName, messageText, and messageType using expressions like
{{ $json.body.data.messageType }}. - In Validate Sender List, replace the placeholder
[YOUR_EMAIL]with the approved sender ID you want to allow. - In Initial Short Delay, set Amount to
2seconds to prevent immediate read confirmation. - Configure Flag Message Read to use messageId as
{{ $('Map Incoming Payload').item.json.messageId }}, remoteJid as{{ $('Validate Sender List').item.json.remoteJid }}, and instanceName as{{ $('Map Incoming Payload').item.json.instance }}.
Credential Required: Connect your Evolution API credentials for Flag Message Read.
Step 3: Route and Capture Multimodal Inputs
Route messages by type and capture text, audio, image, and document payloads for downstream processing.
- In Route by Content Type, keep the rules that map
conversation,audioMessage,imageMessage, anddocumentMessagebased on{{ $('Map Incoming Payload').item.json.messageType }}. - In Capture Text Input, set input to
{{ $('Map Incoming Payload').item.json.messageText }}. - In Capture Audio Base64, Capture Image Base64, and Capture Document Base64, set data to
{{ $('Incoming WhatsApp Hook').item.json.body.data.message.base64 }}. - In Notify Unsupported Type, keep messageText as
Sorry, I cannot process this type of message yet.and route to Stop Unsupported Flow.
Credential Required: Connect your Evolution API credentials for Notify Unsupported Type.
Step 4: Convert Media and Extract Text
Convert base64 media to files and extract usable text for the AI assistant.
- In Convert Audio File, set Operation to
toBinaryand Source Property todata. - In Audio Transcription, set Resource to
audioand Operation totranscribe, then output to Wrap Audio Text with<audio>tags. - In Convert Image File, set Operation to
toBinaryand Source Property todata, then pass to Image Description with Text set toDescribe this image received via WhatsApp. Be as detailed as possible and transcribe any visible text.. - In Convert Document File, set fileName to
{{ $('Incoming WhatsApp Hook').item.json.body.data.message.documentMessage.fileName }}and mimeType to{{ $('Incoming WhatsApp Hook').item.json.body.data.message.documentMessage.mimetype }}, then pass to Document Extraction. - In Wrap Image Text and Wrap Document Text, keep the XML-style tags like
<image>and<document>wrapping the extracted text.
Credential Required: Connect your OpenAI credentials for Audio Transcription, Image Description, and Document Extraction.
Step 5: Queue Messages and Merge Context
Consolidate multiple rapid messages into a single prompt before sending to the assistant.
- In Normalize Input Field, keep input mapped to
{{ $json.input }}for all inbound types. - In Enqueue Message, set List to
{{ $('Map Incoming Payload').item.json.remoteJid }}and Message Data to{{ $json.input }}. - In Delay for More Input, set Amount to
10seconds to capture multiple messages. - In Fetch Queued Items, set Key to
{{ $('Map Incoming Payload').item.json.remoteJid }}and keep Operation asget. - In Check Latest Message, keep the comparison
{{ $json.propertyName.last() }}equals{{ $('Normalize Input Field').item.json.input }}to determine the final message. - In Merge Message List, set finalInput to
{{ $json.propertyName.join('\n') }}.
Credential Required: Connect your Redis credentials for Enqueue Message, Fetch Queued Items, and Purge Message Queue.
Step 6: Configure the AI Assistant and Memory
Set up the AI agent, attach the language model, and connect memory for conversational context.
- In Assistant Agent, set Text to
User message: {{ $json.finalInput }} Current date: {{ $now }}and keep the system message for a friendly assistant. - Open Primary Chat Model and select the model
gpt-4o-mini, then ensure it is connected as the language model for Assistant Agent. - In Postgres Memory Store, set tableName to
whatsapp_chat_memory, sessionKey to{{ $('Map Incoming Payload').item.json.remoteJid }}, and contextWindowLength to20.
Credential Required: Connect your OpenAI credentials for Primary Chat Model.
Postgres Memory Store is connected as memory for Assistant Agent—ensure credentials are added to Assistant Agent, not the Postgres Memory Store sub-node.
Step 7: Format and Dispatch WhatsApp Replies
Split the AI response into WhatsApp-friendly chunks and send text or images accordingly.
- In Format Reply Blocks, keep Text set to
Process the following message: {{ $json.output }}and enable hasOutputParser. - Ensure Secondary Chat Model is connected as the language model for Format Reply Blocks and uses
gpt-4o-mini. - Verify Structured JSON Parser has autoFix set to
trueand jsonSchemaExample to{ "respuesta": [] }. - In Split Reply Items, set Field To Split Out to
output.respuesta. - In Detect Image Link, keep the three conditions that check
{{ $json['output.respuesta'] }}ends with.png,.jpg, or.jpeg. - In Dispatch Text, set messageText to
{{ $json['output.respuesta'] }}and keep the delay expression{{ $json['output.respuesta'].length * 60 }}. - In Dispatch Image, set media to
{{ $('Iterate Reply Batch').item.json['output.respuesta'] }}. - In Pause Between Sends, set Amount to
2seconds before looping back to Iterate Reply Batch.
Structured JSON Parser is connected as the output parser for Format Reply Blocks—ensure any required credentials are added to Format Reply Blocks, not the parser sub-node.
Credential Required: Connect your Evolution API credentials for Dispatch Text and Dispatch Image.
Step 8: Test and Activate Your Workflow
Validate the end-to-end flow and then enable it for production use.
- Click Test workflow and send a WhatsApp text, audio, image, and document message to the webhook.
- Confirm that Normalize Input Field passes content into Enqueue Message and that Merge Message List builds
finalInput. - Check that Assistant Agent produces an output and Format Reply Blocks returns a JSON array of
respuestaitems. - Verify that messages are sent via Dispatch Text or Dispatch Image and that Purge Message Queue clears the Redis list.
- When satisfied, toggle the workflow to Active so the webhook runs continuously.
Common Gotchas
- Evolution API credentials can expire or your instance permissions can be off. If things break, check your Evolution API instance status and API key on the server first.
- If you’re using Wait nodes or external processing, timing varies. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 60–90 minutes once Evolution API and Redis are running.
No. You’ll connect accounts, paste API keys, and edit a few prompts.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API usage (many teams spend a few dollars a week at moderate support volume).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. Update the system prompt in the Assistant Agent to match your tone, disclaimers, and what the bot should never do. You can also adjust the “Delay for More Input” wait time so the assistant responds faster or gathers more context first. If you want the bot to recognize VIP customers, add a lookup step (for example, Google Sheets or Monday.com) before the agent runs. Finally, change the memory retention settings (TTL in Redis and the chat memory node) to match how long your support conversations usually last.
Usually it’s an incorrect instance name or an expired API key in your Evolution API credentials. Check that your Evolution API server is reachable from your n8n host, then confirm the webhook URL configured in Evolution API matches your n8n webhook exactly. If it fails only during busy periods, you may be hitting rate limits or your server is underpowered. Also confirm your sender validation rules aren’t accidentally blocking real customers.
A typical self-hosted setup can handle hundreds of messages per day comfortably, and more if your server is sized well.
Often, yes, because this flow needs branching, short waits, message batching, and memory across messages. Zapier/Make can do parts of it, but keeping context and handling multimodal inputs tends to get clunky and expensive fast. n8n also lets you self-host, which matters here because the Evolution API community node requires it. If your goal is just a simple “auto-reply to one type of message,” Zapier or Make may be fine. If you want a support-grade assistant, n8n is the safer bet, and Talk to an automation expert if you want help choosing.
Support on WhatsApp doesn’t have to mean living in WhatsApp. Set this up once, and the repetitive interpreting, summarizing, and drafting fades into the background.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.