WhatsApp + Google Gemini, replies handled for you
Your WhatsApp inbox never “finishes.” The same questions repeat, voice notes pile up, and your team ends up copy-pasting answers that drift a little more off-brand every day.
This WhatsApp reply automation hits support leads first, but small business owners and ops managers feel it too. You want faster replies without hiring, and you want them to sound like you (not like a random bot).
This workflow turns incoming WhatsApp text and voice messages into consistent, company-aware responses using Google Gemini and Pinecone. You’ll see what it automates, what results you can expect, and what you need to run it reliably.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: WhatsApp + Google Gemini, replies handled for you
flowchart LR
subgraph sg0["WhatsApp Flow"]
direction LR
n0@{ icon: "mdi:robot", form: "rounded", label: "AI Agent", pos: "b", h: 48 }
n1@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
n2@{ icon: "mdi:memory", form: "rounded", label: "Simple Memory", pos: "b", h: 48 }
n3@{ icon: "mdi:wrench", form: "rounded", label: "Answer questions with a vect..", pos: "b", h: 48 }
n4@{ icon: "mdi:cube-outline", form: "rounded", label: "Pinecone Vector Store", pos: "b", h: 48 }
n5@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model1", pos: "b", h: 48 }
n6@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings Google Gemini", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>WhatsApp Trigger"]
n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Get Audio URL"]
n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Audio Download"]
n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Audio Prompt", pos: "b", h: 48 }
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Send message"]
n12@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Audio/Message", pos: "b", h: 48 }
n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Audio Convert"]
n14["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Gemini speech to text"]
n0 --> n11
n10 --> n0
n13 --> n14
n12 --> n8
n12 --> n0
n8 --> n9
n2 -.-> n0
n9 --> n13
n7 --> n12
n14 --> n10
n4 -.-> n3
n6 -.-> n4
n1 -.-> n0
n5 -.-> n3
n3 -.-> n0
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n7 trigger
class n0 ai
class n1,n5 aiModel
class n3 ai
class n2 ai
class n4 ai
class n6 ai
class n12 decision
class n9,n14 api
class n13 code
classDef customIcon fill:none,stroke:none
class n7,n8,n9,n11,n13,n14 customIcon
The Problem: WhatsApp Support Becomes a Time Sink
WhatsApp is great for customers because it’s quick. For you, it’s a constant stream of interruptions. One person asks about pricing, another sends a 40-second voice note, and someone else wants the same shipping policy you answered yesterday. You can’t ignore it, but answering manually means switching context all day, hunting for the “right” response, and hoping a teammate doesn’t promise something that isn’t true. The workload grows quietly until it’s eating a few hours every week.
It’s not one big failure. It’s dozens of tiny ones that stack up.
- Voice notes force you to stop, listen, replay, and summarize before you even start replying.
- Answers vary by agent, so customers get different policies depending on who’s online.
- Searching old chats for “that one message” is slow, and frankly it’s easy to miss details.
- When you’re busy, replies slip, and WhatsApp starts feeling like a fire alarm instead of a channel.
The Solution: An AI WhatsApp Virtual Receptionist That Knows Your Business
This n8n workflow acts like a virtual receptionist inside WhatsApp. When a customer sends a message, it detects whether it’s text or a voice note. If it’s text, it goes straight to the AI agent. If it’s voice, the workflow securely fetches the audio from WhatsApp, converts it, and sends it to Google Gemini for transcription first. From there, the AI agent generates a clean, direct reply using your company knowledge stored in Pinecone (think product catalog, FAQs, policies, and internal “approved” wording). Finally, the response is sent back to the customer in the same WhatsApp thread, fast enough to feel like a real conversation.
The workflow starts with a WhatsApp incoming message trigger. In the middle, Gemini handles transcription and language understanding while Pinecone retrieves relevant company context. At the end, WhatsApp sends a polished reply that follows your communication rules (no unnecessary greetings, approved languages only, and a professional tone).
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say your business gets about 40 WhatsApp questions a day, and roughly 10 are voice notes. Manually, you might spend about 4 minutes per text reply and closer to 8 minutes per voice note (listen, replay, then type), which is around 4 hours daily. With this workflow, the “work” is basically just receiving the message; Gemini transcribes voice notes in the background and the agent drafts the response using Pinecone context, so you’re mostly reviewing edge cases. For many teams, that means getting roughly 3 hours back each day while keeping response quality steady.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- WhatsApp Business Cloud API to receive and send messages.
- Google Gemini for transcription and response generation.
- Pinecone to store and retrieve company knowledge.
- WhatsApp access token (get it from the Meta Developer Dashboard).
- Google Gemini API key (get it from Google AI Studio / Google Cloud).
- Pinecone API key (get it from your Pinecone console).
Skill level: Intermediate. You’ll connect credentials, set a webhook, and paste a few keys, but you won’t be writing an app from scratch.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A WhatsApp message comes in. The WhatsApp trigger listens for new inbound messages so the workflow can respond in real time, not hours later.
Text and voice take different paths. A routing step checks the message type. Text goes straight to the receptionist agent; voice messages first go through audio retrieval and transcription so they can be treated like normal text.
Gemini + Pinecone generate a “company-aware” answer. The AI agent uses a Gemini chat model to draft the response, but it also queries Pinecone for relevant product details, FAQs, and policies. A short-term memory buffer keeps the last 20 messages per session so follow-up questions still make sense.
The reply is sent back to WhatsApp. Once the agent finishes, the workflow dispatches the response using the WhatsApp Business Cloud node, keeping the conversation in one place.
You can easily modify the business rules (tone, allowed languages, how direct the replies are) to match your brand. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the WhatsApp Trigger
This workflow starts when a new WhatsApp message arrives and routes it by message type.
- Add and configure WhatsApp Incoming Trigger as the workflow trigger.
- Credential Required: Connect your
whatsAppTriggerApicredentials in WhatsApp Incoming Trigger. - Verify the trigger listens for updates set to
messages. - Connect WhatsApp Incoming Trigger to Route Message Type.
Step 2: Route Messages by Type
Use the switch node to detect whether the incoming message is audio or text and route accordingly.
- Open Route Message Type and keep the two rules named Audio and Text.
- For the Audio rule, ensure the condition checks
{{ $json.messages[0].audio }}with the exists operator. - For the Text rule, ensure the condition checks
{{ $json.messages[0].text }}with the exists operator. - Connect the Audio output to Retrieve Audio Link, and the Text output to Virtual Receptionist.
messages[0].audio or messages[0].text in the payload.Step 3: Set Up Audio Retrieval and Transcription
The audio branch fetches the media URL, downloads the file, converts it to base64, and sends it for transcription.
- In Retrieve Audio Link, set Resource to
media, Operation tomediaUrlGet, and Media ID to{{ $json.messages[0].audio.id }}. - Credential Required: Connect your
whatsAppApicredentials in Retrieve Audio Link. - In Download Audio File, set URL to
{{ $json.url }}and Authentication togenericCredentialTypewith Generic Auth Type set tohttpHeaderAuth. - Credential Required: Connect your
httpHeaderAuthcredentials in Download Audio File. - Keep the Convert Audio Base64 code as provided to output
base64AudioandmimeTypefrom the binary input. - In Gemini Transcription Request, set Method to
POSTand JSON Body to the provided structure using{{ $json.mimeType }}and{{ $json.base64Audio }}. - Connect the flow: Retrieve Audio Link → Download Audio File → Convert Audio Base64 → Gemini Transcription Request → Prepare Audio Prompt.
data as expected by Convert Audio Base64.Step 4: Prepare AI Input and Configure the Agent
Prepare the final prompt text and wire up the AI components for a knowledge-backed response.
- In Prepare Audio Prompt, ensure the assignment sets candidates[0].content.parts[0].text to
{{ $json.candidates[0].content.parts[0].text }}. - In Virtual Receptionist, set Text to
{{ $json.messages[0].text.body }} {{ $json.candidates[0].content.parts[0].text }}so it can handle both text and transcribed audio. - Keep the Virtual Receptionist System Message content as defined for consistent support behavior and tone.
- Gemini Chat Engine is connected as the language model for Virtual Receptionist — Credential Required: Connect your
googlePalmApicredentials in Gemini Chat Engine. - Session Memory Buffer is connected to Virtual Receptionist — set Session Key to
{{ $('WhatsApp Incoming Trigger').item.json.contacts[0].wa_id }}and Context Window Length to20. - Vector Knowledge Lookup is connected as a tool for Virtual Receptionist — keep the description text as provided.
- Pinecone Vector Index powers Vector Knowledge Lookup — Credential Required: Connect your
pineconeApicredentials and ensure Pinecone Index issuperclean. - Gemini Flash Model is connected as the language model for Vector Knowledge Lookup — Credential Required: Connect your
googlePalmApicredentials and set Model Name tomodels/gemini-2.0-flash. - Gemini Embedding Builder is connected to Pinecone Vector Index — Credential Required: Connect your
googlePalmApicredentials.
Step 5: Configure the WhatsApp Response
Send the AI-generated response back to the user over WhatsApp.
- Open Dispatch WhatsApp Reply and set Operation to
send. - Set Text Body to
{{ $json.output }}. - Set Phone Number ID to
[YOUR_ID]and Recipient Phone Number to{{ $('WhatsApp Incoming Trigger').item.json.messages[0].from }}. - Credential Required: Connect your
whatsAppApicredentials in Dispatch WhatsApp Reply. - Confirm the execution flow from Virtual Receptionist to Dispatch WhatsApp Reply is connected.
[YOUR_ID] value is replaced with your actual WhatsApp Phone Number ID.Step 6: Test & Activate Your Workflow
Run an end-to-end test to confirm both text and audio paths work, then activate the workflow.
- Click Execute Workflow and send a WhatsApp text message to trigger WhatsApp Incoming Trigger → Route Message Type → Virtual Receptionist → Dispatch WhatsApp Reply.
- Send a WhatsApp audio message to test the audio branch: Retrieve Audio Link → Download Audio File → Convert Audio Base64 → Gemini Transcription Request → Prepare Audio Prompt → Virtual Receptionist → Dispatch WhatsApp Reply.
- Confirm a successful run by checking that Dispatch WhatsApp Reply sends a response and that AI output appears in the node’s output.
- Once testing is successful, toggle the workflow Active to enable production use.
Common Gotchas
- WhatsApp Business Cloud credentials can expire or lose permissions after Meta changes. If replies stop sending, check your Meta Developer Dashboard token status and the n8n credential tied to the WhatsApp nodes first.
- If you’re using Wait nodes or external processing (like transcription), processing times vary. Bump up the wait duration if downstream nodes fail on empty responses, especially right after “Download Audio File” and the Gemini transcription request.
- Default prompts in AI nodes are generic. Add your brand voice and “approved phrasing” early inside the Virtual Receptionist agent rules, or you will be editing outputs forever.
Frequently Asked Questions
About 45 minutes if you already have the API keys.
No. You’ll mostly connect accounts and paste credentials into n8n. The only “code-ish” part is already in the workflow for converting audio to Base64.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Google Gemini usage and Pinecone storage, which vary based on message volume.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, but keep it controlled. Update the allowed-language rules inside the Virtual Receptionist (AI Agent) node, then test real messages for each language. If you need stricter routing, add language detection and send different languages to different agent paths so your tone and policy wording stays consistent.
Usually it’s an expired access token or a permission issue on the Meta app that owns your WhatsApp Business account. Regenerate the token in the Meta Developer Dashboard, then update the credential used by the WhatsApp Incoming Trigger and Dispatch WhatsApp Reply nodes. If you recently changed phone numbers or webhooks, double-check the webhook URL and subscribed events. Rate limiting can show up too if you blast replies during testing.
On self-hosted n8n, it depends on your server, but handling a few thousand messages a month is realistic for a small VPS if your AI usage is sized right. On n8n Cloud, your limit is based on your plan’s monthly executions. Voice notes take longer than text because transcription is an extra call, so plan capacity around your peak hours, not your average day.
For this use case, n8n is usually the better fit because you need branching (text vs. voice), session memory, and a more “agent-like” flow with a vector database in the middle. Zapier and Make can do parts of it, but complex chat logic gets messy fast and can get expensive as volume grows. n8n also gives you the option to self-host, which matters when your WhatsApp traffic spikes. If you want the simplest possible setup and you only handle text, you might prefer Zapier. If you’re unsure, Talk to an automation expert and get a straight recommendation.
You set this up once, and your WhatsApp inbox stops running your day. The workflow handles the repetitive questions, and you only jump in when it actually needs a human.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.