WhatsApp + Google Gemini, replies handled for you

Your WhatsApp inbox never “finishes.” The same questions repeat, voice notes pile up, and your team ends up copy-pasting answers that drift a little more off-brand every day.

This WhatsApp reply automation hits support leads first, but small business owners and ops managers feel it too. You want faster replies without hiring, and you want them to sound like you (not like a random bot).

This workflow turns incoming WhatsApp text and voice messages into consistent, company-aware responses using Google Gemini and Pinecone. You’ll see what it automates, what results you can expect, and what you need to run it reliably.

How This Automation Works

The full n8n workflow, from trigger to final output:

n8n Workflow Template: WhatsApp + Google Gemini, replies handled for you

Click to explore

flowchart LR

    subgraph sg0["WhatsApp Flow"]
        direction LR
        n0@{ icon: "mdi:robot", form: "rounded", label: "AI Agent", pos: "b", h: 48 }
        n1@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model", pos: "b", h: 48 }
        n2@{ icon: "mdi:memory", form: "rounded", label: "Simple Memory", pos: "b", h: 48 }
        n3@{ icon: "mdi:wrench", form: "rounded", label: "Answer questions with a vect..", pos: "b", h: 48 }
        n4@{ icon: "mdi:cube-outline", form: "rounded", label: "Pinecone Vector Store", pos: "b", h: 48 }
        n5@{ icon: "mdi:brain", form: "rounded", label: "Google Gemini Chat Model1", pos: "b", h: 48 }
        n6@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings Google Gemini", pos: "b", h: 48 }
        n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>WhatsApp Trigger"]
        n8["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Get Audio URL"]
        n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Audio Download"]
        n10@{ icon: "mdi:swap-vertical", form: "rounded", label: "Audio Prompt", pos: "b", h: 48 }
        n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Send message"]
        n12@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Audio/Message", pos: "b", h: 48 }
        n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Audio Convert"]
        n14["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Gemini speech to text"]
        n0 --> n11
        n10 --> n0
        n13 --> n14
        n12 --> n8
        n12 --> n0
        n8 --> n9
        n2 -.-> n0
        n9 --> n13
        n7 --> n12
        n14 --> n10
        n4 -.-> n3
        n6 -.-> n4
        n1 -.-> n0
        n5 -.-> n3
        n3 -.-> n0
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n7 trigger
    class n0 ai
    class n1,n5 aiModel
    class n3 ai
    class n2 ai
    class n4 ai
    class n6 ai
    class n12 decision
    class n9,n14 api
    class n13 code
    classDef customIcon fill:none,stroke:none
    class n7,n8,n9,n11,n13,n14 customIcon

The Problem: WhatsApp Support Becomes a Time Sink

WhatsApp is great for customers because it’s quick. For you, it’s a constant stream of interruptions. One person asks about pricing, another sends a 40-second voice note, and someone else wants the same shipping policy you answered yesterday. You can’t ignore it, but answering manually means switching context all day, hunting for the “right” response, and hoping a teammate doesn’t promise something that isn’t true. The workload grows quietly until it’s eating a few hours every week.

It’s not one big failure. It’s dozens of tiny ones that stack up.

Voice notes force you to stop, listen, replay, and summarize before you even start replying.
Answers vary by agent, so customers get different policies depending on who’s online.
Searching old chats for “that one message” is slow, and frankly it’s easy to miss details.
When you’re busy, replies slip, and WhatsApp starts feeling like a fire alarm instead of a channel.

The Solution: An AI WhatsApp Virtual Receptionist That Knows Your Business

This n8n workflow acts like a virtual receptionist inside WhatsApp. When a customer sends a message, it detects whether it’s text or a voice note. If it’s text, it goes straight to the AI agent. If it’s voice, the workflow securely fetches the audio from WhatsApp, converts it, and sends it to Google Gemini for transcription first. From there, the AI agent generates a clean, direct reply using your company knowledge stored in Pinecone (think product catalog, FAQs, policies, and internal “approved” wording). Finally, the response is sent back to the customer in the same WhatsApp thread, fast enough to feel like a real conversation.

The workflow starts with a WhatsApp incoming message trigger. In the middle, Gemini handles transcription and language understanding while Pinecone retrieves relevant company context. At the end, WhatsApp sends a polished reply that follows your communication rules (no unnecessary greetings, approved languages only, and a professional tone).

What You Get: Automation vs. Results

What This Workflow Automates

Results You’ll Get

It detects WhatsApp messages and routes text vs. voice automatically.
It downloads voice notes and transcribes them with Google Gemini.
It pulls company-specific answers from Pinecone instead of guessing.
It sends the final response back via the WhatsApp Business Cloud API.

Most teams get back about 5 hours a week from repetitive replies.
Customers receive consistent answers even when staff changes.
Voice notes stop slowing you down because they become searchable text.
New hires ramp faster because the “right” answers are built into the system.
You can handle after-hours questions without leaving people on read.

Example: What This Looks Like

Say your business gets about 40 WhatsApp questions a day, and roughly 10 are voice notes. Manually, you might spend about 4 minutes per text reply and closer to 8 minutes per voice note (listen, replay, then type), which is around 4 hours daily. With this workflow, the “work” is basically just receiving the message; Gemini transcribes voice notes in the background and the agent drafts the response using Pinecone context, so you’re mostly reviewing edge cases. For many teams, that means getting roughly 3 hours back each day while keeping response quality steady.

What You’ll Need

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
WhatsApp Business Cloud API to receive and send messages.
Google Gemini for transcription and response generation.
Pinecone to store and retrieve company knowledge.
WhatsApp access token (get it from the Meta Developer Dashboard).
Google Gemini API key (get it from Google AI Studio / Google Cloud).
Pinecone API key (get it from your Pinecone console).

Skill level: Intermediate. You’ll connect credentials, set a webhook, and paste a few keys, but you won’t be writing an app from scratch.

Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).

How It Works

A WhatsApp message comes in. The WhatsApp trigger listens for new inbound messages so the workflow can respond in real time, not hours later.

Text and voice take different paths. A routing step checks the message type. Text goes straight to the receptionist agent; voice messages first go through audio retrieval and transcription so they can be treated like normal text.

Gemini + Pinecone generate a “company-aware” answer. The AI agent uses a Gemini chat model to draft the response, but it also queries Pinecone for relevant product details, FAQs, and policies. A short-term memory buffer keeps the last 20 messages per session so follow-up questions still make sense.

The reply is sent back to WhatsApp. Once the agent finishes, the workflow dispatches the response using the WhatsApp Business Cloud node, keeping the conversation in one place.

You can easily modify the business rules (tone, allowed languages, how direct the replies are) to match your brand. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the WhatsApp Trigger

This workflow starts when a new WhatsApp message arrives and routes it by message type.

Add and configure WhatsApp Incoming Trigger as the workflow trigger.
Credential Required: Connect your whatsAppTriggerApi credentials in WhatsApp Incoming Trigger.
Verify the trigger listens for updates set to messages.
Connect WhatsApp Incoming Trigger to Route Message Type.

Step 2: Route Messages by Type

Use the switch node to detect whether the incoming message is audio or text and route accordingly.

Open Route Message Type and keep the two rules named Audio and Text.
For the Audio rule, ensure the condition checks {{ $json.messages[0].audio }} with the exists operator.
For the Text rule, ensure the condition checks {{ $json.messages[0].text }} with the exists operator.
Connect the Audio output to Retrieve Audio Link, and the Text output to Virtual Receptionist.

Tip: If messages are not routed correctly, confirm WhatsApp sends either messages[0].audio or messages[0].text in the payload.

Step 3: Set Up Audio Retrieval and Transcription

The audio branch fetches the media URL, downloads the file, converts it to base64, and sends it for transcription.

In Retrieve Audio Link, set Resource to media, Operation to mediaUrlGet, and Media ID to {{ $json.messages[0].audio.id }}.
Credential Required: Connect your whatsAppApi credentials in Retrieve Audio Link.
In Download Audio File, set URL to {{ $json.url }} and Authentication to genericCredentialType with Generic Auth Type set to httpHeaderAuth.
Credential Required: Connect your httpHeaderAuth credentials in Download Audio File.
Keep the Convert Audio Base64 code as provided to output base64Audio and mimeType from the binary input.
In Gemini Transcription Request, set Method to POST and JSON Body to the provided structure using {{ $json.mimeType }} and {{ $json.base64Audio }}.
Connect the flow: Retrieve Audio Link → Download Audio File → Convert Audio Base64 → Gemini Transcription Request → Prepare Audio Prompt.

⚠️ Common Pitfall: If transcription fails, make sure the downloaded media is stored in binary field data as expected by Convert Audio Base64.

Step 4: Prepare AI Input and Configure the Agent

Prepare the final prompt text and wire up the AI components for a knowledge-backed response.

In Prepare Audio Prompt, ensure the assignment sets candidates[0].content.parts[0].text to {{ $json.candidates[0].content.parts[0].text }}.
In Virtual Receptionist, set Text to {{ $json.messages[0].text.body }} {{ $json.candidates[0].content.parts[0].text }} so it can handle both text and transcribed audio.
Keep the Virtual Receptionist System Message content as defined for consistent support behavior and tone.
Gemini Chat Engine is connected as the language model for Virtual Receptionist — Credential Required: Connect your googlePalmApi credentials in Gemini Chat Engine.
Session Memory Buffer is connected to Virtual Receptionist — set Session Key to {{ $('WhatsApp Incoming Trigger').item.json.contacts[0].wa_id }} and Context Window Length to 20.
Vector Knowledge Lookup is connected as a tool for Virtual Receptionist — keep the description text as provided.
Pinecone Vector Index powers Vector Knowledge Lookup — Credential Required: Connect your pineconeApi credentials and ensure Pinecone Index is superclean.
Gemini Flash Model is connected as the language model for Vector Knowledge Lookup — Credential Required: Connect your googlePalmApi credentials and set Model Name to models/gemini-2.0-flash.
Gemini Embedding Builder is connected to Pinecone Vector Index — Credential Required: Connect your googlePalmApi credentials.

Tip: AI tool nodes like Session Memory Buffer, Vector Knowledge Lookup, and Gemini Embedding Builder take credentials from their parent nodes (e.g., Gemini Chat Engine or Pinecone Vector Index), not from the tool node itself.

Step 5: Configure the WhatsApp Response

Send the AI-generated response back to the user over WhatsApp.

Open Dispatch WhatsApp Reply and set Operation to send.
Set Text Body to {{ $json.output }}.
Set Phone Number ID to [YOUR_ID] and Recipient Phone Number to {{ $('WhatsApp Incoming Trigger').item.json.messages[0].from }}.
Credential Required: Connect your whatsAppApi credentials in Dispatch WhatsApp Reply.
Confirm the execution flow from Virtual Receptionist to Dispatch WhatsApp Reply is connected.

⚠️ Common Pitfall: If replies are not delivered, verify the [YOUR_ID] value is replaced with your actual WhatsApp Phone Number ID.

Step 6: Test & Activate Your Workflow

Run an end-to-end test to confirm both text and audio paths work, then activate the workflow.

Click Execute Workflow and send a WhatsApp text message to trigger WhatsApp Incoming Trigger → Route Message Type → Virtual Receptionist → Dispatch WhatsApp Reply.
Send a WhatsApp audio message to test the audio branch: Retrieve Audio Link → Download Audio File → Convert Audio Base64 → Gemini Transcription Request → Prepare Audio Prompt → Virtual Receptionist → Dispatch WhatsApp Reply.
Confirm a successful run by checking that Dispatch WhatsApp Reply sends a response and that AI output appears in the node’s output.
Once testing is successful, toggle the workflow Active to enable production use.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Common Gotchas

WhatsApp Business Cloud credentials can expire or lose permissions after Meta changes. If replies stop sending, check your Meta Developer Dashboard token status and the n8n credential tied to the WhatsApp nodes first.
If you’re using Wait nodes or external processing (like transcription), processing times vary. Bump up the wait duration if downstream nodes fail on empty responses, especially right after “Download Audio File” and the Gemini transcription request.
Default prompts in AI nodes are generic. Add your brand voice and “approved phrasing” early inside the Virtual Receptionist agent rules, or you will be editing outputs forever.

Frequently Asked Questions

How long does it take to set up this WhatsApp reply automation automation?

About 45 minutes if you already have the API keys.

Do I need coding skills to automate WhatsApp replies?

No. You’ll mostly connect accounts and paste credentials into n8n. The only “code-ish” part is already in the workflow for converting audio to Base64.

Is n8n free to use for this WhatsApp reply automation workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Google Gemini usage and Pinecone storage, which vary based on message volume.

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I customize this WhatsApp reply automation workflow for multiple languages?

Yes, but keep it controlled. Update the allowed-language rules inside the Virtual Receptionist (AI Agent) node, then test real messages for each language. If you need stricter routing, add language detection and send different languages to different agent paths so your tone and policy wording stays consistent.

Why is my WhatsApp Business Cloud connection failing in this workflow?

Usually it’s an expired access token or a permission issue on the Meta app that owns your WhatsApp Business account. Regenerate the token in the Meta Developer Dashboard, then update the credential used by the WhatsApp Incoming Trigger and Dispatch WhatsApp Reply nodes. If you recently changed phone numbers or webhooks, double-check the webhook URL and subscribed events. Rate limiting can show up too if you blast replies during testing.

How many messages can this WhatsApp reply automation automation handle?

On self-hosted n8n, it depends on your server, but handling a few thousand messages a month is realistic for a small VPS if your AI usage is sized right. On n8n Cloud, your limit is based on your plan’s monthly executions. Voice notes take longer than text because transcription is an extra call, so plan capacity around your peak hours, not your average day.

Is this WhatsApp reply automation automation better than using Zapier or Make?

For this use case, n8n is usually the better fit because you need branching (text vs. voice), session memory, and a more “agent-like” flow with a vector database in the middle. Zapier and Make can do parts of it, but complex chat logic gets messy fast and can get expensive as volume grows. n8n also gives you the option to self-host, which matters when your WhatsApp traffic spikes. If you want the simplest possible setup and you only handle text, you might prefer Zapier. If you’re unsure, Talk to an automation expert and get a straight recommendation.

You set this up once, and your WhatsApp inbox stops running your day. The workflow handles the repetitive questions, and you only jump in when it actually needs a human.

WhatsApp + Google Gemini, replies handled for you

How This Automation Works

n8n Workflow Template: WhatsApp + Google Gemini, replies handled for you

The Problem: WhatsApp Support Becomes a Time Sink