WhatsApp + Google Docs: instant support replies
Your WhatsApp inbox moves fast. Then a voice note shows up. Someone sends a blurry product photo. Another customer writes in Roman Urdu. Suddenly, “quick replies” turn into a copy-paste marathon and a lot of guesswork.
WhatsApp support automation hits Support Leads first, because they own response time. But ecommerce operators and service business owners feel it too. The outcome is simple: customers get consistent answers in seconds, even when the message isn’t plain text.
This workflow connects WhatsApp to your Google Docs knowledge base, adds AI that can read images and transcribe voice notes, and replies automatically. You’ll see what it fixes, what it produces, and what you need to run it reliably.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: WhatsApp + Google Docs: instant support replies
flowchart LR
subgraph sg0["Incoming WhatsApp Hook Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Incoming WhatsApp Hook"]
n1@{ icon: "mdi:robot", form: "rounded", label: "Support AI Orchestrator", pos: "b", h: 48 }
n2@{ icon: "mdi:memory", form: "rounded", label: "Conversation Memory", pos: "b", h: 48 }
n3@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Classify Input Format", pos: "b", h: 48 }
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Fetch Image Link"]
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Retrieve Image File"]
n6@{ icon: "mdi:robot", form: "rounded", label: "Image Content Review", pos: "b", h: 48 }
n7@{ icon: "mdi:swap-vertical", form: "rounded", label: "Image Text Composer", pos: "b", h: 48 }
n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "Text Message Formatter", pos: "b", h: 48 }
n9["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Fetch Audio Link"]
n10["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Retrieve Audio File"]
n11@{ icon: "mdi:robot", form: "rounded", label: "Audio Transcription", pos: "b", h: 48 }
n12@{ icon: "mdi:swap-vertical", form: "rounded", label: "Audio Text Formatter", pos: "b", h: 48 }
n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Send WhatsApp Reply"]
n14@{ icon: "mdi:cog", form: "rounded", label: "Fetch Docs Reference", pos: "b", h: 48 }
n15@{ icon: "mdi:brain", form: "rounded", label: "OpenRouter Chat Engine", pos: "b", h: 48 }
n1 --> n13
n12 --> n1
n6 --> n7
n9 --> n10
n4 --> n5
n2 -.-> n1
n10 --> n11
n5 --> n6
n3 --> n9
n3 --> n4
n3 --> n8
n8 --> n1
n11 --> n12
n0 --> n3
n7 --> n1
n15 -.-> n1
n14 -.-> n1
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n1,n6,n11 ai
class n15 aiModel
class n2 ai
class n3 decision
class n5,n10 api
classDef customIcon fill:none,stroke:none
class n0,n4,n5,n9,n10,n13 customIcon
The Problem: WhatsApp Support Becomes Unscalable Fast
WhatsApp is great for customers because it’s effortless. For your team, it’s a constant context switch. One minute it’s “What’s your return policy?”, the next it’s a 40-second voice note with three questions inside it, and then a photo asking “Is this the right size?” If you’re relying on humans to remember your policies, pricing, and edge cases, you get slow replies, inconsistent answers, and a support backlog that grows at the worst time (after hours, weekends, launches). Honestly, the mental load is the real cost.
The friction compounds. Here’s where it breaks down in real day-to-day support.
- Voice notes and images force a manual “decode” step before anyone can even start replying.
- Two agents answer the same question differently, which creates refunds, arguments, and “but your team said…” screenshots.
- Updating canned responses doesn’t work when your policies live in someone’s head instead of one source of truth.
- Multilingual chats slow everything down, because translating and staying polite takes time and attention.
The Solution: Auto-Reply From Your Google Docs Knowledge Base
This workflow turns WhatsApp into an always-on support channel backed by your Google Docs knowledge base. It starts when a customer message hits your WhatsApp webhook in n8n. The workflow figures out what type of message it is (text, voice note, or image), then converts everything into clean text: voice gets transcribed, images get described, and plain text gets formatted. That text is sent to an AI agent that answers like a professional support rep, pulls facts from your Google Docs content, and keeps conversation context per phone number so the customer doesn’t have to repeat themselves. Finally, the reply goes straight back to WhatsApp automatically.
The workflow begins with an incoming WhatsApp message and a quick classification. From there, media is downloaded when needed and processed through AI (transcription for audio, vision analysis for images). The AI agent then uses your Google Docs reference to generate a consistent, on-brand response and sends it back within seconds.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say your inbox gets about 40 WhatsApp messages a day, and roughly 10 of them are voice notes or images. Manually, those 10 messages often take about 5 minutes each (listen, interpret, check the doc, translate, reply), so you burn close to an hour just on the “hard” messages. With this workflow, the customer message triggers instantly, transcription or image reading happens in the background, and the reply is sent back automatically in under a minute. That’s about an hour back most days, and the answers stay consistent.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- WhatsApp Business API for receiving and sending messages
- Google Docs API to query your knowledge base document
- OpenAI API key (get it from your OpenAI dashboard)
Skill level: Intermediate. You’ll connect APIs, set permissions, and test a few message types end-to-end.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A WhatsApp message triggers the workflow. n8n receives the incoming message via your WhatsApp webhook, tied to a verified business number.
The workflow standardizes the message into text. A classifier routes the message by type. If it’s a voice note, the audio file is fetched and transcribed with OpenAI. If it’s an image, the file is retrieved and analyzed with an AI vision step so the question becomes “plain text” the agent can answer.
The AI agent generates the support reply. The agent uses an OpenRouter chat model and a Google Docs tool to look up your latest policies, pricing, and FAQs. Conversation memory is attached per phone number, which means follow-up questions don’t reset the thread.
The response is sent back to WhatsApp. The final message is formatted for WhatsApp and delivered automatically, so the customer gets a fast answer without waiting for a human to free up.
You can easily modify the knowledge base structure to support new product lines or a different tone of voice based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the WhatsApp Trigger
Start by setting up the WhatsApp webhook that receives incoming customer messages.
- Add the Incoming WhatsApp Hook node as your trigger.
- Set Updates to
messages. - Credential Required: Connect your
whatsAppTriggerApicredentials in Incoming WhatsApp Hook.
Execution begins at Incoming WhatsApp Hook and routes into Classify Input Format.
Step 2: Classify the Incoming Message Type
Route incoming WhatsApp messages into text, image, or audio processing paths using the switch logic.
- Add the Classify Input Format node after Incoming WhatsApp Hook.
- Configure the Voice rule to check
{{$json.messages[0].audio}}with the Exists operator. - Configure the Image rule to check
{{$json.messages[0].image}}with the Exists operator. - Configure the Text rule to check
{{$json.messages[0].text.body}}with the Exists operator.
The Classify Input Format node routes to Fetch Audio Link, Fetch Image Link, or Text Message Formatter based on the message type.
Step 3: Process Text, Image, and Audio Inputs
Set up the input-specific pipelines that transform each message type into a unified text payload.
- In Text Message Formatter, set the text assignment value to
{{ $('Incoming WhatsApp Hook').item.json.messages[0].text.body }}. - For audio: configure Fetch Audio Link with Resource
media, OperationmediaUrlGet, and Media Get ID{{ $('Incoming WhatsApp Hook').item.json.messages[0].audio.id }}. - Connect Fetch Audio Link → Retrieve Audio File and set URL to
{{$json.url}}, with AuthenticationgenericCredentialTypeand Generic Auth TypehttpHeaderAuth. - Connect Retrieve Audio File → Audio Transcription and set Resource to
audioand Operation totranscribe. - In Audio Text Formatter, set the text assignment value to
{{$json.text}}. - For images: configure Fetch Image Link with Resource
media, OperationmediaUrlGet, and Media Get ID{{ $('Incoming WhatsApp Hook').item.json.messages[0].image.id }}. - Connect Fetch Image Link → Retrieve Image File and set URL to
{{$json.url}}, with AuthenticationgenericCredentialTypeand Generic Auth TypehttpHeaderAuth. - Connect Retrieve Image File → Image Content Review and set Resource to
image, Operation toanalyze, Input Type tobase64, and Text toDescribe the image in detail.. - In Image Text Composer, set the text assignment value to
# The user provided the following image and text. ## IMAGE CONTENT: {{ $json.content }} ## USER MESSAGE: {{ $('Incoming WhatsApp Hook').item.json.messages[0].image.caption || "Describe the image" }}.
Credential Required: Connect your whatsAppApi credentials in Fetch Audio Link and Fetch Image Link.
Credential Required: Connect your httpHeaderAuth credentials in Retrieve Audio File and Retrieve Image File.
Credential Required: Connect your openAiApi credentials in Audio Transcription and Image Content Review.
⚠️ Common Pitfall: If the WhatsApp media URL expires quickly, test the workflow immediately after sending an audio or image message.
Step 4: Configure the AI Orchestration Layer
Set up the AI agent, memory, language model, and document tool used to generate accurate responses.
- In Support AI Orchestrator, set Text to
{{$json.text}}and keep Prompt Type asdefine. - Ensure the system prompt in Support AI Orchestrator includes dynamic values like
{{ $('Incoming WhatsApp Hook').item.json.contacts[0].profile.name }}and{{ $now.toString() }}. - Connect Conversation Memory to Support AI Orchestrator via the AI Memory port and set Session Key to
{{ $('Incoming WhatsApp Hook').item.json.messages[0].from }}with Context Window Length20. - Connect Fetch Docs Reference to Support AI Orchestrator as an AI Tool and set Operation to
getwith Document URL[YOUR_ID]. - Connect OpenRouter Chat Engine to Support AI Orchestrator as the AI Language Model and set Model to
anthropic/claude-sonnet-4.
Credential Required: Connect your openRouterApi credentials for the OpenRouter Chat Engine connection on Support AI Orchestrator.
Credential Required: Connect your googleDocsOAuth2Api credentials for the Fetch Docs Reference tool connection on Support AI Orchestrator.
Tip: Conversation Memory and Fetch Docs Reference are AI sub-nodes. Manage their credentials and connections from Support AI Orchestrator, not directly on the sub-nodes.
Step 5: Configure WhatsApp Reply Delivery
Send the AI-generated response back to the WhatsApp user.
- Add Send WhatsApp Reply after Support AI Orchestrator.
- Set Operation to
sendand Text Body to{{$json.output}}. - Set Phone Number ID to
[YOUR_ID]. - Set Recipient Phone Number to
{{ $('Incoming WhatsApp Hook').item.json.messages[0].from }}. - Credential Required: Connect your
whatsAppApicredentials in Send WhatsApp Reply.
⚠️ Common Pitfall: Replace [YOUR_ID] in Send WhatsApp Reply and Fetch Docs Reference with your real WhatsApp Business Phone Number ID and Google Doc URL.
Step 6: Test and Activate Your Workflow
Validate each branch (text, image, audio) and enable the workflow for production use.
- Click Execute Workflow and send a WhatsApp test message with text to trigger Text Message Formatter → Support AI Orchestrator → Send WhatsApp Reply.
- Send an image and confirm the flow Fetch Image Link → Retrieve Image File → Image Content Review → Image Text Composer → Support AI Orchestrator completes with a reply.
- Send a voice note and confirm the flow Fetch Audio Link → Retrieve Audio File → Audio Transcription → Audio Text Formatter → Support AI Orchestrator completes with a reply.
- Verify successful execution by checking that Send WhatsApp Reply returns a message to the sender.
- Toggle the workflow to Active once all branches respond correctly.
Common Gotchas
- WhatsApp Business API credentials can expire or require specific permissions. If things break, check your Meta developer app settings and webhook subscriptions first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
Plan for about 2-3 hours if your API accounts are ready.
No. You’ll mostly be connecting accounts and pasting API keys into n8n credentials.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API costs for transcription and image analysis, plus OpenRouter model usage for the chat agent.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes. You’ll adjust the AI agent’s system prompt to add language rules, and you can expand language detection logic in the same place you currently handle English and Roman Urdu. Many teams also customize the Google Docs structure (clear FAQ headings help) and add a “human handoff” rule when the agent isn’t confident.
Usually it’s expired credentials or a webhook/subscription issue in your Meta app. Regenerate the token, confirm the phone number is still connected, and re-check webhook permissions. If it fails only on media messages, the file download request is often the culprit (wrong URL, missing auth header, or the media link already expired).
A lot, but it depends on where you run n8n and your API limits.
For media-heavy WhatsApp support, n8n is usually the better fit because you can branch logic freely, keep conversation memory, and self-host for high volume without paying per tiny step. Zapier and Make can work, but multi-step AI flows (download media, transcribe, analyze, query knowledge base, respond) get expensive and harder to maintain. n8n also makes it easier to swap models later if you want. The tradeoff is setup complexity, especially around WhatsApp Business API. Talk to an automation expert if you want help choosing.
Once this is running, your Google Doc becomes the brain and WhatsApp becomes the front desk. Set it up, tune the tone, and let the workflow handle the repetitive questions while you focus on the exceptions.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.