WhatsApp + Google Gemini: instant promo images
You have an idea for a promo image. Then comes the slog: opening a tool, rewriting prompts three times, exporting the file, and sending it back to the team. It’s not “hard” work. It’s the kind of work that quietly steals an hour.
Marketing managers feel this when they’re trying to move fast on campaigns. A small business owner feels it when they’re doing everything themselves. And if you run a client-heavy agency, you already know the pain of “Can we get one more version?” This WhatsApp Gemini images automation turns a single WhatsApp message into a polished image you can use.
You’ll see how the workflow takes your rough idea, upgrades it into a strong prompt, generates the image with Google Gemini, and sends the finished file right back to WhatsApp.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: WhatsApp + Google Gemini: instant promo images
flowchart LR
subgraph sg0["WhatsApp Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Generate Image"]
n1@{ icon: "mdi:robot", form: "rounded", label: "Generate prompt", pos: "b", h: 48 }
n2@{ icon: "mdi:robot", form: "rounded", label: "Structured Prompt", pos: "b", h: 48 }
n3@{ icon: "mdi:cog", form: "rounded", label: "Convert to Image", pos: "b", h: 48 }
n4@{ icon: "mdi:brain", form: "rounded", label: "Gemini 2.5 Pro", pos: "b", h: 48 }
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>WhatsApp Trigger"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/whatsapp.svg' width='40' height='40' /></div><br/>Send Image"]
n4 -.-> n1
n0 --> n3
n1 --> n0
n3 --> n6
n5 --> n1
n2 -.-> n1
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n5 trigger
class n1,n2 ai
class n4 aiModel
class n0 api
classDef customIcon fill:none,stroke:none
class n0,n5,n6 customIcon
The Problem: Promo Images Get Stuck in “Tool Switching”
Promo images are supposed to be quick. In reality, they get trapped between chats, creative tools, and “just one more tweak” feedback loops. Someone drops a rough concept in WhatsApp, you copy it into an image generator, the output is close but not on-brand, so you rewrite the prompt. Now you’re downloading files, renaming versions, and trying to remember which one was “the good one.” Multiply that by a few campaigns (or a few clients) and you’ve built a reliable time sink.
It adds up fast. Here’s where it breaks down most often:
- Every new image requires prompt rewriting, which means your “quick request” turns into a mini writing task.
- Brand consistency slips because each person prompts differently, even when they mean well.
- File handling becomes messy, especially when you’re generating multiple variations per idea.
- Momentum dies when you have to leave WhatsApp, open other tools, then come back to deliver the result.
The Solution: WhatsApp Message → Gemini Image → File Returned in Chat
This workflow turns WhatsApp into a simple “request box” for AI images. You send a message describing what you want (even a messy, short one). n8n catches that message instantly, then uses Gemini 2.5 Pro to expand your rough idea into a detailed, image-ready prompt that’s more likely to produce something usable on the first try. Next, the workflow calls the Gemini 2.0 Flash image generation API via an HTTP request, receives the image back as Base64 data, and converts it into a proper file. Finally, it sends that image file back into the same WhatsApp chat, so you can forward it, save it, or ask for another variation without changing apps.
The workflow starts when a WhatsApp message hits your connected number. Gemini upgrades the prompt, then generates the image. The final output is a ready-to-share image file delivered right back in the thread you’re already using.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you need 5 promo images for a weekend sale. Manually, a “simple” cycle is usually 10 minutes to rewrite prompts and generate each image, plus another 5 minutes to export, name files, and send them back in chat. That’s about 75 minutes. With this workflow, you can drop 5 WhatsApp messages in a few minutes total, then wait for generation and receive each image back as a file in the thread. Realistically, you’re spending about 10 minutes of active time instead of over an hour.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- WhatsApp Business Cloud to receive and send WhatsApp messages.
- Google Gemini to expand prompts and generate images.
- Google Gemini API key (get it from Google AI Studio).
Skill level: Intermediate. You’ll connect WhatsApp + Gemini credentials and paste an API key into the HTTP request settings.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A WhatsApp message triggers everything. When someone sends a text to your connected WhatsApp number, n8n captures the message and passes it into the workflow.
Your idea gets “translated” into a better prompt. Gemini 2.5 Pro takes the short description and turns it into a detailed, stylistic prompt (the kind that tends to produce higher-quality images with fewer retries). A structured prompt parser keeps the output clean and predictable.
The image is generated through an API call. n8n sends the enhanced prompt to the Gemini 2.0 Flash image endpoint using an HTTP request, then receives the result as Base64 image data.
The finished file returns to WhatsApp. The workflow converts the Base64 output into an actual image file and replies in the same chat, so it’s ready to save, forward, or review.
You can easily modify the prompt style to match your brand voice based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Incoming WhatsApp Hook Trigger
Set up the workflow to start when a WhatsApp message is received.
- Add and select the Incoming WhatsApp Hook node as your trigger.
- Verify the webhook is ready to receive incoming WhatsApp messages (the node uses the webhook ID created by n8n).
- Connect Incoming WhatsApp Hook to Compose Prompt Text to match the execution flow.
Step 2: Set Up the AI Prompt Chain
Generate the image prompt text using the AI chain and its connected model and parser.
- Open Compose Prompt Text and define how incoming WhatsApp content is transformed into an image prompt.
- Ensure Gemini Pro Chat Model is connected to Compose Prompt Text as the language model.
- Ensure Structured Prompt Parser is connected to Compose Prompt Text as the output parser.
- Remember that AI sub-nodes like Structured Prompt Parser use credentials from the parent chain—add credentials on Gemini Pro Chat Model, not on the parser.
Step 3: Build the Image Generation Request
Send the prompt to your image generation API.
- Open Create Image Request and configure the HTTP request to your image generation endpoint.
- Map the prompt output from Compose Prompt Text into the request body or query parameters.
- Confirm Create Image Request connects to Format Image File to continue the flow.
Step 4: Format and Send the Image Back to WhatsApp
Convert the API response into a file and send it as a WhatsApp reply.
- Configure Format Image File to convert the image response into a file object suitable for WhatsApp delivery.
- Open Dispatch Image Reply and set it to send the converted file back to the original WhatsApp sender.
- Verify the connection order matches the execution flow: Create Image Request → Format Image File → Dispatch Image Reply.
Step 5: Review the Flowpast Branding Note (Optional)
The sticky note is informational and does not affect execution.
- Keep Flowpast Branding as a visual reference for documentation and ownership.
Step 6: Test and Activate Your Workflow
Validate the full flow end-to-end, then activate the workflow for production use.
- Click Execute Workflow and send a test WhatsApp message to trigger Incoming WhatsApp Hook.
- Confirm a successful run shows data passing through Compose Prompt Text, Create Image Request, Format Image File, and Dispatch Image Reply.
- When satisfied, toggle the workflow to Active to enable continuous operation.
Common Gotchas
- WhatsApp Business Cloud credentials can expire or need specific permissions. If things break, check your Meta app settings and token status first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if your WhatsApp Business and Gemini key are ready.
No. You’ll mostly paste an API key and connect your WhatsApp credential in n8n.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Google Gemini API usage, which is usually small for prompt expansion and image generation but depends on volume.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. The easiest win is updating the instructions inside the “Compose Prompt Text” Gemini node so it always includes your brand colors, lighting preferences, composition rules, and “don’t do this” constraints. If you want more control, adjust the structured output parser so the workflow produces consistent fields like style, aspect ratio, and negative prompts. That way you get repeatable results, not lucky ones.
Usually it’s an expired Meta access token or a permission mismatch on the WhatsApp Business Cloud app. Regenerate the token, confirm the phone number is the one connected to the WhatsApp Trigger, and re-save the credential in n8n. If failures happen only during busy periods, rate limits can also be the culprit, so slow down how often you generate images. Finally, double-check that the reply node is sending a file (binary) and not the raw Base64 text.
On n8n Cloud, it depends on your monthly execution limit; if you self-host, there’s no hard cap beyond your server and API limits.
Often, yes. This workflow relies on a few things that are awkward (or pricey) in simpler automation tools: turning AI output into structured data, calling an image API with custom parameters, and converting Base64 into a proper file for WhatsApp delivery. n8n also gives you more control over branching and retries, which matters when an API returns a slow response or a temporary error. Zapier or Make can still work if you only want a basic “message in, image out” proof of concept and you don’t mind less control. If you want help choosing, Talk to an automation expert.
Set this up once and your WhatsApp thread becomes the fastest way to turn a rough idea into a usable promo image. Honestly, it’s a relief.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.