OpenAI + ElevenLabs, multilingual audio on demand

Your “simple translation” request turns into a week of back-and-forth. One language sounds too formal, another reads like a robot, and the voice pacing is suddenly off for every market.

This is the kind of multilingual audio automation that marketing leads feel during launches. e-learning producers run into it at scale. And content studios get stuck managing the fixes, not the creative.

This n8n workflow takes one source script, localizes it properly (not literally), generates per-language voice settings, then outputs a ready-to-publish set of audio files you can ship.

How This Automation Works

The full n8n workflow, from trigger to final output:

n8n Workflow Template: OpenAI + ElevenLabs, multilingual audio on demand

Click to explore

flowchart LR

    subgraph sg0["Start Workflow Flow"]
        direction LR
        n0@{ icon: "mdi:play-circle", form: "rounded", label: "Start Workflow", pos: "b", h: 48 }
        n1@{ icon: "mdi:swap-vertical", form: "rounded", label: "Workflow Configuration", pos: "b", h: 48 }
        n2@{ icon: "mdi:robot", form: "rounded", label: "Localization Agent", pos: "b", h: 48 }
        n3@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Model - Localization", pos: "b", h: 48 }
        n4@{ icon: "mdi:robot", form: "rounded", label: "Structured Output - Translat..", pos: "b", h: 48 }
        n5@{ icon: "mdi:swap-vertical", form: "rounded", label: "Prepare Languages Array", pos: "b", h: 48 }
        n6@{ icon: "mdi:swap-vertical", form: "rounded", label: "Loop Over Languages", pos: "b", h: 48 }
        n7@{ icon: "mdi:wrench", form: "rounded", label: "Speech Optimization Agent Tool", pos: "b", h: 48 }
        n8@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Model - Speech Optimi..", pos: "b", h: 48 }
        n9@{ icon: "mdi:wrench", form: "rounded", label: "Voice Parameter Agent Tool", pos: "b", h: 48 }
        n10@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Model - Voice Paramet..", pos: "b", h: 48 }
        n11@{ icon: "mdi:robot", form: "rounded", label: "Structured Output - Voice Pa..", pos: "b", h: 48 }
        n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Generate Audio with ElevenLabs"]
        n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Process Audio Response"]
        n14@{ icon: "mdi:swap-vertical", form: "rounded", label: "Aggregate Results", pos: "b", h: 48 }
        n0 --> n1
        n14 --> n6
        n2 --> n5
        n6 --> n12
        n13 --> n14
        n1 --> n2
        n5 --> n6
        n9 -.-> n2
        n3 -.-> n2
        n12 --> n13
        n7 -.-> n2
        n10 -.-> n9
        n4 -.-> n2
        n11 -.-> n9
        n8 -.-> n7
    end

    %% Styling
    classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
    classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
    classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef disabled stroke-dasharray: 5 5,opacity: 0.5
    class n0 trigger
    class n2,n4,n11 ai
    class n3,n8,n10 aiModel
    class n7,n9 ai
    class n12 api
    class n13 code
    classDef customIcon fill:none,stroke:none
    class n12,n13 customIcon

The Problem: Multilingual audio takes too many rounds

Producing multilingual voice audio sounds straightforward until you do it more than once. You start with a good script, send it out for translation, then spend hours correcting phrasing because it’s “technically right” but culturally wrong. Next comes voice. The same pacing that works in English can sound rushed in Spanish, stiff in German, or oddly emphasized in Japanese. Then you’re naming files, tracking versions, and trying to remember which “final_final_v3” is actually approved. It’s exhausting, and it steals attention from the parts of the project that make money.

It adds up fast. Here’s where it breaks down in the real world.

Teams end up doing “translation” and “localization” as two separate projects, which stretches timelines by days.
Every language gets a different sound because voice settings are guessed, then corrected, then guessed again.
Manual exporting and file labeling creates avoidable mistakes, especially when you’re shipping 10+ languages.
Quality checks become subjective because there’s no consistent process for tone, pace, and emphasis.

The Solution: One script in, localized voice packs out

This workflow turns a single source script into a packaged set of multilingual audio assets using OpenAI and ElevenLabs, coordinated inside n8n. It starts when you manually launch the workflow and provide your source content plus the target language list (ISO codes). OpenAI first localizes the text for each market, adapting idioms and phrasing so it sounds natural instead of “translated.” Then specialized AI steps refine how the audio should sound per language, including pacing and emphasis. For each target language, the workflow calls ElevenLabs to generate the audio, then processes the response, attaches useful metadata, and collects every file into one deliverable bundle.

The workflow begins with setup parameters, then a localization pass that produces structured translation outputs. After that, it loops through each language item, generates voice settings, requests the audio from ElevenLabs, and aggregates results so you can publish without assembling everything by hand.

What You Get: Automation vs. Results

What This Workflow Automates

Results You’ll Get

Localizes the source script with cultural context, not literal translation.
Builds a target language array automatically from your configuration.
Loops through each language and generates ElevenLabs audio with per-language voice settings.
Collects and formats audio outputs into a single aggregated result package.

Most teams cut production time in half compared to manual translate-then-record cycles.
Fewer re-record requests because pacing and tone are tuned per language.
Consistent naming and metadata, which makes handoffs cleaner.
Launch assets for 5–10 languages in one run instead of juggling separate projects.
A repeatable process you can use for every campaign, module, or episode.

Example: What This Looks Like

Say you’re launching a product update in 8 languages. Manually, you might spend about 30 minutes per language just coordinating text, voice direction, exports, and file labeling, which is roughly 4 hours before anyone even reviews audio. With this workflow, you drop in the script, define the 8 ISO language codes once, and run it. Expect about 10 minutes to kick things off, then some waiting time while ElevenLabs generates audio. You get a consolidated set of files instead of a messy folder of partial attempts.

What You’ll Need

n8n instance (try n8n Cloud free)
Self-hosting option if you prefer (Hostinger works well)
OpenAI API for localization and voice parameter generation
ElevenLabs to generate multilingual voice audio files
ElevenLabs API key (get it from your ElevenLabs account settings)

Skill level: Intermediate. You’ll connect API keys, edit prompts, and test outputs for a few languages.

Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).

How It Works

Manual launch with your inputs. You start the workflow, provide the source script (or source content), and set your target languages in the configuration step using ISO language codes.

Localization happens first. OpenAI rewrites the script per language with cultural adaptation, then outputs a structured result so later steps don’t have to “guess” what to read.

Voice direction is generated per language. Separate AI steps refine speech parameters (pace, emphasis) and voice characteristics so the audio sounds natural for each language’s phonetics.

ElevenLabs generates audio in a loop. n8n iterates through the language list, calls ElevenLabs via HTTP request, processes the audio response, then collects every output into a final aggregated package.

You can easily modify the target language list to add or remove markets based on your needs. See the full implementation guide below for customization options.

Step-by-Step Implementation Guide

Step 1: Configure the Manual Trigger

This workflow starts manually so you can validate translations and audio generation before enabling production runs.

Add the Manual Launch Trigger node as your start node.
Ensure Manual Launch Trigger is connected to Setup Parameters.

Step 2: Connect Core Inputs for Localization

Define the source text, target languages, and ElevenLabs API settings that drive the localization and audio generation.

Open Setup Parameters and set chineseText to your source text (example placeholder: <__PLACEHOLDER_VALUE__Chinese text to localize__>).
Set targetLanguages to a comma-separated list like English,Spanish,French,German,Japanese.
Set elevenLabsApiKey to [CONFIGURE_YOUR_API_KEY] and replace it with your real API key.
Set elevenLabsVoiceId to [YOUR_ID] and replace it with your ElevenLabs voice ID.
Keep Include Other Fields enabled.

⚠️ Common Pitfall: If elevenLabsApiKey or elevenLabsVoiceId are left as placeholders, ElevenLabs Audio Request will fail.

Step 3: Set Up the Localization AI Orchestration

This step configures the AI agent that generates structured translations and cultural notes for each target language.

Open Localization Orchestrator and set Text to {{ $json.chineseText }}.
Ensure Prompt Type is set to define and Has Output Parser is enabled.
Verify OpenAI Localization Model is connected as the language model for Localization Orchestrator.
Credential Required: Connect your openAiApi credentials in OpenAI Localization Model.
Confirm Parsed Translation Output is attached as the output parser for Localization Orchestrator with the provided JSON schema.
Attach the tools Speech Refinement Tool and Voice Settings Tool to Localization Orchestrator.

The AI tool sub-nodes Parsed Translation Output, Speech Refinement Tool, Voice Settings Tool, and Parsed Voice Settings inherit credentials from their parent nodes. Add credentials to OpenAI Localization Model, OpenAI Speech Model, and OpenAI Voice Model rather than to these sub-nodes.

Step 4: Build and Iterate the Language List

Translations are normalized into an array and processed one language at a time for audio generation.

In Build Language List, set languages to {{ $json.translations }}.
Ensure Build Language List is connected to Iterate Language Items.
Leave Iterate Language Items as default to process one language per batch.

Step 5: Configure Voice Optimization and Audio Generation

This section refines speech text, computes voice settings, and sends the request to ElevenLabs for each language.

In Speech Refinement Tool, set Text to {{ $fromAI("text_to_optimize") }} and keep the tool description as provided.
Verify OpenAI Speech Model is connected to Speech Refinement Tool.
Credential Required: Connect your openAiApi credentials in OpenAI Speech Model.
In Voice Settings Tool, set Text to {{ $fromAI("language") }} and enable Has Output Parser.
Ensure Parsed Voice Settings is connected as the output parser for Voice Settings Tool.
Verify OpenAI Voice Model is connected to Voice Settings Tool.
Credential Required: Connect your openAiApi credentials in OpenAI Voice Model.
In ElevenLabs Audio Request, set URL to =https://api.elevenlabs.io/v1/text-to-speech/{{ $('Setup Parameters').first().json.elevenLabsVoiceId }}.
Set the request Method to POST, enable Send Body and Send Headers, and confirm Response Format is set to file.
Set body parameters: text to {{ $json.translatedText }}, model_id to eleven_multilingual_v2, and voice_settings to {{ { stability: $json.stability, similarity_boost: $json.similarity_boost, style: $json.style, use_speaker_boost: $json.use_speaker_boost } }}.
Set header parameters: xi-api-key to {{ $('Setup Parameters').first().json.elevenLabsApiKey }} and Content-Type to application/json.

⚠️ Common Pitfall: The ElevenLabs request relies on {{ $json.stability }}, {{ $json.similarity_boost }}, {{ $json.style }}, and {{ $json.use_speaker_boost }} coming from Parsed Voice Settings. If the AI tool output is malformed, the API call may fail.

Step 6: Process and Collect Audio Outputs

Binary audio output is normalized with metadata for each localized result and then looped for the next language.

In Handle Audio Response, keep Mode set to runOnceForEachItem and use the provided JavaScript to attach metadata and filename.
In Collect Audio Results, map fields to store output: audioFile to {{ $binary.data }}, language to {{ $json.language }}, languageCode to {{ $json.languageCode }}, and filename to {{ $json.filename }}.
Ensure Collect Audio Results connects back to Iterate Language Items to continue the loop.

Step 7: Test and Activate Your Workflow

Run a manual test to validate the translation and audio outputs before activating the workflow.

Click Execute Workflow to run Manual Launch Trigger and process the full flow.
Confirm Localization Orchestrator outputs a structured list of translations in Parsed Translation Output.
Verify that each loop cycle through Iterate Language Items produces a binary audio file in Handle Audio Response and mapped fields in Collect Audio Results.
When results look correct, toggle the workflow to Active for production use.

🔒

Unlock Full Step-by-Step Guide

Get the complete implementation guide + downloadable template

Common Gotchas

OpenAI credentials can expire or need specific permissions. If things break, check your n8n Credentials panel and your OpenAI API key status first.
If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
ElevenLabs can reject requests if your API key lacks access to the selected voices. Confirm voice availability in your ElevenLabs dashboard, then retest with one language before running the full batch.

Frequently Asked Questions

How long does it take to set up this multilingual audio automation?

About an hour if you already have your OpenAI and ElevenLabs accounts ready.

Do I need coding skills to automate multilingual audio?

No. You’ll mostly paste API keys and edit a few prompts. The only “technical” part is testing with two languages before you scale it up.

Is n8n free to use for this multilingual audio automation workflow?

Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API usage and your ElevenLabs plan, which depends on how much audio you generate.

Where can I host n8n to run this automation?

Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.

Can I customize this multilingual audio automation workflow for different brand voices?

Yes, and you should. Update the localization prompts in the OpenAI “Localization” agent nodes to include your tone rules, forbidden phrases, and terminology. You can also tweak the voice settings generation so certain languages use a calmer pace or different emphasis. Many teams add a small validation checkpoint (an If condition) to flag outputs that don’t meet minimum length or style requirements.

Why is my ElevenLabs connection failing in this workflow?

Usually it’s an invalid or expired API key, so regenerate it in ElevenLabs and update the credential used by the HTTP Request node. If the key is fine, check whether the voice you’re requesting is available on your plan and in your workspace. Rate limits can show up too when you run big language batches, so test with two languages, then scale.

How many audio files can this multilingual audio automation handle?

A lot, as long as your OpenAI and ElevenLabs limits support it and your n8n instance has enough capacity.

Is this multilingual audio automation better than using Zapier or Make?

Often, yes. This workflow relies on looping through language arrays, parsing structured AI output, and adjusting logic when a language needs different handling. n8n is simply more comfortable with that kind of branching and iteration, and you can self-host for unlimited runs. Zapier or Make can still work if you keep it simple, but the moment you add per-language voice settings and packaging, it gets messy and expensive. If you’re on the fence, Talk to an automation expert and get a recommendation based on your volume.

Set this up once and your next “we need 8 languages by Friday” request stops being a fire drill. The workflow handles the repetitive production work so you can focus on what you’re actually trying to say.

OpenAI + ElevenLabs, multilingual audio on demand

How This Automation Works

n8n Workflow Template: OpenAI + ElevenLabs, multilingual audio on demand

The Problem: Multilingual audio takes too many rounds