FileFlows + OpenAI Whisper, transcripts emailed via Gmail
You finally get the audio file. Then you remember the file size limit, the upload failures, and the awkward “part 1 / part 2 / part 3” transcript stitching that always breaks at the worst moment.
This Whisper transcript email automation hits podcasters hardest, but marketers repurposing interviews and ops folks capturing internal calls feel it too. The outcome is simple: long recordings turn into one clean transcript in your inbox, without babysitting the process.
Below you’ll see how the workflow handles splitting, transcription, and delivery, plus what you need to run it reliably at “real-world” audio lengths.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: FileFlows + OpenAI Whisper, transcripts emailed via Gmail
flowchart LR
subgraph sg0["GET Form Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/form.svg' width='40' height='40' /></div><br/>GET Form"]
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Make 4MiB Chunks"]
n2["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Upload Chunk"]
n3@{ icon: "mdi:cog", form: "rounded", label: "Result", pos: "b", h: 48 }
n4@{ icon: "mdi:swap-horizontal", form: "rounded", label: "If succeed", pos: "b", h: 48 }
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Split audio file"]
n6@{ icon: "mdi:cog", form: "rounded", label: "Wait", pos: "b", h: 48 }
n7["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Split Audio"]
n8@{ icon: "mdi:swap-vertical", form: "rounded", label: "Loop Over Segments", pos: "b", h: 48 }
n9@{ icon: "mdi:robot", form: "rounded", label: "OpenAI", pos: "b", h: 48 }
n10@{ icon: "mdi:cog", form: "rounded", label: "Result transcription", pos: "b", h: 48 }
n11@{ icon: "mdi:swap-vertical", form: "rounded", label: "Loop Over Chunks", pos: "b", h: 48 }
n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Merge transcription"]
n13@{ icon: "mdi:cog", form: "rounded", label: "Convert to File", pos: "b", h: 48 }
n14@{ icon: "mdi:swap-vertical", form: "rounded", label: "Configuration", pos: "b", h: 48 }
n15@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Filter temporary files", pos: "b", h: 48 }
n16@{ icon: "mdi:cog", form: "rounded", label: "Rate Limit Delay", pos: "b", h: 48 }
n17@{ icon: "mdi:message-outline", form: "rounded", label: "Send Email with Transcription", pos: "b", h: 48 }
n18@{ icon: "mdi:message-outline", form: "rounded", label: "Send Error", pos: "b", h: 48 }
n19@{ icon: "mdi:message-outline", form: "rounded", label: "Send Error1", pos: "b", h: 48 }
n20@{ icon: "mdi:cog", form: "rounded", label: "Chunk", pos: "b", h: 48 }
n21@{ icon: "mdi:cog", form: "rounded", label: "Segment", pos: "b", h: 48 }
n6 --> n7
n20 --> n2
n9 --> n16
n9 --> n18
n3 --> n15
n21 --> n9
n0 --> n14
n4 --> n5
n4 --> n19
n7 --> n8
n2 --> n11
n14 --> n1
n13 --> n17
n11 --> n3
n11 --> n20
n1 --> n11
n16 --> n8
n5 --> n6
n8 --> n10
n8 --> n21
n12 --> n13
n10 --> n12
n15 --> n4
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n9 ai
class n4,n15 decision
class n2,n5 api
class n1,n7,n12 code
classDef customIcon fill:none,stroke:none
class n0,n1,n2,n5,n7,n12 customIcon
The Problem: Long audio breaks “simple” transcription
Whisper is great, until your recording is longer than a quick clip. A one-hour MP3 can easily exceed the 25 MB upload limit, so you end up hunting for tools to split the file, guessing chunk lengths, and hoping nothing drifts out of order. If you’re doing this for clients or a team, it gets worse: people email you huge attachments, you re-upload the wrong version, and suddenly a “fast transcript” task turns into an afternoon of retries and cleanup. It’s not hard work. It’s fragile work.
The friction compounds. Here’s where it breaks down.
- A single recording can require multiple manual splits just to fit the API limit.
- Upload failures force you to restart, and you often don’t notice until much later.
- Transcripts arrive as separate chunks, so you spend time stitching and reformatting.
- Delivery becomes another job: copying text, saving files, and emailing the right person.
The Solution: Split, transcribe, merge, and email automatically
This workflow turns “long audio transcription” into a simple intake and an automatic delivery. It starts when someone uploads an MP3 via a web form and includes the email address where the transcript should go. n8n then prepares the file for safe handling by splitting it into small 4 MiB upload parts and sending them to FileFlows. FileFlows, using FFmpeg, segments the audio into 15-minute chunks so every piece stays comfortably under Whisper’s size limit. Each segment is transcribed through the OpenAI Whisper API (French by default, but you can change the language), and then n8n merges the text back into one coherent transcript. Finally, Gmail sends the finished transcript automatically, or sends a clear error email if something fails.
The workflow starts with a form submission in n8n. FileFlows does the heavy lifting for audio splitting, so Whisper only sees safe-sized segments. When transcription is complete, n8n assembles a single text file and emails it out, which means the “last mile” is handled too.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you transcribe two 60-minute interviews each week for a podcast. Manually, you might spend about 30 minutes splitting files, uploading parts, waiting, and stitching text back together, so that’s roughly 1 hour of admin work weekly before you even edit the content. With this workflow, the “human time” is closer to 5 minutes per file (upload + email), and processing runs in the background. You get the transcript in about 10–15 minutes per hour of audio, delivered automatically to the right inbox.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- FileFlows for audio splitting and orchestration.
- OpenAI Whisper API to transcribe each audio segment.
- Gmail account to email transcripts and error notices.
- OpenAI API key (get it from your OpenAI dashboard).
Skill level: Intermediate. You’ll connect credentials, set a few URLs/paths, and confirm FileFlows can reach your storage and n8n.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A user submits a form with an MP3 and an email address. That form trigger kicks off the run and stores basic settings (like the target language and the callback details for FileFlows).
The MP3 is prepared for upload and sent to FileFlows in safe-sized parts. n8n splits the binary file into 4 MiB chunks, loops over them in batches, and uploads each part via HTTP so huge files don’t fail halfway through.
FileFlows splits the audio, and n8n transcribes each segment with Whisper. After FileFlows creates 15-minute segments, n8n waits for the callback, extracts the segment list, then loops through it. Whisper transcribes each piece, and a throttle wait helps avoid rate spikes when you’re processing a lot.
Everything is merged, turned into a text file, and emailed via Gmail. When all segments are done, n8n combines the transcript text in order, converts it into a downloadable file, and sends the final email. If splitting or transcription fails, the workflow sends an error notice instead of leaving you guessing.
You can easily modify the language and segment duration to match your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Form Trigger
Set up the public form that collects the audio file and recipient email address to start the workflow.
- Add the Incoming Form Capture node as the trigger.
- Set Form Title to
Audio Transcription. - Set Form Description to
Select an audio file to transcribe and an email address to receive the result. - In Form Fields, add a file field labeled
file(required) that accepts.mp3, and an email field labeledemail(required). - Confirm the response message in Respond With Options is set to
Your file has been received; an email will be sent to you upon completion of transcription or in case of error.
file. If you rename the field, update the code accordingly.Step 2: Configure Workflow Constants and File Chunking
Define the chunk size and FileFlows API settings, then split the incoming audio into 4MiB chunks for upload.
- Open Set Workflow Config and set chunk_size to
{{ 4 * 1024 * 1024 }}. - Set fileflows_url to your FileFlows base URL, e.g.,
http://0.0.0.0:5000. - Set flowUid to your FileFlows flow ID, replacing
[YOUR_ID]. - In Create 4MiB Segments, keep the provided JavaScript to read the binary upload and generate chunk metadata and binary parts.
- Ensure Iterate Chunk Batches is connected after Create 4MiB Segments to control chunk uploads.
Step 3: Connect FileFlows Upload and Split Requests
Upload each chunk to FileFlows, validate the response, and trigger the server-side split operation.
- In Upload Segment Part, set URL to
={{ $('Set Workflow Config').item.json.fileflows_url }}/api/library-file/uploadand Method toPOST. - Set Content Type to
multipart-form-dataand map body parameters: fileName to{{$json["fileName"]}}, chunkNumber to{{$json["chunkNumber"]}}, totalChunks to{{$json["totalChunks"]}}, and file to binary fieldchunk. - Use Exclude Temp Records to filter out temporary items with condition
{{ $json.data }}notEndsWith.temp. - In Success Check, confirm the condition
{{ $json.data }}exists to route valid uploads to Initiate Audio Split. - Configure Initiate Audio Split with URL
={{ $('Set Workflow Config').item.json.fileflows_url }}/api/library-file/manually-addand JSON body{ "FlowUid": "{{ $('Set Workflow Config').first().json.flowUid }}", "Files": [ "{{ $json.data }}" ], "CustomVariables": { "callbackUrl": "{{$execution.resumeUrl}}" } }. - Keep Pause for Callback set to Resume
webhook, HTTP MethodPOST, and Resume Amount30minutes to wait for FileFlows to send back split audio.
Step 4: Set Up Audio Extraction and Transcription Loop
Transform the callback binaries into items, loop through each audio part, and send them to OpenAI for transcription.
- In Extract Audio Parts, keep the provided JavaScript to convert all returned binaries into individual items with a binary field named
Audio. - Ensure Process Segment Loop follows Extract Audio Parts to iterate each segment.
- Set up OpenAI Transcribe with Resource
audio, Operationtranscribe, and Binary Property NameAudio. Keep Language set tofrunder options if you want French transcription. - Credential Required: Connect your openAiApi credentials in OpenAI Transcribe.
- Use Throttle Pause after OpenAI Transcribe to control request pacing before looping back to Process Segment Loop.
Step 5: Combine Transcripts and Deliver Results
Aggregate all transcription pieces into a single text file and email it to the requester.
- In Combine Transcripts, keep the JavaScript that concatenates
item.json.textinto a singletranscriptionfield. - Configure Build Text File with Operation
toTextand Source Propertytranscription. - Set Build Text File options to File Name
transcription.txtand Encodingutf8. - In Email Transcript Delivery, set Send To to
={{ $('Incoming Form Capture').first().json.email }}, Subject toYour transcription is ready, and include the provided message. - Credential Required: Connect your gmailOAuth2 credentials in Email Transcript Delivery.
Step 6: Add Error Handling for Split and Transcription Failures
Ensure users receive feedback when FileFlows or transcription fails.
- From Success Check, verify the false branch routes to Split Error Email.
- In Split Error Email, keep Send To set to
={{ $('Incoming Form Capture').first().json.email }}and confirm the error message content. - From OpenAI Transcribe, ensure the error output connects to Email Error Notice to notify transcription issues.
- Credential Required: Connect your gmailOAuth2 credentials in both Email Error Notice and Split Error Email.
Step 7: Test and Activate Your Workflow
Validate the full pipeline with a test submission before enabling it in production.
- Click Execute Workflow and submit the Incoming Form Capture form with a small
.mp3file and a valid email. - Verify that Upload Segment Part, Initiate Audio Split, and Pause for Callback execute without errors.
- Confirm OpenAI Transcribe produces
textoutputs and Combine Transcripts creates thetranscriptionfield. - Check that Email Transcript Delivery sends an email with
transcription.txtattached to the provided address. - When testing is successful, toggle the workflow to Active to accept live form submissions.
Common Gotchas
- Gmail credentials can expire or need specific permissions. If things break, check your connected Google account status in n8n credentials first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- OpenAI prompts and defaults matter more than people expect. The Whisper node is configured for French by default here, so confirm language settings early or you’ll be correcting mistakes after the fact.
Frequently Asked Questions
About 45 minutes if FileFlows and Gmail are already working.
No. You’ll mainly paste credentials and adjust a few settings in the form, FileFlows endpoint, and the Whisper node.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI Whisper API costs at $0.006 per minute (so a 1-hour recording is about $0.36).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, but change two places. Update the language setting in the OpenAI Transcribe node, then adjust the FileFlows split settings so your segments stay under the 25 MB limit. Common tweaks include switching French to English, shortening segments for noisy audio, and changing the email template to include speaker labels or a summary link.
Usually it’s network reachability or a wrong endpoint URL between n8n and FileFlows, honestly. Confirm n8n can reach the FileFlows host from its network, then re-check the HTTP Request node settings and any required headers. If FileFlows is running in Docker, port mapping and internal DNS names are common culprits. Also make sure the storage path FileFlows uses is writable, or the split job can “succeed” but produce nothing.
On n8n Cloud Starter, you’re limited by monthly executions, while self-hosting is mainly limited by your server and how fast FileFlows can process jobs. A practical approach is to start with a few files per day, then increase concurrency once you’re confident in your queueing and wait times. If you expect bursts (like 20 uploads after an event), consider adding longer waits and slightly more throttling so Whisper and FileFlows don’t get overwhelmed.
For long-audio transcription, yes. Zapier and Make struggle once you need chunked uploads, callbacks, looping over segments, and reliable error emails in one flow. n8n handles branching and loops cleanly, and you can self-host to avoid per-task pricing surprises when volume grows. The tradeoff is setup: you’ll spend a bit more time connecting FileFlows and testing the wait/callback logic. If you want someone to sanity-check it, Talk to an automation expert.
Once this is running, long-audio transcription stops being a recurring chore. You upload, you wait a bit, you get the transcript, and you move on.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.