YouTube to OpenAI, clean video summaries on demand
You open a YouTube link for “research” and suddenly it’s 47 minutes later. Now you’re stuck rewinding to find that one point you needed, and your notes are a mess. This is exactly where YouTube summary automation earns its keep.
Content marketers feel it when they’re pulling quotes for posts. Founders doing competitor research feel it too. And if you run client strategy calls as a consultant, you’ve probably wasted afternoons rewatching videos just to extract a few bullet points.
This workflow turns any YouTube URL into a clean, consistent summary using Apify + OpenAI. You’ll see how it works, what you need, and where teams usually trip up when setting it up.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: YouTube to OpenAI, clean video summaries on demand
flowchart LR
subgraph sg0["On form submission Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/form.svg' width='40' height='40' /></div><br/>On form submission"]
n1@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Apify", pos: "b", h: 48 }
n2@{ icon: "mdi:robot", form: "rounded", label: "Summarization Chain", pos: "b", h: 48 }
n3@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model", pos: "b", h: 48 }
n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Payload", pos: "b", h: 48 }
n5@{ icon: "mdi:swap-vertical", form: "rounded", label: "Caption", pos: "b", h: 48 }
n1 --> n5
n5 --> n2
n4 --> n1
n3 -.-> n2
n0 --> n4
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0 trigger
class n2 ai
class n3 aiModel
class n1 decision
classDef customIcon fill:none,stroke:none
class n0 customIcon
The Problem: YouTube “research” creates messy, unusable notes
Most people don’t really “take notes” from YouTube. They pause, type half a sentence, forget the timestamp, then tell themselves they’ll come back later. Later never happens. The real cost isn’t just the time spent watching. It’s the second round of time spent rewatching because your notes aren’t clean enough to use in a doc, a Slack update, or a client deliverable. And when the same video gets shared internally, the whole cycle repeats. Same content. New wasted hours.
It adds up fast. Here’s where it breaks down in real life.
- One person watches a 30–60 minute video just to extract five takeaways.
- Your notes come out inconsistent, so you still end up editing and rewriting.
- Important details get missed because you’re skimming captions or jumping around.
- Sharing knowledge is painful, since nobody wants to read a wall of unstructured text.
The Solution: Submit a YouTube URL, get a clean summary back
This n8n workflow does one thing extremely well: it converts a YouTube link into a readable summary without you hunting for transcripts or copying chunks into ChatGPT by hand. It starts when you submit a form with the YouTube URL. n8n formats that link into the exact JSON payload Apify expects, then runs Apify’s YouTube Transcript actor to fetch captions (and related metadata). Once captions are pulled out, the workflow feeds them into a summarization chain backed by an OpenAI chat model (gpt-4o-mini). The result is a concise, consistent summary you can paste into a doc, an email, a task, or a content brief.
The workflow begins with a simple form submission. Apify retrieves the transcript so you don’t have to. Then OpenAI turns that transcript into structured takeaways that actually read like a human wrote them.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you review five YouTube videos a week for content ideas. Manually, you might spend about 45 minutes watching each, plus another 10 minutes capturing and cleaning notes, which is roughly 5 hours a week. With this workflow, you drop each URL into the form (maybe 1 minute per video), wait for Apify and OpenAI to process, then copy the summary where you need it. That’s closer to 10 minutes of hands-on time total, not an entire afternoon.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Apify for pulling the YouTube transcript reliably.
- OpenAI to generate the summary from captions.
- OpenAI API key (get it from the OpenAI API dashboard).
Skill level: Intermediate. You’ll connect credentials, confirm the Apify actor ID, and tweak a prompt if you want a specific summary style.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A form submission kicks it off. You paste a YouTube URL into the form, submit, and n8n captures that input as the source for the rest of the run.
The workflow prepares an Apify-ready payload. A “set/edit fields” step formats the URL plus any options into the JSON structure Apify’s YouTube Transcript actor expects, so you don’t touch raw request bodies.
Apify retrieves captions, then n8n extracts only what matters. The Apify node runs the actor (ID 1s7eXiaukVuOr4Ueg), and the next step isolates the captions field for cleaner downstream summarization.
OpenAI produces a consistent summary. The summarization chain feeds the transcript to the OpenAI chat model (gpt-4o-mini), and you end up with a concise output you can paste into your workflow, docs, or a knowledge base.
You can easily modify the summary format to match your voice (short bullets, longer narrative, action items) based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Form Trigger
Set up the form that collects the YouTube URL and starts the workflow.
- Add and open Form Entry Trigger.
- Set Form Title to
Youtube Summary. - Set Form Description to
Give youtube url and get summary. - In Form Fields, add a required field labeled
Youtube URLwith placeholderEnter youtube url.
Youtube URL because it is referenced later in an expression.
Step 2: Connect Apify and Build the Request
Prepare the payload for Apify and connect the actor that retrieves captions.
- Open Build Request Payload and set Mode to
raw. - Set JSON Output to
={"youtube":{ "urls": [ "{{ $json['Youtube URL'] }}" ], "maxRetries": 8, "proxyOptions": { "useApifyProxy": true, "apifyProxyGroups": [ "[YOUR_ID]" ] }, "outputFormat": "captions", "channelNameBoolean": true, "channelIDBoolean": true, "dateTextBoolean": false, "relativeDateTextBoolean": false, "datePublishedBoolean": true, "viewCountBoolean": false, "likesBoolean": false, "commentsBoolean": false, "keywordsBoolean": false, "thumbnailBoolean": false, "descriptionBoolean": false } }. - Open Run Apify Actor and set Operation to
Run actor. - Set Custom Body to
={{ $json.youtube.toJsonString() }}. - Credential Required: Connect your apifyApi credentials in Run Apify Actor.
[YOUR_ID] in the proxy group list with your Apify proxy group ID.
Step 3: Extract Captions and Summarize with AI
Parse captions from Apify and generate the summary using the LLM.
- Open Extract Captions and set Mode to
raw. - Set JSON Output to
={ "captions": {{ $json.captions }} }. - Open Summarize Transcript and keep its default options unless you need a custom summarization style.
- Open OpenAI Chat Engine and set Model to
gpt-4o-mini. - Credential Required: Connect your openAiApi credentials in OpenAI Chat Engine.
Step 4: Verify Node Connections and Execution Flow
Confirm the workflow paths so each node receives the correct data in sequence.
- Ensure Form Entry Trigger outputs to Build Request Payload.
- Ensure Build Request Payload outputs to Run Apify Actor.
- Ensure Run Apify Actor outputs to Extract Captions.
- Ensure Extract Captions outputs to Summarize Transcript.
- Verify OpenAI Chat Engine is connected to Summarize Transcript via the AI language model connector.
Step 5: Test and Activate Your Workflow
Run a manual test to validate the end-to-end summarization, then activate for production use.
- Click Execute Workflow and submit a sample YouTube URL in Form Entry Trigger.
- Confirm Run Apify Actor returns caption data and Extract Captions outputs a
captionsfield. - Verify Summarize Transcript returns a concise summary from the captions.
- When the test is successful, toggle the workflow to Active for live use.
Common Gotchas
- Apify credentials can expire or lack actor permissions. If things break, check the Apify token and actor access in your Apify account first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 30 minutes if you already have Apify and OpenAI accounts.
No. You’ll mostly connect credentials and paste in an API key. The only “technical” part is verifying the Apify actor ID and adjusting the summary prompt if you want a specific style.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API costs (for gpt-4o-mini, summaries are usually just a few cents each) and Apify usage.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you should. You can adjust the summarization chain settings to change length (tight bullets vs. fuller recap), and you can rewrite the instructions going into the OpenAI Chat Model so it outputs “key takeaways + action items” or “tweet-sized points.” If you want extra metadata, modify the “Build Request Payload” step to ask Apify’s actor for more fields, then include those in the prompt.
Usually it’s an API token issue or missing permissions for the YouTube Transcript actor. Regenerate your Apify token, update it in n8n, and confirm the actor ID is still set to 1s7eXiaukVuOr4Ueg. If it only fails on some videos, the transcript may be unavailable (disabled captions, region limits), so the Apify run returns empty captions and the summarizer has nothing to work with.
A lot.
Often, yes, because this is more than a simple “URL in, text out” zap. n8n handles multi-step logic cleanly, works well with community nodes (especially if you self-host), and doesn’t punish you for branching or running slightly heavier workflows. Zapier and Make can still work if you’re doing something lightweight, but transcript retrieval plus summarization tends to get fiddly fast. If you want the quickest path to “set it and forget it,” n8n is usually the calmer option. Talk to an automation expert if you want help choosing.
Once this is running, YouTube stops being a time sink and starts acting like a searchable knowledge source. Set it up, use it weekly, and keep your brain for the work that actually needs it.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.