OpenAI + Pinecone: answer PDF questions in chat
Your PDFs are full of answers. They’re just trapped. Someone asks a simple question, and you end up digging through folders, skimming a 40-page doc, then pasting a “best guess” into chat.
This is where OpenAI Pinecone RAG automation earns its keep. It hits ops leads and support managers first, but marketing teams maintaining a “living” playbook feel it too. You get grounded answers sourced from your own documents, without turning every question into a mini research project.
Below, you’ll see exactly how this n8n workflow turns uploaded PDFs into a searchable knowledge base, then uses that knowledge to answer questions in chat. Practical outcomes. Clear setup requirements. The parts you’ll probably want to tweak.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: OpenAI + Pinecone: answer PDF questions in chat
flowchart LR
subgraph sg0["On form submission Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/form.svg' width='40' height='40' /></div><br/>On form submission"]
n1@{ icon: "mdi:cube-outline", form: "rounded", label: "Pinecone Vector Store", pos: "b", h: 48 }
n2@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings OpenAI", pos: "b", h: 48 }
n3@{ icon: "mdi:robot", form: "rounded", label: "Default Data Loader", pos: "b", h: 48 }
n4@{ icon: "mdi:robot", form: "rounded", label: "Recursive Character Text Spl..", pos: "b", h: 48 }
n5@{ icon: "mdi:robot", form: "rounded", label: "AI Agent", pos: "b", h: 48 }
n6@{ icon: "mdi:play-circle", form: "rounded", label: "When chat message received", pos: "b", h: 48 }
n7@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model", pos: "b", h: 48 }
n8@{ icon: "mdi:memory", form: "rounded", label: "Simple Memory", pos: "b", h: 48 }
n9@{ icon: "mdi:cube-outline", form: "rounded", label: "VectorDB", pos: "b", h: 48 }
n10@{ icon: "mdi:robot", form: "rounded", label: "Reranker Cohere", pos: "b", h: 48 }
n9 -.-> n5
n8 -.-> n5
n10 --> n9
n2 -.-> n1
n2 -.-> n9
n7 -.-> n5
n0 --> n1
n3 -.-> n1
n6 --> n5
n4 -.-> n3
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0,n6 trigger
class n3,n4,n5,n10 ai
class n7 aiModel
class n8 ai
class n1,n9 ai
class n2 ai
classDef customIcon fill:none,stroke:none
class n0 customIcon
The Problem: PDF Knowledge Gets Lost in Chat
Most teams already have the documentation. It’s just scattered across Google Drive folders, old onboarding packets, and “final_v7” PDFs that nobody wants to open. So when a question comes in (“What’s our refund policy for annual plans?”), you either interrupt the one person who knows, or you answer from memory and hope it’s right. That’s risky. It also creates a weird kind of drift, where the same question gets different answers depending on who’s online and how rushed they are.
It adds up fast. Here’s where it breaks down in real life.
- People spend about 15 minutes per question hunting for the right paragraph, then rewriting it for chat.
- Your “source of truth” might be accurate, but nobody reads it when answers are faster to improvise.
- Small wording differences create policy confusion, which means rework, escalations, and awkward backtracking later.
- New hires ask the same basics repeatedly because the docs are not searchable in the moment they need them.
The Solution: Upload PDFs Once, Get Grounded Chat Answers
This workflow gives you a ready-to-use Retrieval-Augmented Generation (RAG) system inside n8n. First, you upload a PDF (using an n8n form trigger in this template). n8n reads the file, breaks the text into smaller chunks, then turns those chunks into embeddings using OpenAI. Those embeddings are stored in Pinecone, which becomes your searchable knowledge base. Later, when someone asks a question via the chat trigger, an AI Agent searches Pinecone for the most relevant passages, refines them with Cohere’s reranker, and then uses an OpenAI chat model (gpt-4.1 in the template) to write an answer grounded in your documents.
The workflow starts with ingestion (PDF in, vectors stored). Then it switches to retrieval (question in, best context pulled). Finally, it generates a clear reply you can use in chat, with conversation history preserved so follow-ups still make sense.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say your team answers 25 internal questions a week from PDF docs (pricing rules, SOPs, partner terms). Manually, even a “quick” lookup is maybe 10 minutes between searching Drive, opening the right file, and turning it into a chat-friendly reply, which is about 4 hours weekly. With this workflow, the “work” is uploading new PDFs when they change (often a few minutes), then asking in chat and getting an answer back in under a minute. That’s a chunk of time back, every week, and the replies stop depending on who happens to remember what.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- OpenAI for embeddings and final answers
- Pinecone to store and search your vectors
- Cohere API key (get it from your Cohere dashboard)
Skill level: Intermediate. You’ll connect a few accounts, add API keys, and be comfortable testing runs in n8n.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A PDF gets added. In the template, that happens through an n8n Form Trigger where you upload a .pdf. The workflow is built so you can swap this later for a Google Drive Trigger if you want true “drop it in a folder and forget it” ingestion.
The document is prepared for search. n8n loads the PDF content, then splits it into smaller segments using a recursive text splitter. This matters because LLMs answer better with tight, relevant excerpts instead of a whole document dump.
Embeddings are created and indexed. OpenAI converts each chunk into an embedding, and Pinecone stores those vectors in your chosen index. Once that’s done, your PDF is effectively “queryable” even though it started as a static file.
Questions get answered in chat. The Chat Trigger receives a message, the AI Agent searches Pinecone for relevant context, Cohere reranks it, and the OpenAI chat model generates the response. A memory window keeps the conversation coherent when someone asks follow-ups.
You can easily modify the ingestion trigger to watch Google Drive instead of using a form, so your knowledge base updates automatically when new PDFs land. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Form Upload Trigger
Set up the workflow entry point that accepts PDF uploads for indexing.
- Add the Form Upload Trigger node and keep it as the workflow trigger.
- Set Form Title to
Upload RAG PDF. - Set Form Description to
Upload RAG PDF. - In Form Fields, add a file field labeled
File, enable Required, and set Accept File Types to.pdf. - Confirm the execution flow shows Form Upload Trigger → Pinecone Index Writer.
.pdf, your loader may fail to parse binary content correctly.Step 2: Connect Pinecone for Indexing
Configure the vector database insert path for uploaded documents.
- Add Pinecone Index Writer and set Mode to
insert. - Select the Pinecone Index value
n8n. - Credential Required: Connect your
pineconeApicredentials. - Ensure Standard Data Loader connects to Pinecone Index Writer via the ai_document connection.
Step 3: Set Up Document Loading, Splitting, and Embeddings
Prepare the uploaded PDF content for vectorization before indexing.
- Add Standard Data Loader and set Data Type to
binary. - Set Text Splitting Mode in Standard Data Loader to
custom. - Add Recursive Text Segmenter and connect it to Standard Data Loader via the ai_textSplitter link.
- Add OpenAI Embedding Engine and connect it to both Pinecone Index Writer and Vector Search Tool via ai_embedding.
- Credential Required: Connect your
openAiApicredentials. This embedding model is used by Pinecone Index Writer and Vector Search Tool.
Step 4: Configure the Conversational Agent and Chat Trigger
Set up the live chat interface and the AI agent that will answer user queries.
- Add Chat Message Trigger and connect it to Conversational AI Agent.
- In Conversational AI Agent, set the System Message to
Hanya jawab berdasarkan data yang ada di tools "VectorDB". Kalau data disitu gak ada, jawab saja kamu tidak tahu.. - Add OpenAI Dialogue Model as the language model for Conversational AI Agent.
- Set the Model in OpenAI Dialogue Model to
gpt-4.1. - Credential Required: Connect your
openAiApicredentials for OpenAI Dialogue Model. - Add Buffer Memory Window as the memory tool for Conversational AI Agent via ai_memory. This sub-node uses the agent’s configuration.
Step 5: Configure Retrieval and Reranking Tools
Enable the agent to query the vector store and improve relevance with reranking.
- Add Vector Search Tool and set Mode to
retrieve-as-tool. - Set Top K to
20and enable Use Reranker. - Set Tool Description to
Ambil data dari vector database untuk knowledgebase. - Select the Pinecone Index value
n8n. - Credential Required: Connect your
pineconeApicredentials for Vector Search Tool. - Add Cohere Relevance Reranker and connect it to Vector Search Tool via ai_reranker.
- Credential Required: Connect your
cohereApicredentials for Cohere Relevance Reranker. - Connect Vector Search Tool to Conversational AI Agent via ai_tool so the agent can query the knowledge base.
Step 6: Test and Activate Your Workflow
Validate both the ingestion and chat retrieval paths before turning the workflow on.
- Click Execute Workflow and submit the Form Upload Trigger with a test PDF.
- Confirm the run reaches Pinecone Index Writer without errors and the PDF is indexed.
- Open the chat UI tied to Chat Message Trigger and send a query that should match the uploaded content.
- Verify Conversational AI Agent responds using data from Vector Search Tool, and that reranking from Cohere Relevance Reranker improves relevance.
- When satisfied, toggle the workflow Active for production use.
Common Gotchas
- Pinecone credentials can expire or your API key may not have access to the right project. If retrieval suddenly returns nothing, check your Pinecone index name and environment settings first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About an hour if you already have API keys and a Pinecone index ready.
No. You’ll mostly be connecting accounts and pasting API keys into the right nodes.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI, Pinecone, and Cohere usage costs, which can be a few dollars a month at small volumes and more as your document library and chat volume grow.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s a common upgrade. Replace the Form Upload Trigger with a Google Drive Trigger that watches a folder, then pass the PDF file into the Standard Data Loader (Binary Input Loader) before the Recursive Text Segmenter. You can also tune retrieval by changing the “Top K” setting in the Vector Search Tool to pull more or fewer chunks per question.
Usually it’s a wrong environment, index name mismatch, or an API key tied to a different Pinecone project. Regenerate the key in Pinecone, then update it in both the Pinecone Index Writer and the Vector Search Tool nodes. If ingestion works but chat retrieval fails, check that both stages are pointing at the same index and namespace. Also, confirm your index has data by running a quick query from Pinecone’s console.
A lot, as long as your Pinecone index and budget scale with it. On n8n Cloud, you’re mainly limited by your execution quota; on self-hosting, it’s more about your server resources and how fast you want ingestion to run. Practically, teams start with a handful of core PDFs, then expand once retrieval quality looks good.
Often, yes, because this is not a simple two-step integration. You’re doing ingestion, chunking, vector indexing, retrieval, reranking, and multi-turn memory, which is where n8n (and its LangChain nodes) feels more natural. Zapier or Make can work for basic “send a prompt, get a response” flows, but RAG gets expensive and brittle there pretty quickly. If you’re unsure, Talk to an automation expert and describe your doc volume and where you want the answers delivered.
Once this is running, your PDFs stop being a graveyard of “useful someday” documents. They become something your team can actually use, right when the question shows up.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.