AWS S3 + Slack: instant answers from your PDFs
Your team keeps asking the same questions, and the answers are hiding in PDFs. Someone downloads a file from S3, searches for the right page, screenshots a paragraph, then tries to explain it in Slack. It’s slow, and it’s never consistent.
Marketing managers feel it when proposals and case studies live in “final_v7.pdf”. Ops leads get pulled into repeat questions. And client-facing agency owners lose momentum when the Slack thread turns into a scavenger hunt. This S3 Slack automation turns your PDFs into a searchable knowledge base, so answers show up with sources.
You’ll set up a two-part workflow: first it ingests PDFs from AWS S3 into Qdrant (with OpenAI embeddings), then it adds a chat entry point so Slack questions get grounded answers pulled from those documents.
How This Automation Works
Here’s the complete workflow you’ll be setting up:
n8n Workflow Template: AWS S3 + Slack: instant answers from your PDFs
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n0@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n1@{ icon: "mdi:swap-vertical", form: "rounded", label: "Loop Over Items", pos: "b", h: 48 }
n2@{ icon: "mdi:cog", form: "rounded", label: "Extract from File", pos: "b", h: 48 }
n3@{ icon: "mdi:cube-outline", form: "rounded", label: "Qdrant Vector Store", pos: "b", h: 48 }
n4@{ icon: "mdi:cog", form: "rounded", label: "Download Files from AWS", pos: "b", h: 48 }
n5@{ icon: "mdi:cog", form: "rounded", label: "Get Files from S3", pos: "b", h: 48 }
n6@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings OpenAI", pos: "b", h: 48 }
n7@{ icon: "mdi:robot", form: "rounded", label: "Default Data Loader", pos: "b", h: 48 }
n8@{ icon: "mdi:robot", form: "rounded", label: "Recursive Character Text Spl..", pos: "b", h: 48 }
n1 --> n4
n6 -.-> n3
n2 --> n3
n5 --> n1
n7 -.-> n3
n3 --> n1
n4 --> n2
n8 -.-> n7
n0 --> n5
end
subgraph sg1["When chat message received Flow"]
direction LR
n9@{ icon: "mdi:play-circle", form: "rounded", label: "When chat message received", pos: "b", h: 48 }
n10@{ icon: "mdi:robot", form: "rounded", label: "AI Agent", pos: "b", h: 48 }
n11@{ icon: "mdi:brain", form: "rounded", label: "OpenAI Chat Model", pos: "b", h: 48 }
n12@{ icon: "mdi:cube-outline", form: "rounded", label: "Qdrant Vector Store1", pos: "b", h: 48 }
n13@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings OpenAI1", pos: "b", h: 48 }
n11 -.-> n10
n13 -.-> n12
n12 -.-> n10
n9 --> n10
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n0,n9 trigger
class n7,n8,n10 ai
class n11 aiModel
class n3,n12 ai
class n6,n13 ai
Why This Matters: Slack Questions Keep Turning Into PDF Archaeology
PDFs are great for sharing, terrible for speed. A simple question like “Do we support SOC 2?” can turn into 20 minutes of searching across proposals, security docs, and old statements of work. Then you answer from memory, which is risky, or you paste a chunk with no context, which creates more questions. The worst part is the repeat cycle. The same three people become human search engines, and their real work gets interrupted all day.
It adds up fast. Here’s where the friction usually shows up.
- Teams waste about 1–2 hours a day collectively just locating “the right PDF” and the right section inside it.
- Answers drift over time because people paraphrase instead of quoting, and it slowly changes what the company “thinks” is true.
- New hires ask more questions (normal), but they can’t self-serve because the knowledge isn’t searchable in the tools they live in.
- Even when someone finds the answer, there’s no consistent way to show sources, so trust stays low and rework stays high.
What You’ll Build: A PDF Knowledge Hub That Answers in Slack
This workflow gives you an end-to-end “ingest and answer” system using n8n. It starts by pulling PDFs from your AWS S3 bucket, downloading each file, extracting the text, then splitting that text into smaller chunks that are easier to search. Each chunk gets embedded with OpenAI (turned into a numeric representation) and saved into a Qdrant collection alongside metadata, so retrieval can point back to the right document. Then you get a chat entry point: a message trigger feeds an AI Agent using an OpenAI chat model, and that Agent can call Qdrant search as a tool. The result is a Slack-friendly answer that’s grounded in the PDFs you already maintain.
The workflow begins with ingestion (S3 → text → chunks → embeddings → Qdrant). After that, your chat trigger routes questions into an Agent that searches Qdrant and responds with what it found. Same indexed library, two different “modes”: build the knowledge base, then use it.
What You’re Building
| What Gets Automated | What You’ll Achieve |
|---|---|
|
|
Expected Results
Say your team handles about 15 “quick questions” a day in Slack, and each one takes roughly 10 minutes to find and quote from an S3 PDF. That’s about 2.5 hours of context switching daily. With this workflow, the question comes in through chat, retrieval runs against Qdrant, and you usually get an answer in under a minute (plus a short wait for the model). Even if only half the questions get resolved instantly, you’ve still bought back about an hour a day.
Before You Start
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- AWS S3 for storing and listing your PDFs.
- Qdrant to store embeddings and run semantic search.
- OpenAI API key (get it from the OpenAI API dashboard)
Skill level: Intermediate. You won’t code, but you will set credentials, bucket/collection names, and test with real PDFs.
Want someone to build this for you? Talk to an automation expert (free 15-minute consultation).
Step by Step
Kick off ingestion from S3. A manual trigger starts the indexing run. n8n lists objects in your S3 bucket, then loops through the file keys in batches so you don’t overload downstream steps.
Download and extract PDF text. For each S3 object, the workflow fetches the file and parses the content to pull out text. This is where your “static PDFs” become usable knowledge.
Chunk, embed, and index into Qdrant. The extracted text gets split into smaller sections (chunks), then OpenAI embeddings are generated for each chunk. Those chunks, embeddings, and helpful metadata are inserted into a Qdrant collection for retrieval later.
Answer questions through chat. A chat message trigger feeds an AI Agent powered by an OpenAI chat model. When someone asks a question, the Agent uses Qdrant search as a tool, pulls relevant passages, and responds with an answer based on your PDFs.
You can easily modify the S3 bucket, Qdrant collection, and chunking rules to match your document library and how your team asks questions. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual Trigger
Start the workflow manually to ingest documents from S3 into Qdrant and enable the chat assistant.
- Add or select the Manual Start Trigger node as the entry point.
- Confirm the connection flow: Manual Start Trigger → Retrieve S3 Objects.
- Optionally keep Flowpast Branding as a reference note on the canvas.
Step 2: Connect Amazon S3
List all files in your S3 bucket and loop through them for processing.
- Open Retrieve S3 Objects and set Operation to
getAll. - Set Bucket Name to
YOUR_S3_BUCKET. - Credential Required: Connect your AWS credentials.
- Open Iterate Batch Items and set Batch Size to
={{ $json.Key.length }}. - Open Fetch S3 Files and set File Key to
={{ $json.Key }}. - Set Bucket Name to
YOUR_S3_BUCKETin Fetch S3 Files. - Credential Required: Connect your AWS credentials for Fetch S3 Files.
⚠️ Common Pitfall: If your S3 object keys are empty or missing, Iterate Batch Items will not process any files. Validate that Retrieve S3 Objects returns a Key field.
Step 3: Set Up File Parsing and Vector Indexing
Parse PDFs from S3, split content, generate embeddings, and insert vectors into Qdrant.
- Open Parse File Content and set Operation to
pdf. - Set Binary Property Name to
=datain Parse File Content. - Open Recursive Text Split and keep default options for chunking.
- Open Standard Data Loader and keep default options.
- Open Qdrant Index Insert and set Mode to
insert. - Set Qdrant Collection to
YOUR_QDRANT_COLLECTIONin Qdrant Index Insert. - Credential Required: Connect your Qdrant credentials for Qdrant Index Insert.
- Open OpenAI Embedding Gen and connect it as the embedding model for Qdrant Index Insert.
- Credential Required: Connect your OpenAI credentials for OpenAI Embedding Gen.
⚠️ Common Pitfall: Make sure your S3 files are PDFs. The Parse File Content node is set to pdf and will fail on other formats.
Step 4: Set Up the AI Retrieval Assistant
Configure the chat assistant to answer questions using the Qdrant vector store.
- Open Chat Message Trigger to allow chat-driven queries.
- Connect Chat Message Trigger to AI Assistant Agent.
- Open OpenAI Chat Engine and set Model to
gpt-4o-mini. - Credential Required: Connect your OpenAI credentials for OpenAI Chat Engine.
- Open Qdrant Search Tool and set Mode to
retrieve-as-tool. - Set Tool Name to
proposal_knowledge_baseand Tool Description toCall this tool to search the vector store knowledge base for proposal-related data. If context is empty, say you don't know the answer.. - Set Qdrant Collection to
YOUR_QDRANT_COLLECTIONin Qdrant Search Tool. - Credential Required: Connect your Qdrant credentials for Qdrant Search Tool.
- Open OpenAI Embedding Gen 2 and connect it as the embedding model for Qdrant Search Tool.
- Credential Required: Connect your OpenAI credentials for OpenAI Embedding Gen 2.
⚠️ Common Pitfall: Ensure the same Qdrant collection is used in both Qdrant Index Insert and Qdrant Search Tool to avoid empty search results.
Step 5: Test and Activate Your Workflow
Run a manual test to verify ingestion and retrieval, then enable the workflow for ongoing use.
- Click Execute Workflow on Manual Start Trigger to ingest files from S3.
- Confirm that Qdrant Index Insert receives parsed text and inserts vectors.
- Send a message via Chat Message Trigger and verify AI Assistant Agent responds using Qdrant Search Tool.
- When successful, toggle the workflow to Active for production use.
Troubleshooting Tips
- AWS S3 credentials can expire or lack permissions. If indexing fails, check IAM access to ListBucket and GetObject for your bucket first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Quick Answers
About an hour if your AWS, Qdrant, and OpenAI accounts are ready.
No. You’ll mostly be adding credentials, picking bucket/collection names, and testing with a couple of PDFs.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API costs (often a few cents per batch of chunks, depending on document size and usage).
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and you probably should. You can swap the AWS S3 listing/downloading nodes for Google Drive if that’s where your PDFs live, and keep the same “split → embed → Qdrant” core. Common tweaks include changing the text splitting size (so answers quote cleaner), storing extra metadata like document type or client name, and pointing the Qdrant nodes at separate collections for “sales” vs “internal” docs.
Usually it’s missing IAM permissions for ListBucket or GetObject on the bucket you configured. Double-check the bucket name in both S3 nodes, then verify the access key/secret pair is still active. If it works for small tests but fails on bigger runs, you may also be hitting rate limits or trying to parse a file that isn’t actually a PDF. Start by running the ingestion with just one known-good document.
A lot, as long as you batch it. On n8n Cloud, your practical limit is tied to monthly executions and how you split batches; self-hosting has no execution cap, but your server resources will matter. Most teams index a few hundred PDFs overnight without drama, then run chat queries all day. If you plan to ingest thousands of large PDFs regularly, tune batch size and keep an eye on OpenAI embedding costs.
Often, yes. This is a retrieval-augmented generation setup, which means you need multi-step ingestion, looping/batching, chunking, embeddings, and a tool-using Agent. n8n handles that kind of “real workflow” logic cleanly, and you can self-host to avoid execution limits. Zapier and Make can work for simpler pipelines, but RAG flows tend to get expensive and awkward once you add branching and retries. If you’re unsure, Talk to an automation expert and describe your document volume and where questions come from.
Once this is running, your PDFs stop being dead weight in S3 and start acting like a real knowledge base in Slack. Set it up, index your docs, and let the workflow handle the repeat questions.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.