Google Drive + Postgres: searchable docs, always ready
Your docs are “somewhere in Drive,” but finding the right paragraph at the right moment turns into a mini scavenger hunt. Someone downloads a file, searches manually, copies a quote into Slack, then does it again next week because nothing is indexed.
This is the kind of mess marketing leads feel when they need approved messaging fast. Ops managers run into it when SOPs are buried. And agency owners hate it because every client question becomes a time sink. With Drive Postgres search automation, new files become “askable” without you lifting a finger.
This workflow watches a Google Drive folder, turns new documents into OpenAI embeddings, stores them in Postgres (PGVector), then moves the file so you don’t process it twice. You’ll see what it fixes, what it produces, and what you need to run it reliably.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Google Drive + Postgres: searchable docs, always ready
flowchart LR
subgraph sg0["When clicking ‘Test workflow’ Flow"]
direction LR
n0@{ icon: "mdi:robot", form: "rounded", label: "Default Data Loader", pos: "b", h: 48 }
n1@{ icon: "mdi:robot", form: "rounded", label: "Recursive Character Text Spl..", pos: "b", h: 48 }
n2@{ icon: "mdi:cube-outline", form: "rounded", label: "Postgres PGVector Store", pos: "b", h: 48 }
n3@{ icon: "mdi:play-circle", form: "rounded", label: "When clicking ‘Test workflow’", pos: "b", h: 48 }
n4@{ icon: "mdi:swap-vertical", form: "rounded", label: "Loop Over Items", pos: "b", h: 48 }
n5@{ icon: "mdi:cog", form: "rounded", label: "Move File", pos: "b", h: 48 }
n6@{ icon: "mdi:cog", form: "rounded", label: "Download File", pos: "b", h: 48 }
n7@{ icon: "mdi:cog", form: "rounded", label: "Search Folder", pos: "b", h: 48 }
n8@{ icon: "mdi:play-circle", form: "rounded", label: "Schedule Trigger", pos: "b", h: 48 }
n9@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Switch", pos: "b", h: 48 }
n10@{ icon: "mdi:cog", form: "rounded", label: "Extract from PDF", pos: "b", h: 48 }
n11@{ icon: "mdi:cog", form: "rounded", label: "Extract from Text", pos: "b", h: 48 }
n12@{ icon: "mdi:cog", form: "rounded", label: "Extract from JSON", pos: "b", h: 48 }
n13@{ icon: "mdi:vector-polygon", form: "rounded", label: "Embeddings OpenAI", pos: "b", h: 48 }
n9 --> n10
n9 --> n11
n9 --> n12
n5 --> n4
n6 --> n9
n7 --> n4
n4 --> n6
n10 --> n2
n8 --> n7
n13 -.-> n2
n12 --> n2
n11 --> n2
n0 -.-> n2
n2 --> n5
n1 -.-> n0
n3 --> n7
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n3,n8 trigger
class n0,n1 ai
class n2 ai
class n13 ai
class n9 decision
The Problem: Drive Documents Aren’t Actually Searchable
Google Drive is great at storing files. It’s not great at answering questions. Sure, you can search titles and maybe a few keywords, but real work happens inside long PDFs, messy JSON exports, and “final_v7” text notes that never got cleaned up. So people fall back to the worst system: asking whoever “might remember.” That costs time, interrupts deep work, and quietly creates inconsistencies because everyone quotes a slightly different version of the truth.
It adds up fast. Here’s where it breaks down once your folder becomes a living library instead of a neat archive.
- Teams re-open the same PDF over and over just to find one sentence they used last month.
- Important context gets lost because Drive search can’t find “similar meaning,” only matching words.
- Manual indexing never happens, and when it does, it becomes stale within a week.
- Duplicates creep in because nobody can tell what’s already been processed and stored elsewhere.
The Solution: Auto-Vectorize Drive Files Into Postgres
This n8n workflow turns your Google Drive folder into a steady ingestion pipeline for semantic search. It runs on a schedule (3 AM by default) or manually when you hit “Test workflow,” then looks inside a chosen Drive folder for new files. Each file is downloaded, routed by type (PDF, TXT, or JSON), and parsed into clean text. From there, the workflow splits the content into chunks, generates OpenAI embeddings using the text-embedding-3-small model, and inserts everything into Postgres with PGVector so it’s immediately query-ready for RAG, internal search, or an AI agent. After a successful insert, the file gets moved to a “vectorized” folder so the pipeline stays tidy and deduplicated.
The workflow starts with a Drive folder scan, then processes files in batches so it doesn’t choke on a big upload. It extracts text based on MIME type, generates embeddings, stores them in your PGVector table, and finally relocates the source file to mark it as done.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say your team drops 20 new docs a week into a shared Drive folder (a mix of PDFs, meeting notes, and JSON exports). Manually, someone usually spends about 10 minutes per doc downloading, skimming, and pulling the right excerpt when questions pop up, which is roughly 3 hours weekly. With this workflow, it’s closer to 5 minutes to set the doc in the right folder and forget it, then the overnight run ingests everything automatically. Next day, answers are a database query away.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Google Drive to store files and monitor folders.
- Postgres with PGVector to store embeddings for search.
- OpenAI API key (get it from the OpenAI API dashboard).
Skill level: Intermediate. You’ll connect credentials, paste folder IDs, and confirm your Postgres/PGVector table settings.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A schedule (or manual run) kicks things off. You can run it nightly (3 AM is the default) or click “Test workflow” when you want to ingest files right now.
The workflow finds files in your chosen Drive folder. It looks up the folder, then iterates through items in batches, which keeps runs stable even when someone dumps in a big backlog.
Each file is downloaded and converted to text. A file-type router sends PDFs to the PDF parser, plain text to a text parser, and JSON to a JSON parser so you get consistent text out the other end.
Embeddings are generated and stored in Postgres (PGVector). The content is chunked, vectors are created with OpenAI’s embedding model, and then inserted into your configured PGVector collection for semantic search.
You can easily modify supported file types to include things like DOCX or Markdown based on your needs. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Manual and Schedule Triggers
Set up the two triggers that can start the workflow manually or on a schedule.
- Open Manual Execution Start and keep it as the on-demand trigger for testing and manual runs.
- Open Scheduled Automation Trigger and set the schedule rule to run every
4hours (Interval → hours → Hours Interval =4). - Verify both triggers connect to Find Drive Folder to ensure either start point flows into the same processing path.
Step 2: Connect Google Drive and Locate Files
Configure Google Drive access and define where the workflow will search for files to process.
- Open Find Drive Folder and set Resource to
fileFolder, Return All totrue, and Options → Fields tonameandid. - Set Filter → Folder ID to the source folder using
[YOUR_ID]and What to Search tofiles. - In Iterate Item Batches, keep default settings to process files one by one from the folder results.
- Open Retrieve Drive File and set Operation to
downloadwith File ID set to{{ $json.id }}. - Credential Required: Connect your googleDriveOAuth2Api credentials in Find Drive Folder, Iterate Item Batches (if prompted), and Retrieve Drive File.
[YOUR_ID] in Find Drive Folder with the actual Google Drive folder ID, or no files will be returned.Step 3: Route Files and Extract Content
Use a file-type router to send each file to the correct parser before vectorization.
- Open Route by File Type and confirm the three rules match MIME types using
{{ $binary["data"].mimeType }}with right valuesapplication/pdf,text/plain, andapplication/json. - Ensure Route by File Type outputs to Parse PDF Content, Parse Text Content, and Parse JSON Content based on the matching MIME type.
- Set Parse PDF Content to Operation
pdf, Parse Text Content to Operationtext, and Parse JSON Content to OperationfromJson.
Step 4: Set Up AI Processing and Vector Storage
Chunk, enrich, embed, and store the extracted content in PGVector.
- Configure Recursive Text Chunker with Chunk Overlap set to
50. - Open Standard Data Loader and verify metadata mapping values are set to
{{ $('Retrieve Drive File').item.json.name }}for filename and{{ $('Retrieve Drive File').item.json.id }}for id. - Open PGVector Storage Insert and set Mode to
insert, Table Name tocollection_vectors, and Collection Name toworkflow_generatorwith Collection Table Nameembedding_collections. - Confirm OpenAI Embedding Generator is connected as the embedding model for PGVector Storage Insert.
- Credential Required: Connect your postgres credentials in PGVector Storage Insert.
- Credential Required: Connect your openAiApi credentials in OpenAI Embedding Generator (this node provides embeddings to PGVector Storage Insert).
Step 5: Configure File Relocation After Processing
Move files to a destination folder once they are inserted into PGVector.
- Open Relocate Drive File and set Operation to
move. - Set File ID to
{{ $('Iterate Item Batches').item.json.id }}and choose Drive IDMy Drive. - Set Folder ID to the target folder using
[YOUR_ID](cached namevectorized). - Credential Required: Connect your googleDriveOAuth2Api credentials in Relocate Drive File.
[YOUR_ID] in Relocate Drive File is not updated, processed files will fail to move and remain in the source folder.Step 6: Test and Activate Your Workflow
Validate the full flow and then enable automation for production use.
- Click Manual Execution Start and run the workflow to test with a sample file from the source folder.
- Confirm a successful run shows Route by File Type selecting the correct parser and PGVector Storage Insert inserting into
collection_vectors. - Verify the file is moved by Relocate Drive File into the destination folder.
- When satisfied, activate the workflow so Scheduled Automation Trigger runs every 4 hours automatically.
Common Gotchas
- Google Drive credentials can expire or need specific permissions. If things break, check the Google Drive OAuth connection inside n8n’s Credentials page first.
- If you’re using Wait nodes or external rendering, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- OpenAI requests can fail on quota or billing issues, and it’s easy to miss. If embeddings suddenly stop, check your OpenAI API usage limits and make sure the key in the Embeddings node matches the active project.
Frequently Asked Questions
About an hour if your Drive and Postgres access is ready.
No. You will mostly paste IDs, connect accounts, and test a run.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in OpenAI API costs, which are usually a few cents per document depending on length.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, but you’ll add one or two extraction steps. You can route DOCX by extending the “Route by File Type” logic and adding another “Extract from File” parser for that MIME type. For scanned PDFs, you typically insert an OCR step before “Parse PDF Content,” then pass the OCR text into the same chunking and embedding path. Common tweaks also include changing chunk size in the text splitter and writing extra metadata fields (like department, client, or doc type) into the PGVector records.
Usually it’s expired OAuth consent or the wrong Google account connected. Reconnect the Google Drive credential in n8n, then confirm the account can access both the source folder and the “vectorized” folder. If it still fails, check that the folder IDs in the Find/Search/Move Drive nodes are correct and that the Drive API isn’t restricted by your workspace admin.
A lot, as long as your server and database can keep up. On n8n Cloud, execution volume depends on your plan, while self-hosting removes hard execution limits and shifts the bottleneck to CPU/RAM and Postgres performance. In practical terms, most teams start with a small batch size, ingest a backlog overnight, then let the nightly schedule handle new files going forward. If you plan to ingest thousands of large PDFs, you’ll want to tune batching and make sure PGVector indexes are set up properly.
For this workflow, n8n has a few advantages: more complex logic with unlimited branching at no extra cost, a self-hosting option for unlimited executions, and native LangChain/PGVector-style building blocks that many no-code tools don’t handle cleanly. The batching is also a big deal when you ingest backlogs, because you can control throughput instead of timing out. Zapier or Make can still work if you only need “new file → send notification,” but embeddings plus a database insert is where they start to feel awkward. If you’re unsure, Talk to an automation expert and sanity-check the best path. Honestly, choosing the wrong tool here gets expensive later.
Once this is running, new Drive files stop being “stuff you uploaded” and start being a searchable knowledge source your tools can actually use. Set it up once, then let the folder stay ready.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.