Telegram to Google Docs, Arabic PDF text ready
You get an Arabic PDF in Telegram, and you already know what’s coming. Someone has to “just pull the text,” fix weird line breaks, add page numbers, and then paste it into something the team can search.
This Telegram PDF OCR workflow hits ops teams and admins first, but marketers dealing with Arabic press clips and reports feel it too. The payoff is simple: send a PDF, get a clean Google Docs link back, and stop burning about 1–2 hours a week on copy-paste cleanup.
Below, you’ll see how the automation runs, what you need to connect, and the real-world time savings you can expect once it’s live.
How This Automation Works
The full n8n workflow, from trigger to final output:
n8n Workflow Template: Telegram to Google Docs, Arabic PDF text ready
flowchart LR
subgraph sg0["Telegram Bot Flow"]
direction LR
n0["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Download Document from Teleg.."]
n1["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Telegram Bot Trigger"]
n2@{ icon: "mdi:swap-horizontal", form: "rounded", label: "Check If Document Attached", pos: "b", h: 48 }
n3["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Upload PDF to Mistral AI"]
n4["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Get Mistral Signed URL"]
n5["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/httprequest.dark.svg' width='40' height='40' /></div><br/>Process OCR with Mistral"]
n6["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/code.svg' width='40' height='40' /></div><br/>Parse OCR Results by Page"]
n7@{ icon: "mdi:cog", form: "rounded", label: "Update Google Doc with Content", pos: "b", h: 48 }
n8@{ icon: "mdi:cog", form: "rounded", label: "Create New Google Doc", pos: "b", h: 48 }
n9@{ icon: "mdi:swap-vertical", form: "rounded", label: "Process Document Updates", pos: "b", h: 48 }
n10@{ icon: "mdi:cog", form: "rounded", label: "Aggregate OCR Results", pos: "b", h: 48 }
n11["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Send Document Link to User"]
n12["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Request PDF File Format"]
n13["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Status: File Received (1/5)"]
n14["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Status: Sent to Processor (2.."]
n15["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Status: File Signed (3/5)"]
n16["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Status: Results Received (4/5)"]
n17["<div style='background:#f5f5f5;padding:10px;border-radius:8px;display:inline-block;border:1px solid #e0e0e0'><img src='https://flowpast.com/wp-content/uploads/n8n-workflow-icons/telegram.svg' width='40' height='40' /></div><br/>Status: Creating Document (5.."]
n1 --> n2
n10 --> n8
n10 --> n17
n8 --> n9
n4 --> n5
n4 --> n15
n9 --> n7
n5 --> n6
n5 --> n16
n3 --> n4
n3 --> n14
n6 --> n10
n2 --> n0
n2 --> n12
n7 --> n9
n7 --> n11
n0 --> n3
n0 --> n13
end
%% Styling
classDef trigger fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef ai fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef aiModel fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
classDef decision fill:#fff8e1,stroke:#f9a825,stroke-width:2px
classDef database fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef api fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef code fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef disabled stroke-dasharray: 5 5,opacity: 0.5
class n1 trigger
class n2 decision
class n3,n4,n5 api
class n6 code
classDef customIcon fill:none,stroke:none
class n0,n1,n3,n4,n5,n6,n11,n12,n13,n14,n15,n16,n17 customIcon
The Problem: Arabic PDFs Are Hard to Reuse
Arabic PDFs are often “readable” to humans but useless to your tools. You can’t search them well, you can’t quote them quickly, and copying text usually comes out scrambled (wrong order, broken lines, missing punctuation). Then someone ends up retyping sections or stitching together chunks from different pages. It’s slow, and frankly it’s mentally exhausting work that nobody wants to own. Even worse, the final output varies depending on who did it, so your team spends extra time double-checking and reformatting.
The friction compounds. Here’s where it breaks down in day-to-day work:
- People copy text from PDFs into WhatsApp or email, and it loses context and page references.
- Arabic OCR is inconsistent across generic tools, so you waste time fixing obvious errors.
- Files get saved in random places, which means the “final version” is always a mystery.
- Someone has to manually tell the requester it’s done, usually after they follow up twice.
The Solution: Send a Telegram PDF, Receive a Google Doc
This n8n workflow turns Telegram into a simple intake box for Arabic PDFs. A user sends a PDF to your bot, the workflow validates it, downloads the file, and sends it to Mistral’s OCR service to extract Arabic text page by page. The output is then cleaned and organized, including page numbering so the text is easy to reference later. Next, n8n creates a Google Doc in your Google Drive and inserts the OCR text in batches (so long documents don’t time out). Finally, the bot replies in Telegram with a clickable Google Docs link, plus progress messages along the way so users don’t wonder if it’s stuck.
The workflow starts with a Telegram message containing a PDF attachment. From there, it pushes the document to Mistral for Arabic OCR, merges the results into one formatted text body, then creates and fills a Google Doc. The last step is the one you care about: a shareable link back in Telegram.
What You Get: Automation vs. Results
| What This Workflow Automates | Results You’ll Get |
|---|---|
|
|
Example: What This Looks Like
Say you handle 10 Arabic PDFs a week, and each one takes about 15 minutes to copy, clean, reformat, and upload somewhere shareable. That’s roughly 2.5 hours weekly, and it’s the kind of work that steals attention in the middle of your day. With this workflow, each request becomes: 1 minute to forward the PDF to Telegram, then you wait for OCR and Google Docs creation (often about 5–15 minutes depending on the PDF). You still review if the document is critical, but the busywork is gone.
What You’ll Need
- n8n instance (try n8n Cloud free)
- Self-hosting option if you prefer (Hostinger works well)
- Telegram for receiving PDF files from users.
- Mistral AI OCR to extract Arabic text from pages.
- Google Docs + Google Drive to create and store the final document.
- Telegram bot token (get it from @BotFather in Telegram).
Skill level: Intermediate. You’ll connect accounts, paste API keys, and confirm Google Drive permissions for the output folder.
Don’t want to set this up yourself? Talk to an automation expert (free 15-minute consultation).
How It Works
A Telegram message triggers everything. When someone sends your bot a document, n8n receives the update in real time through the Telegram trigger.
The file gets validated and downloaded. If there’s no attachment, or it isn’t a PDF, the workflow replies asking for the correct format. When it is a PDF, it fetches the actual document binary so it can be processed.
OCR runs and the text gets organized. n8n uploads the PDF to Mistral, runs Arabic OCR, splits the output by page, and then merges it back into one coherent result. Page numbering is included so references don’t get lost.
A Google Doc is created and filled. The workflow generates a new Google Doc in your Drive folder, then inserts the text in batches. That batching is what keeps large files from failing midway.
You can easily modify the destination folder and naming convention to match your client, project, or department. See the full implementation guide below for customization options.
Step-by-Step Implementation Guide
Step 1: Configure the Telegram Trigger
This workflow starts when a Telegram message arrives. Configure the trigger and validate that a PDF attachment is present.
- Add and open Telegram Update Trigger and keep Updates set to
message. - Credential Required: Connect your telegramApi credentials in Telegram Update Trigger.
- Open Validate Attachment Presence and confirm the condition checks
{{ $json.message.document.file_name }}with the exists operator. - Verify the error branch connects to Ask for PDF Format for non-PDF messages.
Step 2: Connect Telegram File Retrieval and Status Updates
The workflow retrieves the uploaded file from Telegram and sends progress messages to the user.
- In Fetch Telegram Document, set Resource to
fileand File ID to{{ $json.message.document.file_id }}. - Credential Required: Connect your telegramApi credentials in Fetch Telegram Document.
- Open Ask for PDF Format and set Text to
Please send the file in PDF formatand Chat ID to{{ $('Telegram Update Trigger').item.json.message.chat.id }}. - Credential Required: Connect your telegramApi credentials in Ask for PDF Format.
- Connect Telegram credentials to all status nodes: Progress: File Received, Progress: Sent to OCR, Progress: File Signed, Progress: Results Ready, and Progress: Creating Doc.
Step 3: Set Up Mistral OCR Processing
These nodes upload the PDF to Mistral, get a signed link, and run OCR. Several steps run in parallel to send progress updates.
- In Upload PDF to Mistral, set URL to
https://api.mistral.ai/v1/files, Method toPOST, and Content Type tomultipart-form-data. - Configure Body Parameters in Upload PDF to Mistral with
purpose=ocrandfileas formBinaryData with input fielddata. - Credential Required: Connect your httpHeaderAuth credentials in Upload PDF to Mistral.
- In Retrieve Mistral File Link, set URL to
=https://api.mistral.ai/v1/files/{{ $json.id }}/urland add query parameterexpiry=24. - Credential Required: Connect your httpHeaderAuth credentials in Retrieve Mistral File Link and Run Mistral OCR.
- In Run Mistral OCR, set JSON Body to
{ "model": "mistral-ocr-latest", "document": { "type": "document_url", "document_url": "{{ $json.url }}" }, "include_image_base64": true }.
Upload PDF to Mistral outputs to both Retrieve Mistral File Link and Progress: Sent to OCR in parallel. Retrieve Mistral File Link outputs to both Run Mistral OCR and Progress: File Signed in parallel. Run Mistral OCR outputs to both Split OCR Pages and Progress: Results Ready in parallel.
Step 4: Process OCR Pages and Aggregate Output
After OCR, the results are split into pages and aggregated to prepare the final document content.
- In Split OCR Pages, keep the JavaScript code as provided to output
pageNumberandcontentfor each page. - In Combine OCR Output, confirm the aggregation fields include
=pageNumberandcontent.
Combine OCR Output outputs to both Generate Google Doc and Progress: Creating Doc in parallel.
Step 5: Configure Google Docs Output and User Delivery
The workflow creates a Google Doc, inserts the OCR text, and sends the link back to the Telegram user.
- In Generate Google Doc, set Title to
=OCR Result from {{ $('Telegram Update Trigger').item.json.message.document.file_name }}and Folder ID to your target folder ID (replace[YOUR_ID]). - Credential Required: Connect your googleDocsOAuth2Api credentials in Generate Google Doc and Insert Text into Doc.
- In Insert Text into Doc, set Operation to
updateand Document URL to{{ $json.id }}. - Ensure the Insert action text in Insert Text into Doc uses the full expression:
{{ $('Combine OCR Output').item.json.content.map((c, i) => `${c}\n\n(Page Number: ${$('Combine OCR Output').item.json.pageNumber[i]})` ).join('\n\n--------\n\n') }}. - In Batch Doc Updates, keep Batch Size set to
1for sequential updates. - In Send Doc Link to User, set Text to
=<a href="https://docs.google.com/document/d/{{ $json.documentId }}">{{ $('Generate Google Doc').item.json.name }}</a>and Chat ID to{{ $('Telegram Update Trigger').item.json.message.chat.id }}. - Credential Required: Connect your telegramApi credentials in Send Doc Link to User.
[YOUR_ID], Generate Google Doc will fail. Replace it with an actual Google Drive folder ID.Insert Text into Doc outputs to both Batch Doc Updates and Send Doc Link to User in parallel.
Step 6: Test and Activate Your Workflow
Run a full test using a PDF in Telegram and verify each stage completes successfully.
- Click Execute Workflow and send a PDF to your Telegram bot to trigger Telegram Update Trigger.
- Confirm you receive progress messages from Progress: File Received through Progress: Creating Doc.
- Verify that a Google Doc is created with the OCR text and that Send Doc Link to User posts the document link to Telegram.
- When everything works, toggle the workflow to Active to enable production use.
Common Gotchas
- Google Docs/Drive credentials can expire or need specific permissions. If things break, check the connected Google account in n8n credentials and confirm the Drive folder sharing settings first.
- If you’re using Wait nodes or external OCR processing, processing times vary. Bump up the wait duration if downstream nodes fail on empty responses.
- Default prompts in AI nodes are generic. Add your brand voice early or you’ll be editing outputs forever.
Frequently Asked Questions
About 20 minutes if your keys are ready.
No. You’ll mostly paste credentials and choose the target Google Drive folder.
Yes. n8n has a free self-hosted option and a free trial on n8n Cloud. Cloud plans start at $20/month for higher volume. You’ll also need to factor in Mistral OCR API usage costs for each PDF processed.
Two options: n8n Cloud (managed, easiest setup) or self-hosting on a VPS. For self-hosting, Hostinger VPS is affordable and handles n8n well. Self-hosting gives you unlimited executions but requires basic server management.
Yes, and it’s one of the best tweaks to do early. Change the document name in the Google Docs creation step, and adjust the “set/edit fields” mapping that builds the title from Telegram metadata. Common customizations include adding the sender name, inserting the date, using a project code prefix, and saving into different Google Drive folders by chat or user.
Usually it’s the bot token or webhook setup. Regenerate the token in @BotFather if needed, then update the Telegram credentials in n8n and re-check the webhook is active. Also confirm the bot is actually in the chat you’re testing (group permissions trip people up), and that the message contains a real document attachment rather than a forwarded “preview.”
Plenty for normal team usage. On n8n Cloud Starter, you’re limited by monthly executions, while self-hosting has no execution cap (it depends on your server). In practice, OCR time is the real bottleneck, so larger PDFs simply queue up and take longer.
Often, yes, because this flow has branching, batching, and multi-step OCR processing that gets expensive or awkward elsewhere. n8n also gives you the self-host option, which matters if you process lots of documents and want predictable costs. Zapier or Make can still win for very small, simple flows with two steps. If you’re unsure, Talk to an automation expert and you’ll get a straight recommendation based on volume and risk.
Once this is running, Arabic PDFs stop being a dead end. You’ll get searchable text, a shareable Google Doc, and fewer interruptions during the week.
Need Help Setting This Up?
Our automation experts can build and customize this workflow for your specific needs. Free 15-minute consultation—no commitment required.