🚀 Executive Summary

TL;DR: AI agents often process PDFs by converting them to text, losing the binary file handle needed for database attachments. The core problem is an API-layer disconnect where the database expects a URL or multipart/form-data, not the agent’s internal memory. Solutions involve using public URL relays or a dedicated middleware service to handle the binary file transfer.

🎯 Key Takeaways

AI agents typically convert PDF files to text for LLM processing, which means they do not retain a persistent binary handle necessary for direct file attachments to database properties.
Database APIs for file properties (e.g., Notion, Airtable) commonly require either a publicly accessible URL from which they can fetch the file or a `multipart/form-data` upload.
Implementing a middleware relay, such as a Python-based Lambda function, provides the most stable and permanent solution by fetching the binary data and performing a proper `POST` request with `multipart/form-data` to the database.

Agent can read PDFs but can’t attach them to a File property in a database — any workaround?

When your AI agent can process PDF data but fails to save the file to a database property, it’s usually an API-layer disconnect; here is how to bridge that gap using URL relays and middleware.

The Read-Only Wall: Why Your AI Agent Won’t “Attach” That PDF

I remember a frantic 2:00 AM session on dev-workflow-04 where we had built this beautiful legal-tech agent. It could parse a 50-page PDF, extract every clause, and summarize it perfectly. But the second we asked it to save that same PDF into the “Original Document” field in our Notion database? Radio silence. It was like watching a world-class chef describe a meal in exquisite detail but then refuse to actually put the food on a plate. It’s a classic frustration in the “agentic” world: the agent has the context of the file, but it doesn’t have a persistent handle on the binary data to move it from A to B.

The root cause is simple but annoying. Most AI agents (and the platforms they run on, like Zapier or Make) treat files as temporary buffers. When an agent “reads” a PDF, it often converts it to text for the LLM. The database, however, expects a File Property to be populated by a multipart/form-data upload or a public URL. The agent simply doesn’t “hold” the file in its hand once the reading is done. It has the memories, but it lost the object.

Pro Tip: Always check if your database API (Notion, Airtable, etc.) actually allows file uploads via API. Many only allow you to pass a URL that they then go and fetch themselves.

The Fixes

Strategy	Effort	Stability
The Public Link Shuffle	Low	Moderate
The Middleware Relay	High	High
The “Hacky” Base64 Blob	Medium	Low

1. The Quick Fix: The Public Link Shuffle

If your database property (like in Airtable or Notion) requires a URL to “attach” a file, you can’t just pass the agent’s internal memory. You need to upload the file to a temporary storage bucket (AWS S3, Google Cloud Storage, or even a public Dropbox folder) first. The agent then passes that publicly accessible URL to the database property. The database then “sucks” the file from that URL into its own storage.

// Example: Instead of sending the file, send a signed URL
{
  "properties": {
    "Project File": {
      "files": [
        {
          "name": "Report.pdf",
          "external": { "url": "https://s3.amazonaws.com/temp-bucket/report-123.pdf" }
        }
      ]
    }
  }
}

2. The Permanent Fix: The Middleware Relay

In our prod-db-01 environment, we stopped relying on the agent to handle the “move” entirely. We built a small Python-based middleware (running on a Lambda function). The agent sends the file ID to the Lambda, the Lambda fetches the binary from the source, and then uses a proper POST request with multipart/form-data to push it into the database. This is the only way to ensure files aren’t lost in transit or blocked by permissions.

import requests

def upload_to_db(file_url, db_api_key):
    # Fetch from source
    r = requests.get(file_url)
    # Post to DB as binary
    files = {'file': ('contract.pdf', r.content)}
    headers = {"Authorization": f"Bearer {db_api_key}"}
    response = requests.post("https://api.your-db.com/v1/files", headers=headers, files=files)
    return response.json()

3. The “Nuclear” Option: The Base64 Payload

This is “hacky,” and I only recommend it if you’re desperate. If your database doesn’t support file properties via API easily, but it does support long text strings, you can have the agent convert the PDF to a Base64 string and dump it into a text field. You’ll need a script on the other end to decode it. Warning: Large PDFs will hit character limits and crash your api-gateway-01 faster than you can say “out of memory.”

Warning: Base64 encoding increases file size by about 33%. If your database has a 2MB limit on text fields, your 1.6MB PDF will fail to save.

At the end of the day, stop thinking of the AI agent as a file system. Treat it as a logic engine. Let it handle the “thinking,” but let a dedicated storage service or a bit of glue code handle the “heavy lifting” of binary files. Your production logs will thank you.

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.

🤖 Frequently Asked Questions

❓ Why can’t my AI agent attach a PDF it just read to a database file property?

Your AI agent typically converts the PDF to text for processing, losing the binary data handle. Database file properties usually expect a public URL or a `multipart/form-data` upload, which the agent’s internal memory cannot provide directly.

❓ How do the ‘Public Link Shuffle’ and ‘Middleware Relay’ strategies compare for attaching files?

The ‘Public Link Shuffle’ is low effort and moderate stability, involving uploading to temporary storage and passing a public URL. The ‘Middleware Relay’ is high effort but offers high stability, using a dedicated service to fetch and upload the binary data via `multipart/form-data`, ensuring robust handling.

❓ What is a common implementation pitfall when attempting to attach files using Base64 encoding?

A common pitfall is exceeding database text field character limits, as Base64 encoding increases file size by approximately 33%, causing large PDFs to fail when stored as text strings.

TechResolve – SaaS Troubleshooting & Software Alternatives

Leave a ReplyCancel reply