🚀 Executive Summary

TL;DR: GPTBot’s aggressive crawling can significantly inflate Vercel bills for open-source projects by triggering serverless functions. Developers can mitigate this by implementing a `robots.txt` disallow rule, a more robust `vercel.json` edge configuration to block requests by User-Agent, or, as a last resort, IP blocking.

🎯 Key Takeaways

The `robots.txt` file offers a simple, honor-system method to request GPTBot and ChatGPT-User to not crawl a site, but it is not a guaranteed block against all crawlers.
Vercel’s `vercel.json` configuration provides a highly effective edge-level solution to block GPTBot by its User-Agent, returning a `403 Forbidden` status and preventing serverless function invocation costs.
IP blocking, using OpenAI’s published ranges (retrieved via `dig TXT oai.openai.com`), is a ‘nuclear option’ for persistent, non-compliant scrapers, but requires high maintenance and carries risks of inadvertently blocking legitimate services.

GPTBot 164k request a day to my open-source project? Now have to pay for Vercel pro

OpenAI’s GPTBot hammering your open-source project and inflating your Vercel bill? I’ve seen it happen to the best of us. Here are three battle-tested ways to block it, from the polite request to the firewall lockdown.

So, GPTBot Found Your Project. Here’s How to Stop Paying for OpenAI’s Scraper.

It was 3 AM, and of course, PagerDuty was screaming about latency on one of our core APIs. I dove in, expecting a bad deploy or a failing `prod-db-01` replica. Instead, I found our load balancers were getting absolutely hammered by a single, relentless user agent I’d never seen before. A new “AI” company had decided our internal pricing API was prime training data. The feeling of watching your carefully architected system get treated like a free-for-all buffet by a mindless scraper is infuriating. So when I saw that Reddit thread about GPTBot running up a developer’s Vercel bill, I felt that all too familiar frustration.

First, What’s Going On Here?

Let’s get one thing straight: GPTBot isn’t malicious, it’s just… voracious. It’s OpenAI’s web crawler, and its job is to scrape the public internet to train future models like GPT-5. The problem is, it’s incredibly aggressive. For a static site, this might just be annoying. But on a platform like Vercel, especially if you have serverless functions firing for each request, every one of those 164,000 daily hits can translate directly into a line item on your invoice. You’re essentially paying for OpenAI to train their next billion-dollar product on your work. We can do better.

Here are three ways to deal with this, from the polite suggestion to the bouncer at the door.

Solution 1: The “Polite Request” using robots.txt

This is the classic, standard way to manage bots. You create a file named robots.txt in the root of your public directory and tell “good” bots what they are and aren’t allowed to do. It’s the honor system of the internet.

The Fix:

In your project’s public folder (the one that gets deployed to the root of your site), create a file called robots.txt with the following content:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

This tells both of OpenAI’s known bots that they are not allowed to crawl any part of your site. Most reputable crawlers, including GPTBot, will respect this. It’s the easiest and fastest thing to implement.

My Take: This is step one. Always do this. But don’t rely on it as your only line of defense. A polite note only works on polite visitors, and not all bots are polite.

Solution 2: The “Firm No” at the Edge (Vercel Config)

If the polite request doesn’t work, or you just want to be certain, you need to block the bot before it ever hits your functions. On Vercel, the best way to do this is with the vercel.json configuration file. This is a platform-level rule that intercepts requests at the edge based on their User-Agent header.

The Fix:

In the root of your project, create or edit your vercel.json file to include a “blocked” rule in the middleware configuration. It will check the `user-agent` header for the string “GPTBot” and, if it matches, return a 403 Forbidden status immediately.

{
  "middleware": [
    {
      "matcher": "/(.*)",
      "edge": true,
      "handler": "middleware.js"
    }
  ],
  "routes": [
    {
      "src": "/(.*)",
      "has": [
        {
          "type": "header",
          "key": "User-Agent",
          "value": ".*GPTBot.*"
        }
      ],
      "status": 403,
      "dest": "/dev/null"
    }
  ]
}

This is far more robust. The bot gets an access denied error without ever executing your code, saving you the invocation cost. This is my recommended approach for anyone on Vercel.

Solution 3: The “Nuclear Option” (IP Blocking)

Sometimes, a bot will spoof its User-Agent, rendering the previous methods useless. In that case, the last resort is blocking the known IP address ranges of the service you want to restrict. OpenAI publishes its IP ranges for this very reason.

The Fix:

This isn’t something you’d typically do in vercel.json. You’d implement this at a higher level, like a Cloudflare WAF (Web Application Firewall) rule or in your AWS security groups if you were self-hosting. You would create a rule that denies all traffic from OpenAI’s published IP blocks.

You can find the official list by running a DNS TXT query:

$ dig +short TXT oai.openai.com

Warning: I call this the ‘nuclear’ option for a reason. IP ranges can change, requiring you to maintain your block list. More importantly, you could inadvertently block legitimate OpenAI services or APIs that your application might actually need to use in the future. Use this with extreme caution and only if you’re facing a truly persistent, non-compliant scraper.

Quick Comparison

Here’s a quick cheat sheet for choosing your strategy.

Solution	Ease of Implementation	Effectiveness	Maintenance
robots.txt	Very Easy	Low (Relies on trust)	None
Vercel Config Block	Easy	High	Low (Update if agent changes)
IP Blocking	Moderate / Complex	Very High	High (IPs can change)

Your passion project shouldn’t be a free lunch for a trillion-dollar company. Start with robots.txt, but have the vercel.json block ready to go. It’s the most effective and pragmatic solution for protecting your project and your wallet. Now go ship something cool.

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.

🤖 Frequently Asked Questions

❓ What is GPTBot and why is it problematic for Vercel users?

GPTBot is OpenAI’s web crawler designed to scrape public internet data for training future models. It becomes problematic on Vercel, especially for projects utilizing serverless functions, because its high volume of requests directly translates into invocation costs, forcing developers to pay for OpenAI’s data collection.

❓ How do `robots.txt`, Vercel config, and IP blocking compare for stopping GPTBot?

`robots.txt` is very easy but relies on the bot’s compliance. Vercel config (using `vercel.json`) is easy, highly effective for blocking by User-Agent at the edge, and prevents function costs. IP blocking is moderate/complex, very high in effectiveness against spoofing, but high maintenance due to changing IP ranges and risks blocking legitimate services.

❓ What is a common implementation pitfall when trying to block GPTBot and how can it be avoided?

A common pitfall is relying solely on `robots.txt`, as it’s an honor system and not all bots comply, leading to continued billing. This can be avoided by implementing a more robust edge-level block using `vercel.json` to check the `User-Agent` header and return a `403 Forbidden` status immediately, ensuring the bot never triggers serverless functions.

TechResolve – SaaS Troubleshooting & Software Alternatives

Leave a ReplyCancel reply

🚀 Executive Summary

🎯 Key Takeaways

So, GPTBot Found Your Project. Here’s How to Stop Paying for OpenAI’s Scraper.

First, What’s Going On Here?

Solution 1: The “Polite Request” using robots.txt

The Fix:

Solution 2: The “Firm No” at the Edge (Vercel Config)

The Fix:

Solution 3: The “Nuclear Option” (IP Blocking)

The Fix:

Quick Comparison

Darian Vance

🤖 Frequently Asked Questions

❓ What is GPTBot and why is it problematic for Vercel users?

❓ How do `robots.txt`, Vercel config, and IP blocking compare for stopping GPTBot?

❓ What is a common implementation pitfall when trying to block GPTBot and how can it be avoided?

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives