Solved: Monitor OpenAI API Token Usage and Cost Limit Alerts

🚀 Executive Summary

TL;DR: This guide provides a Python script to automatically monitor OpenAI API token usage and associated costs, addressing the problem of unexpected bills. It enables real-time visibility and sends proactive alerts to services like Slack when predefined cost limits are exceeded, eliminating the need for manual dashboard checks.

🎯 Key Takeaways

The solution leverages the OpenAI `v1/usage` API endpoint, requiring `Authorization` (API key) and `OpenAI-Organization` headers, to fetch daily token usage data in UTC for accurate cost calculation.
Configuration secrets, including OpenAI API keys, organization IDs, Slack webhooks, and cost limits, are securely managed using a `config.env` file and loaded via `python-dotenv` to prevent hardcoding.
Automated alerts are triggered to a specified notification service (e.g., Slack) when the estimated daily cost, calculated from prompt and completion tokens, surpasses a user-defined `COST_LIMIT_USD`, with scheduling typically handled by cron or serverless functions.

Monitor OpenAI API Token Usage and Cost Limit Alerts

Hey team, Darian here. Look, we’re all busy, and the last thing anyone needs is a surprise bill because a test script was left running over the weekend. I used to check the OpenAI usage dashboard manually every morning, and frankly, it was a waste of time. After one too many “close calls” with our budget, I built a simple automated monitor. This script has saved my team from more than one headache, and it gives us real-time visibility without the manual-check grind. Today, I’m going to walk you through how to set it up.

Prerequisites

Before we dive in, make sure you have the following ready:

A Python 3 environment.
Your OpenAI Organization ID and API Key.
A Slack Incoming Webhook URL (or any other notification service endpoint).
Familiarity with scheduling a script (like with cron).

The Guide: Building Your Usage Monitor

I’ll skip the standard virtual environment setup since you likely have your own workflow for that. Let’s jump straight to the logic. Just make sure you’ve installed the necessary libraries, like requests for making HTTP calls and python-dotenv for managing our secrets.

Step 1: Set Up Your Configuration

First, let’s get our sensitive data out of the script. I always use a config.env file to store secrets. It’s just good practice.

Create a file named config.env:


# config.env
OPENAI_API_KEY="sk-YourSecretKey"
OPENAI_ORG_ID="org-YourOrgId"
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/Your/Webhook/URL"
COST_LIMIT_USD=100.00

Step 2: The Python Script

Now for the main event. We’ll create a Python script that fetches usage data from the OpenAI API, calculates the cost, and sends an alert if it exceeds our defined limit. I’ve broken it down into logical functions.

Here’s the complete script, which I’ll call monitor_openai.py. We’ll walk through it piece by piece below.


import os
import requests
from datetime import datetime, timedelta
from dotenv import load_dotenv

# --- Constants: OpenAI Pricing (as of late 2023 for GPT-4) ---
# It's crucial to update these if OpenAI changes their pricing!
GPT4_PROMPT_PRICE_PER_1K = 0.03
GPT4_COMPLETION_PRICE_PER_1K = 0.06

def main():
    """Main function to run the monitoring process."""
    load_dotenv('config.env')

    api_key = os.getenv("OPENAI_API_KEY")
    org_id = os.getenv("OPENAI_ORG_ID")
    slack_webhook_url = os.getenv("SLACK_WEBHOOK_URL")
    cost_limit = float(os.getenv("COST_LIMIT_USD", 100.0))

    if not all([api_key, org_id, slack_webhook_url]):
        print("Error: Configuration variables are missing.")
        return

    # Get today's date in YYYY-MM-DD format, as required by the API
    today_str = datetime.utcnow().strftime('%Y-%m-%d')
    
    usage_data = get_openai_usage(api_key, org_id, today_str)
    
    if not usage_data or not usage_data.get("data"):
        print(f"Could not retrieve usage data for {today_str}.")
        return

    total_prompt_tokens = 0
    total_completion_tokens = 0

    # The API returns usage per model, so we sum them up
    for entry in usage_data["data"]:
        total_prompt_tokens += entry["n_prompt_tokens_total"]
        total_completion_tokens += entry["n_completion_tokens_total"]
    
    # Calculate the cost
    estimated_cost = (total_prompt_tokens / 1000 * GPT4_PROMPT_PRICE_PER_1K) + \
                     (total_completion_tokens / 1000 * GPT4_COMPLETION_PRICE_PER_1K)

    print(f"Usage for {today_str}:")
    print(f"  - Prompt Tokens: {total_prompt_tokens}")
    print(f"  - Completion Tokens: {total_completion_tokens}")
    print(f"  - Estimated Cost: ${estimated_cost:.2f}")

    # Check if we need to send an alert
    if estimated_cost > cost_limit:
        message = (
            f":warning: OpenAI API Cost Alert! :warning:\n"
            f"Estimated cost for {today_str} is ${estimated_cost:.2f}, "
            f"which exceeds the limit of ${cost_limit:.2f}.\n"
            f"Prompt Tokens: {total_prompt_tokens}\n"
            f"Completion Tokens: {total_completion_tokens}"
        )
        send_slack_alert(slack_webhook_url, message)

def get_openai_usage(api_key, org_id, date_str):
    """Fetches usage data from the OpenAI API for a specific date."""
    headers = {
        "Authorization": f"Bearer {api_key}",
        "OpenAI-Organization": org_id,
    }
    params = {"date": date_str}
    try:
        response = requests.get("https://api.openai.com/v1/usage", headers=headers, params=params)
        response.raise_for_status() # Raises an HTTPError for bad responses (4xx or 5xx)
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error fetching OpenAI usage: {e}")
        return None

def send_slack_alert(webhook_url, message):
    """Sends a notification to a Slack webhook."""
    payload = {"text": message}
    try:
        response = requests.post(webhook_url, json=payload)
        response.raise_for_status()
        print("Slack alert sent successfully!")
    except requests.exceptions.RequestException as e:
        print(f"Error sending Slack alert: {e}")

if __name__ == "__main__":
    main()

Code Breakdown:

Constants: I’ve hardcoded the pricing for GPT-4. In a more complex setup, you might fetch this from a config file, especially if you use multiple models with different pricing. This is a key point of failure if prices change!
main(): This is our orchestrator. It loads the environment variables, calls the function to get usage, calculates the cost, and then decides whether to fire an alert.
get_openai_usage(): This function does the actual API call. Notice the headers—you need both the `Authorization` bearer token (your API key) and the `OpenAI-Organization` ID. The `date` parameter is crucial for fetching a specific day’s usage.
send_slack_alert(): A simple helper function that POSTs a JSON payload to our Slack webhook. It’s decoupled so you could easily swap this out for email, PagerDuty, or another service.

Pro Tip: In my production setups, I route these alerts to a dedicated Slack channel like #devops-alerts. This keeps the noise out of general channels but ensures the on-call engineer sees it immediately. It prevents alert fatigue for the rest of the team.

Step 3: Schedule the Script

The final step is to run this script automatically. I use cron on our Linux runners. You could also use a systemd timer, a GitHub Action on a schedule, or a serverless function (like AWS Lambda) for a more robust setup.

To run this script every morning at 2 AM, the cron entry would look like this. Remember to use the command that works for your environment to edit your cron tasks.


# Run the OpenAI monitor script at 2:00 AM every day.
0 2 * * * python3 monitor_openai.py

This simple entry assumes the script and config.env are in a directory the cron job can access. You may need to provide a more complete path to the python interpreter and script depending on your system’s configuration.

Common Pitfalls (Where I Usually Mess Up)

Timezone Issues: The OpenAI API reports usage in UTC. If your server or cron job runs in a different timezone, your “daily” query might be off, pulling incomplete data for the current day. I made this mistake early on. Using `datetime.utcnow()` as we did in the script is the safest bet.
Outdated Pricing: OpenAI model prices change. If you don’t update the pricing constants in the script, your cost calculations will be wrong. This is the most brittle part of this setup. A good habit is to review these constants quarterly.
API Key Rotation: If your organization rotates API keys and you forget to update the config.env file, the script will silently fail. Adding an “else” clause to the `if usage_data:` check to send a “Failed to fetch data” alert can save you from being blind to this.

Conclusion

And that’s it. A straightforward, fire-and-forget script that provides a crucial safety net for your OpenAI API costs. It takes about 20 minutes to set up and gives you peace of mind, freeing you up to focus on building cool things, not worrying about the bill. If you have any questions, you know where to find me.

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.

🤖 Frequently Asked Questions

❓ How can I monitor my OpenAI API token usage and costs automatically?

You can implement a Python script that fetches daily token usage from the OpenAI `v1/usage` API, calculates estimated costs based on model pricing constants (e.g., GPT-4), and sends alerts to a Slack webhook if a predefined `COST_LIMIT_USD` is exceeded. Securely manage credentials using a `config.env` file and schedule the script to run periodically (e.g., with cron).

❓ How does this automated monitoring solution compare to alternative methods or scheduling options?

This custom Python script offers proactive, automated cost monitoring and external alerting, which is more efficient than manual checks of the OpenAI usage dashboard. For scheduling, while cron provides a simple setup, more robust alternatives like systemd timers, GitHub Actions on a schedule, or serverless functions (e.g., AWS Lambda) offer greater scalability, reliability, and error handling for production environments.

❓ What is a common pitfall when calculating OpenAI API costs, and how can it be mitigated?

A common pitfall is using outdated pricing constants for OpenAI models, which leads to inaccurate cost calculations. This can be mitigated by regularly reviewing and updating the `GPT4_PROMPT_PRICE_PER_1K` and `GPT4_COMPLETION_PRICE_PER_1K` (or other model-specific prices) within the script, ideally on a quarterly basis, as OpenAI pricing models can change.

TechResolve – SaaS Troubleshooting & Software Alternatives

Leave a ReplyCancel reply