Solved: Monitor AWS Lambda Error Rates and Throttle Alerts via SES

🚀 Executive Summary

TL;DR: This guide provides a Python script to automate monitoring of AWS Lambda functions for errors and throttles over a 24-hour period using CloudWatch metrics. The script then compiles a summary report and sends it to your inbox daily via AWS Simple Email Service (SES), offering a low-maintenance solution for serverless application health checks.

🎯 Key Takeaways

Secure the monitoring script with a least privilege IAM policy, granting only `lambda:ListFunctions`, `cloudwatch:GetMetricData`, and `ses:SendEmail` permissions.
The Python script leverages `boto3` to paginate through all Lambda functions, dynamically build `GetMetricData` queries for `Errors` and `Throttles` metrics, and process results into a structured report.
Automate the daily execution of the monitoring script by deploying it as an AWS Lambda function triggered by an EventBridge (CloudWatch Events) rule, or via a cron job on an EC2 instance.

Monitor AWS Lambda Error Rates and Throttle Alerts via SES

Alright, let’s talk about something that used to drive me crazy: manual monitoring. Early in my career, I’d spend the first hour of my day spot-checking CloudWatch logs for our critical Lambda functions. It was tedious, and I only ever caught problems *after* they had been happening for a while. I finally automated the whole process, and that simple script probably saves me 3-4 hours a week and gives me peace of mind.

This guide is my refined, production-ready version of that script. We’re going to build a simple, effective monitoring tool that scans all our Lambda functions for errors and throttles over the last 24 hours and sends a neat summary report to our inbox using the Simple Email Service (SES). It’s a set-it-and-forget-it solution that delivers real value.

Prerequisites

Before we dive in, make sure you have the following ready to go:

AWS IAM User/Role: You’ll need programmatic access with permissions for CloudWatch, Lambda, and SES. We’ll cover the specific policies below.
Python 3.x: Most systems have it, but it’s good to check.
AWS Boto3 Library: This is the official AWS SDK for Python. If you don’t have it, you can install it in your project environment with a simple `pip install boto3`.
A Verified SES Identity: You need an email address or a whole domain verified in SES to send emails from. This is a crucial anti-spam measure from AWS.

The Guide: Step-by-Step

Step 1: Lock Down IAM Permissions

First things first, security. In my production setups, I always create a dedicated IAM role or user for automation scripts like this. We’ll stick to the principle of least privilege. Create an IAM policy with the following permissions and attach it to the user or role that will run the script:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "LambdaAndCloudWatchMonitoring",
            "Effect": "Allow",
            "Action": [
                "lambda:ListFunctions",
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        },
        {
            "Sid": "SendEmailViaSES",
            "Effect": "Allow",
            "Action": "ses:SendEmail",
            "Resource": "*"
        }
    ]
}

This policy allows our script to list all Lambda functions, fetch their metrics from CloudWatch, and send an email via SES. It can’t modify or delete anything, which is exactly what we want.

Step 2: The Python Script – Where the Magic Happens

Now for the main event. I’ll skip the standard `virtualenv` setup since you likely have your own workflow for that. Let’s jump straight to the Python logic. Create a file named `lambda_monitor.py`.

The script does four things:
1. Fetches a list of all Lambda functions in the region.
2. Queries CloudWatch’s `GetMetricData` API for the `Errors` and `Throttles` sum for each function over the last 24 hours.
3. Formats the findings into a clean HTML email.
4. Sends that email using SES.

Here is the complete script. I’ve added comments to explain each part.


import boto3
import os
from datetime import datetime, timedelta

# --- Configuration ---
# Best practice is to set these as environment variables
SES_SENDER_EMAIL = os.environ.get('SES_SENDER')
SES_RECIPIENT_EMAIL = os.environ.get('SES_RECIPIENT')
AWS_REGION = os.environ.get('AWS_REGION', 'us-east-1')

# --- Initialize AWS Clients ---
cloudwatch_client = boto3.client('cloudwatch', region_name=AWS_REGION)
lambda_client = boto3.client('lambda', region_name=AWS_REGION)
ses_client = boto3.client('ses', region_name=AWS_REGION)

def get_all_lambda_functions():
    """Paginates through all Lambda functions and returns their names."""
    functions = []
    paginator = lambda_client.get_paginator('list_functions')
    page_iterator = paginator.paginate()
    for page in page_iterator:
        for function in page['Functions']:
            functions.append(function['FunctionName'])
    print(f"Found {len(functions)} Lambda functions.")
    return functions

def get_function_metrics(function_names):
    """
    Queries CloudWatch for Error and Throttle metrics for a list of functions.
    """
    if not function_names:
        return {}

    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=1)
    
    # Build the metric data queries dynamically
    metric_queries = []
    for i, name in enumerate(function_names):
        metric_queries.append({
            'Id': f'errors_{i}',
            'MetricStat': {
                'Metric': {
                    'Namespace': 'AWS/Lambda',
                    'MetricName': 'Errors',
                    'Dimensions': [{'Name': 'FunctionName', 'Value': name}]
                },
                'Period': 86400, # 24 hours in seconds
                'Stat': 'Sum',
            },
            'Label': f'{name} Errors',
            'ReturnData': True
        })
        metric_queries.append({
            'Id': f'throttles_{i}',
            'MetricStat': {
                'Metric': {
                    'Namespace': 'AWS/Lambda',
                    'MetricName': 'Throttles',
                    'Dimensions': [{'Name': 'FunctionName', 'Value': name}]
                },
                'Period': 86400,
                'Stat': 'Sum',
            },
            'Label': f'{name} Throttles',
            'ReturnData': True
        })
        
    response = cloudwatch_client.get_metric_data(
        MetricDataQueries=metric_queries,
        StartTime=start_time,
        EndTime=end_time
    )

    # Process results into a more usable dictionary
    results = {}
    for metric in response['MetricDataResults']:
        if metric['Values']:
            count = int(metric['Values'][0])
            if count > 0:
                # Label is formatted like "FunctionName MetricName"
                parts = metric['Label'].split(' ')
                function_name = parts[0]
                metric_type = parts[1]
                
                if function_name not in results:
                    results[function_name] = {}
                results[function_name][metric_type] = count
                
    return results

def format_email_body(metric_results):
    """Builds an HTML email body from the metric results."""
    if not metric_results:
        return "Lambda Monitoring Report<p>No errors or throttles detected in the last 24 hours. Great job!</p>"

    html_body = "<h3>Lambda Monitoring Alert</h3>"
    html_body += "<p>The following functions reported errors or throttles in the last 24 hours:</p>"
    html_body += "<table border='1' cellpadding='5' cellspacing='0'>"
    html_body += "<tr><th>Function Name</th><th>Errors</th><th>Throttles</th></tr>"

    for func_name, metrics in metric_results.items():
        errors = metrics.get('Errors', 0)
        throttles = metrics.get('Throttles', 0)
        html_body += f"<tr><td>{func_name}</td><td>{errors}</td><td>{throttles}</td></tr>"

    html_body += "</table>"
    return html_body

def send_report_email(html_content):
    """Sends the report using AWS SES."""
    try:
        ses_client.send_email(
            Source=SES_SENDER_EMAIL,
            Destination={'ToAddresses': [SES_RECIPIENT_EMAIL]},
            Message={
                'Subject': {'Data': 'AWS Lambda Daily Health Report'},
                'Body': {'Html': {'Data': html_content}}
            }
        )
        print("Successfully sent email report.")
    except Exception as e:
        print(f"Error sending email: {e}")
        # In a real-world scenario, you'd want better error handling here.
        # Maybe log to CloudWatch Logs or another monitoring service.

def main():
    """Main execution function."""
    print("Starting Lambda monitoring check...")
    if not all([SES_SENDER_EMAIL, SES_RECIPIENT_EMAIL]):
        print("Error: SES_SENDER and SES_RECIPIENT environment variables must be set.")
        return

    all_functions = get_all_lambda_functions()
    
    # GetMetricData has a limit on queries, so we batch them.
    # The limit is 500 metrics, and we query 2 metrics per function.
    batch_size = 250 
    problematic_functions = {}

    for i in range(0, len(all_functions), batch_size):
        batch = all_functions[i:i + batch_size]
        print(f"Processing batch {i // batch_size + 1}...")
        batch_results = get_function_metrics(batch)
        problematic_functions.update(batch_results)

    if not problematic_functions:
        print("No issues found.")
        # You could optionally send a "All Clear" email here if you want.
        return

    print(f"Found issues in {len(problematic_functions)} functions.")
    email_body = format_email_body(problematic_functions)
    send_report_email(email_body)
    print("Monitoring check complete.")

if __name__ == "__main__":
    main()

Pro Tip: This script is perfect for a daily health check, but it’s not a replacement for real-time alerting. For business-critical functions, I strongly recommend setting up CloudWatch Alarms on the `Errors` metric with an SNS topic as the action. That way, you get notified within minutes of a critical failure, while this script gives you the broader daily overview.

Step 3: Configuration and Execution

We used environment variables in the script, which is a good practice. To run it locally, I usually create a `config.env` file (I avoid using hidden files like `.env` as they can sometimes be missed by tools).

Your `config.env` file would look like this:


export SES_SENDER="your-verified-sender@example.com"
export SES_RECIPIENT="your-inbox@example.com"
export AWS_REGION="us-east-1"

To run it, you would first load these variables into your session and then execute the script. On a Linux or macOS system, you’d source the file before running the Python script.

Step 4: Automation

A script is only useful if it runs automatically. You have a few great options here:
1. **AWS Lambda Function:** My favorite method. Package this script with its dependencies into a Lambda function and use an EventBridge (CloudWatch Events) rule to trigger it on a schedule (e.g., once a day). It’s the ultimate serverless-monitoring-serverless solution.
2. **EC2 Instance or On-Prem Server:** If you already have a utility server, a simple cron job will do the trick. Just make sure the AWS credentials and environment variables are available to the cron user. A safe cron entry would look like this, running at 8 AM every day: `0 8 * * * python3 /path/to/your/lambda_monitor.py`

Where I Usually Mess Up (Common Pitfalls)

IAM is Everything: The first time I set this up, I spent an hour debugging before realizing the execution role was missing the `lambda:ListFunctions` permission. The script failed silently. Always check the policy first.
SES Sandbox Mode: I’ve also forgotten to request production access for SES on a new AWS account. The script runs perfectly, claims it sent the email, but nothing ever arrives because the recipient isn’t a verified identity. Remember to move your SES account out of the sandbox.
CloudWatch Timezones: CloudWatch metrics are always in UTC. If you’re not careful with your time calculations, you can easily pull data for the wrong 24-hour period. I always stick to `datetime.utcnow()` to avoid any ambiguity.

Conclusion

And there you have it. This is a robust, low-maintenance way to keep a pulse on your serverless applications. It’s not flashy, but it’s one of those foundational DevOps automations that prevents small issues from becoming major incidents. From here, you could easily extend it to monitor other metrics like duration, invocations, or even pull cost data. Happy monitoring!

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.

🤖 Frequently Asked Questions

❓ How can I automate AWS Lambda error and throttle monitoring?

Automate monitoring by deploying a Python script that uses `boto3` to query CloudWatch for `Errors` and `Throttles` metrics across all Lambda functions and sends daily summary reports via AWS SES.

❓ How does this daily report compare to real-time alerting?

This solution provides a daily health overview, complementing real-time CloudWatch Alarms which are recommended for immediate notification of critical failures via SNS topics for business-critical functions.

❓ What is a common implementation pitfall when setting up SES for these alerts?

A common pitfall is forgetting to request production access for SES, which keeps the account in sandbox mode and prevents emails from reaching unverified recipients, even if the script reports successful sending.

TechResolve – SaaS Troubleshooting & Software Alternatives

Leave a ReplyCancel reply