🚀 Executive Summary
TL;DR: Manual Google Analytics report generation is a significant time-sink for engineers. This guide provides a Python-based automation solution using the Google Analytics Data API and ReportLab to generate scheduled PDF reports, reclaiming valuable engineering time.
🎯 Key Takeaways
- Successfully accessing GA4 data requires configuring both a GCP Service Account with appropriate IAM roles and granting that service account “Viewer” permissions directly within the GA4 property.
- The automation relies on core Python libraries: `google-analytics-data` for API interaction, `reportlab` for programmatic PDF generation, and `python-dotenv` for secure environment variable management.
- Linux `cron` jobs are used to schedule the Python script, ensuring reports are generated automatically at specified intervals (e.g., weekly), with a strong recommendation for using absolute file paths for reliability.
Automate Client Report Generation (Google Analytics to PDF)
Hey team, Darian here. Let’s talk about a major time-sink: manual reporting. I used to spend the first couple of hours every Monday pulling Google Analytics data for our key clients, pasting it into a document, formatting it, and emailing it out. It was tedious, error-prone, and frankly, a waste of engineering time. After I built this simple automation, I got that time back for actual infrastructure work. This guide will walk you through the exact setup I use to generate these PDF reports automatically.
Prerequisites
Before we dive in, make sure you have the following ready to go. Getting these sorted out first will make the rest of the process smooth.
- Python 3 installed on the machine where this script will run.
- Access to a Google Cloud Platform (GCP) project.
- The “Google Analytics Data API” enabled within that GCP project.
- A Google Analytics 4 (GA4) Property ID you want to pull data from.
- A Service Account JSON key file from GCP. If you don’t have one, you’ll need to create a service account and download its key.
The Guide: Step-by-Step
Step 1: The Google API Dance (Permissions)
First things first, Google needs to know our script is allowed to ask for data. This is all handled through a Service Account.
- In your GCP project, create a Service Account. Give it a descriptive name like ‘analytics-report-generator’.
- Once created, generate a JSON key for it and download the file. Treat this file like a password. Store it securely on the server where your script will run.
- Copy the service account’s email address (it looks like `…gserviceaccount.com`).
- Go to your Google Analytics 4 property. Navigate to Admin > Property Access Management and add the service account’s email as a new user with the “Viewer” role. This is a critical step; without it, you’ll get permission errors.
Pro Tip: In my production setups, I use a secret management tool like HashiCorp Vault or AWS Secrets Manager to store the contents of the JSON key file. For this tutorial, we’ll just reference its file path, but avoid committing the key file itself to your Git repository.
Step 2: Setting Up the Python Environment
I’ll skip the standard virtualenv setup since you likely have your own workflow for that. Let’s get straight to the libraries you’ll need. You’ll want to install the following Python packages using pip:
google-analytics-data: The official Google client library for the GA4 Data API.reportlab: A powerful library for creating PDFs programmatically.python-dotenv: For managing environment variables, which helps keep our credentials out of the code.
You would typically run a command like `pip install google-analytics-data reportlab python-dotenv` in your activated virtual environment to get these installed.
Step 3: The Python Script – Fetching GA Data
Now, let’s write the code to connect to the API. We’ll start by setting up our configuration and then making the request. Create a file named `config.env` in the same directory as your script to hold our secrets.
Your `config.env` file:
GA_PROPERTY_ID="your_property_id_here"
SERVICE_ACCOUNT_KEY_PATH="./path/to/your/keyfile.json"
And here’s the Python script, let’s call it `report_generator.py`:
import os
from dotenv import load_dotenv
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
DateRange,
Dimension,
Metric,
RunReportRequest,
)
def fetch_analytics_data():
"""Fetches GA4 data for the last 7 days."""
load_dotenv('config.env')
property_id = os.getenv("GA_PROPERTY_ID")
key_path = os.getenv("SERVICE_ACCOUNT_KEY_PATH")
# This uses the Application Default Credentials logic.
# By setting GOOGLE_APPLICATION_CREDENTIALS in our environment (or passing the path),
# the client knows which key to use.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = key_path
client = BetaAnalyticsDataClient()
request = RunReportRequest(
property=f"properties/{property_id}",
dimensions=[Dimension(name="pageTitle")],
metrics=[Metric(name="activeUsers")],
date_ranges=[DateRange(start_date="7daysAgo", end_date="today")],
)
try:
response = client.run_report(request)
print("Report data fetched successfully.")
processed_data = []
for row in response.rows:
processed_data.append({
'pageTitle': row.dimension_values[0].value,
'activeUsers': row.metric_values[0].value
})
return processed_data
except Exception as e:
print(f"Failed to fetch GA data: {e}")
return None
# We'll add more to this file later.
if __name__ == '__main__':
report_data = fetch_analytics_data()
if report_data:
print("Top 5 pages by active users:")
for item in report_data[:5]:
print(f"- {item['pageTitle']}: {item['activeUsers']} users")
This script initializes the client using your service account key, defines a simple request (top pages by active users in the last 7 days), and then processes the response into a clean list of dictionaries. The logic is straightforward: define your dimensions (what you’re grouping by) and metrics (the numbers you want to see), and the API does the rest.
Step 4: The Python Script – Generating the PDF
Now that we have the data, let’s turn it into a presentable PDF. We’ll use the `reportlab` library for this. Let’s add a new function to our `report_generator.py` file.
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from reportlab.lib.units import inch
from datetime import date
def generate_pdf_report(data, filename="Weekly_Analytics_Report.pdf"):
"""Generates a PDF report from the analytics data."""
if not data:
print("No data to generate report.")
return
c = canvas.Canvas(filename, pagesize=letter)
width, height = letter
# Title
c.setFont("Helvetica-Bold", 16)
c.drawString(inch, height - inch, "Weekly Google Analytics Report")
c.setFont("Helvetica", 10)
c.drawString(inch, height - 1.2 * inch, f"Generated on: {date.today().strftime('%Y-%m-%d')}")
# Report Body
c.setFont("Helvetica", 12)
text_object = c.beginText(inch, height - 2 * inch)
text_object.textLine("Top Pages by Active Users (Last 7 Days):")
text_object.textLine("") # Add a blank line
c.setFont("Courier", 10)
for item in data:
line = f"{item['pageTitle'][:50]:<50} | Users: {item['activeUsers']}"
text_object.textLine(line)
c.drawText(text_object)
c.save()
print(f"PDF report saved as {filename}")
if __name__ == '__main__':
report_data = fetch_analytics_data()
if report_data:
generate_pdf_report(report_data)
This new function, `generate_pdf_report`, takes our data and uses `reportlab`’s canvas object to “draw” text onto a PDF document. It sets a title, a date, and then loops through our data to list the pages and user counts. It’s a simple layout, but it’s a fantastic, automated starting point.
Pro Tip: ReportLab is incredibly powerful. I recommend exploring its table-drawing features (`reportlab.platypus.Table`) for more structured, grid-like data layouts. It makes reports with many columns much cleaner.
Step 5: Automating with a Scheduler
The final piece is to run this script on a schedule without manual intervention. On a Linux-based system, cron is the perfect tool for this.
You would add a line to your cron configuration that specifies when to run the script. For example, to run it every Monday at 2 AM, the line would look like this:
0 2 * * 1 python3 /path/to/your/report_generator.py
This tells the system: at minute 0, hour 2, on any day of the month, any month, but only on the first day of the week (Monday), execute our Python script. The PDF will be waiting for you when you start your week.
Common Pitfalls (Where I Usually Mess Up)
- Permissions Errors: The number one issue is the service account not having “Viewer” permissions in GA4. If you see a `PermissionDenied` error in your logs, this is the first place to check. Remember, GCP IAM permissions and GA user permissions are separate things!
- API Quotas: The GA Data API has usage quotas. My first version of a similar script ran hourly, and I hit the daily quota by noon. For weekly reports, a once-a-week cron job is perfect and stays well within the free tier limits.
- Incorrect Paths: When setting this up with cron, always use absolute paths for your script and your key file (e.g., `/home/darian/reporting/key.json`) unless you’ve configured your `config.env` relative to the script’s location. Cron jobs run in a different environment and might not know where your files are.
Conclusion
And that’s the core of it. This framework is a solid foundation for automating a tedious but necessary task. From here, you can easily expand it to include more complex metrics, generate charts, or even use a library like `smtplib` to email the generated PDF to stakeholders automatically. The goal is to let the machines handle the repetitive work so you can focus on the more complex engineering challenges. Hope this helps you reclaim some of your time.
🤖 Frequently Asked Questions
âť“ What are the essential prerequisites for automating GA4 report generation to PDF?
You need Python 3, a Google Cloud Platform (GCP) project with the Google Analytics Data API enabled, a Google Analytics 4 (GA4) Property ID, and a Service Account JSON key file with “Viewer” access in your GA4 property.
âť“ How does this custom Python solution compare to using built-in GA4 reporting or third-party tools?
This Python solution offers superior customization and control over report content, format, and scheduling compared to standard GA4 exports, and avoids the recurring costs and template limitations of many third-party reporting tools.
âť“ What is a common cause for `PermissionDenied` errors when fetching GA4 data with a service account?
`PermissionDenied` errors typically occur because the service account’s email address has not been explicitly added to the GA4 property’s “Property Access Management” with at least a “Viewer” role, or due to insufficient IAM roles in GCP. Verify both GCP IAM and GA4 Property Access Management.
Leave a Reply