🚀 Executive Summary
TL;DR: The article provides a solution for migrating historical Google Analytics (GA4) data to Plausible, a privacy-focused, open-source analytics platform. It addresses the challenges of GA’s complexity, privacy concerns, and data lock-in by detailing a Python script that fetches, transforms, and imports pageview data, ensuring a seamless transition without losing valuable historical context.
🎯 Key Takeaways
- Plausible requires a Site ID and an API Key for authentication, which are generated from your Plausible account settings.
- The migration script utilizes the Google Analytics Data API (GA4) and needs a Google Cloud Project with the API enabled and service account credentials.
- Data is fetched from Google Analytics in daily increments and transformed, then sent to Plausible’s bulk import API in batches of 500-1000 events to prevent API rate limiting.
- Crucial data transformations include converting GA’s YYYYMMDD date format to Plausible’s ISO 8601 timestamp and generating a fallback `user_agent` string from GA’s browser dimension.
- A `config.env` file is used to manage sensitive credentials (Plausible API key, Site ID, GA Property ID, Google service account path) securely, keeping them out of source control.
Switching from Google Analytics to Plausible (Privacy-focus)
Hey team, Darian here. Let’s talk about analytics. For years, I was a die-hard Google Analytics user. But between the increasingly complex GA4 interface and the constant privacy tightrope walk (cookie banners, GDPR, you name it), I found myself spending more time managing the tool than getting insights from it. The final straw was realizing how much historical data felt locked-in, making a switch seem impossible.
This guide is the solution I built for my own projects and now use at TechResolve. We’re going to migrate from Google Analytics to Plausible, a lightweight, open-source, and privacy-first alternative. And the best part? We’re taking our historical data with us. This is how you reclaim your analytics and respect your users’ privacy without starting from zero.
Prerequisites
Before we dive in, make sure you have the following ready to go. This will save you a ton of time.
- An active Plausible Analytics account with the target website already added.
- Admin access to your Google Analytics property (we’ll focus on GA4, but the logic applies to UA).
- Google Cloud Project with the Google Analytics Data API enabled. You’ll need service account credentials (a JSON key file).
- Python 3.8+ installed on your machine.
- A way to install Python packages. We’ll need `requests` and `google-analytics-data`.
The Guide: From Google to Plausible, Step-by-Step
Step 1: Get Your Plausible Credentials
This one’s easy. Log into your Plausible account. Navigate to your site settings (click the gear icon). You’ll need two things:
- Your Site ID (e.g., `yourdomain.com`). This is found in the main settings page.
- Your API Key. You can generate one by going to your user settings (click your avatar in the top right > “API Keys”).
We’ll use these to authenticate our import script later. Keep them somewhere safe.
Step 2: Prepare Your Environment & Configuration
Alright, let’s get our workspace ready. I’ll skip the standard virtual environment setup since you likely have your own workflow for that. Just make sure you’ve installed the necessary Python libraries. You can do this by running a command like `python3 -m pip install requests google-analytics-data python-dotenv` in your terminal.
Next, let’s handle our secrets. In your project directory, create a file named config.env. Never commit this file to source control! Add your credentials to it like this:
# config.env
PLAUSIBLE_API_KEY="your_plausible_api_key_here"
PLAUSIBLE_SITE_ID="yourdomain.com"
GA_PROPERTY_ID="your_ga4_property_id"
GOOGLE_APPLICATION_CREDENTIALS="path/to/your/google-service-account.json"
Step 3: The Python Import Script
Here’s the heart of the operation. This script will connect to the Google Analytics Data API, pull your pageview history, reformat it for Plausible, and send it over to their bulk import API. The logic is key: we fetch data in chunks from Google, transform each record into a Plausible-compatible event, and then send those events in batches.
Pro Tip: Don’t try to import everything in one giant request. Plausible’s API is robust, but sending millions of events at once is asking for a timeout or rate-limiting. I’ve found that batching events into chunks of 500-1000 provides a good balance of speed and reliability.
Here is the Python script, let’s call it import_ga_data.py:
import os
import requests
from datetime import datetime, timedelta
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
DateRange,
Dimension,
Metric,
RunReportRequest,
)
from dotenv import load_dotenv
load_dotenv('config.env')
# --- Configuration ---
PLAUSIBLE_API_KEY = os.getenv('PLAUSIBLE_API_KEY')
PLAUSIBLE_SITE_ID = os.getenv('PLAUSIBLE_SITE_ID')
GA_PROPERTY_ID = os.getenv('GA_PROPERTY_ID')
PLAUSIBLE_API_URL = f"https://plausible.io/api/v1/sites/{PLAUSIBLE_SITE_ID}/events"
# --- Main Logic ---
def fetch_ga_data(client, start_date, end_date):
"""Fetches paginated data from Google Analytics Data API."""
print(f"Fetching data from {start_date} to {end_date}...")
request = RunReportRequest(
property=f"properties/{GA_PROPERTY_ID}",
dimensions=[
Dimension(name="date"),
Dimension(name="pagePath"),
Dimension(name="fullPageUrl"),
Dimension(name="source"),
Dimension(name="medium"),
Dimension(name="browser"),
],
metrics=[Metric(name="screenPageViews")],
date_ranges=[DateRange(start_date=start_date, end_date=end_date)],
)
try:
response = client.run_report(request)
return response.rows
except Exception as e:
print(f"Error fetching from GA API: {e}")
return []
def transform_to_plausible_events(ga_rows):
"""Transforms GA report rows into Plausible event format."""
events = []
for row in ga_rows:
# GA date is YYYYMMDD, we need ISO 8601
event_date_str = row.dimension_values[0].value
event_date = datetime.strptime(event_date_str, '%Y%m%d').isoformat() + "Z"
# A basic fallback for User-Agent
browser = row.dimension_values[5].value
user_agent = f"Mozilla/5.0 ({browser}) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
if "bot" in browser.lower() or "spider" in browser.lower():
continue # Skip common bots
# Number of views for this combination
try:
views = int(row.metric_values[0].value)
except (ValueError, IndexError):
views = 1
for _ in range(views):
event = {
"name": "pageview",
"url": row.dimension_values[2].value,
"domain": PLAUSIBLE_SITE_ID,
"timestamp": event_date,
"referrer": f"{row.dimension_values[3].value} / {row.dimension_values[4].value}",
"user_agent": user_agent,
}
events.append(event)
return events
def send_to_plausible(events_batch):
"""Sends a batch of events to the Plausible import API."""
if not events_batch:
print("No events in batch to send.")
return False
headers = {
'Authorization': f'Bearer {PLAUSIBLE_API_KEY}',
'Content-Type': 'application/json'
}
try:
response = requests.post(PLAUSIBLE_API_URL, headers=headers, json=events_batch)
response.raise_for_status() # Raises an HTTPError for bad responses (4xx or 5xx)
print(f"Successfully sent {len(events_batch)} events. Status: {response.status_code}")
return True
except requests.exceptions.RequestException as e:
print(f"Failed to send events to Plausible: {e}")
print(f"Response body: {e.response.text if e.response else 'No response'}")
return False
def main():
"""Main execution function."""
ga_client = BetaAnalyticsDataClient()
# Let's import the last 90 days as an example
end_date = datetime.now()
start_date = end_date - timedelta(days=90)
current_date = start_date
all_plausible_events = []
# Iterate day by day to avoid huge GA API responses
while current_date <= end_date:
date_str = current_date.strftime('%Y-%m-%d')
ga_rows = fetch_ga_data(ga_client, date_str, date_str)
if ga_rows:
all_plausible_events.extend(transform_to_plausible_events(ga_rows))
# Batch send to Plausible to avoid hitting API limits
if len(all_plausible_events) >= 500:
send_to_plausible(all_plausible_events)
all_plausible_events = [] # Reset batch
current_date += timedelta(days=1)
# Send any remaining events
if all_plausible_events:
send_to_plausible(all_plausible_events)
print("Import process finished.")
if __name__ == "__main__":
main()
Step 4: Run the Import & Flip the Switch
You can now run the script from your terminal with `python3 import_ga_data.py`. Watch the output to see the progress. Depending on how much data you have, this could take a while. I recommend running it for a small date range first (e.g., one week) to validate everything looks correct in your Plausible dashboard.
For a seamless transition, you can run both GA and Plausible tracking scripts on your site for a week or two. Once you’ve backfilled your history and are confident in the new data, it’s time to make it official: remove the Google Analytics script from your site’s code. Congratulations, you’re now running a faster, more private website!
Pro Tip for Automation: In my production setups, I’ll often run a migration script like this nightly for a week leading up to the switch. This ensures any data lag is captured. A simple cron job does the trick. Remember, no absolute paths starting with system directories in your cron definition.
0 2 * * * python3 import_ga_data.py
Common Pitfalls (Where I Usually Mess Up)
- Mismatched Date Formats: This is the number one issue. Google’s API returns dates in `YYYYMMDD` format, but Plausible’s API needs a full ISO 8601 timestamp (`YYYY-MM-DDTHH:MM:SSZ`). The script handles this, but if you modify it, be careful here.
- Forgetting the User-Agent: Plausible’s import API requires a `user_agent` string. Google Analytics doesn’t always provide a clean one per pageview. My script uses the browser dimension as a fallback, which is usually good enough for historical trends.
- API Rate Limits: Hitting the Google or Plausible API too hard and fast will get you temporarily blocked. The daily iteration and batching approach in the script is designed specifically to avoid this. Be patient.
Conclusion
Making the switch from Google Analytics to Plausible is more than just a technical task—it’s a step towards building a better, more trustworthy web. You’re not only simplifying your own workflow and getting clearer insights, but you’re also respecting your users’ digital privacy. By taking your historical data with you, you lose none of the long-term context while gaining all the benefits. Go give it a shot; your users (and your performance metrics) will thank you.
– Darian Vance
🤖 Frequently Asked Questions
âť“ How do I migrate historical Google Analytics data to Plausible?
You migrate historical Google Analytics data to Plausible using a Python script. This script connects to the Google Analytics Data API to fetch pageview history, transforms the data into Plausible’s event format (including ISO 8601 timestamps and a user_agent), and then sends these events in batches to Plausible’s bulk import API using your Plausible API key and site ID.
âť“ How does Plausible compare to Google Analytics for privacy and ease of use?
Plausible is a lightweight, open-source, and privacy-first alternative that simplifies analytics by eliminating the need for complex cookie banners and extensive GDPR compliance efforts. It offers clearer insights with a less complex interface compared to Google Analytics, which, while feature-rich, comes with significant privacy and operational overhead.
âť“ What are common implementation pitfalls when importing GA data to Plausible?
Common pitfalls include mismatched date formats (GA’s YYYYMMDD vs. Plausible’s ISO 8601 timestamp), the absence of a required `user_agent` string for Plausible’s API (requiring a fallback), and hitting API rate limits from both Google and Plausible due to attempting to import too much data in a single, unbatched request.
Leave a Reply