🚀 Executive Summary
TL;DR: High AWS S3 costs for infrequently accessed archival data can be significantly reduced by migrating to Backblaze B2. This guide provides a Python-based solution to transfer S3 bucket data to B2, enabling up to 75% cost savings for long-term storage with a simple, automatable script.
🎯 Key Takeaways
- Securely manage AWS and Backblaze credentials using environment files or a secrets manager, employing dedicated IAM users and B2 Application Keys with minimum required permissions (S3 read-only, B2 write-only).
- The migration script utilizes an efficient in-memory approach, downloading objects from S3 into a BytesIO buffer and immediately uploading them to B2, avoiding temporary disk storage.
- Automate periodic data offloading from S3 to B2 using cron jobs to continuously optimize costs for archival data, transforming S3 into a hot/warm tier and B2 into a cold storage tier.
- Be mindful of S3 egress costs during migration; running the Python script from an EC2 instance in the same AWS region as the S3 bucket can significantly reduce these fees.
- For very large files (multi-gigabyte), modify the script to download to a temporary disk location first, then use the B2 large file upload API to prevent network timeouts, followed by deleting the temporary file.
Migrate S3 Bucket Data to Backblaze B2 (Cost saving)
Hey everyone, Darian Vance here. Let’s talk about something that hits us all: the monthly cloud bill. I was recently reviewing our AWS statement and noticed our S3 costs for archival data—logs, old backups, you name it—were slowly creeping up. It’s data we need to keep but rarely touch. After a bit of research, I set up a simple migration pipeline to Backblaze B2 and cut our long-term storage bill by nearly 75%. It’s a set-and-forget script that saves us real money every month. Thought I’d share the playbook.
Prerequisites
- An AWS account with IAM credentials (Access Key ID and Secret Access Key) that have read access to the source S3 bucket.
- A Backblaze B2 Cloud Storage account with an Application Key that has write access to the destination B2 bucket.
- Python 3 installed on your machine or a server.
- Familiarity with creating and managing S3 and B2 buckets.
The Guide: Step-by-Step
Step 1: Secure Your Credentials
First thing’s first: never hardcode credentials. In my production setups, I use a proper secrets manager, but for a straightforward script like this, a local environment file is perfectly fine. Create a file named config.env in your project directory. This is where we’ll store all our keys.
# config.env
AWS_ACCESS_KEY_ID='YOUR_AWS_ACCESS_KEY'
AWS_SECRET_ACCESS_KEY='YOUR_AWS_SECRET_KEY'
AWS_S3_BUCKET_NAME='your-source-s3-bucket'
AWS_REGION='us-east-1' # The region your S3 bucket is in
B2_APPLICATION_KEY_ID='YOUR_B2_KEY_ID'
B2_APPLICATION_KEY='YOUR_B2_APPLICATION_KEY'
B2_BUCKET_NAME='your-destination-b2-bucket'
Pro Tip: Create a dedicated IAM user in AWS and a specific Application Key in Backblaze with the minimum required permissions for this task (S3 read-only, B2 write-only). It’s a security best practice that limits the blast radius if a key is ever compromised.
Step 2: Set Up Your Python Environment
I’m going to assume you have your own workflow for managing Python projects. So, I’ll skip the standard virtualenv setup and jump straight to the libraries you’ll need. Make sure you install the necessary packages for interacting with AWS and Backblaze. In your terminal, you’d run the commands to install boto3 for AWS, b2sdk for Backblaze, and python-dotenv to load our config file.
Step 3: The Migration Script
Now for the core logic. Below is the Python script that handles the migration. I’ve added comments to explain what each part does. The basic flow is: list all objects in the S3 bucket, then loop through them, downloading each one into memory and immediately uploading it to B2. This avoids saving files to disk, which is cleaner and more efficient.
# s3_to_b2_migrator.py
import os
import io
from dotenv import load_dotenv
import boto3
from b2sdk.v2 import B2Api, InMemoryAccountInfo
def main():
"""
Main function to migrate files from an S3 bucket to a Backblaze B2 bucket.
"""
load_dotenv('config.env')
# --- Load Credentials from environment ---
aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')
s3_bucket_name = os.getenv('AWS_S3_BUCKET_NAME')
aws_region = os.getenv('AWS_REGION')
b2_key_id = os.getenv('B2_APPLICATION_KEY_ID')
b2_app_key = os.getenv('B2_APPLICATION_KEY')
b2_bucket_name = os.getenv('B2_BUCKET_NAME')
# --- Initialize B2 API ---
print("Initializing Backblaze B2 connection...")
info = InMemoryAccountInfo()
b2_api = B2Api(info)
b2_api.authorize_account("production", b2_key_id, b2_app_key)
b2_bucket = b2_api.get_bucket_by_name(b2_bucket_name)
print("B2 connection successful.")
# --- Initialize S3 Client ---
print("Initializing AWS S3 connection...")
s3_client = boto3.client(
's3',
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=aws_region
)
print("S3 connection successful.")
# --- Get list of objects in S3 bucket ---
try:
s3_objects = s3_client.list_objects_v2(Bucket=s3_bucket_name)
if 'Contents' not in s3_objects:
print(f"Source S3 bucket '{s3_bucket_name}' is empty or does not exist.")
return
files_to_migrate = [obj['Key'] for obj in s3_objects['Contents']]
print(f"Found {len(files_to_migrate)} files to migrate from S3.")
except Exception as e:
print(f"Error listing S3 objects: {e}")
return
# --- Migration Loop ---
for file_key in files_to_migrate:
try:
print(f"Migrating '{file_key}'...")
# 1. Download from S3 into memory
s3_object_data = io.BytesIO()
s3_client.download_fileobj(s3_bucket_name, file_key, s3_object_data)
s3_object_data.seek(0) # Rewind the buffer to the beginning
file_size = s3_object_data.getbuffer().nbytes
# 2. Upload from memory to B2
b2_bucket.upload_bytes(
data_bytes=s3_object_data.read(),
file_name=file_key
)
print(f" -> Successfully uploaded {file_key} ({file_size} bytes) to B2.")
except Exception as e:
print(f" -> FAILED to migrate {file_key}. Error: {e}")
print("\nMigration process completed.")
if __name__ == "__main__":
main()
Step 4: Run It and Automate It
To run the migration, just execute the Python script from your terminal. For a one-time migration, that’s all you need. But if you’re using S3 as a hot or warm storage tier and want to periodically offload older data to B2 for archival, you can automate this with a cron job.
A simple weekly cron job might look like this:
0 2 * * 1 python3 script.py
This would run the script every Monday at 2 AM. Just make sure the script is in a location where your cron daemon can find and execute it.
Common Pitfalls (Where I Usually Mess Up)
- IAM Permissions: The number one issue I run into is forgetting to give my AWS user the
s3:ListBucketands3:GetObjectpermissions. Without these, the script will fail immediately. Always check your policies first. - Large File Timeouts: The in-memory approach is great for most files, but for multi-gigabyte files, you might encounter network timeouts. For those scenarios, I modify the script to download the file to a temporary disk location first, then use the B2 large file upload API, and finally, delete the temp file.
- Ignoring S3 Egress Costs: Remember, downloading data from S3 costs money (egress fees). This migration makes sense for reducing *storage* costs. Be mindful of the one-time egress cost to move the data out. Running the script from an EC2 instance in the same region as the S3 bucket can significantly reduce these costs.
Conclusion
And that’s it. This script is a simple but powerful tool for optimizing your cloud storage costs. By moving cold data from S3 to a more affordable service like Backblaze B2, you can achieve significant savings with minimal effort. It’s a classic DevOps win: automate a process, save money, and free up your time to focus on bigger problems. Hope this helps you out!
🤖 Frequently Asked Questions
âť“ How can I migrate data from AWS S3 to Backblaze B2 for cost savings?
Migrate S3 data to Backblaze B2 using a Python script that leverages `boto3` for S3 interaction and `b2sdk` for Backblaze B2. The script lists S3 objects, downloads them into memory, and uploads them directly to a specified B2 bucket, significantly reducing long-term storage costs for archival data.
âť“ How does Backblaze B2 compare to AWS S3 for archival storage?
Backblaze B2 offers significantly lower storage costs (up to 75% savings) compared to AWS S3 for archival or infrequently accessed data. While S3 provides a comprehensive suite for various storage tiers, B2 is a more cost-effective solution specifically for cold storage, though one-time S3 egress fees apply during the initial migration.
âť“ What is a common implementation pitfall when migrating S3 data to B2 and how is it resolved?
A common pitfall is insufficient IAM permissions for the AWS user. The script requires the IAM user to have `s3:ListBucket` and `s3:GetObject` permissions on the source S3 bucket. This is resolved by ensuring the AWS IAM policy grants these specific read permissions to the user associated with the provided credentials.
Leave a Reply