🚀 Executive Summary

TL;DR: This guide addresses the challenge of manually migrating assets between cloud storage providers like DigitalOcean Spaces and AWS S3. It provides a professional, automated solution using Boto3, leveraging its S3-compatible API to efficiently transfer data programmatically.

🎯 Key Takeaways

  • DigitalOcean Spaces are S3-compatible, allowing Boto3’s S3 client to interact with them by specifying a custom `endpoint_url` and region.
  • Using `get_paginator(‘list_objects_v2’)` is crucial for efficiently listing all objects in large DigitalOcean Spaces, preventing memory issues and ensuring comprehensive migration.
  • When uploading to AWS S3, it’s important to preserve `ContentType` from the source object and explicitly set `ACL` (e.g., ‘private’) and potentially `StorageClass` for the destination object.
  • Common pitfalls include authentication errors due to incorrect credentials or insufficient IAM permissions, API rate limiting for high throughput, and memory exhaustion when handling very large files.
  • For large files, avoid reading the entire object into memory; instead, stream `do_response[‘Body’]` directly to `put_object` or use `boto3.s3.transfer.S3Transfer` for robust managed uploads.
  • Post-migration best practices include securely deleting source objects, implementing S3 Lifecycle policies for cost optimization, enabling S3 bucket versioning, and integrating with other AWS services.

Moving Assets from DigitalOcean Spaces to AWS S3 using Boto3

Moving Assets from DigitalOcean Spaces to AWS S3 using Boto3

As a DevOps Engineer, you often find yourself navigating the complex landscape of cloud infrastructure,
seeking efficiency, cost optimization, and robust data management solutions. Migrating assets
between cloud storage providers is a common, yet often daunting, task. Manually downloading
terabytes of data from one service only to re-upload it to another is not only
time-consuming and error-prone but also a significant drain on productivity. This process
becomes even more challenging when dealing with a multitude of objects, diverse metadata,
and the need for minimal downtime.

Whether you’re consolidating your infrastructure under a single cloud provider, optimizing
for cost and performance, or simply transitioning away from a legacy setup, automating
this migration is paramount. This tutorial from TechResolve will guide you through a
professional and efficient method to move your valuable assets from DigitalOcean Spaces
to AWS S3 using Boto3, the AWS SDK for Python. Boto3’s versatility allows us
to interact with DigitalOcean Spaces (thanks to its S3-compatible API) and seamlessly
transfer data to AWS S3, all with a robust and programmatic approach.

Prerequisites

Before we dive into the migration process, ensure you have the following in place:

  • Python 3.x: Installed on your local machine or server.
  • pip: Python’s package installer, usually bundled with Python 3.x.
  • DigitalOcean Spaces Access Key and Secret:
    These credentials are required to authenticate and access your DigitalOcean Space.
    You can generate them in your DigitalOcean account settings under “API” > “Spaces access keys”.
  • DigitalOcean Space Name and Endpoint URL:
    For example, your-space-name and nyc3.digitaloceanspaces.com.
  • AWS Access Key ID and Secret Access Key:
    An IAM user with programmatic access and appropriate permissions (at least s3:PutObject,
    s3:GetObject, s3:ListBucket) for your target S3 bucket. Best practice is to
    use an IAM role with least privilege.
  • AWS S3 Bucket Name and Region: Your destination bucket, e.g.,
    your-aws-bucket in us-east-1.

Step-by-Step Guide

Step 1: Set Up Your Python Environment and Install Boto3

It’s always a good practice to work within a virtual environment to manage your project’s
dependencies. Open your terminal or command prompt and execute the following commands:


python3 -m venv env
source env/bin/activate  # On Windows, use `env\Scripts\activate`
pip install boto3
    

Once Boto3 is installed, you have the necessary library to interact with both
AWS S3 and DigitalOcean Spaces, as DigitalOcean Spaces are S3-compatible, allowing
us to use Boto3’s S3 client with a custom endpoint.

Step 2: Configure Credentials for DigitalOcean Spaces and AWS S3

For security, it’s recommended to use environment variables for your credentials rather
than hardcoding them in your script. However, for this tutorial’s clarity, we will
place them within the script. In a production environment, consider using a configuration
management tool or loading from a secure file.

Boto3 automatically looks for AWS credentials in several locations, including environment
variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) and the
~/.aws/credentials file. For DigitalOcean, we’ll explicitly pass the
credentials and endpoint URL to the Boto3 client.

Step 3: List Objects from DigitalOcean Spaces

First, we need to connect to your DigitalOcean Space and retrieve a list of all objects
you intend to migrate. We will use boto3.client('s3', ...) for this.


import boto3
import os

# DigitalOcean Spaces Configuration
DO_SPACES_KEY = os.getenv('DO_SPACES_KEY', 'YOUR_DO_SPACES_ACCESS_KEY')
DO_SPACES_SECRET = os.getenv('DO_SPACES_SECRET', 'YOUR_DO_SPACES_SECRET_KEY')
DO_SPACES_ENDPOINT = os.getenv('DO_SPACES_ENDPOINT', 'nyc3.digitaloceanspaces.com') # e.g., nyc3.digitaloceanspaces.com
DO_SPACES_BUCKET_NAME = os.getenv('DO_SPACES_BUCKET_NAME', 'your-do-space-name')

# AWS S3 Configuration
AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID', 'YOUR_AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY', 'YOUR_AWS_SECRET_ACCESS_KEY')
AWS_S3_REGION = os.getenv('AWS_S3_REGION', 'us-east-1') # e.g., us-east-1
AWS_S3_BUCKET_NAME = os.getenv('AWS_S3_BUCKET_NAME', 'your-aws-s3-bucket-name')

# Initialize S3 client for DigitalOcean Spaces
do_spaces_client = boto3.client(
    's3',
    region_name='nyc3', # The region is often part of the endpoint, but boto3 expects something.
                        # For DO, this can be arbitrary as long as endpoint_url is correct.
    endpoint_url=f'https://{DO_SPACES_ENDPOINT}',
    aws_access_key_id=DO_SPACES_KEY,
    aws_secret_access_key=DO_SPACES_SECRET
)

print(f"Listing objects in DigitalOcean Space: {DO_SPACES_BUCKET_NAME}...")

do_objects_to_migrate = []
paginator = do_spaces_client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=DO_SPACES_BUCKET_NAME)

for page in pages:
    if "Contents" in page:
        for obj in page["Contents"]:
            do_objects_to_migrate.append(obj["Key"])
    else:
        print("DigitalOcean Space is empty or no 'Contents' found in page.")

print(f"Found {len(do_objects_to_migrate)} objects in DigitalOcean Space.")

    

The code above initializes an S3 client configured for DigitalOcean Spaces. It then uses
a paginator to efficiently list all objects in your specified Space, which is crucial for
Spaces containing a large number of assets. Each object’s Key (its path) is
stored for subsequent download and upload.

Step 4: Download Objects from DigitalOcean Spaces and Upload to AWS S3

Now, we’ll iterate through the list of object keys, download each object from DigitalOcean
Spaces, and then immediately upload it to your AWS S3 bucket. We’ll use a separate
Boto3 client for AWS S3.


# Initialize S3 client for AWS S3
aws_s3_client = boto3.client(
    's3',
    region_name=AWS_S3_REGION,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)

print(f"Starting migration to AWS S3 bucket: {AWS_S3_BUCKET_NAME} in region {AWS_S3_REGION}...")

for object_key in do_objects_to_migrate:
    try:
        print(f"Migrating object: {object_key}")

        # 1. Download object from DigitalOcean Spaces
        do_response = do_spaces_client.get_object(Bucket=DO_SPACES_BUCKET_NAME, Key=object_key)
        object_body = do_response['Body'].read()
        content_type = do_response['ContentType'] if 'ContentType' in do_response else 'binary/octet-stream'

        # Optional: Preserve metadata if needed. Boto3 copies common headers by default for put_object.
        # If custom metadata is crucial, you'd extract it from do_response['Metadata'] and pass to put_object.

        # 2. Upload object to AWS S3
        aws_s3_client.put_object(
            Bucket=AWS_S3_BUCKET_NAME,
            Key=object_key,
            Body=object_body,
            ContentType=content_type,
            ACL='private' # Or 'public-read' if your objects need public access
            # You can also set StorageClass='STANDARD_IA' for Infrequent Access, etc.
        )
        print(f"Successfully migrated {object_key}")

    except Exception as e:
        print(f"Error migrating {object_key}: {e}")

print("Migration complete!")
    

This script connects to both DigitalOcean Spaces and AWS S3. For each object found
in your DigitalOcean Space, it performs a get_object call to retrieve
its content and then uses put_object to upload it to the specified
AWS S3 bucket. We’re also attempting to preserve the ContentType and
setting a default ACL (Access Control List) for the uploaded objects.
You might need to adjust the ACL and consider other parameters like
StorageClass, Metadata, and ServerSideEncryption
based on your specific requirements.

Step 5: Verification

After the script completes, it’s crucial to verify that all assets have been
successfully migrated. You can do this by:

  • Checking the AWS S3 Console: Navigate to your target bucket
    in the AWS Management Console and visually inspect the uploaded objects.
  • Running a verification script: Write a small Python script
    using Boto3 to list objects in your AWS S3 bucket and compare the count
    and some sample keys against your DigitalOcean Space’s contents.

Common Pitfalls

  • Authentication and Authorization Errors:

    Ensure your DigitalOcean Spaces access keys and AWS access keys are correct and
    have the necessary permissions. For DigitalOcean, this means read access to the Space.
    For AWS, the IAM user or role must have s3:PutObject,
    s3:GetObject, and s3:ListBucket permissions on the target
    bucket. Look for errors like “Access Denied” or “InvalidAccessKeyId”.

  • Rate Limiting:

    For very large numbers of small objects or large objects, you might hit API rate
    limits from either DigitalOcean or AWS. Boto3 includes automatic retry mechanisms
    with exponential backoff for transient errors, but for sustained high throughput,
    you might need to implement custom delays or consider parallel processing
    (e.g., using Python’s multiprocessing module) with careful rate control.

  • Memory and Large Files:

    The current script downloads the entire object into memory
    (object_body = do_response['Body'].read()) before uploading.
    For extremely large files (gigabytes or more), this can lead to memory exhaustion.
    A more robust solution for large files involves streaming the data directly
    between the two services, without fully loading it into memory. This can be achieved
    by passing the do_response['Body'] directly to put_object
    without calling .read(), or by using boto3.s3.transfer.S3Transfer
    for managed uploads and downloads.

Conclusion

You have successfully orchestrated a data migration from DigitalOcean Spaces to AWS S3 using
Boto3. This programmatic approach not only saves countless hours of manual effort but also
provides a repeatable, auditable, and scalable solution for your cloud storage migration
needs. Automating such tasks is a cornerstone of modern DevOps practices, ensuring consistency
and reducing human error.

Now that your assets reside in AWS S3, you can leverage the full power of the AWS ecosystem.
Consider these next steps:

  • Delete Source Objects: Once verified, securely delete the original
    objects from your DigitalOcean Space to avoid duplicate storage costs.
  • Implement Lifecycle Policies: Configure S3 Lifecycle policies
    to automatically transition objects to different storage classes (e.g., S3 Glacier)
    or delete them after a certain period, further optimizing costs.
  • Set up Versioning: Enable S3 bucket versioning to protect against
    accidental deletions and overwrites.
  • Integrate with AWS Services: Explore how your newly migrated assets
    can integrate with other AWS services like CloudFront for CDN, Lambda for event-driven
    processing, or Athena for analytics.

At TechResolve, we believe in empowering engineers with the tools and knowledge
to build resilient and efficient cloud infrastructures. Happy migrating!

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ How can I programmatically migrate assets from DigitalOcean Spaces to AWS S3?

You can programmatically migrate assets using Boto3, the AWS SDK for Python. Configure a Boto3 S3 client with DigitalOcean’s endpoint URL and credentials to list and retrieve objects, then use a separate Boto3 S3 client configured for AWS to upload those objects to your target S3 bucket.

âť“ How does this Boto3 migration method compare to manual asset transfer or other tools?

This Boto3-based method offers a highly automated, repeatable, and scalable solution, significantly reducing the time, effort, and potential for human error associated with manual download/re-upload processes. It provides granular control over object properties like `ContentType` and `ACL` during transfer, which might be less flexible with simpler sync tools.

âť“ What is a common pitfall when migrating large files with this script, and how can it be mitigated?

A common pitfall is memory exhaustion when the script attempts to load an entire large file into memory using `do_response[‘Body’].read()` before uploading. This can be mitigated by streaming the `do_response[‘Body’]` directly to the `put_object` call without first reading it, or by leveraging `boto3.s3.transfer.S3Transfer` for managed, memory-efficient uploads.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading