🚀 Executive Summary
TL;DR: This guide provides a Python script to automate the migration and continuous mirroring of GitHub repositories to a self-hosted Gitea instance. It transforms a tedious manual process into an efficient, scheduled “set it and forget it” solution for internal CI and archival needs.
🎯 Key Takeaways
- Automating GitHub to Gitea migration requires a Python script leveraging `requests` and `python-dotenv`, with specific Gitea (admin, `write:repository`, `write:admin`) and GitHub (`repo` scope) access tokens.
- The Gitea API’s `/api/v1/repos/migrate` endpoint is key, using a payload that includes the GitHub `clone_addr`, the Gitea owner’s numeric `uid`, and setting `mirror: True` for continuous synchronization.
- Securely manage API tokens using `config.env` and `.gitignore`, and ensure the script handles GitHub API pagination and dynamically fetches the Gitea user’s numeric `uid`.
Migrate GitHub Repos to Gitea (Self-hosted Git)
Hey everyone, Darian here. Let’s talk about a common headache: keeping our internal tools in sync with external ones. For a while, I was manually mirroring our critical GitHub repositories over to our self-hosted Gitea instance for internal CI builds and archival. It was a tedious, manual process I had to remember to do every week. I eventually realized this was costing me a couple of hours a month, and it was ripe for automation.
So, I wrote a simple Python script to handle it. Now, it runs on a schedule, and our Gitea instance is always a perfect, up-to-date mirror without me lifting a finger. Today, I’m going to walk you through how to set this up yourself. It’s a “set it and forget it” solution that will save you a ton of time.
Prerequisites
Before we dive in, make sure you have the following ready to go:
- A running Gitea instance (version 1.12+ recommended for the migration API).
- An administrator account on your Gitea instance.
- A Gitea Access Token with `write:repository` and `write:admin` permissions.
- A GitHub Personal Access Token (classic) with `repo` scope to read your repositories (including private ones).
- Python 3 installed on a machine that can reach both GitHub and your Gitea instance.
The Guide: Step-by-Step
Step 1: Project Setup and Configuration
First, let’s get our environment ready. I’ll skip the standard virtualenv setup since you likely have your own workflow for that. The main thing is to get a clean environment and install the necessary Python libraries. You’ll need `requests` to make API calls and `python-dotenv` to handle our credentials securely. You can install these with pip.
Next, in your project directory, create a file named config.env. This is where we’ll store our secrets so they aren’t hardcoded in the script. It should look like this:
# Gitea Configuration
GITEA_URL="http://your-gitea-instance.com"
GITEA_TOKEN="your_gitea_api_token_here"
GITEA_USER="your_gitea_admin_username"
# GitHub Configuration
GITHUB_API_URL="https://api.github.com"
GITHUB_USER="your_github_username_or_org"
GITHUB_TOKEN="your_github_personal_access_token_here"
Pro Tip: In my production setups, I consider this non-negotiable: never hardcode API tokens or other secrets directly in your code. Always use environment variables or a secure configuration file like this one. And please, add
config.envto your.gitignorefile immediately.
Step 2: The Python Migration Script
Now for the fun part. Create a Python file, let’s call it migrate_repos.py. We’ll build this out logically, piece by piece.
First, let’s handle the imports and load our configuration from the config.env file.
import os
import requests
from dotenv import load_dotenv
# Load environment variables from config.env
load_dotenv('config.env')
# Gitea settings
GITEA_URL = os.getenv("GITEA_URL")
GITEA_TOKEN = os.getenv("GITEA_TOKEN")
GITEA_USER = os.getenv("GITEA_USER")
# GitHub settings
GITHUB_API_URL = os.getenv("GITHUB_API_URL")
GITHUB_USER = os.getenv("GITHUB_USER")
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
# API Headers
GITEA_HEADERS = {
"Authorization": f"token {GITEA_TOKEN}",
"Content-Type": "application/json",
}
GITHUB_HEADERS = {
"Authorization": f"token {GITHUB_TOKEN}",
"Accept": "application/vnd.github.v3+json",
}
The logic here is simple: we’re pulling the secrets we just defined and preparing the authorization headers we’ll need for our API calls.
Next, we need a function to fetch the list of repositories from your GitHub account or organization.
def get_github_repos():
"""Fetches a list of all repos for the configured GitHub user/org."""
print("Fetching repositories from GitHub...")
repos = []
page = 1
while True:
url = f"{GITHUB_API_URL}/users/{GITHUB_USER}/repos?per_page=100&page={page}"
response = requests.get(url, headers=GITHUB_HEADERS)
if response.status_code != 200:
print(f"Error fetching GitHub repos: {response.text}")
return []
data = response.json()
if not data:
break
repos.extend(data)
page += 1
print(f"Found {len(repos)} repositories on GitHub.")
return repos
This function handles pagination, which is crucial if you have more than 100 repositories. It loops through pages of results from the GitHub API until there are no more left, then returns a complete list.
Now, here’s the core logic. We’ll loop through the GitHub repos and trigger a migration for each one that doesn’t already exist in Gitea.
def main():
"""Main function to orchestrate the migration."""
github_repos = get_github_repos()
if not github_repos:
print("No repositories found or an error occurred. Exiting.")
return
migrated_count = 0
for repo in github_repos:
repo_name = repo["name"]
clone_url = repo["clone_url"]
is_private = repo["private"]
description = repo["description"] or ""
# Check if repo already exists in Gitea
check_url = f"{GITEA_URL}/api/v1/repos/{GITEA_USER}/{repo_name}"
if requests.get(check_url, headers=GITEA_HEADERS).status_code == 200:
print(f"Repository '{repo_name}' already exists in Gitea. Skipping.")
continue
# If not, trigger the migration
print(f"Migrating '{repo_name}' to Gitea...")
migration_payload = {
"clone_addr": clone_url,
"uid": int(requests.get(f"{GITEA_URL}/api/v1/users/{GITEA_USER}", headers=GITEA_HEADERS).json()['id']),
"repo_name": repo_name,
"mirror": True,
"private": is_private,
"description": description,
"auth_token": GITHUB_TOKEN
}
migrate_url = f"{GITEA_URL}/api/v1/repos/migrate"
response = requests.post(migrate_url, headers=GITEA_HEADERS, json=migration_payload)
if response.status_code == 201:
print(f"Successfully started migration for '{repo_name}'.")
migrated_count += 1
else:
print(f"Failed to migrate '{repo_name}'. Status: {response.status_code}, Response: {response.text}")
print(f"\nMigration complete. Initiated migration for {migrated_count} new repositories.")
if __name__ == "__main__":
main()
The key here is the `migration_payload`. We tell Gitea the source URL (`clone_addr`), who should own it (`uid`), what to name it, and most importantly, that it should be a `mirror`. This keeps it in sync automatically after the initial import. We also pass along the GitHub token so Gitea can access private repos.
Step 3: Schedule the Sync
To make this truly automated, you can run it on a schedule using a cron job. For example, to run the script every Monday at 2 AM, you’d set up a cron job like this:
0 2 * * 1 python3 /path/to/your/script/migrate_repos.py
This way, any new repo you create on GitHub will be automatically mirrored to your Gitea instance without any manual intervention.
Common Pitfalls
Here are a couple of things that tripped me up the first time I did this:
- Token Permissions: If migrations are failing, double-check your tokens. The Gitea token must belong to an admin user with permissions to create repos for other users. The GitHub token needs full `repo` scope to see private repositories.
- User ID (uid): The Gitea API requires a numeric user ID (`uid`) for the repository owner, not the username. My script handles this by making a quick API call to fetch the ID based on the username, but it’s a common point of failure if you build your own payload.
- API Rate Limiting: If you have a massive organization with thousands of repos, you might hit GitHub’s API rate limit. The script is fairly efficient, but for very large-scale migrations, you may need to add some error handling and back-off logic.
Conclusion
And that’s it. With one script and a simple cron job, you’ve created a robust pipeline to keep your self-hosted Gitea instance synchronized with your GitHub repositories. This not only provides a reliable backup but also seamlessly integrates with any internal tooling that relies on your Gitea instance. It’s a small investment of time that pays off by making your infrastructure more resilient and your workflows more efficient. Let me know if you have any questions!
🤖 Frequently Asked Questions
âť“ How can I automate the migration of GitHub repositories to a self-hosted Gitea instance?
You can automate this using a Python script that fetches repositories from GitHub via its API and then utilizes the Gitea API’s `/api/v1/repos/migrate` endpoint to create mirrored repositories. Scheduling this script with a cron job ensures continuous synchronization.
âť“ How does this automated migration approach compare to manual methods?
This automated approach, using a Python script and cron job, provides a “set it and forget it” solution, eliminating the manual, tedious, and error-prone process of weekly mirroring. It ensures Gitea instances are always up-to-date with GitHub repositories without human intervention.
âť“ What is a common pitfall when setting up Gitea repository migration, and how is it resolved?
A common pitfall is incorrect API token permissions. The Gitea token requires `write:repository` and `write:admin` scopes for an admin user, and the GitHub Personal Access Token needs full `repo` scope to access private repositories. Verifying these permissions is crucial for successful migrations.
Leave a Reply