🚀 Executive Summary
TL;DR: Migrating repositories between Azure DevOps (ADO) and GitHub Enterprise can be a productivity drain due to constant context-switching. This guide provides a Python script to automate the core migration process, efficiently moving code, history, and all branches from ADO to GitHub Enterprise.
🎯 Key Takeaways
- Successful migration requires Azure DevOps and GitHub Enterprise Personal Access Tokens (PATs) with appropriate ‘Code (Read)’ and ‘repo’ scopes, respectively, along with organization/project names and a Python 3 environment.
- The migration is automated using a Python script that leverages the ‘requests’ library for API interactions (fetching ADO repos, creating GHE repos) and the ‘subprocess’ module for executing Git commands.
- Repositories are mirrored using a ‘git clone –bare’ command from ADO, followed by a ‘git push –mirror’ command to transfer all branches, tags, and the full commit history to the newly created GitHub Enterprise repository.
- Configuration details and sensitive credentials are managed securely using a ‘config.env’ file, with a strong recommendation to fetch these secrets from secure vaults like Azure Key Vault or HashiCorp Vault in production setups.
- Common pitfalls include incorrect PAT scopes, URL mismatches in the ‘config.env’ file, repository name conflicts, and potential timeouts for very large repositories during ‘git clone’ or ‘git push’ operations.
Migrate Azure DevOps Repos to GitHub Enterprise
Hey there, Darian Vance here. If you’re juggling repositories between Azure DevOps (ADO) and GitHub Enterprise, you know the pain. For a while, my team was split between the two, and the constant context-switching was a hidden productivity killer. We’d have pull requests in one place, pipelines in another, and security scans all over. Consolidating everything into GitHub Enterprise was a game-changer for our workflow, and I want to show you exactly how we automated the core migration process. This guide will help you move the code, the history, and all the branches, saving you hours of manual work.
Prerequisites
Before we dive in, let’s make sure you have the necessary keys to the kingdom. You’re going to need:
- An Azure DevOps Personal Access Token (PAT): With ‘Code (Read)’ permissions at a minimum. I recommend giving it ‘Read & Write’ if you plan to archive the old repos later.
- A GitHub Enterprise Personal Access Token (PAT): This needs the ‘repo’ scope to create and write to repositories.
- Your Organization & Project Names: The ADO organization and project name, as well as your GitHub Enterprise organization name.
- Python 3 Environment: A working Python 3 installation. We’ll be using the ‘requests’ and ‘python-dotenv’ libraries.
The Step-by-Step Migration Guide
Step 1: Project Setup and Configuration
First things first, let’s get our workspace ready. I’ll skip the standard virtualenv setup since you likely have your own workflow for that. The important part is to create a project directory and install the required Python libraries. You can do this with a simple `pip install requests python-dotenv` command.
Next, create a file named config.env in your project root. This is where we’ll securely store our credentials and configuration details, keeping them out of the script itself. Your file should look like this:
# Azure DevOps Configuration
ADO_ORG="your-ado-organization-name"
ADO_PROJECT="YourADOProjectName"
ADO_PAT="your_personal_access_token_for_ado"
ADO_USER="your_ado_username"
# GitHub Enterprise Configuration
GHE_API_URL="https://your-github-enterprise-url/api/v3"
GHE_ORG="your-github-enterprise-organization"
GHE_PAT="your_personal_access_token_for_github"
Pro Tip: In my production setups, I always fetch these secrets from a secure vault like Azure Key Vault or HashiCorp Vault at runtime. For this tutorial, the `config.env` file is perfectly fine, but avoid committing it to a public repository!
Step 2: The Python Script – Initialization
Now for the fun part. Let’s create our Python script, which I’ll call `migrate_repos.py`. We’ll start by importing the necessary libraries and loading our configuration from the `config.env` file.
The logic here is straightforward: we load the environment variables, then immediately set up the authentication headers we’ll need for our API calls to both ADO and GitHub. This keeps our main logic clean.
import os
import requests
import subprocess
import base64
from dotenv import load_dotenv
# Load environment variables from config.env
load_dotenv('config.env')
# --- Azure DevOps Configuration ---
ADO_ORG = os.getenv("ADO_ORG")
ADO_PROJECT = os.getenv("ADO_PROJECT")
ADO_PAT = os.getenv("ADO_PAT")
ADO_USER = os.getenv("ADO_USER")
ADO_API_URL = f"https://dev.azure.com/{ADO_ORG}/{ADO_PROJECT}/_apis/git/repositories?api-version=6.0"
# --- GitHub Enterprise Configuration ---
GHE_API_URL = os.getenv("GHE_API_URL")
GHE_ORG = os.getenv("GHE_ORG")
GHE_PAT = os.getenv("GHE_PAT")
# --- API Headers ---
# For ADO, we need to use Basic Auth with the PAT.
ado_pat_b64 = base64.b64encode(f":{ADO_PAT}".encode('utf-8')).decode('utf-8')
ADO_HEADERS = {
'Authorization': f'Basic {ado_pat_b64}',
'Content-Type': 'application/json'
}
# For GitHub, we use a Bearer token.
GHE_HEADERS = {
'Authorization': f'token {GHE_PAT}',
'Accept': 'application/vnd.github.v3+json'
}
def get_ado_repos():
"""Fetches a list of repositories from the specified ADO project."""
print("Fetching repositories from Azure DevOps...")
response = requests.get(ADO_API_URL, headers=ADO_HEADERS)
if response.status_code != 200:
print(f"Error fetching ADO repos: {response.status_code} - {response.text}")
return []
repositories = response.json().get('value', [])
print(f"Found {len(repositories)} repositories in project '{ADO_PROJECT}'.")
return repositories
Step 3: Creating and Mirroring Repositories
Next, we need two key functions. The first, `create_ghe_repo`, will take a repository name and create a new, private repository in our GitHub Enterprise organization. The second, `mirror_repo`, is the workhorse. It will perform the actual Git operations to clone the repository from ADO and push it, with all its history and branches, to the newly created GitHub repo.
Here’s the code for that. I’m using Python’s `subprocess` module to run the Git commands. This is how the script automates what you would otherwise do manually in your terminal.
def create_ghe_repo(repo_name):
"""Creates a new private repository in GitHub Enterprise."""
url = f"{GHE_API_URL}/orgs/{GHE_ORG}/repos"
payload = {
'name': repo_name,
'private': True,
'description': f"Migrated from ADO project: {ADO_PROJECT}"
}
print(f"Creating repository '{repo_name}' in GitHub Enterprise...")
response = requests.post(url, headers=GHE_HEADERS, json=payload)
if response.status_code == 201:
print(f"Successfully created GHE repo: {repo_name}")
return True
elif response.status_code == 422: # Repository already exists
print(f"Warning: GHE repo '{repo_name}' already exists. Skipping creation.")
return True
else:
print(f"Error creating GHE repo '{repo_name}': {response.status_code} - {response.text}")
return False
def mirror_repo(ado_repo_url, ghe_repo_url, repo_name):
"""Mirrors a repository from ADO to GHE using git commands."""
temp_dir = f"./{repo_name}.git"
# Authenticated URLs
ado_clone_url = f"https://{ADO_USER}:{ADO_PAT}@dev.azure.com/{ADO_ORG}/{ADO_PROJECT}/_git/{repo_name}"
ghe_push_url = f"https://x-access-token:{GHE_PAT}@{ghe_repo_url.split('//')[1]}"
print(f"Starting mirror for '{repo_name}'...")
try:
# Step 1: Bare clone the repository from ADO
print(f" - Cloning (bare) from ADO...")
subprocess.run(['git', 'clone', '--bare', ado_clone_url, temp_dir], check=True, capture_output=True)
# Step 2: Mirror push to the new GitHub Enterprise repository
print(f" - Pushing (mirror) to GHE...")
subprocess.run(['git', 'push', '--mirror', ghe_push_url], cwd=temp_dir, check=True, capture_output=True)
# Step 3: Clean up the local temporary clone
print(f" - Cleaning up temporary directory...")
subprocess.run(['rm', '-rf', temp_dir], check=True)
print(f"Successfully mirrored '{repo_name}' to GHE.")
return True
except subprocess.CalledProcessError as e:
print(f" -!! GIT COMMAND FAILED for '{repo_name}' !!")
print(f" - STDERR: {e.stderr.decode()}")
return False
Pro Tip: We use a `git clone –bare` followed by `git push –mirror`. This is the most reliable way to copy everything—all branches, all tags, and the full commit history—without creating a working copy on disk. It’s efficient and clean.
Step 4: Putting It All Together
Finally, let’s write the main execution block. This part of the script will call our functions in sequence: get the list of ADO repos, loop through them, create a corresponding repo in GitHub, and then trigger the mirror process.
def main():
"""Main function to orchestrate the migration."""
ado_repos = get_ado_repos()
if not ado_repos:
print("No repositories found or an error occurred. Exiting.")
return
succeeded = 0
failed = []
for repo in ado_repos:
repo_name = repo['name']
# We need to construct the GHE repo URL for the push command
ghe_domain = GHE_API_URL.split('/api/v3')[0].replace('https://', '')
ghe_repo_url = f"https://{ghe_domain}/{GHE_ORG}/{repo_name}.git"
print(f"\n--- Processing: {repo_name} ---")
# 1. Create the repo in GHE
if create_ghe_repo(repo_name):
# 2. If creation is successful, mirror the contents
if mirror_repo(repo['remoteUrl'], ghe_repo_url, repo_name):
succeeded += 1
else:
failed.append(repo_name)
else:
failed.append(repo_name)
print("\n--- Migration Summary ---")
print(f"Total repositories processed: {len(ado_repos)}")
print(f"Successfully migrated: {succeeded}")
if failed:
print(f"Failed to migrate: {len(failed)}")
print("Failed repositories:", ", ".join(failed))
print("-------------------------")
if __name__ == "__main__":
main()
And that’s it! Run this script from your terminal (`python3 migrate_repos.py`), and it will methodically work through your ADO project, replicating each repository in GitHub Enterprise.
Common Pitfalls (Where I Usually Mess Up)
Even with a script, things can go wrong. Here are a few traps I’ve fallen into myself:
- Incorrect PAT Scopes: The number one issue is always PAT permissions. If your GitHub token doesn’t have the full ‘repo’ scope, the script will fail on repo creation. If the ADO token can’t read code, it will fail on the fetch. Double-check them!
- URL Mismatches: A typo in your ADO organization name or GHE API URL in the `config.env` file can cause all requests to fail. Make sure they are copied exactly.
- Repository Name Conflicts: The script includes a basic check for existing repos in GitHub, but if you have complex naming conventions or forks, you might need to add more sophisticated logic to handle name clashes.
- Large Repositories & Timeouts: For massive repositories (multiple gigabytes), the `git clone` or `git push` operations might time out. You may need to run the script on a machine with a fast, stable internet connection or handle these large repos manually as a separate step.
Conclusion
Automating this migration saves a ton of time and, more importantly, reduces the risk of human error. This script provides a solid foundation. From here, you can expand it to migrate branch policies, user permissions, or even integrate it into a larger orchestration tool. By consolidating your source code, you’re paving the way for a more streamlined, secure, and efficient development lifecycle. Good luck!
🤖 Frequently Asked Questions
âť“ How can I automate the migration of Git repositories from Azure DevOps to GitHub Enterprise?
You can automate this using a Python script that fetches ADO repositories via API, creates corresponding repositories in GitHub Enterprise, and then uses ‘git clone –bare’ and ‘git push –mirror’ commands to transfer all history, branches, and tags.
âť“ How does this automated migration approach compare to alternatives?
This Python script-based automation significantly reduces manual effort and the risk of human error compared to manually cloning and pushing each repository. It provides a robust, scriptable solution for bulk migrations, ensuring full history transfer without creating a working copy on disk, which is more efficient than individual manual transfers.
âť“ What are the most common implementation pitfalls during this migration process?
The most common pitfalls include incorrect Personal Access Token (PAT) scopes for both ADO and GitHub, typos in organization or API URLs in the ‘config.env’ file, repository name conflicts, and potential timeouts when migrating exceptionally large repositories due to network or size constraints.
Leave a Reply