🚀 Executive Summary

TL;DR: Senior DevOps Engineer Darian Vance developed a Python automation script leveraging the ChatGPT API to generate concise summaries of Pull Request diffs. This solution drastically reduced his daily PR review time from an hour to just 10 minutes, eliminating manual context-switching and improving review efficiency.

🎯 Key Takeaways

  • Automating PR summaries requires Python, a GitHub Personal Access Token with ‘repo’ scope, and an OpenAI API Key for secure access.
  • Securely store API keys and repository details using `python-dotenv` and a `config.env` file to prevent hardcoding credentials.
  • Fetch the raw PR diff from GitHub using its API by specifying the `Accept: application/vnd.github.v3.diff` header for accurate code changes.
  • Craft effective AI prompts by assigning a persona (e.g., Senior Software Engineer) and structuring the desired output with sections like ‘High-Level Summary’ and ‘Potential Risks’.
  • Utilize the `openai.chat.completions.create` method with models like `gpt-4-turbo-preview` or `gpt-3.5-turbo` and a moderate `temperature` (e.g., 0.5) for factual and deterministic summaries.
  • Integrate the summary posting directly into GitHub PRs using the GitHub Issues API to add comments, enhancing team collaboration and feedback loops.
  • Implement robust error handling for API rate limits, `None` return values, and ensure correct GitHub PAT scopes to avoid common 403/404 API errors.

Automate Pull Request Summaries using ChatGPT API

Automate Pull Request Summaries using ChatGPT API

Hey there, Darian Vance here. As a Senior DevOps Engineer at TechResolve, my calendar is a battlefield. Between managing infrastructure, optimizing CI/CD pipelines, and mentoring the team, context-switching is my biggest enemy. I used to spend the first hour of my day just catching up on overnight Pull Requests, trying to piece together the “why” from a dozen commit messages. I realized I was wasting close to five hours a week on this. That’s when I built this little automation. This script now gives me a clear, concise summary of every PR, so I can get straight to the high-level review. It brought my morning PR routine down from an hour to about 10 minutes. Let’s build it.

Prerequisites

Before we dive in, make sure you have the following ready:

  • Python 3 installed on your machine.
  • A GitHub account and a Personal Access Token (PAT) with ‘repo’ scope.
  • An OpenAI account and an API Key.
  • A basic understanding of how REST APIs work.

The Step-by-Step Guide

Step 1: Setting Up Your Project

First, you’ll want to create a new directory for our project. I’ll skip the standard virtualenv setup commands since you likely have your own workflow for that. The key is to have an isolated environment. Once that’s ready, you’ll need to install a few Python libraries. You can do this with pip. The packages we’ll need are requests (for making HTTP calls to the GitHub API), python-dotenv (for managing our secret keys), and openai (the official client library).

Step 2: Securely Storing Your Keys

Never, ever hardcode your API keys in your script. It’s a massive security risk. Instead, we’ll use a `config.env` file to store them. Create a file named config.env in your project root and add your keys like this:

# config.env
GITHUB_TOKEN="your_github_personal_access_token_here"
OPENAI_API_KEY="your_openai_api_key_here"
GITHUB_REPO="owner/repository_name" # e.g., "TechResolve/ProjectPhoenix"

Step 3: Fetching the PR Diff from GitHub

The first piece of the puzzle is getting the raw changes from the pull request. The GitHub API provides a `.diff` view for every PR, which is perfect for our use case. It’s a text-based representation of all the code changes.

Let’s write a Python function to grab it. This script will load our environment variables and then make an authenticated API call.

# pr_summarizer.py
import os
import requests
from dotenv import load_dotenv

def get_pr_diff(repo, pr_number, token):
    """Fetches the diff of a specific pull request from the GitHub API."""
    url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}"
    headers = {
        'Authorization': f'token {token}',
        'Accept': 'application/vnd.github.v3.diff'
    }
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # This will raise an exception for HTTP errors
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error fetching PR diff: {e}")
        return None

# Example usage will be in our main function later

Here, we construct the API URL, add our PAT to the headers for authentication, and specify we want the `diff` format. Simple and effective.

Step 4: Crafting the Perfect AI Prompt

This is where the magic happens. The quality of your summary depends entirely on the quality of your prompt. A vague prompt like “summarize this” will give you a vague summary. We need to be specific.

I’ve found that giving the AI a persona and a structured set of instructions works best.

def create_summary_prompt(diff):
    """Creates a detailed prompt for the OpenAI API to summarize a PR diff."""
    
    prompt_template = """
    As a Senior Software Engineer, please provide a summary of the following pull request diff. 
    Analyze the changes and structure your response in the following format:

    **1. High-Level Summary:** 
    Provide a concise, one-paragraph overview of the purpose and key changes in this PR.

    **2. Key Changes by File:**
    List the most significant files that were changed and briefly describe what was altered in each.

    **3. Potential Risks & Areas for Review:**
    Based on the changes, identify any potential risks, logical flaws, or areas that require close attention from the reviewer. If there are no obvious risks, state "No major risks identified."

    Here is the diff:
    ---
    {diff_content}
    ---
    """
    return prompt_template.format(diff_content=diff)

Pro Tip: Be mindful of the model’s token limit. A massive PR with thousands of lines of changes might generate a diff that’s too large for the API context window. In my production setups, I add logic to truncate the `diff` content if it exceeds a certain character count (e.g., 15,000 characters), adding a note like “…[diff truncated]…” to the prompt so the AI is aware.

Step 5: Generating the Summary with ChatGPT

Now we’ll send our well-crafted prompt and the PR diff to the OpenAI API. The `openai` library makes this very straightforward.

# Add this to pr_summarizer.py
import openai

def get_ai_summary(prompt, api_key):
    """Sends the prompt to the OpenAI API and returns the summary."""
    openai.api_key = api_key
    try:
        response = openai.chat.completions.create(
            model="gpt-4-turbo-preview",  # Or "gpt-3.5-turbo" for a faster, cheaper option
            messages=[
                {"role": "system", "content": "You are a helpful assistant for code review."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.5,
            max_tokens=1024
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error generating AI summary: {e}")
        return None

I’m using `gpt-4-turbo-preview` here because it’s great at understanding code context, but `gpt-3.5-turbo` is a solid, cost-effective alternative. The `temperature` is set to 0.5 to keep the output fairly deterministic and factual.

Step 6: Posting the Summary as a GitHub Comment

Getting the summary is great, but the real value comes from sharing it with the team. Let’s write one more function to post the summary as a comment directly on the pull request.

# Add this to pr_summarizer.py
def post_github_comment(repo, pr_number, token, comment_body):
    """Posts a comment on a GitHub pull request."""
    url = f"https://api.github.com/repos/{repo}/issues/{pr_number}/comments"
    headers = {
        'Authorization': f'token {token}',
        'Accept': 'application/vnd.github.v3+json'
    }
    data = {'body': comment_body}
    try:
        response = requests.post(url, headers=headers, json=data)
        response.raise_for_status()
        print("Successfully posted comment to PR.")
        return True
    except requests.exceptions.RequestException as e:
        print(f"Error posting GitHub comment: {e}")
        return False

Step 7: Putting It All Together

Finally, let’s create a `main` function to orchestrate the whole process. This will be the entry point of our script.

# Add this to the end of pr_summarizer.py
def main():
    load_dotenv('config.env')
    
    github_token = os.getenv("GITHUB_TOKEN")
    openai_api_key = os.getenv("OPENAI_API_KEY")
    github_repo = os.getenv("GITHUB_REPO")
    
    # You would typically get this from a CI/CD environment variable or a command-line argument
    pr_number_to_summarize = 101 

    if not all([github_token, openai_api_key, github_repo]):
        print("One or more environment variables are not set. Check your config.env file.")
        return

    print(f"Fetching diff for PR #{pr_number_to_summarize} in repo {github_repo}...")
    diff = get_pr_diff(github_repo, pr_number_to_summarize, github_token)

    if diff:
        print("Generating AI summary...")
        prompt = create_summary_prompt(diff)
        summary = get_ai_summary(prompt, openai_api_key)
        
        if summary:
            print("--- PR SUMMARY ---")
            print(summary)
            print("------------------")
            
            # Post the summary back to GitHub
            post_github_comment(github_repo, pr_number_to_summarize, github_token, summary)
        else:
            print("Failed to generate summary.")
    else:
        print("Failed to fetch PR diff.")

if __name__ == "__main__":
    main()

Pro Tip: The best way to run this is within your CI/CD pipeline. For instance, you can set up a GitHub Action that triggers whenever a pull request is opened or updated. The action would run this script, using the event payload to get the correct PR number automatically. This is far more efficient than running it on a fixed schedule.

Common Pitfalls (Where I Usually Mess Up)

I’ve stumbled a few times setting this up. Here are the things to watch out for:

  • Incorrect GitHub PAT Scopes: Your Personal Access Token absolutely needs the `repo` scope. Without it, you’ll get 404 Not Found or 403 Forbidden errors when trying to read the diff or post a comment.
  • API Rate Limiting: If you’re running this across a huge organization with hundreds of PRs, you might hit API rate limits for either GitHub or OpenAI. Implement some error handling and backoff logic if this becomes an issue.
  • Forgetting to Handle `None`: My functions return `None` on failure. It’s tempting to just chain the function calls together, but if `get_pr_diff` fails, you’ll be sending `None` to the OpenAI API, which will cause a crash. Always check the return values.

Conclusion

And there you have it. A complete, automated workflow that takes a raw pull request and turns it into a structured, human-readable summary. This little script has become an indispensable part of my daily routine at TechResolve. It saves me time, helps me prioritize my reviews, and gives junior developers instant, high-level feedback. Feel free to adapt the prompt, experiment with different models, and integrate it into whatever workflow serves you best. Happy automating!

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ How do I automate Pull Request summaries using the ChatGPT API?

To automate PR summaries, you need to fetch the PR diff from GitHub using its API, craft a detailed prompt with the diff content, send it to the OpenAI API (e.g., `gpt-4-turbo-preview`), and then post the generated summary as a comment back on the GitHub PR using the GitHub Issues API.

âť“ How does this automated PR summary solution compare to traditional manual reviews?

This automation significantly reduces the time spent on initial PR context-switching and deciphering commit messages, cutting review time from an hour to minutes. It provides a structured, high-level overview, allowing reviewers to focus on deeper technical aspects rather than basic comprehension, thus improving efficiency and consistency.

âť“ What are common pitfalls when implementing this GitHub PR summary automation?

Common pitfalls include using incorrect GitHub Personal Access Token scopes (requiring ‘repo’ scope for read/write access), hitting API rate limits for either GitHub or OpenAI, and failing to handle `None` return values from API calls, which can lead to script crashes. Always check return values and implement backoff logic for rate limits.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading