🚀 Executive Summary

TL;DR: Silent goroutine leaks in Golang applications lead to memory degradation and require time-consuming manual `pprof` analysis. This guide automates detection by exposing `go_goroutines` metrics via `promhttp` and `net/http/pprof`, then monitoring with Prometheus and Grafana for proactive identification.

🎯 Key Takeaways

  • Instrumenting a Go application for Prometheus involves using `prometheus/client_golang/promhttp` to expose a `/metrics` endpoint, ideally on a separate HTTP server.
  • The `_ “net/http/pprof”` import is crucial as its `init` function registers default Go runtime metrics, including `go_goroutines`, which `promhttp.Handler()` then serves.
  • Running the metrics server on a separate port (e.g., 9091) is a best practice for isolation, security, and distinct firewall rules from the main application.
  • Prometheus is configured to scrape the Go application’s metrics endpoint by adding a `job_name` and `targets` to `prometheus.yml`, followed by a configuration reload.
  • Goroutine leak visualization in Grafana uses the PromQL query `go_goroutines{job=”my-go-app”}` on a time series graph to identify steady climbs over time.
  • Using `rate(go_goroutines{job=”my-go-app”}[5m])` in Grafana provides a more immediate indicator of a leak by showing the consistent rate of change.
  • Common pitfalls include forgetting the `_ “net/http/pprof”` import, firewall/network issues blocking Prometheus, and mismatched `job` labels in Grafana queries.

Track Golang Goroutine Leaks using Prometheus

Track Golang Goroutine Leaks using Prometheus

Hey team, Darian here. I want to talk about something that used to be a real thorn in my side: silent goroutine leaks. In the early days, I’d find our services slowly degrading, memory climbing for no obvious reason. I’d spend hours SSH’ing into boxes, pulling pprof dumps, and manually comparing snapshots. It was a huge time sink. Then I realized we could automate this entire process with a tool we were already using: Prometheus. Setting this up turned a multi-hour diagnostic task into a 30-second glance at a Grafana dashboard. Let’s get this running for you.

Prerequisites

Before we dive in, make sure you have the following ready:

  • A running Golang application that you can modify.
  • A Prometheus server that is already installed and running.
  • Access to a Grafana instance (optional, but this is where the magic happens).
  • A basic understanding of Go’s net/http package.

The Step-by-Step Guide

Step 1: Instrument Your Go Application

First things first, we need our Go application to expose its internal metrics, including the current goroutine count. The official Prometheus client library for Go makes this incredibly simple. You’ll need to fetch the necessary package using the Go toolchain; the primary one we’ll use is prometheus/client_golang/promhttp.

The key is to expose an HTTP endpoint, typically /metrics, that Prometheus can periodically scrape.

Here’s a minimal example of how to add this to your application’s main.go. We’ll create a separate HTTP server for the metrics endpoint, which is a best practice.


package main

import (
	"log"
	"net/http"
	_ "net/http/pprof" // This is important for go_goroutines!

	"github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
	// Your main application logic would go here.
	// For this example, we'll just simulate it being up.
	log.Println("Main application running...")

	// --- Metrics Server Setup ---
	// We create a new ServeMux to avoid exposing pprof handlers on the main app router.
	metricsMux := http.NewServeMux()
	metricsMux.Handle("/metrics", promhttp.Handler())

	// Start the metrics server on a different port.
	go func() {
		log.Println("Metrics server starting on port 9091")
		if err := http.ListenAndServe(":9091", metricsMux); err != nil {
			log.Fatalf("Failed to start metrics server: %v", err)
		}
	}()

	// Keep the main application alive
	select {}
}

Why this works: The promhttp.Handler() function creates an HTTP handler that serves all registered Prometheus metrics. Crucially, by importing net/http/pprof (even with the blank identifier `_`), we trigger its `init` function, which automatically registers collectors for default Go runtime metrics, including `go_goroutines`. This is the metric we’re after.

Pro Tip: In my production setups, I always run the metrics server on a separate port from the main application server (like port 9091 in the example). This isolates it from application traffic, prevents accidental exposure of debugging endpoints to the public, and allows for different firewall rules.

Step 2: Configure Prometheus to Scrape Your App

Now that our application is exposing metrics, we need to tell Prometheus where to find them. This is done by adding a new job to your prometheus.yml configuration file.

Open your Prometheus config and add the following block under the scrape_configs section:


# prometheus.yml
scrape_configs:
  # ... other jobs you might have ...

  - job_name: 'my-go-app'
    scrape_interval: 15s
    static_configs:
      - targets: ['<YOUR_APP_IP_OR_HOSTNAME>:9091']

After adding this, you’ll need to reload your Prometheus configuration. Once reloaded, Prometheus will start hitting the /metrics endpoint on your application every 15 seconds and storing the data. You can verify it’s working by going to your Prometheus UI, navigating to the “Targets” page, and checking that the state of your ‘my-go-app’ job is ‘UP’.

Step 3: Visualize in Grafana (The Payoff)

Staring at raw numbers in Prometheus isn’t very intuitive. Let’s build a simple graph in Grafana to visualize the goroutine count over time.

  1. Log in to Grafana and create a new dashboard or add a panel to an existing one.
  2. Choose your Prometheus data source.
  3. In the query editor, enter the following PromQL query:
    go_goroutines{job="my-go-app"}
  4. Go to the “Visualization” settings and choose a “Time series” graph.
  5. Save the panel.

That’s it! You now have a graph that tracks the number of running goroutines in your application. A healthy application will show a relatively stable line. If you see it steadily climbing over hours or days without ever coming down, you’ve almost certainly got a goroutine leak.

Pro Tip: Don’t just track the total number. Use a query like rate(go_goroutines{job="my-go-app"}[5m]) to see the rate of change. A consistently positive rate, even a small one, is a huge red flag for a leak that needs immediate investigation.

Here’s Where I Usually Mess Up (Common Pitfalls)

  • Forgetting the `pprof` import: The number one mistake I see (and have made myself) is forgetting to add `_ “net/http/pprof”`. The `promhttp` handler is great, but it relies on the `pprof` package to register the default Go metrics collectors. If you forget this import, the `go_goroutines` metric simply won’t appear.
  • Firewall/Network Issues: I’ve spent more time than I’d like to admit debugging why Prometheus can’t connect, only to find a firewall rule on the host or a network security group blocking the metrics port. Always check connectivity from your Prometheus server to your application’s metrics endpoint first. A simple curl from the Prometheus box can save you a lot of time.
  • Mismatched Job Labels: When you write your Grafana query, make sure the `job` label (`job=”my-go-app”`) exactly matches the `job_name` you defined in your `prometheus.yml`. It’s a simple typo that can leave you with an empty graph.

Conclusion

And that’s really all there is to it. You’ve now turned a manual, reactive debugging process into an automated, proactive monitoring system. This simple setup will save you from performance degradation and late-night head-scratching. By adding this one graph to your dashboard, you gain immediate insight into the runtime health of your Go services. Keep an eye on that graph, set up an alert for when it exceeds a reasonable threshold, and you’ll catch leaks before they ever become production incidents. Happy monitoring!

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ How do I track goroutine leaks in Go applications?

Track goroutine leaks by instrumenting your Go application to expose `go_goroutines` metrics via `promhttp.Handler()` and `_ “net/http/pprof”`. Configure Prometheus to scrape this `/metrics` endpoint, and visualize the `go_goroutines` count over time in Grafana to detect steady climbs.

âť“ How does this Prometheus-based monitoring compare to manual pprof analysis for goroutine leaks?

This Prometheus-based approach automates goroutine leak detection, turning a multi-hour manual `pprof` snapshot comparison into a quick glance at a Grafana dashboard. It provides proactive, continuous monitoring, unlike reactive, on-demand manual `pprof` analysis.

âť“ What’s a common implementation pitfall when setting up goroutine leak monitoring with Prometheus?

A common pitfall is forgetting to include `_ “net/http/pprof”` in your Go application. Without this blank import, the `pprof` package’s `init` function won’t run, and the `go_goroutines` metric will not be registered or exposed by `promhttp.Handler()`.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading