🚀 Executive Summary
TL;DR: Prometheus’s `azure_sd_configs` fails to discover Azure App Services because it’s designed for IaaS Virtual Machines, not PaaS Web Apps. The recommended solution involves implementing file-based service discovery, where a script queries Azure APIs for tagged Web Apps and generates a Prometheus-readable JSON target file.
🎯 Key Takeaways
- `azure_sd_configs` is specifically designed to discover IaaS resources like Virtual Machines, not PaaS offerings such as Azure App Services.
- Azure App Services lack dedicated NICs or static private IPs like VMs, instead being exposed via FQDNs or private endpoints, which `azure_sd_configs` does not query.
- File-based service discovery, using a script to query Azure CLI for tagged Web Apps and generate a `file_sd_configs` JSON, is the ‘right’ and scalable solution for most use cases.
Struggling to make Prometheus discover your Azure App Services using `azure_sd_configs`? This guide cuts through the confusion, explaining why it fails and providing three real-world, battle-tested solutions to get your metrics flowing.
Prometheus Can’t See My Azure Web App: A Field Guide to Fixing `azure_sd_configs`
I still remember the 2 AM alert. It was a Tuesday. It’s always a Tuesday. PagerDuty was screaming that our primary e-commerce checkout service, `prod-checkout-webapp`, had vanished from our dashboards. Grafana was a sea of “No Data” panels. My first thought was that the app was down, but a quick check showed it was serving traffic just fine. The problem? Prometheus, our all-seeing eye, had gone blind to it. A junior engineer had just migrated the service from a VM to an Azure App Service, assuming our standard Prometheus `azure_sd_configs` would pick it up automatically. It didn’t. That night, I learned a valuable, sleep-deprived lesson about the assumptions we make with cloud tooling.
The “Why”: A Tale of Two Azure APIs
This is one of those problems that feels like it should be simple. You point a discovery tool at Azure, and it should find… well, Azure things. Right? The root of the problem is that `azure_sd_configs` is designed to discover IaaS resources, specifically Virtual Machines.
Under the hood, it queries Azure APIs that list VMs and their associated network interfaces to find IP addresses. Azure App Services, being a PaaS (Platform-as-a-Service) offering, don’t live in that world. They don’t have a dedicated NIC or a static private IP in the same way a VM does. They exist in a managed “App Service Plan” and are exposed via a public FQDN or through private endpoints, which are different resource types altogether. Simply put, `azure_sd_configs` is knocking on the front door asking for the VM list, while your Web App lives in a penthouse apartment with a separate entrance.
Pro Tip: Never assume a cloud provider’s discovery mechanism for one service type (like VMs) will magically work for another (like PaaS Web Apps). Always check the documentation for what resource types are explicitly supported.
The Fixes: From Duct Tape to a New Engine
So, how do we get Prometheus to see our Web App? We have a few options, ranging from a quick fix to get you through the night to a proper, scalable solution. I’ve used all three in different situations.
1. The Quick Fix: Static Configs to the Rescue
This is the duct tape solution. It’s fast, it’s ugly, but it will stop the bleeding at 2 AM. You explicitly tell Prometheus where to find the target by adding its address to a `static_configs` block in your `prometheus.yml`.
Let’s say your web app is `https://prod-checkout-webapp.azurewebsites.net` and it exposes a `/metrics` endpoint. You’d add this to your scrape job:
- job_name: 'azure-webapps-static'
metrics_path: /metrics
scheme: https
static_configs:
- targets: ['prod-checkout-webapp.azurewebsites.net']
labels:
app: 'checkout-service'
env: 'production'
Warning: This is a brittle solution! If the URL changes, or if you spin up new environments, you have to manually edit this file. Use this to restore monitoring immediately, but plan to replace it with Solution 2 as soon as you’ve had some coffee.
2. The ‘Right’ Way: File-Based Service Discovery
This is my preferred method for 90% of use cases. It combines the flexibility of dynamic discovery with the simplicity of a standard Prometheus feature: `file_sd_configs`. The idea is to have a separate process that queries the Azure API for your Web Apps and writes the targets to a JSON file that Prometheus reads.
Step 1: Tag your resources.
In Azure, add a tag to the App Services you want to monitor. Something like `prometheus-scrape: “true”`.
Step 2: Create a discovery script.
This can be a simple shell script using the Azure CLI, running on a cron job every 5-10 minutes. This script finds all App Services with your tag and formats them into a JSON file.
Here’s a bare-bones `generate-webapp-targets.sh` script:
#!/bin/bash
# A simple script to generate a file_sd_config for Azure Web Apps
# WARNING: This requires jq to be installed!
TARGET_FILE="/etc/prometheus/targets/azure_webapps.json"
TEMP_FILE=$(mktemp)
# Query Azure for all web apps with the 'prometheus-scrape' tag
# and format the output as a JSON array of objects for file_sd.
az webapp list --query "[?tags.\"prometheus-scrape\"=='true'].{host:defaultHostName, name:name, rg:resourceGroup}" | \
jq '[.[] | .host as $target | { "targets": [$target], "labels": { "instance": .name, "job": "azure-webapp", "resource_group": .rg } }]' > $TEMP_FILE
# Atomically replace the old file with the new one
mv $TEMP_FILE $TARGET_FILE
Step 3: Configure Prometheus.
Now, just point a Prometheus job at that generated file.
- job_name: 'azure-webapps-file-sd'
metrics_path: /metrics
scheme: https
file_sd_configs:
- files:
- '/etc/prometheus/targets/azure_webapps.json'
refresh_interval: 2m
This is a robust solution. It automatically adds and removes targets as you scale your App Services, as long as you’re consistent with your tagging.
3. The ‘Enterprise’ Option: Custom SD Adapters
If you’re running a massive environment, probably on Kubernetes, and using the Prometheus Operator, you might eventually outgrow the simple file-based approach. At this scale, you might consider a more integrated solution.
This involves building or using a “service discovery adapter”. This is a small application that runs alongside Prometheus, implements its Service Discovery API, and acts as a bridge. It would talk to the Azure API on one side and talk to Prometheus on the other, providing targets dynamically without intermediate files.
| Pros | Cons |
|
|
Frankly, I’ve only seen this implemented once or twice. It’s the “nuclear option” for when you have dedicated observability engineers and a scale that justifies the maintenance overhead. For everyone else, Solution 2 is the sweet spot.
🤖 Frequently Asked Questions
âť“ Why doesn’t `azure_sd_configs` find my Azure Web Apps?
`azure_sd_configs` is built to discover IaaS Virtual Machines by querying Azure APIs for network interfaces. Azure Web Apps are PaaS services that do not have dedicated NICs or static private IPs in the same way, making them undiscoverable by this method.
âť“ How does file-based service discovery compare to static configs or custom SD adapters for Azure Web Apps?
File-based service discovery offers dynamic target management and scalability without the brittleness of manual static configs or the high complexity and maintenance overhead of custom SD adapters, positioning it as the optimal balance for most environments.
âť“ What is a common implementation pitfall when trying to monitor Azure Web Apps with Prometheus?
A common pitfall is assuming that a discovery mechanism for one cloud service type (e.g., `azure_sd_configs` for VMs) will automatically work for another (e.g., PaaS Web Apps). The solution is to always verify supported resource types and use specific discovery methods like file-based service discovery for App Services.
Leave a Reply