🚀 Executive Summary
TL;DR: Unattached Azure disks and lingering Public IPs are a common source of unexpected cloud bills, as Azure intentionally decouples their lifecycle from deleted VMs. To combat this, organizations can employ manual CLI cleanups, implement automated governance policies with Azure Policy and Automation Runbooks, or adopt the highly effective strategy of treating entire Resource Groups as ephemeral units of deployment.
🎯 Key Takeaways
- Azure’s default VM deletion behavior retains associated disks, network interfaces, and Public IPs, which can lead to significant unmanaged costs if not explicitly cleaned up.
- Proactive cost governance can be established using Azure Policy for mandatory resource tagging and stale resource auditing, complemented by Azure Automation Runbooks for scheduled cleanup of orphaned resources.
- The ‘nuclear’ option of treating non-production Resource Groups as ephemeral units of deployment, where the entire group is deleted upon project completion, is the most foolproof method to prevent lingering resources and enforce Infrastructure as Code practices.
Unattached Azure disks and lingering Public IPs are a common source of surprise cloud bills. Learn three battle-tested methods, from manual CLI cleanups to automated governance policies, to eliminate these costly “ghost” resources for good.
My Biggest Azure Cost Headache? The Ghosts in the Machine.
I still remember the Monday morning stand-up where our cloud bill report flashed red. We’d somehow blown an extra $3,000 over the weekend on… nothing. No new deployments, no major traffic spikes. It was a ghost charge. After an hour of frantic digging, we found the culprit. A junior engineer, let’s call him Alex, had diligently torn down a hefty UAT environment on Friday afternoon. He deleted all the VMs for the `project-phoenix-uat` deployment. What he *didn’t* delete were the two dozen P30 Premium SSDs and static Public IPs that were attached to them. They just sat there, unattached and invisible to a casual glance, racking up charges every single hour. Alex did his job, but the platform’s “safety features” left us with a financial landmine. That day, I made it my mission to exorcise these ghosts for good.
So, Why Does This Even Happen?
First, let’s get one thing straight: this isn’t a bug, it’s a feature. When you delete a Virtual Machine in Azure, the platform intentionally does not delete the OS disk, data disks, or the network interface (and its associated Public IP) by default. The logic is simple: what if you just wanted to replace the “compute” part of the VM but keep the data? What if that disk contained critical state you needed to re-attach to a new instance, `prod-db-02`?
By decoupling the lifecycle of the VM from its dependencies, Azure gives you flexibility and a safety net against accidental data loss. The problem is, in 90% of dev/test scenarios, when the VM is gone, you want everything else gone too. This “safety net” becomes a cost headache, silently draining your budget with resources you’ve long forgotten about.
The Exorcism: Three Ways to Banish Ghost Resources
Over the years, my team and I have developed a few key strategies to handle this. Depending on your team’s maturity and urgency, you can pick the one that fits best.
1. The Quick Fix: The Manual Hunt & Destroy
This is the reactive, down-and-dirty approach you use when the bill is already high. It’s about finding the immediate offenders and stopping the bleeding. The Azure Portal has gotten better at this, but I still trust the command line for a definitive list.
For disks, you can find them in the ‘Disks’ service view and filter by ‘Disk state’ for ‘Unattached’. For a more scriptable approach, the Azure CLI is your best friend.
# Login to Azure first: az login
# Find all unattached managed disks in your subscription
az disk list --query '[?managedBy==`null`].[name, resourceGroup, diskSizeGb, sku.name]' -o tsv
This command will spit out a clean, tab-separated list of every single disk that isn’t attached to a running VM. You can then use this list to investigate and delete them with `az disk delete`. A similar approach works for Public IPs that are no longer associated with a NIC.
Warning: The Manual Hunt is effective but risky. Before you run `delete`, double-check that the disk isn’t temporarily detached for maintenance. A disk named `prod-sql-data-vol1-2024-snap` is probably not something you want to blindly erase. When in doubt, ask.
2. The Permanent Fix: Automation and Governance with Azure Policy
After the ‘Project Phoenix’ incident, we moved to a proactive model. You can’t rely on people remembering to clean up; you have to build a system that either prevents the mess or cleans it up automatically.
Our primary tool here is Azure Policy. We implemented two key policies:
- Mandatory Tagging: We created a policy that denies the creation of any VM, Disk, or Public IP without a `costCenter` and `owner` tag. No exceptions. This immediately assigns accountability.
- Stale Resource Audits: We use a more advanced policy that audits for resources with a `creationDate` tag older than 90 days in non-production environments. This doesn’t delete them, but it flags them on our compliance dashboard for review.
To handle the actual cleanup, we have an Azure Automation Runbook that runs on a schedule. It queries for resources in dev/test subscriptions that are unattached and have a certain tag pattern (e.g., `temp-project-*`), and then safely deletes them.
# A sample KQL query you might use in Azure Resource Graph to find these orphans
Resources
| where type =~ 'microsoft.compute/disks'
| where properties.diskState =~ 'Unattached'
| project name, resourceGroup, location, tags
This is the real “DevOps” solution. It takes time to set up, but it scales and removes human error from the equation.
3. The ‘Nuclear’ Option: Treat Resource Groups as Ephemeral
This is less of a technical fix and more of a philosophical one, and frankly, my favorite. For anything that isn’t a long-lived production or shared services environment, the Resource Group is the unit of deployment.
When a developer starts work on a new feature, `feature-new-login-flow`, they get a whole new resource group: `rg-login-feature-dev`. They deploy their VMs, databases, and network components inside it. When the feature is merged and tested, we don’t delete individual resources.
We delete the entire resource group.
# The simplest, most effective cleanup command in Azure.
az group delete --name rg-login-feature-dev --yes --no-wait
This approach is foolproof. There are no orphans. No stray disks, no forgotten NICs, no lingering IPs. Everything lives and dies together. It forces good IaC (Infrastructure as Code) practices because you need to be able to redeploy the environment from scratch at a moment’s notice. It’s the ultimate cleanup strategy.
Comparing The Methods
| Method | Effort to Implement | Risk Level | Effectiveness |
|---|---|---|---|
| 1. Manual Hunt | Low | High (Accidental Deletion) | Low (Reactive) |
| 2. Automation/Policy | High | Medium (Bad Logic) | High (Proactive) |
| 3. ‘Nuclear’ Option (RG) | Medium (Requires IaC) | Low (If scoped correctly) | Very High |
Ultimately, that surprise $3,000 bill was a cheap lesson. It forced us to stop playing whack-a-mole with costs and start building a mature, automated cloud governance model. Don’t wait for your own “Monday morning surprise”—go check for some ghosts in your subscription right now.
🤖 Frequently Asked Questions
âť“ How can I identify unattached Azure disks and Public IPs that are incurring costs?
You can identify unattached managed disks using the Azure CLI command `az disk list –query ‘[?managedBy==`null`].[name, resourceGroup, diskSizeGb, sku.name]’ -o tsv`. A similar approach applies to Public IPs that are no longer associated with a Network Interface Card (NIC).
âť“ How do manual cleanup, automated policies, and ephemeral resource groups compare for Azure cost management?
Manual cleanup is reactive, low effort but high risk of accidental deletion. Automated policies (Azure Policy, Automation Runbooks) are proactive, high effort to implement but highly effective. Treating Resource Groups as ephemeral is a medium-effort strategy requiring IaC, offering low risk and very high effectiveness by ensuring all resources within a group are deleted together.
âť“ What is a common implementation pitfall when cleaning up Azure ‘ghost’ resources?
A common pitfall, especially with manual cleanup, is accidentally deleting disks or IPs that are temporarily detached for maintenance or intended for re-attachment. Always verify the resource’s purpose and state before deletion to prevent unintended data loss or service disruption.
Leave a Reply