🚀 Executive Summary

TL;DR: Implementing Infrastructure as Code (IaC) in brownfield Azure environments is challenging due to “state drift,” where manual changes cause discrepancies with the IaC state file, leading to destructive deployments. The article presents three strategies: manual import, reverse-engineering with tools like aztfy followed by refactoring, or a complete blue/green rebuild to establish a pristine, IaC-managed environment.

🎯 Key Takeaways

  • “State drift” is the primary challenge in brownfield IaC adoption, where manual changes in the cloud environment diverge from the IaC state file, leading to unexpected and destructive deployments.
  • Tools like Azure Terrafy (aztfy) can reverse-engineer existing Azure resources into IaC code, providing a baseline that still requires significant refactoring to be maintainable and adhere to best practices.
  • Preventing future state drift is crucial, achieved by locking down IaC-managed resource groups using Azure Policy, RBAC, and deny assignments to ensure the IaC pipeline is the sole source of truth for changes.

Bringing IaC to brownfield Azure is hell... or am I doing it wrong?

Tired of Terraform drift and import errors when managing existing Azure infrastructure? Learn three real-world strategies, from quick fixes to a full rebuild, to tame your brownfield environment and finally get your IaC under control.

Bringing IaC to Brownfield Azure is Hell… Or Are You Doing It Wrong?

I still remember the 2 AM call. A “simple” Terraform apply to add a new app setting to our production App Service, `app-svc-prod-api-01`, had somehow triggered a full replacement of the associated VNet integration. The entire API went dark. The root cause? Someone, months before I joined, had manually changed the subnet delegation in the Azure portal to fix a “temporary” issue. My Terraform state file, the supposed source of truth, was a lie. It was at that moment, staring at a sea of red in the pipeline logs, that I realized bringing Infrastructure as Code (IaC) to a “brownfield” environment isn’t just a technical challenge; it’s an archaeological dig through years of undocumented manual clicks.

So, Why Is This So Painful? The Sin of “State Drift”

Let’s get one thing straight: the problem isn’t usually the tool. Whether you’re using Terraform, Bicep, or Pulumi, they all rely on a core concept: a state file. This file is the tool’s map of your world. It records what resources it thinks it manages and their expected configuration.

In a “greenfield” project, this is beautiful. The state file and the actual cloud environment are born together and live in perfect harmony. But in a brownfield environment—an existing setup built manually over time—you have no state file. Your “state” is scattered across the Azure portal, sticky notes on monitors, and the memory of an engineer who left the company six months ago. When you try to apply a new IaC configuration, the tool sees a massive discrepancy between its empty map and the real world. Its default reaction is often to “correct” reality by deleting what it doesn’t know about, or worse, making destructive changes to align a resource with a half-written configuration. This is state drift, and it’s the monster under the bed for any brownfield IaC adoption.

Taming the Beast: Three Strategies from the Trenches

I’ve seen teams bang their heads against this wall for months. The good news is, you can get through it. Here are the three main paths we take at TechResolve, ranging from a quick patch-up to a full-scale rebuild.

1. The Quick Fix: “Import and Pray”

This is the most common starting point. You write the IaC code to match the existing resource, and then you use your tool’s import function to tell the state file, “Hey, that `prod-db-01` SQL server? It already exists. Please adopt it, don’t create it.”

For Terraform, it looks something like this. First, you write the resource block in your `.tf` file:


resource "azurerm_resource_group" "rg_core" {
  name     = "rg-prod-we-core"
  location = "West Europe"
}

Then, you run the import command, feeding it the address in your code and the Azure Resource ID:


terraform import azurerm_resource_group.rg_core /subscriptions/your-sub-id/resourceGroups/rg-prod-we-core

The Reality: This is a tedious, resource-by-resource process. You will miss things. You will get resource IDs wrong. After importing, your first `terraform plan` will likely show a hundred changes because your code doesn’t perfectly match the manually-configured state (e.g., tags, minor settings, case sensitivity). It’s a necessary, but painful, first step to establish a baseline. It’s a hack, but it’s a start.

Pro Tip: Do this for one resource group at a time. Trying to import an entire subscription at once is a recipe for insanity. Start with something non-critical, like a development environment, to get the hang of the workflow.

2. The Permanent Fix: “Reverse-Engineer and Refactor”

Instead of manually writing code and importing, you use a tool to do the heavy lifting. This is the more robust and sane approach for complex environments. The idea is to generate the IaC code *from* your existing Azure resources.

For Terraform, the go-to tool is Azure Terrafy (aztfy). You point it at a resource group, and it spits out the `.tf` files and automatically runs the import commands for you. It’s a lifesaver.


# Example of running aztfy on a resource group
aztfy rg-prod-we-core

The Reality: This isn’t a one-shot magic bullet. The code it generates is… verbose. It will hardcode everything and won’t use variables, loops, or modules. The real work begins *after* the generation.

  1. Generate: Run the tool against a small, contained set of resources.
  2. Review: Commit the generated code to a new branch immediately. This is your “source of truth” baseline.
  3. Refactor: This is critical. Start pulling out hardcoded names into variables. Group related resources into modules. Remove settings that can be left as default. Your goal is to turn machine-generated configuration into human-maintainable code.
  4. Verify: Run a `plan`. It should show no changes. If it does, your refactoring broke something. Fix it. Repeat until the plan is clean.

This approach takes discipline but results in a clean, maintainable, and accurate IaC codebase that truly reflects your environment.

3. The ‘Nuclear’ Option: “The Rebuild”

Sometimes, the existing environment is such a tangled mess of undocumented dependencies, inconsistent naming conventions, and manual security exceptions that trying to import it is more work than it’s worth. The technical debt is simply too high. In these cases, we propose the “Rebuild,” often called a Blue/Green deployment strategy.

The concept is simple: you leave the old, “blue” environment running. In parallel, you build a brand new, pristine, 100% IaC-managed “green” environment. You write the code the *right* way from the start. Once the new environment is ready, you methodically migrate applications and data, switch the traffic over, and finally, ceremoniously decommission the old environment. It’s the most expensive and time-consuming option, but it’s also the only one that guarantees a perfectly clean slate.

Pros Cons
Clean slate with zero technical debt. Highest cost (running two environments).
Enforce best practices from day one. Requires a complex data migration strategy.
Zero-downtime cutover is possible. High engineering effort and planning.
Forces you to fully understand and document your application dependencies. Can be politically difficult to get buy-in for.

Warning: Don’t even think about this option without a solid plan for data migration. For stateful applications like `prod-db-01`, this is the hardest part. How will you sync the data and cut over without significant downtime or data loss?

Final Thoughts: Lock It Down

Bringing IaC to a brownfield environment is a journey, not a sprint. It’s about methodically wrestling control back from the chaos of manual changes. Whichever path you choose, remember the final, critical step: preventing new drift. Once a resource group is under IaC control, lock it down. Use Azure Policy to enforce tagging, restrict what services can be deployed, and most importantly, use RBAC and deny assignments to make the Azure Portal read-only for most users. Your IaC pipeline should be the *only* thing that can make changes. Otherwise, you’ll be right back here in six months, wondering why another 2 AM “simple” deployment took down the site.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ What is “state drift” in the context of brownfield IaC, and why is it problematic?

“State drift” occurs when the actual configuration of cloud resources in a brownfield environment deviates from what the IaC tool’s state file believes it manages, often due to manual changes. This discrepancy can lead to IaC tools attempting destructive “corrections” or failing deployments.

âť“ How do the “Import and Pray” and “Reverse-Engineer and Refactor” strategies compare for brownfield IaC adoption?

“Import and Pray” is a tedious, resource-by-resource manual process for adopting existing resources into IaC, often resulting in immediate drift. “Reverse-Engineer and Refactor” uses tools like aztfy to automate code generation and import, providing a more comprehensive baseline that still requires significant refactoring for maintainability, but is generally more robust for complex environments.

âť“ What is a common implementation pitfall when adopting IaC in brownfield Azure, and how can it be avoided?

A common pitfall is failing to prevent new manual changes after IaC adoption, leading to recurring state drift. This can be avoided by locking down IaC-managed resource groups using Azure Policy, RBAC, and deny assignments to ensure only the IaC pipeline can make modifications.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading