π Executive Summary
TL;DR: Messy Terraform repositories often lead to production accidents due to state file chaos and environment bleed, rather than tool choice. This guide presents three battle-tested strategiesβTerraform Workspaces, directory-based environment separation, and Terragruntβto structure Infrastructure as Code for improved isolation, reduced blast radius, and enhanced reusability.
π― Key Takeaways
- Flat Terraform repo structures cause state file chaos (monolithic, fragile) and environment bleed, making it easy to apply changes to the wrong environment and increasing blast radius.
- Terraform Workspaces provide a quick fix for state separation within a single directory but are a band-aid, not a long-term solution, as they share module versions and provider configurations.
- Directory-based environment separation is a scalable standard that physically isolates state files and configurations by environment, forcing reusable module creation and providing clear CI/CD boundaries.
- Terragrunt is an enterprise option for managing complex, multi-account infrastructures by providing DRY configurations, remote state management, and dependency handling through a hierarchical structure.
- Avoid premature optimization; adopt advanced tools like Terragrunt only when experiencing significant scaling pains (e.g., extensive code duplication across 10+ components), as they introduce additional abstraction and a learning curve.
Tired of messy Terraform repos causing production accidents? This guide cuts through the noise and explores three practical, battle-tested strategies for structuring your Infrastructure as Code, from quick fixes to long-term scalable solutions.
Terraform vs. Tofu is a Distraction. Let’s Fix Your Awful Repo Structure.
I’ll never forget the 2 AM page. A junior engineer, bless his heart, had just applied a staging database schema change… to the production RDS instance, `prod-db-cluster-01`. The root cause wasn’t a typo or a lack of skill; it was our own damn fault. Our entire AWS infrastructure was defined in a single, flat Terraform directory. The only thing separating a safe staging deployment from a catastrophic production event was a `cd ../prod` and a prayer. That’s when I knew we had a structural problem, not a people problem. The endless debates about Terraform vs. OpenTofu are interesting, but they won’t save you from a poorly organized repository that invites disaster.
The “Why”: State Files and Blast Radius
So why does a simple, flat repo structure fall apart so spectacularly? It comes down to two things: state file management and blast radius.
- State File Chaos: In a flat structure, you’re either juggling state files with `-state=` flags (a nightmare) or you have one giant state file for everything. This monolithic state file becomes slow, fragile, and a huge single point of failure.
- Environment Bleed: When your `dev`, `staging`, and `prod` configurations live side-by-side, it’s terrifyingly easy to apply a change to the wrong environment. Your blast radius is the entire company infrastructure.
The goal is to structure your code to isolate environments, minimize the potential damage of any single change, and make your code reusable. Let’s walk through three ways to do that, from a quick fix to a full-blown enterprise setup.
Solution 1: The Quick Fix – The Monorepo with Workspaces
If you’re in a small team or just starting out, this is the fastest way to get some separation. Terraform Workspaces allow you to have multiple state files within the same directory, managed by Terraform itself. Think of them as separate contexts for the same set of `.tf` files.
How it Works:
You keep all your code in one directory, but switch between environments using the CLI.
# In your main infra directory
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
# To work on dev
terraform workspace select dev
terraform apply -var-file="dev.tfvars"
# To work on prod
terraform workspace select prod
terraform apply -var-file="prod.tfvars"
You can then use the terraform.workspace variable in your code to change resource names or instance sizes based on the environment.
resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = terraform.workspace == "prod" ? "t3.medium" : "t3.micro"
tags = {
Name = "web-server-${terraform.workspace}"
Environment = terraform.workspace
}
}
Darian’s Take: This is a band-aid, not a long-term solution. It’s great for getting started, but you’re still sharing the same module versions and provider configurations across all environments. It reduces the risk of state file crossover but doesn’t eliminate the “I thought I was in staging” human error.
Solution 2: The Scalable Standard – Directory-Based Environment Separation
This is my preferred approach for most projects. It’s the point where you get serious about Infrastructure as Code. The principle is simple: isolate environments by directory. This physically separates the state files and configurations, making it much harder to make a mistake.
A Common Structure:
Here’s a structure I’ve used successfully on multiple projects. It separates reusable modules from the environment-specific configurations that use them.
infra-repo/
βββ modules/
β βββ vpc/
β β βββ main.tf
β β βββ variables.tf
β βββ ec2_instance/
β β βββ main.tf
β β βββ variables.tf
β
βββ envs/
βββ dev/
β βββ main.tf # Calls modules for dev
β βββ terraform.tfvars
β βββ backend.tf # S3 backend config for dev state
β
βββ staging/
β βββ main.tf # Calls modules for staging
β βββ terraform.tfvars
β βββ backend.tf # S3 backend config for staging state
β
βββ prod/
βββ main.tf # Calls modules for prod
βββ terraform.tfvars
βββ backend.tf # S3 backend config for prod state
In this model, to apply to production, a developer must explicitly `cd envs/prod`. The `backend.tf` in each directory points to a completely different S3 bucket for the state file (e.g., `my-company-tfstate-prod`). There is zero chance of state file collision.
| Pro | Con |
| Total isolation of state files and failure domains. | Can lead to some boilerplate code duplication between environments. |
| Forces creation of reusable, generic modules. | Requires more discipline to keep environment configs in sync. |
| Clear and unambiguous for CI/CD pipelines. | Can be verbose if you have many micro-services or components. |
Solution 3: The “Enterprise” Option – Terragrunt
Okay, so what happens when you have dozens of AWS accounts, hundreds of micro-services, and complex dependencies between components? That’s when the directory separation model can become cumbersome. You find yourself copying and pasting the same `backend.tf` and `provider.tf` blocks everywhere. This is the problem Terragrunt was built to solve.
Terragrunt is a thin wrapper for Terraform that provides extra tools for keeping your configurations DRY (Don’t Repeat Yourself), managing remote state, and handling dependencies.
How it Works:
You create a hierarchical structure and define your backend and provider configurations once in a root `terragrunt.hcl` file. Each component then just inherits that configuration.
A Terragrunt structure might look like this:
live-infra/
βββ terragrunt.hcl # Root config with remote state, provider versions
βββ prod/
β βββ us-east-1/
β β βββ app/
β β β βββ terragrunt.hcl
β β βββ mysql/
β β β βββ terragrunt.hcl
βββ staging/
β βββ us-east-1/
β β βββ app/
β β β βββ terragrunt.hcl
β β βββ mysql/
β β β βββ terragrunt.hcl
A component’s `terragrunt.hcl` can be incredibly simple:
# live-infra/prod/us-east-1/mysql/terragrunt.hcl
include {
path = find_in_parent_folders()
}
terraform {
source = "git::ssh://git@github.com/my-company/terraform-modules.git//mysql?ref=v1.2.3"
}
inputs = {
instance_class = "db.r5.large"
allocated_storage = 200
}
You run `terragrunt apply` instead of `terraform apply`, and it handles all the backend setup and variable passing for you.
Warning from the Trenches: Don’t jump to Terragrunt just because it’s powerful. It adds another layer of abstraction and a learning curve for your team. If you’re not feeling the pain of duplicated code across many components (10+), the directory-based approach (Solution 2) is probably the right call. Solve the problem you actually have, not the one you think you might have someday.
Conclusion
The tool flame wars are fun, but they often distract from the foundational principles that actually prevent outages. A logical, scalable, and predictable repository structure is one of the most important choices you’ll make. It makes your code easier to reason about, safer for junior engineers to contribute to, and simpler for your CI/CD pipelines to execute. Start with workspaces, graduate to directory separation, and only reach for the heavy machinery like Terragrunt when you feel the scaling pains. Your on-call self will thank you at 2 AM.
π€ Frequently Asked Questions
β What are the main problems with a flat Terraform repository structure?
A flat structure leads to state file chaos (monolithic, slow, fragile state) and environment bleed, where configurations for different environments (dev, staging, prod) are easily confused, resulting in a high blast radius.
β How do Terraform Workspaces, directory-based separation, and Terragrunt compare for repo structure?
Terraform Workspaces offer internal state separation within a single directory, suitable for small teams. Directory-based separation physically isolates environments for robust state and configuration management. Terragrunt is an enterprise wrapper for DRY configurations, remote state, and dependency handling across complex, multi-account infrastructures.
β What is a common implementation pitfall when adopting advanced Terraform repo structures like Terragrunt?
A common pitfall is adopting Terragrunt prematurely without experiencing the scaling pains it addresses, such as extensive code duplication across many components. This adds unnecessary abstraction and a learning curve without proportional benefit.
Leave a Reply