🚀 Executive Summary
TL;DR: Directly accessing Terraform remote state via `terraform_remote_state` creates brittle, implicit dependencies that can lead to catastrophic failures when upstream infrastructure changes. A robust solution involves using AWS SSM Parameter Store as a decoupled message bus, allowing producer configurations to explicitly publish outputs and consumer configurations to safely read them, ensuring stability and separation of concerns.
🎯 Key Takeaways
- Direct `terraform_remote_state` access introduces high coupling and implicit dependencies, making infrastructure brittle and prone to failure with internal structure changes in remote states.
- AWS SSM Parameter Store provides a decoupled mechanism for sharing Terraform outputs, acting as a stable, explicit contract between producer and consumer configurations, enhancing deployment stability.
- For large-scale Infrastructure-as-Code, frameworks like Terragrunt can automate dependency management, abstracting the SSM pattern to provide consistency and guardrails for complex environments.
Tired of Terraform’s tangled state dependencies? We break down why direct terraform_remote_state access is a high-risk gamble and show you how to use AWS SSM Parameter Store for a truly decoupled and scalable infrastructure.
Terraform’s Tightrope: Ditching Remote State for the Sanity of SSM
I still remember the 2 A.M. PagerDuty alert. A routine deployment for our main application, `prod-checkout-api`, was failing catastrophically. The error was cryptic, something about an index out of bounds in a `for_each` loop. After an hour of frantic debugging, we found the culprit. The networking team, on a completely separate schedule, had added a new private subnet to the main VPC. Our application’s Terraform configuration was using a data "terraform_remote_state" block to pull in the list of subnet IDs directly from the networking state file. It assumed there would always be three private subnets. The fourth one broke everything. That night, I swore off direct remote state access for anything that mattered. It creates a hidden, brittle contract between teams that is guaranteed to break when you least expect it.
The Root of the Problem: Tight Coupling
So, you’re building out your infrastructure. You have one Terraform configuration for your network (VPCs, subnets) and another for your application (EC2 instances, databases). The app needs to know the VPC ID and subnet IDs. The seemingly obvious answer is the terraform_remote_state data source. It lets you peek into the state file of another configuration and pull out its outputs. Simple, right?
Wrong. What you’ve actually done is create a hard, implicit dependency. Your application’s infrastructure is now intimately tied to the *internal structure* of your networking infrastructure’s outputs. If the networking team renames an output, changes a data type from a list to a map, or, like in my story, changes the number of items in a list, your application’s deployment will fail. It breaks the “separation of concerns” principle and turns your infrastructure into a house of cards.
Let’s walk through the common ways teams handle this, from the quick and dirty to the way we do it at TechResolve.
Solution 1: The ‘Get It Done’ Method (With a Warning Label)
This is the direct remote state access we just talked about. I’m including it because you’ll see it everywhere, and frankly, for a small, single-team project, it can feel expedient. But you need to know the risks you’re taking on.
Here’s what it looks like in your application’s Terraform code:
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "techresolve-tf-state-prod"
key = "networking/vpc-main/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "app_server" {
# ... other config ...
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
# DANGEROUS: Directly coupling to the remote state output
subnet_id = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
}
When to use it: Honestly? Almost never in a production, multi-team environment. Maybe for a quick proof-of-concept or a project where the same person manages both states and understands the tight coupling. It’s a landmine waiting for a future you (or a new team member) to step on.
Solution 2: The Decoupled Dream (Using SSM Parameter Store)
This is the gold standard and my strong recommendation. The idea is simple: instead of having one configuration “peek” into another’s state, you use a neutral third party as a message bus. The “producer” configuration (e.g., networking) explicitly publishes its key outputs to a known location, and the “consumer” configuration (e.g., the application) explicitly consumes them. In AWS, the perfect tool for this is the SSM Parameter Store.
Step 1: The Producer (networking/vpc) writes its outputs to SSM.
# In your networking/main.tf
resource "aws_vpc" "main" {
# ... vpc config ...
}
# This is the "contract". We are publishing the VPC ID.
resource "aws_ssm_parameter" "vpc_id" {
name = "/infra/prod/network/vpc_id"
type = "String"
value = aws_vpc.main.id
}
# You can even store lists as a StringList or comma-delimited string
resource "aws_ssm_parameter" "private_subnet_ids" {
name = "/infra/prod/network/private_subnet_ids"
type = "StringList"
value = join(",", aws_subnet.private[*].id)
}
Step 2: The Consumer (application/ec2) reads the values from SSM.
# In your application/main.tf
# Read the "contract" value from SSM
data "aws_ssm_parameter" "vpc_id" {
name = "/infra/prod/network/vpc_id"
}
data "aws_ssm_parameter" "private_subnet_ids" {
name = "/infra/prod/network/private_subnet_ids"
}
resource "aws_instance" "app_server" {
# ... other config ...
vpc_security_group_ids = [aws_security_group.app_sg.id]
# Now we are safely consuming the decoupled value
subnet_id = split(",", data.aws_ssm_parameter.private_subnet_ids.value)[0]
}
This approach decouples your deployments. The networking team can now add, remove, or reorder subnets all day long. As long as the SSM parameter /infra/prod/network/private_subnet_ids is kept up-to-date, the application deployment doesn’t care. It has a stable, well-defined interface to query.
Darian’s Pro Tip: Establish a clear and consistent naming convention for your SSM parameters from day one. We use a hierarchy like
/{context}/{environment}/{component}/{name}. For example:/terraform/prod/main-vpc/private_subnet_ids. It makes discovery and IAM policies so much easier.
Solution 3: The ‘Let’s Get Serious’ Option (Tooling & Frameworks)
When your infrastructure-as-code grows to a certain scale, managing these dependencies manually, even with SSM, can become a chore. This is where tools like Terragrunt come in. Terragrunt is a thin wrapper for Terraform that provides extra tools for working with multiple modules, remote state, and locking.
It has a first-class concept of dependencies. You can define that your `app` module depends on the `vpc` module’s outputs, and Terragrunt will handle fetching them for you in a sane way. It’s essentially automating the SSM pattern for you.
# In your app/terragrunt.hcl file
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-12345" # for local planning
}
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
}
This is a more advanced approach that requires buying into a specific framework, but for large organizations, the consistency and guardrails it provides can be a lifesaver. It solves the same coupling problem but at a higher level of abstraction.
Comparison at a Glance
| Criteria | Remote State Data | SSM Parameter Store | Terragrunt/Frameworks |
| Coupling | Very High. Brittle and implicit. | Low. Decoupled via a defined contract. | Low. Managed and explicit. |
| Complexity | Low initial effort, high long-term risk. | Medium. Requires defining and managing parameters. | High. Requires learning a new tool/framework. |
| Security | Poor. Requires broad read access to entire state files. | Excellent. Fine-grained IAM control per parameter. | Good. Manages state access under the hood. |
| Best For | Quick PoCs, single-person projects. (Use with caution!) | Most production use cases. The reliable workhorse. | Large-scale, multi-team, complex IaC environments. |
So, the next time you’re tempted to reach for terraform_remote_state, take a moment. Think about that 2 A.M. phone call. Do your future self a favor and set up a proper contract with SSM. It’s a few extra lines of code that will save you countless hours of headaches down the road.
🤖 Frequently Asked Questions
âť“ Why should `terraform_remote_state` be avoided in production environments?
`terraform_remote_state` creates hard, implicit dependencies on the internal structure of another configuration’s outputs. This tight coupling means changes like renaming outputs, altering data types, or modifying item counts in the remote state can cause catastrophic failures in dependent deployments.
âť“ How does using SSM Parameter Store compare to `terraform_remote_state` for managing infrastructure dependencies?
SSM Parameter Store offers low coupling, excellent security with fine-grained IAM control, and acts as an explicit contract for sharing values. Conversely, `terraform_remote_state` results in very high, brittle coupling, poor security due to broad state file access, and is generally recommended only for quick proofs-of-concept.
âť“ What is a recommended naming convention for SSM parameters when sharing Terraform outputs?
A clear and consistent hierarchical naming convention, such as `/{context}/{environment}/{component}/{name}` (e.g., `/infra/prod/network/vpc_id`), is recommended. This approach simplifies parameter discovery and makes IAM policy management much easier.
Leave a Reply