🚀 Executive Summary

TL;DR: Circular dependencies, or ‘backlinks,’ in infrastructure code like Terraform cause deployment deadlocks by violating the Directed Acyclic Graph (DAG) principle. Resolve these issues by separating resource rules or refactoring modules to ensure clear, one-way data flows and prevent costly outages.

🎯 Key Takeaways

  • Infrastructure-as-code tools like Terraform operate on a Directed Acyclic Graph (DAG), meaning resources must be created in a specific, one-way order.
  • Circular dependencies, such as mutually dependent security group rules, create deadlocks because the orchestrator cannot determine a valid build order.
  • The most robust solution involves splitting resource definitions from their rules (e.g., `aws_security_group` from `aws_security_group_rule`) or refactoring into distinct modules to enforce clear component boundaries and one-way data flows.

Backlinks are the saviour. Is that true?

Circular dependencies, or “backlinks,” in infrastructure code can seem like a clever solution but often lead to deployment deadlocks. Learn why they’re a problem and discover three practical methods—from quick hacks to robust architectural changes—to resolve them for good.

Untangling the Terraform Knot: When Your ‘Clever’ Backlinks Bite Back

I still remember the 3 AM PagerDuty alert. A routine deployment for our main application stack had completely stalled. The `terraform apply` was just… hanging. For forty-five minutes. No errors, no timeout, just a blinking cursor mocking our entire on-call rotation. The junior engineer who pushed the change was frantic, swearing his change was “tiny.” He was right. It was a single security group rule. He’d created a perfect, elegant, and catastrophic circular dependency—a ‘backlink’ from the database to the app server, which already had a link back to the database. It felt like a genius move to keep the rules in sync, but in reality, he’d just handed Terraform a logic puzzle with no solution, and it decided to take our production deployment down with it.

The ‘Why’: Your Infrastructure Isn’t a Circle, It’s a One-Way Street

Let’s get one thing straight. Tools like Terraform, CloudFormation, and Pulumi operate on a simple principle: the Directed Acyclic Graph (DAG). It’s a fancy term for a to-do list where some tasks must finish before others can start. You have to build the VPC before you can create a subnet inside it. You need the database instance to exist before the application can connect to it. It’s a one-way flow of logic.

A “backlink” or circular dependency is when you tell the system that Task A depends on Task B, but Task B also depends on Task A. The orchestrator just freezes. It’s a digital chicken-and-egg problem. It can’t build the VPC because it’s waiting for the subnet, which it can’t build because it’s waiting for the VPC. That’s what happened to us. The app’s security group needed the DB’s security group ID, and the DB’s security group needed the app’s. Deadlock.

Darian’s Warning: Don’t ever trust a deployment that “worked once by accident.” These circular dependencies can sometimes slip through a `plan` if the resources already exist, luring you into a false sense of security. The trap is sprung the next time you need to create the environment from scratch.

Fixing the Mess: From Band-Aids to Brain Surgery

So, you’ve found yourself in this mess. Your pipeline is stuck, and your manager is asking for an ETA. Here’s how we get out of it, from the quick-and-dirty to the architecturally sound.

Solution 1: The Quick Fix (The “Data Source Shuffle”)

This is the emergency “get it working *now*” approach. You manually break the cycle in your code by turning one of your managed resources into a simple data lookup. You’re telling Terraform, “Don’t manage this resource, just go read its properties for me.”

Let’s say your web app security group (`prod-webapp-sg`) and database security group (`prod-rds-sg`) depend on each other.

The Problem Code:

# Web App Security Group
resource "aws_security_group" "webapp_sg" {
  name = "prod-webapp-sg"
  # ... other config ...
  egress {
    # This rule creates the dependency on the DB SG
    security_groups = [aws_security_group.db_sg.id]
    # ...
  }
}

# Database Security Group
resource "aws_security_group" "db_sg" {
  name = "prod-rds-sg"
  # ... other config ...
  ingress {
    # This rule creates the BACKLINK to the Web App SG
    security_groups = [aws_security_group.webapp_sg.id]
    # ...
  }
}

The Fix: We’ll stop managing the `db_sg` directly and just look it up. You might need to create it manually in the AWS console first if it doesn’t exist.

# Web App Security Group (Now depends on a data source)
resource "aws_security_group" "webapp_sg" {
  name = "prod-webapp-sg"
  # ... other config ...
}

# The "Backlink" - an ingress rule on the DB SG
resource "aws_security_group_rule" "db_ingress_from_webapp" {
  type              = "ingress"
  from_port         = 5432
  to_port           = 5432
  protocol          = "tcp"
  security_group_id = data.aws_security_group.db_sg.id # Use the data source ID
  source_security_group_id = aws_security_group.webapp_sg.id
}

# Use a data source to LOOK UP the DB security group instead of managing it.
# This breaks the cycle.
data "aws_security_group" "db_sg" {
  name = "prod-rds-sg" 
}

This works, but it’s a hack. You’ve introduced a piece of manually-managed infrastructure. It’s technical debt, but sometimes, it’s the debt you need to incur to end an outage.

Solution 2: The Permanent Fix (The “Rule Split”)

A much cleaner way is to separate the resources from their rules. Instead of defining ingress/egress rules inside the security group resources, define them as standalone `aws_security_group_rule` resources. This allows Terraform to create both security groups first (with no rules, so no dependencies) and then add the rules afterward, creating a clean, linear dependency chain.

# 1. Create the Web App SG (no rules, no dependencies)
resource "aws_security_group" "webapp_sg" {
  name        = "prod-webapp-sg"
  description = "Controls access for the web application"
  vpc_id      = var.vpc_id
}

# 2. Create the Database SG (no rules, no dependencies)
resource "aws_security_group" "db_sg" {
  name        = "prod-rds-sg"
  description = "Controls access for the RDS instance"
  vpc_id      = var.vpc_id
}

# 3. Now, create the rules. They depend on the SGs, but the SGs don't depend on each other.
resource "aws_security_group_rule" "db_ingress_from_webapp" {
  type                     = "ingress"
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
  security_group_id        = aws_security_group.db_sg.id
  source_security_group_id = aws_security_group.webapp_sg.id
}

resource "aws_security_group_rule" "webapp_egress_to_db" {
  type                     = "egress"
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
  security_group_id        = aws_security_group.webapp_sg.id
  destination_security_group_id = aws_security_group.db_sg.id
}

See? No more cycle. Both groups are created, and then the rules are applied. This is almost always the right answer for this specific problem.

Solution 3: The ‘Nuclear’ Option (Module Refactor)

Sometimes, a circular dependency isn’t just a mistake; it’s a symptom of a deeper architectural flaw. If your database, application, and networking are all crammed into one giant Terraform module, you’re asking for trouble. The ‘backlink’ is your codebase screaming that your components are too tightly coupled.

The fix is to do what we should have done from the start: break it down.

  1. Networking Module: A dedicated module to manage the VPC, subnets, and maybe a “shared services” security group. It has outputs like `vpc_id` and `private_subnet_ids`.
  2. Database Module: Manages the RDS instance. It takes the VPC and subnet IDs as input variables and outputs the `db_security_group_id`.
  3. Application Module: Manages the EC2 instances or ECS service. It also takes VPC/subnet IDs as inputs, and it can use a `terraform_remote_state` data source to read the `db_security_group_id` output from your database module’s state.

This is a major refactor, not a quick fix. But it’s the correct path for any system that’s growing in complexity. It forces you to think about clear boundaries and one-way data flows between your infrastructure components—just like the DAG wants you to.

Solution Effort Long-Term Viability When to Use
1. Data Source Shuffle Low Poor (Tech Debt) During an active incident to restore service quickly.
2. The Rule Split Medium Excellent The standard, go-to fix for security group cycles.
3. Module Refactor High The Best When you repeatedly hit dependency issues and your codebase is a monolith.

So, is a “backlink” a saviour? In infrastructure code, absolutely not. It’s a landmine disguised as a shortcut. The best systems are the ones you can reason about, with clear, predictable, one-way flows. Next time you feel the urge to create a clever two-way dependency, take a step back, think about the graph, and save your future on-call self a 3 AM headache.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ What is a ‘backlink’ in the context of infrastructure code?

In infrastructure code, a ‘backlink’ refers to a circular dependency where Resource A depends on Resource B, and Resource B simultaneously depends on Resource A, leading to deployment deadlocks in tools like Terraform.

âť“ How does the ‘Rule Split’ method compare to the ‘Data Source Shuffle’ for resolving circular dependencies?

The ‘Rule Split’ is a permanent, architecturally sound solution that defines rules as standalone resources, allowing the core resources to be created first. The ‘Data Source Shuffle’ is a quick, temporary hack for active incidents, introducing technical debt by manually looking up one resource’s properties.

âť“ What is a common pitfall when encountering circular dependencies in Terraform?

A common pitfall is a false sense of security when `terraform plan` or `apply` appears to work because resources already exist. The true issue of the circular dependency only manifests when attempting to create the environment from scratch, leading to unexpected failures.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading