🚀 Executive Summary

TL;DR: The ‘Chicken and Egg’ security group problem in Terraform occurs when an ALB and ECS Task security group have a circular dependency, preventing Terraform from creating them. The recommended solution involves defining security groups and their rules as separate resources to break this dependency.

🎯 Key Takeaways

  • The ‘Chicken and Egg’ problem stems from a circular dependency where an ALB’s security group needs the ECS Task’s security group ID for egress, and the ECS Task’s security group needs the ALB’s security group ID for ingress.
  • The canonical and idempotent solution is to use separate `aws_security_group_rule` resources for each ingress and egress rule, rather than defining them inline within the `aws_security_group` resource.
  • The ‘Two-Apply Tango’ (manual multiple `terraform apply` steps) is an anti-pattern that breaks Infrastructure as Code principles and is unsuitable for production or automated CI/CD.
  • A ‘Self-Referencing Egress Rule’ (`self = true`) on the ALB’s security group is a valid, but less intuitive, alternative to break the cycle by allowing egress to other resources within the same security group.
  • Breaking out rules into separate resources allows Terraform to create the security groups first, then add the interdependent rules, resolving the `Cycle detected` error.

How would you all handle the ALB-to-EcsTask

Tackle Terraform’s infamous ‘Chicken and Egg’ security group cycle between an ALB and ECS Task. This guide provides three real-world solutions, from quick hacks to the permanent, idempotent fix for production environments.

Solving the ALB-to-ECS “Chicken and Egg” Security Group Problem in Terraform

I still remember the 3 AM PagerDuty alert. A critical deployment for our `checkout-service` was failing. The pipeline was red, the on-call junior was panicking, and the error message was the one that still gives me a nervous twitch: Error: Cycle: aws_security_group.ecs_sg, aws_security_group.alb_sg. A “simple” security group change had created a circular dependency from hell, and our entire release was blocked. This isn’t just a theoretical problem; it’s a rite of passage for anyone building AWS infrastructure with Terraform, and understanding how to fix it properly separates the pros from the people who get paged at 3 AM.

So, What’s Actually Happening? The Root of the Cycle

Before we dive into the fixes, let’s get on the same page about why this happens. It’s a classic catch-22, a true “chicken and egg” problem that Terraform’s dependency graph can’t solve on its own.

Here’s the logical loop you’re trying to create:

  • The Application Load Balancer (ALB): You want its Security Group (SG) to allow egress (outbound traffic) only to the ECS Task’s SG. To do this, it needs the ID of the ECS Task’s SG.
  • The ECS Task: You want its Security Group to allow ingress (inbound traffic) only from the ALB’s SG. To do this, it needs the ID of the ALB’s SG.

You see the problem? The ALB SG needs the ECS SG to exist first, but the ECS SG needs the ALB SG to exist first. When Terraform builds its graph of what to create, it sees this loop and throws its hands up with a Cycle detected error. It literally cannot proceed.

Three Ways to Break the Cycle

I’ve seen teams handle this in a few different ways, ranging from “please don’t do this” to “this is the gold standard.” Let’s walk through them.

Solution 1: The “Two-Apply Tango” (And Why You Shouldn’t Do It)

When you’re new to Terraform and hit this wall, the first instinct is often to force it through manually. It looks something like this:

  1. You write your code with the circular dependency.
  2. You run terraform apply and it fails.
  3. You comment out the egress block on the ALB security group.
  4. You run terraform apply again. It works! The SGs are created, but they aren’t configured correctly.
  5. You uncomment the egress block.
  6. You run terraform apply a final time. It works, adding the final rule.

Why it’s a bad idea: This “fix” completely breaks the promise of Infrastructure as Code. It’s not repeatable, it’s not idempotent, and it will absolutely fail in any automated CI/CD pipeline. It’s a useful exercise to understand the problem, but it has no place in a real environment.

A Word From The Trenches: If you find yourself needing to run `apply` multiple times with code changes in between to get to a stable state, you’re fighting the tool, not using it. It’s a signal that your resource definitions are flawed.

Solution 2: The “Right Way” – The Separate Resource Pattern

This is the canonical, idempotent, and correct way to solve this problem. Instead of defining your rules inside the aws_security_group resource, you break them out into separate aws_security_group_rule resources.

This breaks the cycle because Terraform’s graph now looks like this:

  1. Create the ALB Security Group (with no rules that reference the ECS SG).
  2. Create the ECS Task Security Group (with no rules that reference the ALB SG).
  3. Once both groups exist and have IDs, create the rule that connects them.

Here’s what that looks like in practice.

The “Wrong Way” (Inline Rules):

# THIS CODE CREATES A CYCLE AND WILL FAIL
resource "aws_security_group" "alb_sg" {
  name   = "prod-alb-sg"
  vpc_id = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # THIS IS THE PROBLEM
  egress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.ecs_sg.id] # Needs ecs_sg to exist
  }
}

resource "aws_security_group" "ecs_sg" {
  name   = "prod-app-task-sg"
  vpc_id = var.vpc_id

  # AND THIS IS THE OTHER HALF OF THE PROBLEM
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb_sg.id] # Needs alb_sg to exist
  }
}

The “Right Way” (Separate Rules):

# Step 1: Define the security groups themselves, with no cyclical rules.
resource "aws_security_group" "alb_sg" {
  name        = "prod-alb-sg"
  description = "Controls access to the production ALB"
  vpc_id      = var.vpc_id
  
  # A default egress rule is fine, we will add a specific one later.
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "ecs_sg" {
  name        = "prod-app-task-sg"
  description = "Controls access to the production app tasks"
  vpc_id      = var.vpc_id
}

# Step 2: Define the rules separately, referencing the created groups.
# This breaks the cycle!

# Rule: Allow the ALB to receive traffic from the internet on 443
resource "aws_security_group_rule" "alb_ingress_world" {
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.alb_sg.id
}

# Rule: Allow the ECS Task to receive traffic from the ALB
resource "aws_security_group_rule" "ecs_ingress_from_alb" {
  type                     = "ingress"
  from_port                = 8080 # The port your container is listening on
  to_port                  = 8080
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.alb_sg.id
  security_group_id        = aws_security_group.ecs_sg.id
}

# Rule: Allow the ALB to send traffic to the ECS Task
resource "aws_security_group_rule" "alb_egress_to_ecs" {
  type                     = "egress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.ecs_sg.id
  security_group_id        = aws_security_group.alb_sg.id
}

This is the pattern we enforce at TechResolve. It’s clean, declarative, and works perfectly with automation.

Solution 3: The “Clever Hack” – The Self-Referencing Egress Rule

There is another pattern that works, and while it’s less common, it’s worth knowing. You can break the cycle by having the ALB’s security group allow egress traffic *to itself*.

It sounds strange, but it works. The logic is that any resource within the ALB’s security group is allowed to talk to any other resource in that same security group. Since the Target Group is associated with the ALB security group, this rule effectively allows the ALB to talk to its targets.

resource "aws_security_group" "alb_sg" {
  name   = "prod-alb-sg-self-ref"
  vpc_id = var.vpc_id

  # Allow egress to other resources within this same security group
  egress {
    from_port = 0
    to_port   = 0
    protocol  = "-1"
    self      = true
  }
}

resource "aws_security_group" "ecs_sg" {
  name   = "prod-app-task-sg"
  vpc_id = var.vpc_id

  # Ingress is simple: just allow from the ALB SG
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb_sg.id]
  }
}

This works because the aws_security_group.alb_sg resource no longer depends on aws_security_group.ecs_sg. Its egress rule only depends on itself, which Terraform can resolve. Then, the ECS security group can safely depend on the ALB security group.

Warning: While this pattern is valid and more concise, it can be less intuitive for other engineers reading your code. A rule allowing egress to “self” isn’t as immediately obvious as an explicit rule pointing from SG-A to SG-B. I generally prefer the clarity of Solution 2.

Comparison and Final Recommendation

Let’s put it all in a table to make the choice clear.

Solution Idempotent? Clarity Recommendation
1. Two-Apply Tango No Very Low Never in production. A learning tool only.
2. Separate Rules Yes High The recommended production-ready standard.
3. Self-Referencing Rule Yes Medium A valid but less common alternative. Use if your team understands the pattern.

My advice is simple: use the separate aws_security_group_rule pattern (Solution 2). It’s the cleanest, most explicit, and most maintainable way to solve this problem. It makes your infrastructure’s network policy crystal clear to anyone who reads the code, and it will save you from those 3 AM pages. Don’t be clever; be clear. Your future self will thank you.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ What causes the ‘Chicken and Egg’ security group problem in Terraform?

The problem arises from a circular dependency where an Application Load Balancer (ALB) security group requires the ID of an ECS Task security group for its egress rule, and simultaneously, the ECS Task security group requires the ID of the ALB security group for its ingress rule. Terraform’s dependency graph cannot resolve this loop.

âť“ How does the ‘Separate Resource Pattern’ solve the security group cycle?

The ‘Separate Resource Pattern’ solves the cycle by defining `aws_security_group` resources without interdependent rules, and then creating `aws_security_group_rule` resources separately. This allows Terraform to first create both security groups, obtaining their IDs, and then apply the rules that reference those IDs, breaking the circular dependency.

âť“ What are the drawbacks of using the ‘Two-Apply Tango’ solution?

The ‘Two-Apply Tango’ is not repeatable, not idempotent, and completely breaks the principles of Infrastructure as Code. It requires manual intervention (commenting/uncommenting code and multiple `terraform apply` commands) and will fail in any automated CI/CD pipeline, making it unsuitable for production environments.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading