🚀 Executive Summary

TL;DR: Terraform’s stateful nature, with its `.tfstate` file creating a second source of truth, challenges pure GitOps’ single source of truth principle. Effective integration requires bridging this gap through structured CI/CD pipelines, dedicated GitOps orchestrators like Atlantis for PR-driven workflows, or by decomposing monolithic Terraform state files for enhanced safety and scalability.

🎯 Key Takeaways

  • Terraform’s `terraform.tfstate` file introduces a “two sources of truth” problem, where the desired state in Git (`.tf` files) and the last known state (`.tfstate`) must be reconciled, unlike pure declarative GitOps tools.
  • Dedicated GitOps orchestrators like Atlantis are the “gold standard” for Terraform GitOps, integrating the `terraform plan` and `apply` lifecycle directly into pull requests for collaborative review and audited execution.
  • Radically decomposing monolithic Terraform state files into smaller, domain-specific states significantly reduces the blast radius, enables granular permissions, and accelerates plan/apply cycles, enhancing safety and scalability.

Does Terraform fit into the GitOps story

Terraform’s stateful nature creates a unique challenge for pure GitOps, but it’s a solvable problem. This guide explores practical solutions, from simple CI/CD wrappers to dedicated tools like Atlantis, to bridge the gap and make Terraform a first-class citizen in your GitOps workflow.

Does Terraform Even *Do* GitOps? A Field Guide for the Perplexed.

I still remember the day a junior engineer, let’s call him Alex, took down half our staging environment. It was 4:45 PM on a Friday, of course. He’d been asked to add a single egress rule to a security group. He cloned the repo, made the change, and ran terraform apply from his laptop. What he didn’t know was that another engineer had manually deleted a “temporary” S3 bucket an hour earlier, a bucket that was still defined in the Terraform code. His `apply` didn’t just add the rule; it helpfully tried to “fix” the drift by deleting a bunch of other resources that depended on that S3 bucket. It was a complete mess, born from two well-intentioned people working outside of a predictable process. That’s the moment the “Terraform on your laptop” rule died forever at our company and we got serious about this exact question.

The Elephant in the Room: The Terraform State File

Before we dive into solutions, let’s get real about why this is even a discussion. The core of the GitOps philosophy is that your Git repository is the single source of truth for the desired state of your infrastructure. Tools like Argo CD or Flux are brilliant at this for Kubernetes: they see a YAML file in Git, compare it to what’s running in the cluster, and make it so.

Terraform throws a wrench in this beautiful, simple model. Why? Because it has two sources of truth:

  1. Your .tf files in Git: This is your desired state. “I want three EC2 instances of this type.”
  2. The terraform.tfstate file: This is Terraform’s understanding of the *current* state. “I last saw three EC2 instances with these specific IDs and IP addresses.”

Terraform uses the state file to generate a plan. It compares your code (desired state) to its state file (last known state) and the real world (actual state) to figure out what needs to be created, updated, or destroyed. A pure GitOps controller doesn’t understand this three-way reconciliation or the plan/apply lifecycle. It just wants to slap a declarative config onto an API. This mismatch is the source of all our pain.

The Fixes: From Duct Tape to Dedicated Machinery

So, how do we solve it? I’ve seen teams try everything, but the solutions generally fall into three categories. We’ll go from the quick-and-dirty to the robust and scalable.

Solution 1: The “Good Enough for Now” CI/CD Wrapper

This is the most common starting point. You treat Terraform execution as just another step in your existing CI/CD pipeline (e.g., GitHub Actions, GitLab CI, Jenkins).

The flow looks like this:

  • A developer opens a Pull Request with a change to a .tf file.
  • The pipeline automatically triggers, checks out the code, and runs terraform plan.
  • The plan output is posted as a comment on the PR for review.
  • Once the PR is approved and merged into the main branch, a separate pipeline job triggers.
  • This job runs terraform apply -auto-approve.

Here’s a simplified GitHub Actions example for the `apply` step:


name: 'Terraform Apply on Main'

on:
  push:
    branches:
      - main

jobs:
  terraform:
    name: 'Terraform'
    runs-on: ubuntu-latest
    steps:
    - name: 'Checkout'
      uses: actions/checkout@v3

    - name: 'Terraform Apply'
      uses: hashicorp/terraform-github-actions@v0.1.0
      with:
        tf_actions_version: 1.2.0
        tf_actions_subcommand: 'apply'
        tf_actions_working_dir: '.'
        tf_actions_comment: false
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        TF_VAR_some_variable: "value"

Darian’s Warning: This is a “push-based” approach, not a “pull-based” one like true GitOps. The pipeline pushes the change; nothing is pulling from the repo to reconcile state. If someone makes a manual change in the cloud console, this pipeline won’t know about it until the next time you run a commit. It stops the “apply from my laptop” problem, but it doesn’t fully solve for state drift.

Solution 2: The “Do It Right” GitOps Orchestrator

This is where we bring in tools designed specifically for this problem. The goal is to bring the plan/apply lifecycle directly into your Git workflow, making it transparent and collaborative.

My go-to recommendation here is Atlantis. It’s an open-source application you host yourself that listens for Terraform pull requests. It’s a game-changer.

The Atlantis flow:

  1. A developer opens a Pull Request.
  2. Atlantis automatically picks it up and runs terraform plan in the background.
  3. It then posts the entire plan output as a clean, formatted comment right inside the PR. The whole team can see *exactly* what will change.
  4. To proceed, a team member with permissions comments back: atlantis apply.
  5. Atlantis runs the apply, captures the output, and posts that back to the PR as well.
  6. Once merged, the change is complete and fully audited within the PR history.

This is GitOps for Terraform done right. The conversation, the plan, and the execution all live in the pull request. It becomes the unit of infrastructure change. Terraform Cloud and Spacelift offer a more polished, SaaS version of this same core workflow, with added features like policy-as-code (OPA/Sentinel) and cost estimation.

Solution 3: The “Rethink Your State” Structural Fix

Sometimes the tool isn’t the problem; the scope is. I’ve seen teams trying to manage their entire company’s infrastructure in a single, monolithic Terraform state file. A plan in that repo could take 15 minutes to run and propose changing everything from a DNS record to a production Kubernetes cluster. It’s terrifying and fragile.

The “nuclear” option here isn’t to ditch Terraform, but to radically decompose your state. Don’t have one giant state; have dozens of small ones, each with a clear, limited blast radius.

  • networking-prod state: Manages the core VPC, subnets, and routing. Changes rarely.
  • platform-services-prod state: Manages the EKS cluster, node groups, and core IAM roles.
  • app-billing-prod state: Manages the S3 buckets, DynamoDB table, and Lambda function for just the billing application.

When you break things down this way, applying GitOps principles becomes much safer. An automated `apply` on the `app-billing-prod` state is low-risk. You can grant the pipeline credentials that *only* have access to manage those specific resources. This approach, combined with a tool like Atlantis (Solution 2), is the sweet spot for mature teams.

Final Verdict: A Side-by-Side Comparison

Approach Pros Cons Best For
1. CI/CD Wrapper – Easy to set up with existing tools.
– Stops local applies.
– Not true GitOps (push-based).
– Risk of -auto-approve.
– Poor visibility on the PR itself.
Small teams just getting started with IaC automation.
2. GitOps Orchestrator (Atlantis) – Brings plan/apply into the PR.
– Excellent visibility & collaboration.
– Full audit trail.
– Requires hosting and maintaining another tool.
– Learning curve for the team.
Most teams serious about IaC. This is the gold standard.
3. Decomposed State – Massively reduces blast radius.
– Enables granular permissions.
– Faster plan/apply cycles.
– Can be complex to manage dependencies between states (e.g., using terraform_remote_state).
– Requires significant architectural discipline.
Large, mature organizations with complex infrastructure.

So, does Terraform fit into the GitOps story? Absolutely, but not out of the box. You have to be intentional. You have to bridge the gap between its stateful, imperative execution model and the declarative, self-healing world of GitOps. Start with a simple pipeline, but have a plan to graduate to a real orchestrator like Atlantis. And for the love of all that is holy, stop running terraform apply from your laptop.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

❓ What is the fundamental challenge of integrating Terraform into a pure GitOps workflow?

The fundamental challenge stems from Terraform’s `terraform.tfstate` file, which acts as a second source of truth alongside the `.tf` files in Git, conflicting with the GitOps principle of Git being the *sole* source of truth for desired state.

❓ How do CI/CD wrappers compare to dedicated GitOps orchestrators for managing Terraform?

CI/CD wrappers are a “push-based” approach, automating `terraform plan` and `apply` but not actively reconciling state from Git. Dedicated GitOps orchestrators like Atlantis provide a “pull-based” model, listening for PRs, running plans, and executing applies based on explicit team approval within the Git workflow, offering superior visibility and auditability.

❓ What is a critical pitfall to avoid when implementing Terraform GitOps, and what’s the solution?

A critical pitfall is allowing manual `terraform apply` commands from local machines, which can lead to state drift and unintended resource changes. The solution is to enforce all Terraform executions through automated, centralized processes like CI/CD pipelines or dedicated GitOps orchestrators, ensuring changes are reviewed and applied predictably.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading