🚀 Executive Summary

TL;DR: Legacy ISC DHCP servers are brittle and prone to manual errors, leading to cascading failures in modern infrastructure. Modernizing with Kea/Stork requires embracing a paradigm shift to an API-first, automation-driven approach, offering solutions from high-availability database backends to full Infrastructure as Code for reliable, scalable DHCP.

🎯 Key Takeaways

  • Kea is a paradigm shift from ISC DHCP, moving from monolithic config files to JSON configuration and an API-first design, requiring new architectural considerations.
  • For production, a High-Availability (HA) Kea pair with a shared SQL database backend (e.g., PostgreSQL) is the minimum standard to prevent single points of failure and ensure continuous operation.
  • The most robust Kea deployment involves treating DHCP configuration as code, using CI/CD pipelines to generate and deploy configurations, and leveraging the Kea Control Agent API for dynamic, auditable changes.

Anyone using Stork/Kea DHCP in production?

Struggling to modernize your DHCP infrastructure with Kea/Stork? This senior engineer’s guide cuts through the noise, offering three real-world solutions—from a quick-and-dirty fix to a fully automated, production-grade setup—to solve common deployment headaches.

So You Want to Run Kea DHCP in Production? An Engineer’s Unfiltered Guide

It was 2:37 AM. The alert on my phone was screaming about a cascading failure in our primary application cluster. The symptoms made no sense—pods were failing to get network interfaces, and new VMs wouldn’t provision. After a frantic 20 minutes, we found the culprit. Not a database, not a Kubernetes scheduler, but our ancient, rickety ISC DHCP server. Someone had manually added a static reservation for a new SAN controller a week ago and, in the process, created an overlapping IP range. The server didn’t crash; it just started handing out duplicate IPs, causing absolute chaos. That was the night I decided we were done with fragile, manually-edited config files. We needed something built for the modern era. That journey led us to Kea, and let me tell you, it was a journey.

The Real Problem: It’s a Paradigm Shift, Not a Drop-In Replacement

A lot of engineers I talk to get frustrated with Kea because they treat it like a 1:1 replacement for the old isc-dhcp-server. It’s not. The core reason people stumble is that they’re trading one problem set for another. You’re leaving behind the world of a single, monolithic dhcpd.conf file that you manually edit with Vim. That world is familiar, but it’s brittle, hard to audit, and impossible to automate safely.

Kea, with its JSON configuration and API-first design, moves you into a world of structured data and automation. This is incredibly powerful, but it also introduces new potential failure points: database backends, API authentication, and the Stork agent/server relationship. The key is to understand this trade-off and build your architecture to handle it. You can’t just install Kea, copy-paste some config, and expect it to be magically better. You have to embrace the new model.

The Solutions: From Bleeding Triage to Bulletproof Automation

Over the years, we’ve deployed Kea in various stages of maturity. Here are the three main patterns we’ve used, from the “we need this working yesterday” fix to our current “set it and forget it” architecture.

Solution 1: The Quick Fix (Standalone with File-Based Leases)

This is the simplest way to get off the ground. You run a single Kea instance and use its default `memfile` lease backend. All your configuration lives in one JSON file, `kea-dhcp4.conf`, which you edit by hand. It’s basically a modern version of the old ISC setup.

When to use it: Lab environments, small/non-critical networks, or as a temporary measure during a migration.

The Guts: Your config is straightforward. You define your interfaces, subnets, and lease database location directly in the JSON.


{
"Dhcp4": {
    "interfaces-config": {
        "interfaces": [ "eth0" ]
    },
    "lease-database": {
        "type": "memfile",
        "lfc-interval": 3600
    },
    "subnet4": [
        {
            "subnet": "192.168.1.0/24",
            "pools": [ { "pool": "192.168.1.100 - 192.168.1.200" } ],
            "option-data": [
                {
                    "name": "routers",
                    "data": "192.168.1.1"
                },
                {
                    "name": "domain-name-servers",
                    "data": "8.8.8.8, 8.8.4.4"
                }
            ]
        }
    ]
}
}

Warning: This approach is a single point of failure. If the server `prod-dhcp-01` goes down, your network is blind. Use this to get familiar with the syntax, but don’t bet your production environment on it.

Solution 2: The Permanent Fix (High-Availability with a Database Backend)

This is the minimum standard for any real production environment. Here, you run at least two Kea servers in a high-availability (HA) pair. They share lease information via a common database backend (we use a dedicated PostgreSQL cluster, but MySQL/MariaDB works too). This ensures that if one DHCP server fails, the other can immediately take over without interruption.

When to use it: Any production network where DHCP downtime is unacceptable.

The Guts: You’ll add a `high-availability` block to your config on both servers and point the `lease-database` to your SQL server. This example shows a load-balancing setup.


{
"Dhcp4": {
    // ... other configs like interfaces ...
    "lease-database": {
        "type": "postgresql",
        "name": "kea_leases",
        "host": "prod-db-cluster.internal",
        "user": "kea_user",
        "password": "super_secret_password"
    },
    "high-availability": [
        {
            "this-server-name": "dhcp-primary-01.techresolve.io",
            "mode": "load-balancing",
            "peers": [
                {
                    "name": "dhcp-primary-01.techresolve.io",
                    "url": "http://10.0.1.10:8080/",
                    "role": "primary"
                },
                {
                    "name": "dhcp-secondary-01.techresolve.io",
                    "url": "http://10.0.1.11:8080/",
                    "role": "secondary"
                }
            ]
        }
    ],
    "subnet4": [
        // ... your subnet definitions ...
    ]
}
}

At this stage, you absolutely should be running Stork. The Stork dashboard gives you visibility into lease usage, HA status, and server health. You can’t manage what you can’t see, and Stork is your eyes and ears.

Solution 3: The ‘Nuclear’ Option (Infrastructure as Code & API-Driven)

This is where we are now. We treat our DHCP configuration as code. The JSON files on the servers are considered ephemeral artifacts, not the source of truth. Our actual source of truth is a combination of YAML files in a Git repository and our IPAM system (we use NetBox).

When to use it: When you have a mature DevOps practice and need to manage DHCP at scale across multiple environments with full auditability.

The Workflow:

  1. A network engineer needs a new DHCP reservation. They don’t SSH into a server. They open a pull request in Git to add an entry to a `reservations.yaml` file.
  2. The PR is peer-reviewed and merged.
  3. A CI/CD pipeline (Jenkins/GitLab CI) triggers. A script (we use Python, but Ansible works great too) reads all our sources of truth (Git files, NetBox API for subnets) and generates a complete, validated `kea-dhcp4.conf` file.
  4. The pipeline pushes this new configuration file to our DHCP servers (`dhcp-primary-01` and `dhcp-secondary-01`) and runs a command to gracefully reload the Kea service.

For dynamic, on-the-fly changes, our internal provisioning tools talk directly to the Kea Control Agent API to add or remove host reservations without ever touching a config file.

Pro Tip: Kea’s API is powerful but can be a security risk. Make sure your Control Agent is firewalled off and only accessible from trusted management hosts or services. Do not expose it to the open network.

Approach Pros Cons
1. Quick Fix Simple, fast to deploy, minimal dependencies. Single point of failure, manual process, not scalable.
2. Permanent Fix Resilient (HA), scalable, visible with Stork. Adds DB dependency, more complex setup.
3. ‘Nuclear’ Option Fully automated, auditable, repeatable, elite scalability. High initial setup complexity, requires mature IaC/CI-CD.

My Final Take

Is Kea/Stork ready for production? Absolutely. But it demands that you level up your infrastructure management practices. You can’t just wing it like you might have with the old tools. Start with the HA database model—it’s the sweet spot of reliability and manageable complexity. Once you’re comfortable there, start looking at automating your configuration management. The peace of mind that comes from knowing your DHCP infrastructure is as reliable and auditable as the rest of your modern stack is worth every bit of the effort.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ Is Kea/Stork ready for production use?

Yes, Kea/Stork is absolutely ready for production, but it requires leveling up infrastructure management practices to embrace its structured data, API-first design, and automation capabilities.

âť“ How does Kea/Stork compare to traditional ISC DHCP?

Kea/Stork replaces the brittle, manually edited `dhcpd.conf` files of ISC DHCP with JSON configuration and an API-first design, enabling structured data, automation, high availability, and better auditability, moving beyond the single point of failure inherent in older systems.

âť“ What is a common implementation pitfall when deploying Kea?

A common pitfall is deploying Kea as a standalone instance with a file-based lease backend, creating a single point of failure. The solution is to implement a High-Availability (HA) pair with a shared database backend (like PostgreSQL) for lease information to ensure resilience.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading