🚀 Executive Summary
TL;DR: systemd-resolved often causes mysterious DNS failures on Linux servers, particularly with internal corporate domains, due to its strict DNSSEC validation for unsigned zones. The core problem is `systemd-resolved` returning `SERVFAIL` instead of a record when an upstream server doesn’t support DNSSEC correctly for a specific zone. Solutions involve configuring `DNSSEC=allow-downgrade`, disabling the local stub listener, or completely disabling `systemd-resolved` to use a static `/etc/resolv.conf`.
🎯 Key Takeaways
- systemd-resolved acts as a local DNS stub resolver on `127.0.0.53`, handling DNS requests, caching, and critically, DNSSEC validation.
- Strict DNSSEC enforcement by `systemd-resolved` can lead to `SERVFAIL` errors for internal corporate domains that are not DNSSEC-signed, even if public DNS resolution works.
- Common fixes include setting `DNSSEC=allow-downgrade` in `/etc/systemd/resolved.conf`, disabling `DNSStubListener=no` and symlinking `/etc/resolv.conf` to `/run/systemd/resolve/resolv.conf`, or completely disabling `systemd-resolved` and creating a static `/etc/resolv.conf`.
Is systemd-resolved causing mysterious DNS failures on your Linux servers? A Senior DevOps Engineer breaks down why it happens, explains the DNSSEC confusion, and provides three practical, real-world fixes to get your services back online.
Wrestling the Hydra: Taming systemd-resolved and DNSSEC in the Real World
I still remember the 3 AM page. A new fleet of Kubernetes nodes, `kube-worker-prod-us-east-1c-01` through `20`, had just been provisioned with our standard Ubuntu 22.04 image. Everything looked green, until our CI/CD pipeline started failing with the most infuriating error: “Could not resolve host: internal.artifacts.corp”. The junior engineer on call was pulling his hair out. He could SSH into the box, `ping 8.8.8.8` worked, `curl google.com` worked, but anything on our internal corporate domain was dead in the water. After 30 minutes of frantic debugging, we found the culprit. His `dig` command was getting a `SERVFAIL` status, and `/etc/resolv.conf` pointed to a weird localhost address: `127.0.0.53`. Welcome, my friends, to the wonderful world of `systemd-resolved`.
So, What’s Actually Going On Here?
In modern Linux distributions (like Ubuntu 18.04+), `systemd-resolved` has taken over DNS management. It’s not just a simple config file anymore. It acts as a local DNS stub resolver. This means that instead of your applications talking directly to your DNS server (like your router or a public one), they talk to a little service running on the machine itself at `127.0.0.53`. This little service then forwards the request, caches the result, and—this is the critical part—can perform DNSSEC validation.
DNSSEC is a security feature that cryptographically verifies DNS responses are authentic. It’s a great idea, but it’s the source of so many headaches. If `systemd-resolved` is configured to enforce DNSSEC and it tries to resolve a domain against an upstream DNS server that is misconfigured or doesn’t support DNSSEC correctly for a specific zone, it won’t just pass the record along. It will return a `SERVFAIL` error. It chooses to fail securely rather than give you a potentially untrustworthy answer. This is what was happening to our new nodes: our internal corporate DNS wasn’t signed, but a change in the cloud image had defaulted `DNSSEC` to a stricter setting.
Let’s get this fixed. I’ve got three approaches for you, ranging from a gentle tweak to a scorched-earth policy.
Solution 1: The “Tame the Beast” Approach
This is my preferred method. We’re not fighting `systemd-resolved`; we’re just telling it how to behave in our environment. It’s the most “correct” way to handle it if you want to keep the benefits like caching.
First, see what `resolved` is actually doing. Use `resolvectl` to get a status report:
$ resolvectl status
You’ll see a firehose of information, but look for the “Global” and per-interface sections to see which DNS servers are actually being used. Now, let’s configure it. Open up its configuration file:
$ sudo nano /etc/systemd/resolved.conf
You’ll see a lot of commented-out lines. We’re interested in a few key ones. Uncomment them and set them like this:
[Resolve]
DNS=1.1.1.1 8.8.8.8 10.0.1.2
# ^ These are your primary DNS servers. I've put in Cloudflare, Google, and our internal one.
FallbackDNS=1.0.0.1 8.8.4.4
# ^ These are used if the primary ones fail.
DNSSEC=allow-downgrade
# ^ This is the magic bullet for most problems. It means "Try to validate with DNSSEC, but if the upstream server doesn't support it, that's okay, just give me the record anyway."
# Other options are 'yes' (strict) or 'no' (disabled).
Save the file and then restart the service to apply the changes:
$ sudo systemctl restart systemd-resolved
Your DNS should now be working as expected, balancing security with real-world practicality.
Solution 2: The “Put It on a Leash” Fix
Okay, maybe you don’t trust the local stub resolver. You want the benefits of `systemd-resolved` managing the configuration (e.g., pulling DNS servers from DHCP), but you want your applications to talk directly to the DNS servers, just like the old days.
In this scenario, we disable the local stub listener. This stops `resolved` from listening on `127.0.0.53`.
Step 1: Edit the configuration file again.
$ sudo nano /etc/systemd/resolved.conf
Step 2: Set `DNSStubListener` to `no`.
[Resolve]
DNSStubListener=no
Step 3: Here’s the crucial part. By default, `/etc/resolv.conf` is a symlink to `/run/systemd/resolve/stub-resolv.conf`, which contains `nameserver 127.0.0.53`. Since we just disabled that, we need to point it to the *real* resolv.conf file that `systemd-resolved` generates.
# Remove the old symlink pointing to the stub resolver
$ sudo rm /etc/resolv.conf
# Create a new symlink to the static resolver config
$ sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf
# Restart the service
$ sudo systemctl restart systemd-resolved
Now, if you `cat /etc/resolv.conf`, you’ll see the actual DNS server IPs. Your system is back to a more traditional DNS setup, but `systemd` is still managing the configuration file behind the scenes.
Solution 3: The “Nuke it From Orbit” Option
I’m not gonna lie, I’ve done this on a Friday afternoon when I just wanted to go home. This is the “I’m done with this magic, give me back my static config file” approach. This completely removes `systemd-resolved` from the equation.
Warning: This is a sledgehammer. You will lose all dynamic DNS configuration. If your server gets its DNS servers via DHCP and they change, your server’s DNS will break until you manually update the file again. Proceed with caution.
Step 1: Stop and permanently disable the service.
$ sudo systemctl stop systemd-resolved
$ sudo systemctl disable systemd-resolved
Step 2: Get rid of the symlink at `/etc/resolv.conf`.
$ sudo rm /etc/resolv.conf
Step 3: Create a new, old-fashioned, static `/etc/resolv.conf` file.
$ sudo nano /etc/resolv.conf
Add your nameservers directly into this file:
nameserver 1.1.1.1
nameserver 8.8.8.8
search mycompany.corp ec2.internal
And that’s it. No more service, no more magic. Just a plain text file. It’s hacky, it’s not “correct” by modern standards, but on a server with a static IP and a critical job to do, sometimes simple and predictable is best.
Ultimately, `systemd-resolved` isn’t a bad tool, it’s just a complex one with defaults that can bite you in a hybrid environment. Knowing how to diagnose and configure it is a key skill for any of us working on modern Linux systems. Hopefully, this saves you from your own 3 AM DNS mystery.
🤖 Frequently Asked Questions
âť“ Why is systemd-resolved causing DNS issues on my Linux server, especially for internal domains?
systemd-resolved acts as a local DNS stub resolver and can enforce strict DNSSEC validation. If an upstream DNS server or internal zone is not DNSSEC-signed, `systemd-resolved` may return a `SERVFAIL` error instead of the record, leading to resolution failures for those domains.
âť“ How does systemd-resolved compare to traditional DNS configurations?
systemd-resolved provides a local caching stub resolver, supports DNSSEC, and dynamically manages DNS settings (e.g., from DHCP). Traditional configurations involve applications directly querying upstream DNS servers defined in a static `/etc/resolv.conf`, lacking features like local caching or host-level DNSSEC validation.
âť“ What is a common implementation pitfall when configuring systemd-resolved for internal domains?
A common pitfall is `systemd-resolved`’s default strict `DNSSEC=yes` setting causing `SERVFAIL` for internal corporate domains that are not DNSSEC-signed. The solution is to set `DNSSEC=allow-downgrade` in `/etc/systemd/resolved.conf` to permit resolution even if DNSSEC validation fails for unsigned zones.
Leave a Reply