🚀 Executive Summary

TL;DR: A Cloudflare outage exposed a critical single point of failure in modern Linux servers: `systemd-resolved`’s default configuration. The solution involves configuring `systemd-resolved` with multiple upstream DNS providers for resilience or, as a last resort, disabling it to manage `/etc/resolv.conf` directly.

🎯 Key Takeaways

  • Modern Linux distributions (Ubuntu 18.04+, Debian 10+) use `systemd-resolved` as a local DNS stub resolver on `127.0.0.53`.
  • Default `systemd-resolved` configurations often rely on a single upstream DNS provider, creating a catastrophic single point of failure during outages.
  • Manually editing `/etc/resolv.conf` is a temporary fix, as `systemd-resolved` can overwrite it upon reboot or service restart.
  • The recommended permanent solution is to configure `systemd-resolved` with multiple `DNS` and `FallbackDNS` entries in `/etc/systemd/resolved.conf`.
  • Disabling `systemd-resolved` provides direct, static control over `/etc/resolv.conf` but sacrifices caching and other advanced features.

Cloud fare down again 2 times in a single year

Summary: Cloudflare’s outage breaking your Linux servers? Learn why systemd-resolved is the real culprit and get three battle-tested fixes from a senior DevOps engineer—from the emergency hack to the permanent, resilient solution.

Your Servers Can’t Find The Internet? A DevOps War Story on DNS, Systemd, and Why That Big Outage Broke Your Stuff

It was 3:17 AM. My phone was buzzing itself off the nightstand with PagerDuty alerts. Every single one of our core services was reporting down. I jumped on Slack, and our #war-room channel was chaos. A junior engineer, bless his heart, was already SSH’d into prod-api-gateway-03 and frantically checking network connectivity. “The instance is up! I can ping IPs, but I can’t `curl google.com`! It’s not resolving anything!” he typed. The error? Temporary failure in name resolution. I’d seen this ghost before. It wasn’t our network. It was bigger. It was a single point of failure that a lot of modern Linux distros have baked right in.

The Real Culprit: Why Your Server Suddenly Went Blind

You see a major provider like Cloudflare go down, and you assume the problem is “out there.” But why can’t your server just use a different DNS service? The answer, for anyone running a recent version of Ubuntu (18.04+), Debian (10+), or other systemd-based distros, lies in a service called systemd-resolved.

In the old days, you’d just edit /etc/resolv.conf and list your nameservers. Simple. Now, if you look at that file, you’ll probably see this:

# This file is managed by man:systemd-resolved(8). Do not edit.
...
nameserver 127.0.0.53
options edns0 trust-ad

Your server isn’t talking to Cloudflare (1.1.1.1) or Google (8.8.8.8) directly. It’s talking to a local DNS stub resolver running on its own loopback address (127.0.0.53). That service, systemd-resolved, is the one configured to talk to the outside world. And if it’s only configured with one upstream provider—which is often the default from cloud images—it becomes a catastrophic single point of failure. When Cloudflare goes down, systemd-resolved doesn’t gracefully fail over. It just gives up, and your server is effectively blind to the entire internet.

Three Levels of Fixing This Mess

Alright, enough theory. You’re in the middle of an outage and you need to get your services back online. Here are three ways to tackle this, from the battlefield triage to the long-term architectural fix.

Solution 1: The “3 AM and I Need It Working NOW” Fix

This is the dirty, temporary fix to stop the bleeding. We’re going to bypass systemd-resolved entirely by manually creating a new resolv.conf file. This will get you back online in 60 seconds.

First, see that /etc/resolv.conf is actually a symlink:

$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Jun 15 2022 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

We’re going to remove that link and create a real file in its place with known-good, redundant DNS servers.

# As root or with sudo
rm /etc/resolv.conf
echo "nameserver 8.8.8.8" > /etc/resolv.conf
echo "nameserver 9.9.9.9" >> /etc/resolv.conf
echo "nameserver 1.1.1.1" >> /etc/resolv.conf

Your server should immediately be able to resolve DNS again. Go run that `curl google.com` and breathe a sigh of relief.

Warning: This is a temporary fix! The next time you reboot or the systemd-resolved service is restarted, it will likely overwrite your manual file and you’ll be back to square one. Use this to get out of the immediate crisis, then implement a permanent solution.

Solution 2: The “Let’s Do This Properly” Fix

This is the solution I recommend for 99% of teams. We’re not going to fight systemd; we’re going to configure it correctly to be resilient. The goal is to tell systemd-resolved to use multiple upstream DNS providers so that if one fails, it automatically uses the next one in the list.

Edit the configuration file at /etc/systemd/resolved.conf. Most of the options will be commented out with a ‘#’.

# As root or with sudo
nano /etc/systemd/resolved.conf

Find the [Resolve] section and make the following changes. Add multiple, space-separated IP addresses for both the primary and fallback DNS servers.

[Resolve]
DNS=1.1.1.1 8.8.8.8 9.9.9.9
FallbackDNS=1.0.0.1 8.8.4.4
# ... other options

Save the file and then restart the service to apply the changes:

sudo systemctl restart systemd-resolved

Now your system is configured to use Cloudflare, Google, and Quad9 as its primary resolvers. If any one of them goes down, it will seamlessly fail over to the others. This is the robust, modern way to handle DNS on your hosts.

Solution 3: The “I Don’t Trust It” Nuclear Option

I get it. Some of us old-school sysadmins just don’t like services like systemd-resolved abstracting away core functionality. If you want direct, file-based control over your DNS configuration and want to banish systemd-resolved for good, this is for you. It’s a bit more involved, but it puts you back in the driver’s seat.

  1. Stop and disable the service for good:
    sudo systemctl stop systemd-resolved
    sudo systemctl disable systemd-resolved
  2. Unlink the managed resolv.conf file:
    sudo rm /etc/resolv.conf
  3. Create your own static resolv.conf file:
    # As root or with sudo
    echo "nameserver 1.1.1.1" > /etc/resolv.conf
    echo "nameserver 8.8.8.8" >> /etc/resolv.conf

This approach works, but you lose the caching and other advanced features of systemd-resolved. Your server will now behave like it did in the days of Ubuntu 16.04. For some, that’s a feature, not a bug.

Which Fix is Right for You?

Here’s a quick cheat sheet to help you decide.

Solution Pros Cons
1. Quick Fix Fastest way to restore service. Temporary, will be overwritten. High risk of recurring.
2. Permanent Fix Resilient, idempotent, works with the modern OS architecture. Requires understanding systemd.
3. Nuclear Option Gives you direct, simple file-based control. Predictable. Goes against the grain of the OS. Loses caching benefits.

At TechResolve, we mandate Solution 2 in all our infrastructure-as-code templates. It’s the only way to build resilient systems that don’t wake you up at 3 AM. Redundancy isn’t just for your load balancers and databases; it starts right here, with DNS on every single host.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ Why did a Cloudflare outage cause my Linux servers to experience ‘Temporary failure in name resolution’?

The outage caused issues because `systemd-resolved`, acting as your server’s local DNS stub resolver (127.0.0.53), was likely configured with Cloudflare as its sole upstream DNS provider. When Cloudflare went down, `systemd-resolved` failed to gracefully fail over, rendering your server unable to resolve domain names.

âť“ How does `systemd-resolved` differ from the traditional `/etc/resolv.conf` method for DNS configuration?

`systemd-resolved` acts as a local DNS stub resolver on `127.0.0.53`, managing upstream DNS servers and providing caching. The traditional method involved directly listing nameservers in `/etc/resolv.conf`, which `systemd-resolved` now often symlinks and controls, making direct edits temporary.

âť“ What is a common pitfall when trying to fix DNS resolution issues on modern Linux systems?

A common pitfall is manually editing `/etc/resolv.conf` directly. This is often a temporary fix because `systemd-resolved` manages this file and can overwrite manual changes upon reboot or service restart, reverting the system to its previous, vulnerable state. The proper solution involves configuring `systemd-resolved` via `/etc/systemd/resolved.conf`.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading