🚀 Executive Summary

TL;DR: The JVM’s default DNS cache behavior can cause Java applications to connect to stale IP addresses indefinitely, especially during database failovers or dynamic cloud environments. The primary solution involves configuring the JVM’s DNS cache Time To Live (TTL) to a reasonable value, such as 60 seconds, or restarting the application as an immediate fix.

🎯 Key Takeaways

  • The JVM by default caches positive DNS lookups forever (networkaddress.cache.ttl=-1), preventing applications from seeing DNS updates.
  • The recommended permanent fix is to set the JVM’s DNS cache TTL to a sane value (e.g., 60 seconds) using either the java.security file (networkaddress.cache.ttl) or a JVM argument (-Dsun.net.inetaddr.ttl).
  • Restarting the Java application is an immediate, albeit temporary, fix for clearing the JVM’s in-memory DNS cache during an outage.
  • Setting the DNS cache TTL to 0 disables caching entirely, ensuring the freshest DNS record but can severely impact performance and overload DNS infrastructure, making it unsuitable for most production use.

A stubborn JVM DNS cache can make a simple database failover a nightmare. This guide explains why your Java app isn’t seeing DNS updates and provides three real-world solutions for DevOps engineers.

“Are you sure you’ve been a network engineer before?” – A DevOps Guide to the Infamous JVM DNS Cache

It was 2 AM. A routine RDS failover for our primary PostgreSQL database, `prod-db-01`, had just completed. The DBA team updated the CNAME `postgres.techresolve.internal` to point to the new primary, `prod-db-02`. The network team confirmed the change propagated; a quick `dig postgres.techresolve.internal` from the bastion host showed the new IP address. Yet, our main Java API service was throwing a fit, flooding the logs with `PSQLException: Connection to host refused`. It was still trying to connect to the old, now-offline database. The dreaded question came through on the incident call: “Darian, are you sure your service is resolving DNS correctly? Are you sure you’ve been a network engineer before?” I gritted my teeth. I knew exactly what it was. It wasn’t the network. It was Java.

So, What’s Actually Going On? The “Why”

This isn’t a bug; it’s a feature, albeit a feature from a different era of the internet. By default, the Java Virtual Machine (JVM) has its own DNS cache, and its default behavior is to cache positive DNS lookups forever. I’m not exaggerating. Once the JVM resolves a hostname to an IP address, it holds onto that mapping for the entire life of the process unless you explicitly tell it otherwise.

This behavior is controlled by a Java security property called networkaddress.cache.ttl (Time To Live). If this property isn’t set, its default value is `-1`, which the JVM interprets as “cache forever”. In a modern, dynamic cloud environment where IPs change due to auto-scaling, failovers, and blue-green deployments, this is a recipe for disaster.

The Fixes: From Duct Tape to Solid Engineering

You’re in the middle of an outage and you need to get the service back online. Let’s walk through the options, from the immediate fix to the permanent solution.

Solution 1: The “It’s 3 AM and I Just Want to Go to Bed” Fix

The quickest, dirtiest way to solve this is to force the JVM to clear its cache. And the most effective way to do that? Restart the application.

$ kubectl rollout restart deployment/my-java-api

When the JVM process restarts, its in-memory DNS cache is wiped. On the next connection attempt, it will perform a fresh DNS lookup, get the new IP for `postgres.techresolve.internal`, and connect to the correct database. It’s ugly, it causes a brief service interruption, but in a production-down scenario, it works. It’s the classic “turn it off and on again,” and it’s a valid first step in an emergency.

Solution 2: The Permanent, Grown-Up Fix

Restarting services during every failover is not a sustainable strategy. The correct, permanent solution is to configure the JVM’s DNS cache TTL to a sane value. A value of 60 seconds is a common and safe starting point for most cloud-native applications. You have two primary ways to do this:

1. Set it globally in the java.security file:

You can edit the `java.security` file located in your JRE’s `lib/security` directory. This is a good option for controlling behavior across all applications on a golden AMI or base container image.

# Located in $JAVA_HOME/jre/lib/security/java.security

# The default value is -1, meaning "cache forever"
# We'll set it to 60 seconds.
networkaddress.cache.ttl=60

2. Set it as a JVM argument on startup (Recommended for containerized apps):

This is my preferred method for services running in Docker or Kubernetes. It’s explicit, lives with your application’s deployment configuration, and doesn’t require modifying the base image.

# Add this flag to your java command
java -Dsun.net.inetaddr.ttl=60 -jar my-application.jar

Pro Tip: The property name is different depending on where you set it! It’s networkaddress.cache.ttl in the `java.security` file but sun.net.inetaddr.ttl as a system property (`-D` flag). Don’t get tripped up by this inconsistency.

Solution 3: The “I Don’t Trust Caching” (Nuclear) Option

In some rare debugging scenarios, or if you have an application that needs to resolve DNS on every single request, you can disable the cache entirely. You do this by setting the TTL to 0.

# This will force a new DNS lookup for every connection
java -Dsun.net.inetaddr.ttl=0 -jar my-application.jar

Be very careful with this approach. Disabling the cache means your application will be hitting your DNS resolver (like CoreDNS in Kubernetes or Amazon Route 53) for every single outbound connection. This can add latency and put unnecessary load on your DNS infrastructure. It’s a powerful diagnostic tool, but rarely the right long-term production solution.

Summary: Choosing Your Weapon

Here’s a quick breakdown to help you decide which path to take.

Solution Pros Cons
1. Restart Service Fastest immediate fix during an incident. No code/config changes needed. Causes downtime. Doesn’t prevent the problem from recurring.
2. Set TTL to 60s The recommended permanent fix. Balances performance and responsiveness to DNS changes. Requires a configuration change and redeploy. There’s still a potential 60-second delay.
3. Set TTL to 0 Guarantees the freshest DNS record is always used. Useful for debugging. Poor performance. Can add significant load to DNS servers. Not recommended for production.

Final Thoughts

The JVM’s DNS caching behavior is a classic “gotcha” that every engineer working with Java in the cloud will eventually run into. It’s a reminder that we can’t just operate at the application layer; we have to understand the quirks of our runtime and the platform it runs on. So next time someone questions your networking skills during a failover, you can calmly point them to the `sun.net.inetaddr.ttl` property and be the hero of the 2 AM incident call.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ What causes Java applications to fail after a DNS change like a database failover?

The Java Virtual Machine (JVM) has its own DNS cache that, by default, caches positive DNS lookups indefinitely (networkaddress.cache.ttl=-1). This prevents the application from resolving new IP addresses even after DNS records have been updated, leading to connection failures to the old, stale IP.

âť“ How does setting -Dsun.net.inetaddr.ttl=60 compare to -Dsun.net.inetaddr.ttl=0?

Setting -Dsun.net.inetaddr.ttl=60 configures the JVM to cache DNS lookups for 60 seconds, balancing performance with responsiveness to DNS changes. In contrast, -Dsun.net.inetaddr.ttl=0 disables the cache entirely, forcing a new DNS lookup for every connection, which significantly increases DNS query load and latency, making it generally unsuitable for production environments.

âť“ What is a common implementation pitfall when configuring JVM DNS cache TTL?

A common pitfall is using the incorrect property name depending on the configuration method. When setting it in the java.security file, use networkaddress.cache.ttl, but when setting it as a JVM argument, use -Dsun.net.inetaddr.ttl. Mismatched property names will result in the configuration not being applied.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading