🚀 Executive Summary

TL;DR: Spoke VPCs in a hybrid AWS hub-and-spoke network often fail to resolve on-premises DNS names due to isolated VPC resolvers. The recommended solution combines AWS Route 53 Resolver Rules for seamless on-prem to cloud resolution and Centralized Private Hosted Zones for efficient inter-VPC name resolution.

🎯 Key Takeaways

  • AWS VPCs utilize isolated private DNS resolvers (at the .2 address of their CIDR) that cannot inherently resolve on-premises or other VPC’s private DNS records.
  • Route 53 Resolver Rules, leveraging Outbound Endpoints for on-prem queries and Inbound Endpoints for AWS queries, coupled with AWS Resource Access Manager (RAM) sharing, provide the modern, AWS-native solution for hybrid DNS resolution.
  • Centralized Private Hosted Zones (PHZs) are the ideal approach for managing and resolving private DNS records between different VPCs, even across multiple AWS accounts, complementing Resolver Rules for cloud-to-cloud communication.

Correct DNS architecture with hybrid hub and spoke

Unlock seamless DNS resolution in your AWS hub-and-spoke hybrid network with three practical, battle-tested solutions for connecting on-prem and cloud resources.

Navigating the DNS Maze: A Real-World Guide to Hybrid Hub-and-Spoke Architecture

I still remember the 3 AM PagerDuty alert. A critical batch processing job, running in one of our new “spoke” VPCs, was failing. The error? “Cannot resolve hostname `prod-sql-cluster.corp.local`.” My heart sank. I could `nslookup` that name just fine from our “hub” VPC, where the Direct Connect to our on-prem datacenter lived. But from the spoke? Dead air. For two hours, we chased our tails, convinced it was a security group or a NACL issue. It wasn’t. It was DNS, the silent killer of network connectivity, and a classic symptom of a misconfigured hub-and-spoke architecture. We’ve all been there, and it’s why getting this right from day one is non-negotiable.

The Root of the Problem: Why Your Spoke VPC Can’t See On-Prem

Before we dive into fixes, let’s understand the “why.” When you create a VPC in AWS, it gets its own private DNS resolver at the `.2` address of your VPC CIDR (e.g., `10.20.0.2`). This resolver is brilliant, but it’s isolated by design. It only knows about resources within its own VPC and public DNS records. It has absolutely no clue about your on-prem DNS servers (`onprem-dc-01.corp.local`) or the Private Hosted Zones associated with other VPCs. So, when your EC2 instance in a spoke VPC asks “Who is `prod-sql-cluster.corp.local`?”, the local resolver just shrugs and says, “Never heard of it.” The query never even makes it to the hub VPC where the real magic is supposed to happen.

Three Paths Out of the DNS Woods

I’ve built and fixed this pattern more times than I can count. Over the years, I’ve found there are really three ways to solve this, each with its own trade-offs. Let’s call them The Scalpel, The Weaver, and The Sledgehammer.

Solution 1: The Scalpel – Route 53 Resolver Rules

This is the modern, AWS-native, and usually the best approach for hybrid resolution. You treat DNS routing like network routing. You create forwarding rules that tell the Route 53 Resolver where to send specific queries.

How it works:

  1. In the Hub VPC: You deploy a Route 53 Resolver Outbound Endpoint. This gives you an ENI (Elastic Network Interface) in your hub that can reach your on-prem network.
  2. Create a Rule: You create a “Forwarding Rule” that says “Any query for the domain `corp.local` should be forwarded to the IP addresses of my on-prem DNS servers (e.g., 172.16.10.5, 172.16.10.6).”
  3. Associate the Rule: You associate this rule with your Hub VPC. Now, any instance in the hub can resolve on-prem names.
  4. The Critical Step – SHARE: This is the part everyone misses. You use AWS Resource Access Manager (RAM) to share that Resolver Rule with your spoke accounts or organization. Then, in each spoke account, you accept the share and associate the rule with your spoke VPCs.

Pro Tip: Don’t forget the other direction! You’ll also need an Inbound Endpoint in the hub so your on-prem servers can forward queries to resolve AWS hostnames (like EC2 private DNS names) back into the cloud.

This is my go-to solution. It’s clean, managed by AWS, and scales beautifully without needing to manage any servers.

Solution 2: The Weaver – Centralized Private Hosted Zones

This solution is less about hybrid on-prem connectivity and more about resolving private DNS records between your spoke VPCs. Let’s say one spoke runs your application servers (`app.prod.cloud`) and another runs shared tools (`jenkins.tools.cloud`). You don’t want to create complex peering for DNS.

How it works:

  1. In the Hub/Shared Services Account: Create a Private Hosted Zone (PHZ) for your domain, for example, `prod.cloud`.
  2. Associate VPCs: This is the magic. You can then associate this single PHZ with VPCs across your entire organization, even if they are in different AWS accounts.
  3. Central Management: Now, when you create a record like `api-gateway.prod.cloud` in that central PHZ, any VPC associated with it can resolve that name instantly. No forwarding rules, no custom servers.

This approach complements Solution 1 perfectly. Use Resolver Rules for on-prem traffic and Centralized PHZs for inter-VPC cloud traffic.

Scenario Recommended Solution
Spoke VPC needs to resolve `prod-sql-cluster.corp.local` (On-Prem) Route 53 Resolver Rules (Solution 1)
App Spoke VPC needs to resolve `grafana.monitoring-spoke.internal` (Another Spoke) Centralized Private Hosted Zones (Solution 2)

Solution 3: The Sledgehammer – Rolling Your Own DNS Forwarders

Before Route 53 Resolver existed, this was the only way. I call it the sledgehammer because it’s heavy, requires manual work, but it will absolutely smash the problem into submission. I only recommend this now if you have complex, non-standard DNS requirements or are dealing with a vendor that demands it.

How it works:

  1. Deploy Servers: Spin up two small EC2 instances in your Hub VPC. Install a DNS server like BIND or Unbound on them.
  2. Configure Forwarding: Configure these servers to be conditional forwarders. Set them up so that queries for `corp.local` go to your on-prem DNS, queries for `prod.cloud` go to the AWS VPC `.2` resolver, and so on.
  3. Update DHCP Options: This is the big, disruptive step. For every single spoke VPC, you have to create a new DHCP Option Set. In this set, you change the `domain-name-servers` from “AmazonProvidedDNS” to the private IP addresses of your two new EC2 DNS servers.

# Example Terraform for a DHCP Option Set
resource "aws_vpc_dhcp_options" "dns_forwarders" {
  domain_name_servers = [
    "10.100.10.53", # Your custom DNS forwarder 1
    "10.100.11.53"  # Your custom DNS forwarder 2
  ]

  tags = {
    Name = "custom-dns-forwarder-set"
  }
}

resource "aws_vpc_dhcp_options_association" "spoke_a" {
  vpc_id          = aws_vpc.spoke_a.id
  dhcp_options_id = aws_vpc_dhcp_options.dns_forwarders.id
}

Warning: This solution puts you on the hook for everything. Patching the DNS servers, ensuring their high availability, and troubleshooting their configs is now your problem, not AWS’s. It’s a significant increase in operational overhead.

My Final Take

Don’t overcomplicate it. For 99% of the hybrid cloud setups I see today, a combination of Solution 1 (Route 53 Resolver Rules) for north-south (on-prem to cloud) traffic and Solution 2 (Centralized PHZs) for east-west (cloud to cloud) traffic is the gold standard. It’s scalable, resilient, and lets AWS manage the underlying infrastructure. Start there. Only pull out the sledgehammer if you have a truly bizarre edge case. Your on-call self at 3 AM will thank you for it.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ Why do instances in an AWS spoke VPC struggle to resolve on-premises hostnames?

Instances in a spoke VPC use their local VPC DNS resolver, which is isolated by design and only knows about its own VPC resources and public DNS, lacking knowledge of on-premises DNS servers or other private zones.

âť“ How do Route 53 Resolver Rules compare to rolling your own DNS forwarders in AWS?

Route 53 Resolver Rules are an AWS-managed, scalable, and resilient service requiring no server management, making them the preferred modern solution. Rolling your own DNS forwarders (e.g., BIND on EC2) incurs significant operational overhead for patching, high availability, and configuration, and is generally considered a legacy approach.

âť“ What is a critical step often missed when implementing Route 53 Resolver Rules for a hybrid hub-and-spoke?

A critical step often missed is using AWS Resource Access Manager (RAM) to share the Resolver Rule from the hub account with the spoke accounts or organization, and then associating the shared rule with the spoke VPCs.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading