How, and why, we scaled up to a Multi-DNS architecture (Part 2)
Infrastructure

How, and why, we scaled up to a Multi-DNS architecture (Part 2)

This is the second of a 3-part series and deals with the technical background of Multi-DNS and its strategies. If you want to read more about the “why” part start in part 1, or skip forward to part 3 to deep-dive into actually making the transition

Daniel Mittelman
Daniel Mittelman

…the added value of Multi-DNS is it works even when there are issues that affect an entire provider’s network, like targeted attacks against the provider as a whole, a faulty software patch that affects all the provider’s nodes or a botched infrastructure upgrade that causes provider-wide outages.

Before we jump right in, let’s understand what Multi-DNS actually means.

A Multi-DNS architecture is one that utilizes authoritative DNS nameservers from two or more providers.

Most, if not all DNS service providers, use multiple nameservers for high availability and redundancy, allowing them to mitigate most issues without clients even being aware. Still, the added value of Multi-DNS is it works even when there are issues that affect an entire provider’s network, like targeted attacks against the provider as a whole, a faulty software patch that affects all the provider’s nodes or a botched infrastructure upgrade that causes provider-wide outages.

It is important to outline that there are no master/slave roles when using multiple DNS providers. More precisely: the failover pattern does not exist. Clients and downstream DNS servers see them as a single DNS source, and it is up to us to ensure that they are kept in sync.

Multi-DNS architecture is commonly achieved with one of three strategies:

In this strategy, we configure the domain’s authoritative nameservers as a mix of DNS servers from our providers (the primary and secondary ones).

Nameserver configuration on GoDaddy

Both providers we defined act as the authoritative source for DNS queries for this domain, and traffic is split across the servers in a random fashion, like so:

When a user asks for the IP address of mysite.com, their ISP tracks down the domain’s authoritative nameservers and asks a random one to resolve the record

While both providers share the same role as authoritative DNS providers, we choose one that will act as the primary and the other as secondary. This distinction is important to define which one we update directly, and which one will receive updates through zone transfers, as such:

The zone administrator sends updates to the primary provider. The secondary provider is configured to receive zone transfer requests from the primary, and pulls the zone changes whenever they’re available

This strategy is pretty straightforward, and is easy to maintain as there’s a single source of truth to update. However, what happens if the primary provider is down? While it’s technically possible to manually update the secondary DNS servers in some of the providers, it is usually not possible to make manual updates to ones configured as secondary DNS providers.

This is a good approach, but not good enough.

This strategy is useful in specific use cases, usually for DNS that’s managed in on-premise infrastructure, and exposed to the world through another DNS provider. The primary provider, usually on-premise, is designated as an internal DNS that receives direct updates, and the secondary is the authoritative DNS provider that is published to the world.

When an Internet user asks for mysite.com’s IP address, the response will eventually originate from the public provider

Updating works similar to the Primary/Secondary approach:

This strategy allows one to protect their internal (“hidden”) DNS infrastructure, which also serves as the source of truth. However, this approach does not provide extra redundancy for Internet users.

Moving on to strategy #3.

This strategy is also similar to the first one, in the sense that two or more DNS providers share the role of being the authoritative DNS source for our domain, however in this configuration both providers act as the primary source.

DNS resolution is identical to the first strategy, however the main difference stems from the way DNS updates are propagated to both providers:

When using Primary/Primary, the admin is responsible for updating both providers

This strategy offers everything that we set out to find in terms of redundancy: customers are being served by multiple providers, and there is no dependency between the two. This allows us to ensure that no DNS provider knows about the other, and no one provider can affect the availability of the other.

With all the goodies this strategy provides (which is why we chose it), it introduces a new challenge: with a lack of a single source of truth, how do we reliably manage DNS records? This question will, of course, be answered in the next part.

Having DNS servers that return different responses for the same domain can wreak havoc and may go undetected for a long time during an incident, as DNS is never the “usual suspect”. This can also cause intermittent issues, which are the hardest to diagnose.


Continue to part 3 where we’ll talk next about actually executing the migration process