6
Managed DNS services not only relieve enterprises of the burden of running their own DNS, they often provide value added capabilities not available in purely RFC compliant (typically open source) implementations. With advanced traffic routing features, managed DNS providers are addressing technical needs in the marketplace that the original designers of DNS did not anticipate, and that current contributors to the DNS standard have chosen not to address. In short, vanilla standards compliant DNS does not provide the capabilities needed by online enterprises. The basic “table stakes” characteristics of an enterprise class managed DNS service are high reliability, high availability, high performance and traffic management. However, even the most robust DNS infrastructure is not immune to outages. Outages may be localized in which certain DNS servers in the network are not responding or, less commonly, system wide. A system wide failure in the DNS can take an entire business offline - the equivalent of power failure in every one of their data centers. While there is a great deal of redundancy built into the architectures of top tier DNS providers, there remains the exposure that they are a single point of failure in the infrastructure of enterprises that rely solely on one provider. Given that delivering high scale, high performance online services is a core business requirement of many enterprises, it is not surprising that reliability and infrastructure engineers are taking a closer look at their DNS. This white paper discusses the approaches engineers can take to reduce their exposure to DNS outages and in doing so, also gain valuable application performance improvements. +1-855-GET-NSONE (6766) | www.ns1.com | @nsoneinc A Better Way to a Dual, Redundant DNS

A Better Way to a Dual, Redundant DNS

Embed Size (px)

Citation preview

+1-855-GET-NSONE (6766) | www.ns1.com | @nsoneinc

Managed DNS services not only relieve enterprises of the burden of running their own DNS, they often provide value added capabilities not available in purely RFC compliant (typically open source) implementations. With advanced traffic routing features, managed DNS providers are addressing technical needs in the marketplace that the original designers of DNS did not anticipate, and that current contributors to the DNS standard have chosen not to address. In short, vanilla standards compliant DNS does not provide the capabilities needed by online enterprises.

The basic “table stakes” characteristics of an enterprise class managed DNS service are high reliability, high availability, high performance and traffic management. However, even the most robust DNS infrastructure is not immune to outages. Outages may be localized in which certain DNS servers in the network are not responding or, less commonly, system wide. A system wide failure in the DNS can take an entire business offline - the equivalent of power failure in every one of their data centers.

While there is a great deal of redundancy built into the architectures of top tier DNS providers, there remains the exposure that they are a single point of failure in the infrastructure of enterprises that rely solely on one provider.

Given that delivering high scale, high performance online services is a core business requirement of many enterprises, it is not surprising that reliability and infrastructure engineers are taking a closer look at their DNS. This white paper discusses the approaches engineers can take to reduce their exposure to DNS outages and in doing so, also gain valuable application performance improvements.

+1-855-GET-NSONE (6766) | www.ns1.com | @nsoneinc

A Better Way to a Dual, Redundant DNS

Managed DNS Architectures

A robust managed DNS has to be highly available (up time 99.999+%), highly reliable and high performing (response times under 50msec). The foundational elements for achieving this are:

• Multiple points of presence, with strong global geographic coverage• Globally anycasted network• Well provisioned (servers, storage, bandwidth)• Uses top tier networks and colocation providers• Overprovisioned to absorb traffic spikes and DDoS attacks• Round the clock monitoring and incident response

Finally, the value of a DNS system architected and run by experts is not to be underestimated. Few enterprises have the technical depth and resources to do this really well.

Single Point of Failure

Top tier managed DNS systems have a great deal of built in redundancy and fault tolerance, but all managed DNS providers have experienced problems to some degree with resulting impact on their customers. While it rarely happens, providers can experience a complete loss of service. Often it is the case that enterprises that have experienced a loss of DNS then decide to bring on a second source. In short, no system is failure proof, so from the point of view of a subscribing enterprise, their managed DNS does represent a single point of failure. The question every enterprise should address is whether bringing in a second DNS service is worth the effort and cost.

Managed DNS providers do publish availability statistics and the industry norm exceeds 5 nines (99.999% uptime) – about 5 minutes per year downtime. However, this top line number does not provide the detail needed to properly assess the business risk associated with relying on a sole source provider. It is not clear for example what the probabilities and impact are of degraded performance in certain regions or probabilities of a system wide outage of various duration. What enterprises can do is look at the issue from the perspective of their business. What would a 30-minute loss of DNS cost the business in terms of revenue, reputation damage, support costs and recovery? Compare that with the cost of a second source DNS. Enterprises for whom online services are mission critical will generally conclude that the cost ratios are in the range of 10:1 – one order of magnitude. Put another way, the cost of one outage is roughly ten times the annual cost of a second service. That would put the breakeven point at about 1 major DNS outage every 10 years.

DNS Basics

The architecture of a single source DNS is described in Figure 1. The subscribing enterprise (the Registrant) contacts a domain Registrar to have their DNS information entered into the public domain name system. The core of that information comprises the name of their domain (e.g. globalenterprise.com) and the internet name and IP addresses of the authoritative domain name servers that provide the IP addresses of their websites (e.g. example.com).

The registrar ensures this information is entered into the top level domain (TLD) servers for the domain of the registrant (.com in this case). Thus, when a client types www.globalenterprise.com into their browser, their request goes to a recursive server that contacts the .com TLD server to find the IP address of the authoritative server for globalenterprise. The recursive server then contacts the authoritative server to find the IP address of the globalenterprise website, which it then forwards to the client.

If the authoritative server does not respond, the recurser retries after waiting 1.5 seconds. It will try again after 3 seconds if there is no response to that second request. It then will try another authoritative server in the list. Managed DNS providers typically set up four authoritative servers (actually many servers) that are independently globally anycasted. Thus it is highly likely that the client will get a response even if the initial request times out. That is why managed providers can correctly claim 5 nines availability.

However, if there is a localized outage the result can be degraded service for users located in the affected region. This is a more common occurrence in regions where the internet infrastructure is less robust. Network connections to the in region DNS servers may go down, resulting in timeouts for the end users and longer response times as their requests get forwarded to servers in other regions.

In general, the sources of DNS problems are:

• Network problems (link failures, router misconfiguration)• DDoS attacks• DNS software errors• DNS configuration errors

DNS configuration errors are perhaps the most common but are not failures in the DNS itself. Enterprises that either run their own DNS or use a managed service are responsible for their own records. One advantage of managed services is the advice and expertise the providers offer, thus helping their customers avoid mistakes.

example.comns1.provider1 1.1.1.1ns2.provider1 2.2.2.2

example.comns1.provider1 1.1.1.1ns2.provider1 2.2.2.2 example.com records

www.example.com 8.8.8.8mail.example.com 9.9.9.9

AXFR

ns2.provider1

Domain Registrar

Domain Admin

ns1.provider1

.com TLD

Figure 1: Domain adminstrator registers the name servers into the .com TLD

Establishing a redundant DNS is straightforward in some respects, but complex in others. It is straightforward to add one or more DNS servers to the list submitted to the Registrar. This enters the additional authoritative servers in the global domain name system. Thus recursive servers receive an expanded list of authoritative servers when they query the TLD for the enterprise domain. This provides redundancy with respect to the main causes of outages – network infrastructure issues, DDoS and software errors.

However, DNS record management is major source of dual DNS complexity. If the records are not managed correctly, the redundant DNS system will be less reliable than the original single source solution.

Record Management in a Dual DNS

Dual DNS record management is not a complex matter where both DNS systems use RFC compliant records. Both systems are registered in the global DNS. One is designated as Primary (master) and the other as Secondary (slave). This designation defines the source and destination of record changes, it does not affect the order or priority of which the registered servers receive queries. A record transfer via the AXFR protocol is triggered when there are record changes in the primary DNS. This results in those changes also taking place in the secondary DNS. Figure 1 shows the AXFR between primary and secondary in a single provider configuration. This is essentially the same with two providers, as long as both do not implement any non-standard features in the records.A similar approach can be used by defining both DNS systems as secondary to an unregistered primary DNS (a “hidden master”).

Enterprises considering a dual DNS are almost certainly using the advanced traffic management capabilities offered by top tier managed providers. These capabilities include geo based routing, server monitoring, load management and telemetry based traffic management. These capabilities are not provided for in the DNS standards, so the managed providers have implemented proprietary extensions that enable these high value attributes. The result is the standard AXFR protocol cannot be used to synchronize records across two advanced DNS providers.

There are two methods for ensuring records stay in synch in such a dual DNS set up (Figure 2). One is to simply manage both systems independently. This effectively doubles the administrative overhead. Worse than that, it is prone to configuration errors which will result in look-up failures – the very scenario dual DNS is supposed to prevent.

The other approach entails using middleware that interfaces with the APIs of each DNS provider, thus providing a single interface the domain admin uses to update the records in both systems. This becomes progressively more complex if the middleware needs to support the more advanced intelligent routing features. It is generally left up to the customer to develop this middleware but some dual provider middleware implementations have been made generally available.

Figure 2: Synchronizing DNS records with two independent advanced DNS providers

Dual DNS with NS1

NS1 can be implemented in conjunction with another managed DNS. Because the NS1 APIs are comprehensive and easy to use, it is relatively straightforward to create a middleware bridge to NS1. NS1 has worked with a number of its customers to help them do just that. The AXFR process on NS1 also makes it easier to transfer records to another system by removing non-standard elements of the records in the NS1 system. This preserves the advanced routing capabilities in the NS1 DNS while enabling synchronization with a second DNS. One advantage of dual DNS is a reduction in average latency on queries from recursive servers to authoritative servers. Many recursive servers implement a feature called smooth round trip time (SRTT). SRTT is mechanism by which the recursive server keeps track of the response time it gets from the authoritative servers listed in its cache. It sends more queries to the faster responding servers. NS1 customers have recorded measurable and significant latency reductions resulting from having implemented dual DNS.

Using NS1 Dedicated DNS for a Dual, Redundant Solution

Dedicated DNS is a fully managed DNS service, just like NS1’s Managed DNS and other top tier managed services. The difference is NS1 Dedicated DNS is deployed on infrastructure controlled by the subscribing enterprise.

The enterprise decides where to locate the DNS servers in their Dedicated DNS solution. The servers can be deployed in the enterprise data centers, on public cloud, private cloud or all of the above. The enterprise decides how many DNS servers to deploy, on what networks and on what hardware platforms (real or virtualized).

Finally, the Dedicated DNS infrastructure is not shared with any other customer. NS1’s exclusive Dedicated DNS service provides a unique approach that solves the synchronization issues of disparate DNS solutions. It also can provide an extra level of DDoS protection because the Dedicated DNS servers are not shared by any other subscribing customer.

Record Updates

ns1.provider2

Do Everything Twice

Domain Admin

ns1.provider1

Record Updates

ns1.provider2

Write Some Middleware

Domain Admin

ns1.provider1

Middleware RecordUpdates

RecordUpdates

Figure 3. Single management interface for NS1 Dedicated and Managed DNS

servers.

NS1 is defining the future of application delivery and performance by converging real-time user, infrastructure and network data, enabling organizations to control their applications at the extreme edge. Our intelligent DNS + traffic management platform delivers the speed, performance and reliability needed to drive digital transformation and enhance customer experience, all through an elegant, integrated and unified platform. With ground-up, next-generation architecture, the NS1 Platform is purpose-built to maximize the potential of elastic,

scalable and distributed applications & infrastructure all while simplifying the management of complex, mission critical pathways to your digital estate. Launched in 2013 in New York City, NS1 counts well known brands including Imgur, Algolia, Collective Media, OneLogin and other top-tier organizations as customers. NS1 is backed by leading venture capital firms including Flybridge Capital Partners, Sigma Prime Ventures, Founder Collective and Center Electric.

About NS1

+1-855-GET-NSONE (6766) | www.ns1.com | @nsoneinc

Dedicated DNS is managed using the same interface as Managed DNS (figure 3). Whether the subscribing enterprise uses the NS1 APIs or the management GUI, record management is a one step, one place process with single pane of glass view into all records and reporting. There is no way for records to drift out of synchronization and all updates are reflected virtually simultaneously across all servers, within seconds.

Dedicated DNS gives enterprises the flexibility to deploy their DNS in a manner that not only provides the dual redundancy they seek; it also allows them to use those added DNS resources to improve the overall performance of their web services.

NS1 Services

Regardless of what approach is taken to implement a dual DNS solution, a critical success factor is the quality of support and consultation the managed DNS provider can offer. The primary cause of DNS failures is in fact configuration errors. NS1 is staffed by experts who have an excellent track record and reputation for providing the best consultative support in the industry. NS1 works in close consultation with our customers, helping them make informed decisions about architectural approaches as well as detailed advice on specific record management issues. Summary

A well architected dual DNS reduces the risk of business losses due to DNS failure. It can also improve day to day end user quality of experience by reducing latency in DNS queries. When selecting a solution, enterprises should evaluate the following:

• Quality and track record of the managed service (latency, availability, global coverage)• Completeness and ease of use in the management system and APIs• Advanced traffic management capabilities• Quality of routine support and quality (willingness/ability) to provide consultative guidance

For more information on DNS services from NS1, please visit www.ns1.com.