Tuesday, December 18, 2012

TTL in DNS part 3

When we last talked in part 2, we were discussing what TTL is not (it is not how long the name server takes to update) and what it is (it is how long a computer takes to update it's own, local, cache).

We ended with an object lesson where www.TheEmailAuthority.com has a 1 hour TTL on it's A record, and I change the A record and shut down the server at my old IP promptly at 3 PM.

User1 was just reading my site at 2:30, so when they go to load another page anytime after 3 and before their own, local, cached record expires at 3:30 they won't be able to load the page.

However, User2 was lucky enough to request my A record at 3:10, so they've got twenty full minutes of Email Authority goodness while User1 is still just reading an error screen.

We've now seen how two users each with their own cache and their own TTL respond to DNS changes.  This simple example only used one record type, and only one "hop" between the client and the server.  Let's take this to the next level and try a more complex, email related, change.

Domain1.com is changing their email server, and their online ordering system sends them emails every time someone places an order.  Because of this, they cannot afford to have any down time.  We'll presume that they're open 24/7/365 so late-night and weekend migrations are out of the question.  Take these three examples into consideration, presuming the following zone file:

domain1.com. 86400 IN NS ns1.domain1s-ISP.com. 
domain1.com. 86400 IN NS ns2.domain1s-ISP.com. 

domain1.com. 14400 IN MX mail.domain1.com. 

mail.domain1.com. 3600  IN A  10.11.12.13.

Example 1: Change the MX record

Domain1.com's email administrator sets up mail2.domain1.com and places it into the MX records.  He sends a test email from his gmail account, and it routes correctly.  However, the online ordering emails are not coming through to mail2.domain1.com.

This is because the online ordering server still has his old MX records cached, and is sending the order emails to the old server until it updates it's cache sometime in the next 4 hours.  It could be less than that depending on when the online ordering server last cached the records, but there is still an outage and the admin loses his job.

Moral of this story: MX record changes aren't good enough.

Example 2: Set the TTL to a low value and change the IP of the A record

The next admin understands TTL, and logs in to his DNS interface to change the TTL in the morning before the migration that evening.  He sets the A record's TTL to 1 minute in the morning, and then changes the A record's IP address to point to the new server

This way the cached MX records include the same A record, and the short TTL means that the connecting mail servers *should* update the IP every minute.

Unfortunately, the owner's wife uses an AOL address, and they cache records for their own pre-set value (24+ hours), so since the owner's wife's email wasn't coming through the new admin got fired too.  He at least got a severance package...


Example 3: Set the original server to relay mail to the new server

The final admin understands that things don't always go according to plan, and leaves the old server online for several days after the migration.  He sets the old server to act as a mail relay though, so it's not delivering locally to any of the users (and thereby interrupting mail flow).  He waits until no legitimate email is seen in his logs for a day or so and then takes the server offline.

This admin also could have used a mail relay service such as a hosted anti-spam service to act as the relay for extra points :-)


Well, that was a much longer look into TTL than anyone would ever want to know.  Hopefully you enjoyed it, and can now impress your friends (or put them to sleep) with your new-found TTL skills.



-TEA

Friday, December 7, 2012

TTL in DNS part 2

In part 1 of this series, we discussed what TTL is, and provided a few exercises to help strengthen your understanding of TTL.  We'll pick up where we left off with our same DNS zone for domain1.com:

domain1.com. 86400 IN NS ns1.domain1s-ISP.com. 
domain1.com. 86400 IN NS ns2.domain1s-ISP.com. 

domain1.com. 14400 IN MX mail.domain1.com. 

mail.domain1.com. 3600  IN A  10.11.12.13.

As we discussed, the NS records will expire 1 day after they were last cached, the MX records will expire 4 hours after they were last cached, and the A record will expire 60 minutes after it was last cached.  What does this mean for you and your planned migration?

A common misconception about TTL is that it's the amount of time the Name Server requires to update the record.  This is absolutely not the case.  Do you remember what RFC 1034 said a TTL is?  A TTL is:

...primarily used by resolvers when they cache RRs.  The TTL describes how long a RR can be cached before it should be* discarded.

Where in there does it say that a TTL is how long a name server takes to update?  What does it say?  It says that a DNS resolver (which is anything that resolves DNS, your computer has a DNS resolver in it) caches the records, and the TTL describes how long the DNS resolver should cache the record before discarding it (and then re-requesting it).

What this means is that (for the sake of simplicity) every computer connected to the internet has it's own local cache of websites it's visited, mail servers it's talked to, etc. and will not re-request those records until it's own, local, cache expires.  TTL is how long it should* keep the records until it discards them and re-requests them.

Object lesson time...

UserPC1.ValuedReader.com visits www.TheEmailAuthority.com at 2:30 PM.  www.TheEmailAuthority.com has a 1 hour TTL for it's A record.  If I move the website at 3 PM, and the old IP stops serving the site, how long would it be until UserPC1.ValuedReader.com can read my site again?

UserPC-2.ValuedReader.com visits www.TheEmailAuthority.com at 3:10 PM How long will it be until they can read the site?

The answers and explanations will be in part three.

TTL in DNS part 1

TTL, or Time To Live, is a confusing subject for many people. Due to DNS's distributed nature, there are many different scenarios to consider. In this post we will explain what TTL is, and what it is not, using a real world example.

First, I suppose we'll answer why TTL is important. TTL can impact your email infrastructure any time you make changes to a Name Server that hosts your DNS, the A record that your mail server resolves to (for instance, when you change to a new hosting provider), or when you change your MX records (for a new hosting provider, or a new mail relay). By no means is this a comprehensive list, but it will suffice for this example.  Since DNS records can be cached for days sometimes, it's important to understand how they work when planning a migration so you can account for several day's worth of traffic.

How it works:
According to RFC 1034, "DOMAIN NAMES - CONCEPTS AND FACILITIES", each DNS record (or "Resource Record", "RR" for short) has a TTL or Time To Live.   It describes TTL as:

TTL which is the time to live of the RR. This field is a 32 bit integer in units of seconds, an is primarily used by resolvers when they cache RRs. The TTL describes how long a RR can be cached before it should be* discarded

A "Resolver" is anything that resolves a DNS record. This could be your ISP's name servers, or your local computer. Anything that queries and resolves DNS records is a "DNS Resolver".

Let's take the below zone into consideration:

domain1.com. 86400 IN NS ns1.domain1s-ISP.com. 
domain1.com. 86400 IN NS ns2.domain1s-ISP.com. 

domain1.com. 14400 IN MX mail.domain1.com. 

mail.domain1.com. 3600  IN A  10.11.12.13. 

So, let's take our three examples above and do some "real world" exercises.  Remember, TTL is expressed in seconds.  You might want to open your calculator app now ;-)

If domain1.com wanted to switch their Name Server records to a different ISP, how many days would it be until the NS TTL expired?

If domain1.com wanted to add a hosted anti-spam service (which would require them to change the MX records), how many hours would it take for the MX TTL to expire?

If domain1.com wanted to upgrade their mail server and point the A record to a new IP, how many minutes would it take for the A record's TTL to expire?

Ok, so we've gotten through the math portion of this post.  Pencils on the table please. Now we know that records can take DAYS to update depending on their TTL.  How does this impact our migration?  I'll answer in part two...