How CDN makes your website faster. Part 5. Domains

web-performance

⋅ 5 min read

May 23, 2022

by Igor Adamenko

This is the fifth article of our series on the technologies behind Content Delivery Networks. Here are the previous parts in case you missed them:

The only thing left to understand is the way your computer finds out the IP it should use to send packets when you type a domain in your browser.

Domain resolution

When you type a domain, let’s say, app.uploadcare.com, your browser initiates a DNS request. DNS stands for Domain Name System. This system looks like a bunch of servers spreading around the world and storing “domain to address” associations. To explain the idea let’s try to find the IP of app.uploadcare.com.

The first question that immediately pops up in our heads is “Well, where are those DNS servers located? How do we find their IP addresses?” It turns out that there’s a public list of well-known servers called “root servers.” This list is served by Internet Assigned Numbers Authority (IANA) and it looks like this:

List of root servers
List of root servers

Currently, there are 13 root servers. Let’s pick one of them and make a request! For example, “NASA’s server” sounds cool!

To make a request we will use dig. This tool is preinstalled on most Unix-like systems. If you use Windows without WSL you may try Resolve-DnsName.

Alright, let’s ask NASA’s server what it knows about app.uploadcare.com:

$ dig @192.203.230.10 app.uploadcare.com

; <<>> DiG 9.16.1-Ubuntu <<>> @192.203.230.10 app.uploadcare.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63891
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;app.uploadcare.com.            IN      A

;; AUTHORITY SECTION:
com.                    172800  IN      NS      a.gtld-servers.net.
com.                    172800  IN      NS      b.gtld-servers.net.
com.                    172800  IN      NS      c.gtld-servers.net.
com.                    172800  IN      NS      d.gtld-servers.net.
com.                    172800  IN      NS      e.gtld-servers.net.
com.                    172800  IN      NS      f.gtld-servers.net.
com.                    172800  IN      NS      g.gtld-servers.net.
com.                    172800  IN      NS      h.gtld-servers.net.
com.                    172800  IN      NS      i.gtld-servers.net.
com.                    172800  IN      NS      j.gtld-servers.net.
com.                    172800  IN      NS      k.gtld-servers.net.
com.                    172800  IN      NS      l.gtld-servers.net.
com.                    172800  IN      NS      m.gtld-servers.net.

;; ADDITIONAL SECTION:
a.gtld-servers.net.     172800  IN      A       192.5.6.30
a.gtld-servers.net.     172800  IN      AAAA    2001:503:a83e::2:30
b.gtld-servers.net.     172800  IN      A       192.33.14.30
b.gtld-servers.net.     172800  IN      AAAA    2001:503:231d::2:30
c.gtld-servers.net.     172800  IN      A       192.26.92.30
c.gtld-servers.net.     172800  IN      AAAA    2001:503:83eb::30
d.gtld-servers.net.     172800  IN      A       192.31.80.30
d.gtld-servers.net.     172800  IN      AAAA    2001:500:856e::30
e.gtld-servers.net.     172800  IN      A       192.12.94.30
e.gtld-servers.net.     172800  IN      AAAA    2001:502:1ca1::30
f.gtld-servers.net.     172800  IN      A       192.35.51.30
f.gtld-servers.net.     172800  IN      AAAA    2001:503:d414::30
g.gtld-servers.net.     172800  IN      A       192.42.93.30
g.gtld-servers.net.     172800  IN      AAAA    2001:503:eea3::30
h.gtld-servers.net.     172800  IN      A       192.54.112.30
h.gtld-servers.net.     172800  IN      AAAA    2001:502:8cc::30
i.gtld-servers.net.     172800  IN      A       192.43.172.30
i.gtld-servers.net.     172800  IN      AAAA    2001:503:39c1::30
j.gtld-servers.net.     172800  IN      A       192.48.79.30
j.gtld-servers.net.     172800  IN      AAAA    2001:502:7094::30
k.gtld-servers.net.     172800  IN      A       192.52.178.30
k.gtld-servers.net.     172800  IN      AAAA    2001:503:d2d::30
l.gtld-servers.net.     172800  IN      A       192.41.162.30
l.gtld-servers.net.     172800  IN      AAAA    2001:500:d937::30
m.gtld-servers.net.     172800  IN      A       192.55.83.30
m.gtld-servers.net.     172800  IN      AAAA    2001:501:b1f9::30

;; Query time: 30 msec
;; SERVER: 192.203.230.10#53(192.203.230.10)
;; WHEN: Tue Feb 01 17:03:55 EET 2022
;; MSG SIZE  rcvd: 843

We’ve run dig with passing the domain we want to resolve and the server that dig should ask. The result may be overwhelming, but actually, we haven’t got the answer! Check the header section:

;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27

The header says that we’ve sent one query, but there’s no answer. Instead, the server returned 13 more servers that we can ask about this domain. Turns out the root servers do not know about all the domains in the world. The servers serve information about the first level domains only, such as .com, .org, etc. When you ask a root server about a domain of another level, the server checks to which first level this domain belongs and “redirects” you to the server responsible for this first level domain.

That’s exactly what happened in our case! The authority section of the response contains the list of servers that know about .com-domains:

;; AUTHORITY SECTION:
com.                    172800  IN      NS      a.gtld-servers.net.
com.                    172800  IN      NS      b.gtld-servers.net.
# and so on

NASA's root server even attached the IP addresses of those servers to the response! They’re listed in the additional section:

;; ADDITIONAL SECTION:
a.gtld-servers.net.     172800  IN      A       192.5.6.30
a.gtld-servers.net.     172800  IN      AAAA    2001:503:a83e::2:30
b.gtld-servers.net.     172800  IN      A       192.33.14.30
b.gtld-servers.net.     172800  IN      AAAA    2001:503:231d::2:30
# and so on

Alright! Let’s ask a.gtld-servers.net what it knows about app.uploadcare.com:

$ dig @192.5.6.30 app.uploadcare.com

; <<>> DiG 9.16.1-Ubuntu <<>> @192.5.6.30 app.uploadcare.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54781
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;app.uploadcare.com.            IN      A

;; AUTHORITY SECTION:
uploadcare.com.         172800  IN      NS      ns-657.awsdns-18.net.
uploadcare.com.         172800  IN      NS      ns-411.awsdns-51.com.
uploadcare.com.         172800  IN      NS      ns-1625.awsdns-11.co.uk.
uploadcare.com.         172800  IN      NS      ns-1371.awsdns-43.org.

;; ADDITIONAL SECTION:
ns-411.awsdns-51.com.   172800  IN      A       205.251.193.155

;; Query time: 130 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
;; WHEN: Tue Feb 01 17:38:12 EET 2022
;; MSG SIZE  rcvd: 200

Well, this server also doesn’t know about app.uploadcare.com. But it gives us the servers that know something! For one of them (ns-411.awsdns-51.com) it even attached the IP. So, let’s ask this ns-411.awsdns-51.com, whatever it is:

$ dig @205.251.193.155 app.uploadcare.com

; <<>> DiG 9.16.1-Ubuntu <<>> @205.251.193.155 app.uploadcare.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21912
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;app.uploadcare.com.            IN      A

;; ANSWER SECTION:
app.uploadcare.com.     60      IN      A       52.4.167.167
app.uploadcare.com.     60      IN      A       3.86.127.247
app.uploadcare.com.     60      IN      A       18.213.93.146

;; AUTHORITY SECTION:
uploadcare.com.         172800  IN      NS      ns-1371.awsdns-43.org.
uploadcare.com.         172800  IN      NS      ns-1625.awsdns-11.co.uk.
uploadcare.com.         172800  IN      NS      ns-411.awsdns-51.com.
uploadcare.com.         172800  IN      NS      ns-657.awsdns-18.net.

;; Query time: 80 msec
;; SERVER: 205.251.193.155#53(205.251.193.155)
;; WHEN: Tue Feb 01 17:40:10 EET 2022
;; MSG SIZE  rcvd: 232

Yaay! As you see in the header, we have 3 answers. Finally, there’s an answer section in the response we get. Which means that we know which IP address has the computer that is serving app.uploadcare.com. There are actually three of them: 52.4.167.167, 3.86.127.247, and 18.213.93.146.

If you open any of these IP addresses in your browser, you will probably get a 502 Bad Gateway error. Why?

A small feature of web servers

The error happens due to the setup of web servers, which does not allow “direct” connection to the domain.

Those web servers usually maintain several domains at a time, and they want to know which one you are asking for. The servers get this information from the “Host” header, which is not set to “app.uplodcare.com” by browser when we open an IP. But we can send this info manually using curl!

First, let’s ensure that curl requests fail with 502 too. To do this, let’s request the first IP we’ve got—52.4.167.167:

$ curl -v 52.4.167.167
*   Trying 52.4.167.167:80...
* TCP_NODELAY set
* Connected to 52.4.167.167 (52.4.167.167) port 80 (#0)
> GET / HTTP/1.1
> Host: 52.4.167.167
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 502 Bad Gateway
< Server: awselb/2.0
< Date: Tue, 01 Feb 2022 15:48:56 GMT
< Content-Type: text/html
< Content-Length: 122
< Connection: keep-alive
<
# useless HTML here
* Connection #0 to host 52.4.167.167 left intact

Yeah, it fails. Also you may see the “Host” header value that is passed by curl. It is the IP we’ve requested.

Now, let’s pass the correct “Host” header by ourselves:

$ curl -v --header "Host: app.uploadcare.com" 52.4.167.167
*   Trying 52.4.167.167:80...
* TCP_NODELAY set
* Connected to 52.4.167.167 (52.4.167.167) port 80 (#0)
> GET / HTTP/1.1
> Host: app.uploadcare.com
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Moved Temporarily
< Date: Tue, 01 Feb 2022 15:51:18 GMT
< Content-Type: text/html
< Content-Length: 154
< Connection: keep-alive
< Server: nginx
< Location: https://app.uploadcare.com/
<
# useless HTML here
* Connection #0 to host 52.4.167.167 left intact

Cool! We’ve got the response and not the error one this time. We’ve got a redirect. It means that the server hosting app.uploadcare.com actually does not store any data over HTTP, but according to the “Location” header value the app.uploadcare.com should return the data if we ask it over HTTPS.

It’s actually kind of tricky to send HTTPS requests using curl, because by default it checks whether the server’s certificate is issued to the domain we request. But we request an IP address, which leads to shutting down the connection. One way to run the request is to use --insecure flag, which is not something you should do at home without your parents:

$ curl -v --header "Host: app.uploadcare.com" --insecure https://52.4.167.167
*   Trying 52.4.167.167:443...
* TCP_NODELAY set
# a lot of TLS related things
> GET / HTTP/2
> Host: app.uploadcare.com
> user-agent: curl/7.68.0
> accept: */*
>
< HTTP/2 200
< date: Tue, 01 Feb 2022 16:20:41 GMT
< content-type: text/html; charset=UTF-8
< server: nginx
< cache-control: private, max-age=0, stale-if-error=3600
< x-frame-options: SAMEORIGIN
< referrer-policy: ORIGIN
<
# huge HTML with app.uploadcare.com main page!
* Connection #0 to host 52.4.167.167 left intact

Another one, which is more safe and reliable, is to use --resolve option. This option allows us to tell curl which IP address it should use for the requested domain:

$ curl -v https://app.uploadcare.com --resolve app.uploadcare.com:443:52.4.167.167
# the same result

Regardless of the option chosen, it works!

We’ve done everything that our browser does when we ask it to open app.uploadcare.com:

  1. We’ve resolved the domain and have got the IP address it belongs to.
  2. We’ve sent our HTTP/S requests to this IP.

Actually, we’ve done even more than the browser does…

Recursive and caching name server

Sure, the system of “leveled” DNS servers works fine. You start with the root server, it tells you where to go next, and so on. But this way of resolving generates a huge amount of traffic for the root servers. So, we need some cache here.

Actually, responses for our requests already have a piece of cache-related information. Here’s a line from the authority section of response for our first request:

;; AUTHORITY SECTION:
com.                    172800  IN      NS      a.gtld-servers.net.

Here the first column means “for the domain .com”, the last one means “ask a.gtlid-servers.net”, and the second column means “and you may ask it directly for the next 172 800 seconds”. This 172 800 in the answer is the time in seconds (48 hours, btw) for which we can cache the response. So, when we decide to resolve one more .com domain, we should go directly to a.gtlid-servers.net (or any other of the servers that were listed in the response).

Same works for others responses, e.g.:

uploadcare.com.         172800  IN      NS      ns-411.awsdns-51.com.

ns-411.awsdns-51.com is in charge of uploadcare.com-related domains for the next 48 hours, so we could start our resolving of uploadcare.com straight from this DNS server.

(If you want to understand the rest of the answer, check Ch. 15. DNS Messages of “Pro DNS and BIND” book.)

Due to the fact that we have such TTLs we can build a cache server that will do all this resolving for us. Well, that’s exactly the servers our computer uses.

When you connect your laptop to a router, the router usually sends you by DHCP your local IP, subnet mask, default gateway and the address of the DNS server you should use for domain resolving.

When you ask this server for resolve, it firstly checks its own cache, trying to find the answer there. If the cache contains the IP for the domain you ask for, the server simply returns it to you. If not, the server breaks down your domain by levels and goes to the DNS server that is related to the first part that has not been resolved. In other words, the server recursively resolves the domain by doing exactly the thing we’ve done many times during this article.

Finally, when this recursive server resolves the domain, it stores the resulting IP in the cache and returns this IP to you.

You may even know such public DNS resolves like Google DNS (8.8.8.8) or Cloudflare DNS (1.1.1.1). It is possible to configure your computer to always use them, no matter what the router says. Sometimes it works faster than your provider’s DNS server.

You can see for yourself that those are recursive resolvers, because they give you the straight answer, without any redirections. E.g.:

$ dig @1.1.1.1 app.uploadcare.com
# ...
;; ANSWER SECTION:
app.uploadcare.com.     60      IN      A       18.213.93.146
app.uploadcare.com.     60      IN      A       3.86.127.247
app.uploadcare.com.     60      IN      A       52.4.167.167
# ...

Summary

In this article we’ve told you how DNS servers and resolvers work.

Now it looks like we’ve laid the foundation properly, and you know all the things that are related to CDN: how the Internet works, what the main protocols are, what routing is and why BGP is important. Finally, you know how DNS works.

In the next article we will see how CDN is built on top of all these technologies and how it makes your website faster!


© 2011–2022 Uploadcare Inc.
Burrard St, Vancouver, BC V7X 1M8, Canada
Request status