Thursday, September 19, 2013

Prefetching dns lookups

Since I've been working hard on latency and client side performance at my company, I've been analyzing several pages a day of our site and other big sites on the web, using mainly WebPageTest, looking for ways to optimize their performance. Viewing hundreds of waterfall charts, your eyes tend to get used to looking at the same kind of patterns and the same kind of requests.

The DNS resolution, or 'DNS lookup' phase in the request was something I always thought should just be ignored. I mean, it pissed the hell out of me that it was there, but I honestly thought that there was nothing I can do about it...

A while ago I thought about simply inserting the IP addresses of our CDN domains and other sub-domains we might have directly in the code to solve this. This is bad for 2 main reasons:
1. If your IP changes for some reason it forces you to change your code accordingly. (maybe not a scenario that should happen often or even at all, but still might)
2. (and this is much more important!) When using a CDN service like akamai, the dns lookup will give you different results according to where you are in the world. Since they have servers strategically placed in different geographical locations, a user from the USA will probably get a different IP than a user from Europe or Asia.

Well, recently that all changed - I realized that you can direct the browser to prefetch the dns lookup at the beginning of the request, so that when the browser runs into the new domain it won't have to lookup up the dns again.

To do this, all you need to add is this tag at the beginning of your page :

Doing this on the domain you're currently on has no effect since the browser already did the dns lookup, but it can help when you know that in your page source you have calls to multiple sub-domains (for cdn's), calls to 3rd party libraries or ajax calls you make to other domains. Even if you know of a call that will happen on the next page the user lands on, you should still prefetch the dns lookup since the browser caches the results for a couple of minutes at least, and this should have no effect on the current page performance.

The most common response I get when telling people about this, or reading about this on the internet is that the DNS lookup alone doesn't take that long. From my tests, I can say that the average DNS lookup time is under 100ms, although usually above 20ms, and sometimes it passes the 100ms. Even though this isn't the common case, you can still make sure time is saved for those 'unlucky' users.
...and besides, this is one of the easiest performance wins you have - It requires almost no work to implement!

Just while writing this article I happened to test, and check out how long the DNS lookup took on those 3 last requests!
(You can view the full results of this test here : Yep, you better believe your eyes - The DNS lookup on those last requests seemed to take 2 seconds!!
Now, I don't know why they took 2 seconds in that case, and I bet this is really rare, but it still happens sometimes, you can't argue with that.
But hey, If they would've requested to prefetch that last domain, it would still take that long! That's right, but it would've started much earlier, and could've still save hundreds of valuable milliseconds.

So, my suggestions to you is, lets say you have 4 sub-domains for CDN's and you know you're going to call facebook's api at some point, you should put something like this in the head tag of your source :

This will tell the browser to immediately start the dns fetching so that when the browser reaches those domains it will have the ip stored in the cache already.

If you want to see what it looks like when you're prefetching the dns lookup properly, take a look at these WebPageTest results from amazon :
You can clearly see that the dns lookup part of the request on some of the domains happen a lot before the browser reaches the actual resource on the timeline, and when it does, it doesn't need to wait for the dns lookup.
As usual, great work amazon! :)

Some more resources on the subject :
- MDN - Controlling DNS prefetching
- Chromium Blog - DNS prefetching
- Performance Calendar - Speed up your site using DNS prefetching

Wednesday, September 4, 2013

All about http chunked responses

A short background on HTTP and the 'Content-Length' header :
When sending requests over HTTP (hence, 'the web'), we send an HTTP request which consists of two main parts - the header of the request and the body. The header defines various details of the request body (e.g.: encoding type, cookies, request method, etc.). One of these details is the 'Content-Length' specifying the size of the body. If you're building a website and aren't specifying this explicitly then chances are the framework you're using is doing this for you. Once you send the response to the client, the framework measures the size of the response and adds it to this header.

In a normal request, looking at the headers with FireBug or Chrome developer tools, it should look like this (looking at :

So, what is a 'chunked response' ?
A 'chunked' response means that instead of processing the whole page, generating all of the html and sending it to the client, we can split the html into 'chunks' and send one after the other, without telling the browser how big the response will be ahead of time.

Why would anyone want to do this ?
Well, some pages on the site can take a long time to process. While the server is working hard to generate the output, the user sees a white screen and the browser is pretty much hopeless during this time with nothing to do and just displays a boring white screen to the user.
The work the server is doing might be to generate a specific part of the content on the page, and we might have a lot ready that we can already give the client to work with. If you have scripts & stylesheets in the <head/> of your page, you can send the first chunk with the 'head' tag html content to the user's machine, then the browser will have something to work with, meaning it will start downloading the scripts and resources it needs and during this time, and your servers can continue crunching numbers to generate the content to be displayed.
You are actually gaining parallelism by sending the client this first chunk without waiting for the rest of the page to be ready!

Taking this further, you can split the page into several chunks. In practice, you can send one chunk with the 'head' of the page. The browser can then start downloading scripts and stylesheets, while your server is processing lets say the categories from your db to display in your header menu/navigation. Then you can send this as a chunk to the browser so it will have something to start rendering on the screen, and your server can continue processing the rest of the page.

Even if the user only sees part of the content, and it isn't enough to work with, the user still gets a 'sense' of better performance - something we call 'perceived performance' which has almost the same impact.

Many big sites are doing this, since this will most definitely improve the client side performance of your site. Even if it's only by a few milliseconds, in the ecommerce world we know that time is money!

How does this work ?
Since the response is chunked, you cannot send the 'Content-Length' response header because you don't necessarily know how long the response will be. Usually you won't know how big the response will be, and even if you do, the browser doesn't care at this point.
So, to notify the browser about the chunked response, you need to omit the 'Content-Length' header, and add the header 'Transfer-Encoding: chunked'. Giving this information to the browser, the browser will now expect to receive the chunks in a very specific format.
At the beginning of each chunk you need to add the length of the current chunk in hexadecimal format, followed by '\r\n' and then the chunk itself, followed by another '\r\n'.

FireBug and Chrome dev tools both combine the chunks for you, so you won't be able to see them as they are really received by the browser. In order to see this properly you will need to use a more low level tool like Fiddler.

This is how the raw response of looks like using fiddler :

Note : I marked the required 'Transfer-Encoding: chunked' header, and the first line with the size of the chunk. In this case the first chunk is 0xd7c bytes long, which in human-readable format is 3452 bytes.
Also, it's interesting to note that you cannot really read the first chunk since it's encoded via gzip (which is also automatically decoded when using browser dev tools). When using fiddler, you can see the message at the top telling you this, and you can click it and have it decoded, but then the chunks are removed and you'll see the whole html output.

How can we achieve this with ?
When you want to flush the content of your site, all you need to do in the middle of a view is call 'HttpContext.Current.Response.Flush()'.
It's that easy! Without you having to worry about it, the .net framework will take care of the details and send the response to the browser in the correct format.

Some things that might interfere with this working properly :
- You might have to configure 'Response.BufferOutput = false;' at the beginning of your request so the output won't be buffered and will be flushed as you call it.
- If you specifically add the 'Content-Length' header yourself then this won't work.

For more helpful resources on chunked responses :
Wikipedia, and the spec details :
How to write chunked responses in .net (but not -
Implementing chunked with IHttpListener -