Saturday, April 12, 2014

Debugging and solving the 'Forced Synchronous Layout' problem

If you're using Google Developer tools to profile your website's performance, you might have realized that Chrome warns you about doing 'forced layouts'.
This looks something like this :
In this screenshot, I marked all the warning signs chrome tries to give you so you can realize this problem.

So, what does this mean ?
When the browser constructs a model of the page in memory, it builds 2 trees that represent the DOM in memory. One is the DOM structure itself, and the other is a tree that represents the way the elements should be rendered on the screen.
This tree needs to always stay updated, so when you change an element's css properties for example, the browser might need to update these trees in memory to make sure the next time you request a css property, the browser will know it has updated information.

Why should you care about this ?
Updating both these trees in memory may take some time. Although they are in memory, most pages these days have quite a big DOM so the tree will be pretty big. It also depends on which element you change, since updating different elements might mean only updating part of the tree or the whole tree in different cases.

Can we avoid this ?
The browser can realize that you're trying to update many elements at once, and will optimize itself so that a whole tree update won't happen after each update, but only when the browser knows it needs relevant data. In order for this to work correctly, we need to help it out a little.
A very simple example of this scenario might be setting and getting 2 different properties, one after the other, as so :
var a = document.getElementById('element-a');
var b = document.getElementById('element-b');

a.clientWidth = 100;
var aWidth = a.clientWidth;

b.clientWidth = 200;
var bWidth = b.clientWidth;

In this simple example, the browser will update the whole layout twice. This is because after setting the first element's width, you are asking to retrieve an element's width. When retrieving the css property, the browser know's it needs updated data, so it then goes and updates the whole DOM tree in memory. Only then, it will continue to the next line, which will soon after cause another update because of the same reasons.

This can simply be fixed by changing around the order of the code, as so :
var a = document.getElementById('element-a');
var b = document.getElementById('element-b');

a.clientWidth = 100;
b.clientWidth = 200;

var aWidth = a.clientWidth;
var bWidth = b.clientWidth;

Now, the browser will update both properties one after the other without updating the tree. Only when asking for the width on the 7th line, it will update the DOM tree in memory, and will keep it updated for line number 8 as well. We easily saved one update.

Is this a 'real' problem ?
There are a few blogs out there talking about this problem, and they all seem like textbook examples of the problem. When I first read about this, I too thought it was a little far fetched and not really practical.
Recently though I actually ran into this on a site I'm working on...

Looking at the profiling timeline, I realized the same pattern (which was a bunch of rows alternating between 'Layout' and 'Recalculate Style').
Clicking on the marker showed that this was actually taking around ~300ms.

I can see that the evaluation of the script was taking ~70ms which I could handle, but over 200ms was being wasted on what?!...

Luckily, when clicking on the script in that dialog, it displays a JS stacktrace of the problematic call. This was really helpful, and directed me exactly to the spot.

It turned out I had a piece of code that was going over a loop of elements, checking each element's height, and setting the container height according to the aggregated height. This was being set and get in each loop iteration, causing a performance hit.

The problematic code looked something like this :
for (var i=0; i<containerItems.length; i++) {
   var item = containerItems[i];

var appendItemToContainer = function(item) {
   container.clientHeight += item.clientHeight;

You can see that the 'for' loop has a call to the method 'appendItemToContainer' which sets the container's height according to the previous height - which means setting and getting in the same line.

I fixed this by looping over all the item's in the container, and building an array of their height's. Then I aggregated them all together and set the container's height once. This saved many DOM tree updates, and only left one which is necessary.

The fixed code looked something like this :
// collect the height of all elements
var totalHeight = 0;
for (var i=0; i<containerItems.length; i++) {
   totalHeight += containerItems[i].clientHeight;

// set the container's height once
container.clientHeight = totalHeight;

After fixing the code, I saw that the time spent was actually much less now -

As you can see, I managed to save a little over 150ms which is great for such a simple fix!!

Friday, February 21, 2014

Chrome developer tools profiling flame charts

I just recently, and totally coincidentally, found out that Chrome developer tools can generate flame charts while profiling js code!
Recently it seems like generating flame charts from profiling data has become popular in languages like Ruby, python and php, so i'm excited to see that chrome has this option for js code as well.

The default view for profiling data in the dev tools is the 'tree view', but you can easily change it to 'flame chart' by selecting it on the drop down in the bottom part of the window.

Like here :

Then you will be able to see the profiling results, in a way that sometimes is easier to look at.
You can use the mouse scroll button to zoom in on a specific area of the flame chart, and see what's going on there.

In case you're not familiar with reading flame charts, then here's a simple explanation -
  • Each colored line is a method call
  • The method calls above one another represent the call stack
  • The width of the lines represents how long each call was

And here you can see an example of a flame chart, and I marked a few sections that the flame chart points out for us, that are non-optimized TryCatchBlocks. In this case it's comfortable viewing it in a flame chart because you can see nicely how many method calls each try/catch block is surrounding.

Wednesday, February 19, 2014

Preloading resources - the right way (for me)

Looking through my 'client side performance glasses' when browsing the web, I see that many sites spend too much time downloading resources, mostly on the homepage, but sometimes the main bulk is on subsequent pages as well.

Starting to optimize
When trying to optimize your page, you might think that it's most important that your landing page is the fastest since it defines your users' first impression. So what do you do ? You probably cut down on all the js and css resources you can and leave only what's definitely required for your landing page. You minimize those and then you're left with one file each. You might even be putting the js at the end of the body so it doesn't block the browser from rendering the page, and you're set!

But there's still a problem
Now, your users go onto the next page, probably an inner page of your site, and this one is filled with much more content. On this page you use some jquery plugins and other frameworks you found useful and probably saved yourself hours of javascript coding, but your users are paying the price...

My suggestion
I ran into this same exact problem a few times in the past, and the best way I found of solving this was to preload the resources on the homepage. I can do this after 'page load' so it doesn't block the homepage from rendering, and while the user is looking at the homepage, a little extra time is spent in the background downloading resources they'll probably need on the next pages they browse.

How do we do this ?
Well, there are several techniques, but before choosing the right one, lets take a look at the requirements/constraints we have -
  • We want download js/css files in a non-blocking way
  • Trigger the download ourselves so we can defer it to after 'page load'
  • Download the resources in a way that won't execute them (css and js) (This is really important and the reason we can't just dynamically create a '<script/>' tag and append it to the '<head/>' tag!)
  • Make sure they stay in the browser's cache (this is the whole point!)
  • Work with resources that are stored on secure servers (https). This is important since I would like it to preload resources from my secured registration/login page too if I can.
  • Work with resources on a different domain. This is very important since all of my resources are hosted on an external CDN server with a different subdomain.

The different techniques are (I have tested all of these, and these are my notes)
1. Creating an iframe and appending the script/stylesheet file inside it
var iframe = document.createElement('iframe');
iframe.setAttribute("width", "0");
iframe.setAttribute("height", "0");
iframe.setAttribute("frameborder", "0");
iframe.setAttribute("name", "preload"); = "preload";
iframe.src = "about:blank";

// gymnastics to get reference to the iframe document
iframe = document.all ? document.all.preload.contentWindow : window.frames.preload;
var doc = iframe.document;;

var iFrameAddFile = function(filename) {
    var css = doc.createElement('link');
    css.type = 'text/css';
    css.rel = 'stylesheet';
    css.href = filename;
This works on Chrome and FF but on some versions of IE it wouldn't cache the secure resources (https).
So, close, but no cigar here (at least, fully).

2. Creating a javascript Image object
new Image().src = 'http://myResourceFile.js';
This only works properly on Chrome. On FireFox and IE it would either not download the secure resources or download them but without caching.

3. Building an <object/> tag with file in data attribute
var createObjectTag = function(filename) {
    var o = document.createElement('object'); = filename;

    // IE stuff, otherwise 0x0 is OK
    if (isIE) {
        o.width = 1;
        o.height = 1; = "hidden";
        o.type = "text/plain";
    else {
        o.width  = 0;
        o.height = 0;

This worked nicely on Chrome and FF, but not on some versions of IE.

4. XMLHttpRequest a.k.a. ajax
var ajaxRequest = function(filename) {
    var xhr = new XMLHttpRequest();'GET', filename);

This technique won't work with files on a different domain, so I immediately dropped this.

5. Creating a 'prefetch' tag
var prefetchTag = function(filename) {
    var link = document.createElement('link');
    link.href = filename;
    link.rel = "prefetch";


6. 'script' tag with invalid 'type' attribute
// creates a script tag with an invalid type, like 'script/cache'
// I realized this technique is used by LabJS for some browsers
var invalidScript = function(filename) {
    var s = document.createElement('script');
    s.src = filename;
    s.type = 'script/cache';

This barely worked in any browser properly. It would download the resources, but wouldn't cache them for the next request.

So, first I must say, that given all the constraints that I have, this is more complicated than I thought would be at first.
Some of the techniques worked well on all of the browsers for non-secured resources (non SSL) but only on some browsers for secured resources. In my specific case I just decided to go with one of those, and figure that some users will not have cached resources that are for SSL pages (these are a minority in my case).
But, I guess that given your circumstances, you might choose a different technique. I had quite a few constraints that I'm sure not everyone has.
Another thing worth mentioning is that I didn't test Safari on any technique. Again, this was less interesting for me in my case.
I also didn't think about solving this problem on mobile devices yet. Since mobile bandwidth is also usually much slower I might tackle this problem differently for mobile devices...

Friday, December 20, 2013

Prebrowsing - Not all that...

Six weeks ago, Steve Souders, an amazing performance expert published a post called "Prebrowsing".
In the post he talks about some really simple techniques you can use to make your site quicker. These techniques rely on the fact that you know the next page that the user will browse to, so you 'hint' to the browser, and the browser will start downloading needed resources earlier. This will make the next page navigation appear much quicker to the user.

There are three ways presented to do this - They all use the 'link' tag, with a different 'rel' value.

The first technique is 'dns-prefetch'. This is really easy to add, and can improve performance on your site. Don't expect a major improvement though, the dns resolution itself usually doesn't take more than 150ms (from my experience).
I wrote about this to, in this blog post: Prefetching dns lookups

The second two techniques shown are 'prefetch' and 'prerender'.
Since these are really easy to add, once I read about this, I immediately added this to my site.
A little information about the site I'm working on : The anonymous homepage doesn't have SSL. From this page, most users sign-in or register. Both of these actions redirects the user to a SSL secured page. Since the protocol on these pages are https, the browser doesn't gain from the cached resources it has already since it thinks it's a different domain (and should). This causes the user to wait a long time on these pages just to have it's client download the same resources again but from a secured connection this time.

So I thought it would be perfect to have the browser prerender (or prefetch) the sign-in page or the register page. I have a WebPageTest that runs a script measuring the performance of this page, after the user was at the anonymous homepage. This test improved by a LOT. This was great! It was only a day after that I realized that the anonymous homepage itself was much slower... :/
I guess this is because while the browser takes up some of it's resources to prerender the next page, it affects the performance of the current page. Looking at multiple tests of the same page I couldn't detect any point of failure except that each resource on the page was taking just a little longer to download. Another annoyance is that you can't even see what's happening with the prerendered page on utilities like WebPageTest, so you just see the effect on the current page.

After reading a little more on the subject I found more cons to this technique. First, it's still not supported in all browsers, not even FF or Opera. Another thing is that Chrome can only prerender one page across all processes. This means I can't do this for 2 pages and I don't know how the browser will react if another site that is opened also requested to prerender some pages. You also won't see the progress the browser makes on prerendering the page, and what happens if the user browses to the next page before the prerendered page finished ? Will some of the resources be cached already ? I don't know, and honestly I don't think it's worth testing yet to see how all browsers act on these scenarios.
I think we need to wait a little longer with these two techniques for them to mature a bit...

What is the best solution ?
Well, like every performance improvement - I don't believe there is a 'best solution' as there are no 'silver bullets'.
However, the best solution for the site I'm working on so far, is to preload the resources we know the user will need ourselves. This means we use javascript to have the browser download resources we know the user will need throughout the site on the first page they land, so on subsequent pages, the user's client will have much less to download.

What are the pros with this technique ?
1. I have much more control over it - This means I can detect which browser the user has, and use the appropriate technique so it will work for all users.
2. I can trigger it after the 'page load' event. This way I know it won't block or slow down any other work the client is doing for the current page.
3. I can do this for as many resources I need. css, js, images and even fonts if I want to. Basically anything goes.
4. Downloading resources doesn't limit me to guessing the one page that the user will be heading after this one. On most sites there are many common resources used among different pages, so this gives me a bigger win.
5. I don't care about other tabs the user has open that aren't my site. :)

Of course the drawback with this is that opposed to the 'prerender' technique, the browser will still have to download the html, parse & execute the js/css files and finally render the page.

Unfortunately, doing this correctly isn't that easy. I will write about how to do this in detail in the next post (I promise!).

I want to sum up for now so this post won't be too long -
In conclusion I would say that there are many techniques out there and many of them fit different scenarios. Don't implement any technique just because it's easy and because someone else told you it works. Some of them might not be a good fit for your site and some might even cause damage. Steve Souder's blog is great and an amazing fountain of information on performance. I learned the hard way that each performance improvement I make needs to be properly analyzed and tested before implementing.

Some great resources on the subject :
- Prebrowsing by Steve Souders
- High performance networking in Google Chrome by Ilya Grigorik
- Controlling DNS prefetching by MDN

Monday, November 11, 2013

Some jQuery getters are setters as well

A couple of days ago I ran into an interesting characteristic of jQuery -
Some methods which are 'getters' are also 'setters' behind the scenes.

I know this sounds weird, and you might even be wondering why the hell this matters... Just keep reading and I hope you'll understand... :)

If you call the element dimension methods in jquery (which are height(), innerHeight(), outerHeight(), width(), innerWidth() & outerWidth() ) you'll probably be expecting it to just check the javascript object properties using simple javascript and return the result.
The reality of this is that sometimes it needs to do more complicated work in the background...

The problem :
If you have an object which is defined as 'display:none', calling 'element.clientHeight' in javascript, which should return the object's height will return '0'. This is because a 'hidden' object using 'display:none' isn't rendered on the screen and therefore the client never knows how much space it visually actually takes, leading it to think it's dimensions are 0x0 (which is right in some sense).

How jquery solves the problem for you :
When asking jquery what the height of a 'display:none' element is (by calling $(element).height() ), it's more clever than that.
It can identify that the element is defined as 'display:none', and takes some steps to get the actual height of the element :
- It copies all the element's styles to a temporary object
- Defines the object as position:absolute
- Defines the object as visibility:hidden
- Removes 'display:none' from the element. After this, the browser is forced to 'render' the object, although it doesn't actually display it on the screen because it is still defined as 'visibility:hidden'.
- Now the jquery knows what the actual height of your element is
- Swaps back the original styles and returns the value.

Okay, so now that you know this, why should you even care ?
The step that jquery changes the styles of your element without you knowing, which forces the browser to 'render' the element in the background can take time. Not a lot of time, but still take some time. Probably a few milliseconds. Doing this once wouldn't matter to anyone, but doing this many times, lets say in a loop, might cause performance issues.

Real life!
I recently found a performance issue on our site that was caused by this exact reason. The 'outerHeight()' method was being called in a loop many times, and fixing this caused an improvement of ~200ms. (Why saving 200ms can save save millions of dollars!)

I will soon write a fully detailed post about how I discovered this performance issue, how I tracked it down, and how I fixed it.

Always a good tip!
Learn how your libraries are working under the hood. This will give you great power and a great understanding of how to efficiently use them.

Saturday, November 9, 2013

Humanity wasted 14,526 years watching Gangnam Style

"Humanity wasted 14,526 years watching Gangnam Style"...
This was the title of a link I posted on Hacker News about a week ago which linked to a website I created with a friend of mine (Gil Cohen) -

It seems like I managed to really annoy some people, and some even claim to hate me!
(The whole discussion can be seen here :

Well, I just wanted to say about this whole thing a few words -
The whole idea of this site was only a joke. Just me and my friend sitting in the living room one boring Friday watching some silly YouTube videos ourselves when I started thinking about how many times these videos were watched. It amazed me, so I started calculating it in my head. The programmer I am wouldn't allow me to just calculate this data manually so I started building a site that would do it for me. When we saw the numbers we were amazed and started joking about the things we could have done instead of this 'wasted time'...

I didn't mean to laugh about how you decide to spend your time or make fun of anyone in anyway. I myself 'waste' a lot of time on YouTube, sometimes on silly videos while doing nothing, and sometimes countless hours listening to music as I work. I added at least a few views myself to each one of the videos seen on the site, and many more not on the site. I don't see that time as 'wasted'.

I also know the calculation isn't a bit accurate, and that each (or at least most) of the facts on that site wasn't accomplished by one person so in reality it took much more than written.

So, sorry if I hurt you. I know I made a lot of people laugh in the process, so it was totally worth it! :)

Monday, October 21, 2013

A tale of, IIS 7.5, chunked responses and keep-alive

A while ago I posted about chunked responses - what they are and the importance of them. It turns out that we (where I work) were getting it all wrong.

We implemented chunked responses (or at least thought so) quite a while ago, and it WAS working, in the beginning, but all of a sudden stopped.

How did I come to realize this ?
While analyzing waterfall charts of our site, which I've been doing regularly for quite a while now, I realized that the response doesn't look chunked.
It's not trivial realizing this from a waterfall chart, but if you look closely and you're familiar with your site's performance you should notice this. Since the first chunk we send the client is just the html 'head' tag, this requires almost no processing so it can be sent to the client immediately, and it immediately causes the browser to start downloading resources that are requested in the 'head' tag. If a response is chunked, in the waterfall, you should see the resources starting to be downloaded before the client even finishes downloading the html response from the site.

A proper chunked response should look like this :
If you look closely you will realize that the response took long to download, which doesn't match the internet connection we chose for this test, which means the download didn't actually take that long, but the server sent part of the response, processed more of it, and then sent the rest.

Here's an image of a response that isn't chunked :

You can see that the client only starts downloading the resources required in the 'head' after the whole page is downloaded. We could've saved some precious time here, and have our server work parallel to the client that is downloading resources from our CDN.

What happened ?
Like I said, once this used to work and now it doesn't. We looked back at what was done lately and realized that we switched load balancers recently. Since we weren't sending the chunks properly, the new load balancer doesn't know how to deal with this and therefore just passes the request on without chunks to the client.
In order to investigate this properly, I started working directly with the IIS server...

What was happening ?
I looked at the response with Fiddler and WireShark and realized the response was coming in chunks, but not 'properly'. This means the 'Transfer-Encoding' header wasn't set, and the chunks weren't being received in the correct format. The response was just being streamed, and each part we had we passed on to the client. Before switching load balancer, it was being passed like this to the client, and luckily most clients were dealing with this gracefully. :)

So why weren't our chunks being formatted properly ?
When using, mvc, and IIS 7.5 you shouldn't have to worry about the format of the chunks. All you need to do is call 'HttpContext.Response.Flush()' and the response should be formatted correctly for you. For some reason this wasn't happening...
Since we're not using the classic Microsoft MVC framework, but something we custom built here, I started digging into our framework. I realized it had nothing to do with the framework, and was more low level in Microsoft's web assemblies, so I started digging deeper into Microsoft's code.

Using dotPeek, I looked into the code of 'Response.Flush()'...
This is what I saw :

As you can see, the code for the IIS 6 worker is exposed, but when using IIS7 and above it goes to some unmanaged dll, and that's where I stopped going down that path.

I started looking for other headers that might interfere, and started searching the internet for help... Couldn't find anything on the internet that was useful (which is why I'm writing this...), so I just dug into our settings.
All of a sudden I realized my IIS settings had the 'Enable HTTP keep-alive' setting disabled. This was adding the header 'Connection: close' which was interfering with this.

I read the whole HTTP 1.1 spec about the 'Transfer-Encoding' and 'Connection' headers and there is no reference to any connection between the two. Whether it makes sense or not, It seems like IIS 7.5 (I'm guessing IIS 7 too, although I didn't test it) doesn't format the chunks properly, nor add the 'Transfer-Encoding' header if you don't have the 'Connection' header set to 'keep-alive'.

Jesus! @Microsoft - Couldn't you state that somewhere, in some documentation, or at least as an error message or a warning to the output when running into those colliding settings?!!

Well, what does this all mean ?
The 'Connection' header indicates to the client what type of connection it's dealing with. If the connection is set to 'Close' it indicates that the connection is not persistent and will be closed immediately when it's done sending. When specifying 'keep-alive' this means the connection will stay opened, and the client might need to close it.
In the case of a chunked response, you should indicate the last chunk by sending a chunk with size '0', telling the client it's the end, and they should close the connection. This should be tested properly to make sure you're not leaving connections hanging and just wasting precious resources on your servers.
(btw - by not specifying a connection type, the default will be 'Keep-Alive').

If you want to take extra precaution, and I suggest you do, you can add the 'Keep-Alive' header which indicates that the connection will be closed after a certain amount of time of inactivity.

Whatever you do, make sure to run proper tests under stress/load to make sure your servers are managing their resources correctly.

Additional helpful resources :
- 'Keep-Alive' header protocol
- HTTP/1.1 headers spec