Monday, June 23, 2014

Drastically improving 'First Byte' and 'Page Load' (for SEO)


Improving your 'first byte' speed, and in general your 'page load' can be crucial for SEO. Google likes pages that render faster to the user, and, in some cases, will prioritize them higher than other pages in search results.

If you're not familiar with this, then here are some articles on the subject :
http://googlewebmastercentral.blogspot.co.il/2010/04/using-site-speed-in-web-search-ranking.html
http://blog.kissmetrics.com/speed-is-a-killer/
http://www.quicksprout.com/2012/12/10/how-load-time-affects-google-rankings/

Improving your site's performance can be a daunting task. There are probably many easy wins you can do that will improve the speed by a little, but quickly you will realize that better results will take much longer. Some improvements can take days, weeks and even months of infrastructure changes.

But why should your SEO suffer from this ?? Why not be a step ahead of google ??
Your site doesn't really need to be fast for you to get good SEO scores, you just need google to think your site is fast!

But how do you do that ?
Google will scan your site once every few days/weeks and cache the results for indexing. So let's beat google to it's own game.
Why don't we crawl our site first, cache the results to text files even, and when google comes around, just serve it the static pages we cached without any server calculations.

You can easily build a crawler using Selenium, phantomjs, zombiejs or pure nodejs. You don't even need to implement all the logic of a regular crawler since you're familiar with your site's domain.

For a real world example :
If your site is a big commerce site, then you know the structure of all your product pages. They're probably something like this :
http://www.YourCommerceSite.com/product/Product-Name/:Product-ID:

You can invoke this endpoint, while scanning all your different product id's from your db.
Then you can save them all to text files like this :
Product_.txt

When the google bot comes around (which you can easily detect by it's 'User-Agent' header) and requests a product page, then quickly give it the cached product page you stored on disk.
This might be stale by a few days/hours (as frequent as you decide to scan) but will still be good enough for indexing in google (since google's indexing isn't realtime anyway) and should be super fast!