Friday, April 17, 2009

Make your site faster and cheaper to operate in one easy step

Is your web server using using gzip encoding? Surprisingly, many are not. I just wrote a little script to fetch the 30 external links off news.yc and check if they are using gzip encoding. Only 18 were, which means that the other 12 sites are needlessly slow, and also wasting money on bandwidth.

Check your site here.

Some people think gzip is "too slow". It's not. Here's an example (run on my laptop) using data from one of the links on news.ycombinator.com:
$ cat < /tmp/sd.html | wc -c
146117
$ gzip < /tmp/sd.html | wc -c
35481
$ time gzip < /tmp/sd.html >/dev/null
real    0m0.009s
user    0m0.004s
sys     0m0.004s

It took 9ms to compress 146,117 bytes of html (and that includes process creation time, etc), and the compressed data was only about 24% the size of the input. At that rate, compressing 1GB of data would require about 66 seconds of cpu time. Repeating the test with a much larger file results yields about 42 sec/GB, so 66 sec is not an unreasonable estimate.

Inevitably, someone will argue that they can't spare a few ms per page to compress the data, even though it will make their site much more responsive. However, it occured to me today that thanks to Amazon, it's very easy to compare CPU vs Bandwidth. According to their pricing page, a "small" (single core) instance cost $0.10 / hour, and data transfer out costs $0.17 / GB (though it goes down to $0.10 / GB if you use over 150 TB / month, which you probably don't).

Using these numbers, we can estimate that it would cost $1.88 to gzip 1TB of data on Amazon EC2, and $174 to transfer 1TB of data. If you instead compress your data (and get 4-to-1 compression, which is not unusual for html), the bandwidth will only cost $43.52.

Summary:
with gzip: $1.88 for cpu + $43.52 for bandwidth = $45.40 + happier users

without gzip: $174.00 for bandwidth = $128.60 wasted + less happy users

The other excuse for not gzipping content is that your webserver doesn't support it for some reason. Fortunately, there's a simple solution: put nginx in front of your servers. That's what we do at FriendFeed, and it works very well (we use a custom, epoll-based python server). Nginx acts as a proxy -- outside requests connect to nginx, and nginx connects to whatever webserver you are already using (and along the way it will compress your response, and do other good stuff).