All Magento pages from Varnish

I just realised I haven’t updated the site since our last big development. We’re now serving almost all of our pages from Varnish. Crude research suggests around 90% of our pageviews are now coming from Varnish. In simple terms, we’re doing did the following:

  • Render the cart / account links from a cookie with javascript
  • Ajax all pages, so everything can be cached (with a few exceptions we manually exclude)
  • Cache page variants for currencies and tax rate

We’re also warming / refreshing the cache using a bash script parsing the sitemap and hitting every url with a PURGE then a GET.

The hardest part of the whole performance space has been to measure the impact of our changes. But our TTFB was previously in the 300-500ms range for most pages, and now it’s in the 20-30ms range for pages that come from Varnish. I’m very confident that it’s impacting our bottom line.

All category pages from varnish

It’s a glorious day in the pursuit of ultra high performance on Magento. Today, we serve all our category pages from varnish. Plus, we artificially warm the cache to ensure that all category pages across all sites are already in the cache when a real user hits the sites.

Varnish typically takes our time to first byte from around 300ms – 400ms to 20ms – 30ms. We were previously serving 80% of landing pages from varnish, but this changes should improve overall performance by a noticeable margin. Happy days. 🙂

The implementation is fairly custom. Essentially, we’re adding a header to all pages which tells varnish whether the page can be cached or not. So on category pages that header says yes, on product pages that header says no. We also did some custom coding to dynamically the header links (My Cart, Login, Logout, etc) from a cookie. We set that cookie on login, add to cart, etc.

varnishd: Child start failed: could not open sockets

I was banging my head against a wall trying to figure out this error:

varnishd: Child start failed: could not open sockets

I checked netstat -tlnp but nothing was listening on the target port or IP. Turns out, the IP was simply missing. ifconfig didn’t show that IP being up at all. DOH! Simple solution once I found the actual problem. Posting here because I couldn’t find much on this one online.

SSD partition realignment

If you want a great value host in the UK, OVH is pretty good. Their SSD based machines are hard to beat on price. Sure, the service sucks, but you get what you pay for.

There’s a bug with their auto installer, it automatically partitions the whole disk even if you leave a chunk free to reduce write amplification. By default we’re setup with software RAID. This allows for the on-the-fly repartitioning of the disks to align them with the 4k pages on the disk. The OVH auto installer leaves them out of alignment, bad.

Here’s a step by step, please be sure you understand before attempting to run these commands, or you could easily destroy all your data.

Assuming you have a few logical volumes (say 80Gs worth) on a 120G disk with a 10G root partition, leaving in theory 30G free. First, reduce the physical volume to 83G (just a little over 80G to be on the safe side), next reduce the raid partition to 85G (again, a little over, better safe than sorry). Then take one drive out of the raid array, delete the partition, create a new properly aligned partition, add it back to the raid array. Now let it resync and repeat for the second drive. Finally, resize up the raid partition and the physical volume.

This assumes /dev/sda|b3 is an extended partition containing 1 logical partition /dev/sda|b5 which is in raid partition /dev/md5.

Make a copy of your partition tables, raid layout, etc, before you start. Backup.

# Resize the physical volume and raid array
pvresize --setphysicalvolumesize 83G /dev/md5
mdadm --grow /dev/md5 --size 89216000

# Take this partition out of the RAID array
mdadm --manage /dev/md5 --fail /dev/sda5
mdadm --manage /dev/md5 --remove /dev/sda5
mdadm --zero-superblock /dev/sda5

# Remove the partition
parted /dev/sda rm 5
parted /dev/sda rm 3

# Moved everything down the drive to align properly with 512k rows
parted /dev/sda mkpart extended 23068672s 209715200s
parted /dev/sda mkpart logical 23072768s 209715200s
parted /dev/sda set 5 raid on

# Add the partition back into the RAID array
mdadm -a /dev/md5 /dev/sda5

# Let the resync finish, then repeat for the other drive

# Resize the raid and lvm back to the full size of the new partition
mdadm --grow /dev/md5 --size=max
pvresize /dev/md5

WordPress and memcached

Taking Web Performance Optimisation into my personal life, and partly egged on by my bro, I’ve been looking at my site’s performance over the last few weeks.

As with any performance optimisation, the starting point is the traffic profile. The site sees between 300 and 1’000 pageviews a day, the top 8 pages account for 50% of traffic, the rest is one or two pageviews per page per day.

Given this spread of traffic, full page caching will take a long time to warm up, and will only improve the second hit to any given page, if the cached page still exists. I’d like to boost performance across the board, so I looked at using memcache. Several plugins exist which leverage WordPress’s in-built caching and store the data in memcache so it persists between requests.

Couchbase

The folks behind membase and couchdb merged into Couchbase. They produce couchbase server, which is memcache compatible (via a proxy known as moxi I learned) out of the box, with the added benefit of persistence to disk. One of my long term goals is to store Magento sessions for a long time in a mostly persistent cache, so I was keen to experiment with couchbase.

At first, I installed couchbase, fired up the memcached-redux plugin, and my load times went from ~200ms to >5s. Turns out couchbase doesn’t work out of the box, it needs to be configured via the web interface on localhost:8091. Done. Now load times are in the ~400ms range. Slower. I learn that the proxy is slower at getMulti() requests. So I installed the memcached plugin, which implements its own getMulti() in PHP. Load times improve slightly to ~350ms.

Memcached

I then uninstall couchbase, install memcached, and try again. The memcached-redux plugin showed load times ~350ms, memcached plugin ~300ms with a couple of 4s responses thrown in for good mesaure.

Site was slower

Bottom line, using memcache was slower, whichever backend or plugin I tried.

On this server, we have plenty of spare memory / CPU and mysql has been given a generous amount of memory to play with. My guess is that for reasonably simple queries, when serving from memory, mysql performs about the same as memcache. Some old reading suggests mysql might even perform slightly better under the right circumstances.

Here, mysql is connecting via a unix socket while memcache is over TCP/IP. That alone might account for the performance difference.

Memcache has its place

Memcache has a whole lot of properties that make it useful in a wide range of circumstances. In fact, WordPress.com serve their cached pages from memcache via batcache. In an environment without a shared filesystem, memcache provides a distributed cache, which is the key to its success with WordPress.com. In fact the batcache literature specifically says that file based caching is faster on a single node.

Conclusion

On a single server with plenty of capacity, memcache is the wrong tool. I’m seriously considering varnish for sidebar and/or full page caching, it could really help with the busiest pages and I have some experience with it. But I think the next step will be to test APC. It’s a single machine, in-memory cache, so it could work well in this situation. Plus, the bytecode caching might have a positive impact.

Papertrail

A couple of weeks ago I trialled papertrail. Simply put, these guys rock. The application itself is great, simple and functional. But what sets them apart, above and beyond, is the service. Simply outstanding. I’ve been impressed with every interaction on their live chat, and more impressed by their seemingly non-stop presence in there, night or day.

The pitch is simple. Pipe all your logs into one place. Amalgamate the logs from multiple servers and applications, and dump them all into one place. Then provide a simple, easy to use web interface to monitor and search those logs.

It seems like a “so what” kinda product, but the effect is like going from dial up to broadband. The difference from dial up to broadband was not actually the speed, it was the always-on nature of it. Suddenly, when the net is only a click away, it becomes exponentially more useful. Papertrail is the same.

Before I had tried the service, I had difficulty seeing the value beyond my natural desire to be organised and prepared. But once I had all that data in one easy interface, subtle changes happened. I no longer use `tail -f` to keep an eye on things. I can watch the logs for the same service on multiple servers, colour coded (required a little CSS greasemonkey on my part), in a single flow.

Our logs are now closer. As in more close, not more closing! That has profound impact.

loggly

There’s another seemingly similar service called loggly. At our current estimated usage (2.5G/month) they’re free while papertrail would be $18 or $35 depending on how your price it. To get around that, I’ll likely exclude our static assets from the data we push to papertrail, it’s probably noise anyway. That brings us down to $7/month.

Just to see what it’s like, I created an account on loggly today. I don’t quite get it. The logs are not real time as far as I can tell. The focus seems to be on analysis. Trending, graphing, reporting. The interface is complicated, heavy. Maybe I’m missing something. If you’re a fan of loggly and can extol it’s virtues, please do so in the comments.

I might experiment further with loggly. It’s possible we’ll leverage the graphing capabilities of loggly at some point, maybe even in addition to papertrail. But for now, even at $7 instead of free, I like papertrail. I feel good about being a customer. That’s precious.

Meetup Presentation June 2012

Here’s the slides (pdf) from the presentation last night. I’ve pulled out a couple of the more useful code sections below. Any questions, or if you want any links not in the slides, let me know in the comments.

The first line rewrites static resources via a custom origin CDN, the second shards those resources between 2 or more CDN hostnames. We have a CloudFront distribution setup on cdn1|2.dmn.tld that uses domain.tld as the origin. No other config required.

ModPagespeedMapRewriteDomain static.dmn.tld domain.tld
ModPagespeedShardDomain static.dmn.tld cdn1.dmn.tld,cdn2.dmn.tld

We use these mod_pagespeed filters:
combine_javascript,remove_comments,collapse_whitespace,outline_javascript

I’d recommend using the ModPagespeedLoadFromFile directive:

ModPagespeedLoadFromFile "http://static.dmn.tld/js/" "/var/www/path/to/htdocs/js/"
ModPagespeedLoadFromFile "http://static.dmn.tld/skin/" "/var/www/path/to/htdocs/skin/"
ModPagespeedLoadFromFile "http://static.dmn.tld/media/" "/var/www/path/to/htdocs/media/"

Few links for the truly lazy who don’t want to search!

First useful boomerang graph

Today we’ve produced our first useful graphs from the 770k boomerang data points we have collected. This is one of the graphs we produced, and I’ll post it here only because it’s the first one I personally produced. At last, after 4 months we’re actually seeing data.

What does it tell us? That page load time is not very uniform. Next step, linear regression comparing page load time against user’s available bandwidth.

Google’s mod_pagespeed

We’ve been running mod_pagespeed for a few weeks now. The results appear to be very positive in our tests. However, Google Analytics hasn’t shown any conclusive improvement in performance. When we compare mod_pagespeed against bare apache, page load times improve by around 1 second, but GA shows erratic results.

I think our traffic is too low for GA to get consistent results. We have set the page speed sample to the maximum 10%, but even then we’re not getting enough data to provide an accurate picture. That’s my guess. We’re running boomerang on all page loads, however, we’re still working on how to analyse the data to get useful statistics from it.

The stock mod_pagespeed install leaves out a couple of big improvements that are low risk. We tweaked the config by enabling the following filters:

ModPagespeedEnableFilters combine_javascript,remove_comments,collapse_whitespace,outline_javascript

We don’t have any data, but my feeling is that enabling the collapse_whitespace and remove_comments made a big difference. It reduced our HTML page size considerably, although the effect was offset by the inlining of smaller images. We also enabled the ModPagespeedLoadFromFile option to load our static assets directly from disk. Not sure how much difference it makes in practice, but it seemed like a low risk, sensible option.

We also used the ModPagespeedMapRewriteDomain and ModPagespeedShardDomain options to rewrite static requests onto a CDN. More about that in a later post.

Sysadmin friendly

The biggest advantage of mod_pagespeed from my perspective as the sysadmin was that it required zero changes to the application. That was a big win for us because it meant no code changes and no developer time. For example being able to combine, minify and CDN enable all our static assets with a couple of changes to the apache vhost config without a single code change was superb.

At first I thought mod_pagespeed had increased our time to first byte by around 200ms, but that turned out to be a myth. It’s hard to tell exactly, but it looks like it adds maybe 20ms to our time to first byte. That’s a very reasonable trade off for the advantages.

Conclusion

From a sysadmin perspective, mod_pagespeed is great. It applies across all the pieces of our application (we had to disable it for some with ModPagespeedDisallow). If I have time I’ll look into writing our own custom filters to insert things like boomerang, but that’s a longer term nice-to-have, not a core requirement.

Bottom line, after my initial scepticism, I can recommend mod_pagespeed if you want to easily reduce your roundtrips and CDNify your static assets at the server level rather than the application level. Big thanks to the mod_pagespeed developers who appear to be ultra responsive on the mailing list.

Ultra High Performance Magento