Tag Archives: magento

All Magento pages from Varnish

I just realised I haven’t updated the site since our last big development. We’re now serving almost all of our pages from Varnish. Crude research suggests around 90% of our pageviews are now coming from Varnish. In simple terms, we’re doing did the following:

  • Render the cart / account links from a cookie with javascript
  • Ajax all pages, so everything can be cached (with a few exceptions we manually exclude)
  • Cache page variants for currencies and tax rate

We’re also warming / refreshing the cache using a bash script parsing the sitemap and hitting every url with a PURGE then a GET.

The hardest part of the whole performance space has been to measure the impact of our changes. But our TTFB was previously in the 300-500ms range for most pages, and now it’s in the 20-30ms range for pages that come from Varnish. I’m very confident that it’s impacting our bottom line.

Advertisements

All category pages from varnish

It’s a glorious day in the pursuit of ultra high performance on Magento. Today, we serve all our category pages from varnish. Plus, we artificially warm the cache to ensure that all category pages across all sites are already in the cache when a real user hits the sites.

Varnish typically takes our time to first byte from around 300ms – 400ms to 20ms – 30ms. We were previously serving 80% of landing pages from varnish, but this changes should improve overall performance by a noticeable margin. Happy days. 🙂

The implementation is fairly custom. Essentially, we’re adding a header to all pages which tells varnish whether the page can be cached or not. So on category pages that header says yes, on product pages that header says no. We also did some custom coding to dynamically the header links (My Cart, Login, Logout, etc) from a cookie. We set that cookie on login, add to cart, etc.

Meetup Presentation June 2012

Here’s the slides (pdf) from the presentation last night. I’ve pulled out a couple of the more useful code sections below. Any questions, or if you want any links not in the slides, let me know in the comments.

The first line rewrites static resources via a custom origin CDN, the second shards those resources between 2 or more CDN hostnames. We have a CloudFront distribution setup on cdn1|2.dmn.tld that uses domain.tld as the origin. No other config required.

ModPagespeedMapRewriteDomain static.dmn.tld domain.tld
ModPagespeedShardDomain static.dmn.tld cdn1.dmn.tld,cdn2.dmn.tld

We use these mod_pagespeed filters:
combine_javascript,remove_comments,collapse_whitespace,outline_javascript

I’d recommend using the ModPagespeedLoadFromFile directive:

ModPagespeedLoadFromFile "http://static.dmn.tld/js/" "/var/www/path/to/htdocs/js/"
ModPagespeedLoadFromFile "http://static.dmn.tld/skin/" "/var/www/path/to/htdocs/skin/"
ModPagespeedLoadFromFile "http://static.dmn.tld/media/" "/var/www/path/to/htdocs/media/"

Few links for the truly lazy who don’t want to search!

First useful boomerang graph

Today we’ve produced our first useful graphs from the 770k boomerang data points we have collected. This is one of the graphs we produced, and I’ll post it here only because it’s the first one I personally produced. At last, after 4 months we’re actually seeing data.

What does it tell us? That page load time is not very uniform. Next step, linear regression comparing page load time against user’s available bandwidth.

Traffic profile by page

I’ve been looking at the profile of our traffic. On our biggest site, the top 50 pages account for more than 50% of our traffic. Across all four sites, our top 100 pages account for 50% of total pageviews.

Getting this data out of Google Analytics was easy. To start with I went to the Content > Site Content > Pages report, switched the view to percentage, and added the percentages for our top 10 then top 50 pages with a calculator.

To get data for all our sites, I used this trick to export the data for all our sites. I had to use the old interface to do this, then Content > Top Content. I combined all the data into a single sheet in LibreOffice Calc, added a row for the site domain and one for the total pageviews on that site. That allowed me to graph percentage of total traffic for our most popular pages.

I graphed our traffic with a log scale, and it looks a lot like the graphs from the Long Tail.

Our pages are around 10K. So a quick calculation suggests that (allowing 100% extra for Varnish overhead) we could serve 50% of our total traffic from 2MiB of cached data, or 80% of our traffic from 12.5MiB (640 pages, 20KiB per page). That puts an interesting spin on the idea of using a varnish cache.

I wonder how our traffic profile compares to that of other sites. I’d imagine some sites are more niche, with less traffic in the “head” and more in the “tail”. In those cases, caching might provide less of a performance boost.

This also got me thinking about benchmarking cache performance. I’ll think about it and write more in a later post.

Update: Another couple of perspectives. We have fewer than 500 pages out of a total 12k which get more than a 100 pageviews per month (3 per day). If we can serve 80% of traffic from 12.5MiB of cached data, in economic terms that’s £7/month!