I’ve been looking at the profile of our traffic. On our biggest site, the top 50 pages account for more than 50% of our traffic. Across all four sites, our top 100 pages account for 50% of total pageviews.
Getting this data out of Google Analytics was easy. To start with I went to the Content > Site Content > Pages report, switched the view to percentage, and added the percentages for our top 10 then top 50 pages with a calculator.
To get data for all our sites, I used this trick to export the data for all our sites. I had to use the old interface to do this, then Content > Top Content. I combined all the data into a single sheet in LibreOffice Calc, added a row for the site domain and one for the total pageviews on that site. That allowed me to graph percentage of total traffic for our most popular pages.
I graphed our traffic with a log scale, and it looks a lot like the graphs from the Long Tail.
Our pages are around 10K. So a quick calculation suggests that (allowing 100% extra for Varnish overhead) we could serve 50% of our total traffic from 2MiB of cached data, or 80% of our traffic from 12.5MiB (640 pages, 20KiB per page). That puts an interesting spin on the idea of using a varnish cache.
I wonder how our traffic profile compares to that of other sites. I’d imagine some sites are more niche, with less traffic in the “head” and more in the “tail”. In those cases, caching might provide less of a performance boost.
This also got me thinking about benchmarking cache performance. I’ll think about it and write more in a later post.
Update: Another couple of perspectives. We have fewer than 500 pages out of a total 12k which get more than a 100 pageviews per month (3 per day). If we can serve 80% of traffic from 12.5MiB of cached data, in economic terms that’s £7/month!