I’ve moved on from the world of Magento performance optimisation and am now working with meteor on ultra fast building of mobile and web apps with our new agency, superlumen. Check out the web site if you’re a startup looking for an interim CTO, or you have a business idea you want to validate either with a landing page campaign or an MVP.
Yet another incident with Linode today. We’re all about getting the most bang for buck in hosting, but apparently choosing Linode was a bad decision. We’ve had more outages with Linode in the last year than across all our other machines ever. Time to find another supplier for US nodes. 😦
It doesn’t help that at some point they halved their pricing and didn’t bother letting us know, so we spent several months with half as much RAM as we were paying for!
While investigating a serious performance issue over the last 24 hours I discovered an issue whereby some of our CSS files were being combined but were not being rewritten by mod_pagespeed. After much hair pulling, far too many hours spent in front of
less and friends, I finally tracked down the solution. I added a
ModPagespeedLoadFromFile directive for every CDN domain and now all our resources are being properly rewritten.
I think, but I’m not sure, that mod_pagespeed tries to retrieve from the same domain as the request arrives on. So if you’re rewriting resources from
cdn.xmpl.tld, when the request arrives at mod_pagespeed with a Host header of
cdn.xmpl.tld, mod_pagespeed tries to look up every resource on the
cdn.xmpl.tld hostname, instead of doing a reverse lookup through the
I’ve spent a couple of days researching options to graph performance metrics. We’re trying to get all our metrics from all our services into a single interface. Here’s the pieces I found which seemed to fit best.
- InfluxDB + Grafana – Appears to be the best in class for metric storage + display
- Graphite (+ Carbon) – Looks to be more complex to setup and a little less powerful than InfluxDB
- StatsD – Pre-process high volume metrics before feeding to influx / carbon
Next step is to play with Influx + Grafana, either their hosted service, or our own, and see what we can push into it. More to follow…
We switched to using StartSSL some years ago for our SSL certificates. Their pricing model is attractive, pay $60 for personal identity validation and issue as many certificates in whatever configuration you like. Pay another $140 for EV validation. It all went great in the beginning. However, in recent interactions they’ve become increasingly antagonistic.
At first they refused to issue a replacement for an expired certificate, saying that we needed organisation validation (an extra $60) first. We were able to issue the same certificate 2 years ago without this extra $60. Then they became downright obnoxious by email, deciding they were now unhappy to discuss our account with me.
Woke up to mdadm raid failure alerts. Looks like a spinning disk failed overnight. Opened ticket, now waiting for #hetzner to replace…
Posted via Composer
We’re using mod_rpaf and trying to use it for SSL offloading so we can cache all HTTPS requests with Varnish. This has worked well in testing, but on production, we’re seeing intermittent port errors. So making 1’000 requests to the same URL over HTTP, we get several hundred requests showing in the apache logs as port 443.
This causes all sorts of unexpected side effects. Particularly with mod_pagespeed which serves 404 when the port has been set incorrectly. Nightmare.
Bottom line, we’ve take this out of our architecture until we can find a solution. The hardest part is that we can’t replicate the issue on staging. I’ve opened an issue.
We deployed SPDY on our production sites on Monday. It’s hard to tell precisely, but it looks like our pages are being removed from Google search results.
Some background. We used the
Alternate-Protocol: 443:npn-spdy/2. Our http pages are
INDEX,FOLLOW and our https / SPDY pages are
Could it be that Googlebot takes follows the Alternate-Protocol header, loads the https version of the page, and then doesn’t index it because of the noindex tags?
Can’t find anything in Google about this issue. Anyone else have experience? I’ll try to post back here if we find anything more definitive than pure speculation…
I couldn’t find a definitive answer to this question, so I’m posting this in the hope of saving others the search time. If you have information to the contrary, or if the situation changes, please let me know in the comments and I’ll update this post.
This raises the question, how do we deploy SPDY for Firefox users? Do we redirect all traffic to SSL anyway? Redirect only Firefox browsers that we think support SPDY? Only use SPDY for Chrome users? I’ll post more once we make a decision…
Thanks to the awesome folks at SOASTA we’re now using their mPulse system instead of our own boomerang install. This gives us 2 major wins. First, we’re including the tracking code in the non-blocking asynchronous iframe method, which gives the best possible performance at this point. Second, we can actually see the data. Previously, we just weren’t getting visibility into our boomerang data. We had the data, but weren’t using it, which was a total waste.
Looking at our stats today, mPulse tracks the median page load time. I was looking at the data and thinking, I wonder what it looks like per user. For example, I wonder if users with faster connections typically hit more pages. If they do, that means our median average user load time is actually lower than our median page load time.
Take two users, Alice and Bob. Alice is on her desktop in London with a 100Mbps line. She visits 8 pages. The average load time for her 8 pageviews is 1.2s. Bob is on his iPad over 3G in Alabama. (We’re in the UK, so London is closer!) Bob visits 4 pages. The average load time for the 4 pages is 2.3s. Now our arithmetic mean is somewhere in between the two, but our median, in this case, is one of Alice’s pageviews.
What would be really interesting, is to group pageviews by users. To count up all the Alices and Bobs, and then calculate the median (and 95th, 99th percentile) on their averages. That actually tells me, 50% of our users saw a page load of <1.4s, 95% <8s.
Having said all that, the data might actually look very similar to what we’re currently seeing. I’ll try to dig out some of our archived boomerang data, do some analysis, and post an update once I have more info.