How removing caching improved mobile performance by 25%

A colleague and I were looking at how our flagship product, Klarna Checkout, loads in the browser and contemplating ways to improve the performance. We decided that we wanted to defer some of the work the browser was doing until after the user could start interacting. In the process of changing this, we stumbled across a much bigger performance problem that had been lurking in our code.

First, some background. When a merchant loads Klarna Checkout, we end up creating two iframes — the ‘main’ iframe which handles most of the user input and a ‘fullscreen’ iframe which is used whenever we need a fullscreen modal popup. For most users, the fullscreen iframe isn’t used until later in the process so we figured deferring the loading of it could give us a nice gain.

I wanted to find a way for us to get a reasonable idea of how much of an improvement this would make. I’ve personally become a bit obsessed with the idea that if developers (myself included) can easily see the performance effects of their change then they will incorporate performance concerns into their day-to-day, without needing to do much more than just showing them the data. Unfortunately, definitively tying an improvement to a specific change is often hard in web development because there are so many variables — network speed, device speed, browser efficiency, caching, etc etc. If only I could remove as many of those variables as possible, maybe we’d have something meaningful.

To do this, I needed to have mock responses for dynamic requests and consistent network latency for the static assets. I decided to make a sort of infinite bandwidth, low latency environment using PhantomJS. Although PhantomJS is not a browser used by any of our users, it’s reasonable to expect that improvements seen in PhantomJS should translate to improvements in a normal browser which is ultimately all I wanted to know.

I wanted a script that I could run that would tell me how many milliseconds it takes for the site to be usable by a consumer. To enable that, I created the following setup using Docker:

I created a bootstrap.htmlpage which loads Klarna Checkout in a similar way as production. I then created an NGINX docker image containing a locally compiled branch of code and updated the bootstrap.htmlto point to this container (named nginx.service).

I then wrote the script below for PhantomJS to execute, using onResourceRequested() to rewrite any CDN URLs to instead load from the local NGINX. To calculate the loading time, I took advantage of events we send during the loading phase. I needed only two events:

  • bootstrap — signifies the beginning of the user’s load.
  • ready — signifies that the user can start typing in details.

With all that set up, we were ready to compare our released code with code that deferred the loading of the fullscreen iframe. I ran my benchmark and saw a modest improvement, but the numbers were a bit inconsistent and it was unclear if I was seeing any statistically significant change. However, when my colleague ran it on his machine, he saw a quite dramatic improvement — over 50%! While trying to figure out why we were getting different results, we started echoing out all the files that were being downloaded and noticed that, even though we were deferring the fullscreen, some of the fullscreen-specific assets were still being downloaded. Curious.

if you don’t look around, you’ll miss the hidden treasure or… “Cache! Why did it have to be cache?” (credit: Yatir Keren)

We looked up a bit and saw manifest.appcache in the log of files being downloaded and noticed that the file size was different between us. I had heard that we had a ‘manifest file’ but didn’t really know much about it and figured it was largely optional, so I just said “let’s see what happens if we remove it.” Boom, I start getting the same results as my colleague. What is this application cache manifest thing anyway?

The application cache is used for offline apps. You tell the browser what your app needs to be offline via a manifest, the browser downloads everything in the manifest, and the next time a user hits your page, the browser will first load the cached content before checking over the network if the manifest file has changed. My first question was why we were using it? My next question was what sort of priority does the manifest have and also… how did we not notice all these files being downloaded before?

So, why were we using it? Turns out the application cache was added a while ago to improve the performance of Klarna Checkout for native app integrations. However, it wasn’t actually necessary since for our use case standard HTTP caching techniques already got us what we wanted — cached assets being loaded without going to the network. (Shameless plug — watch my talk on HTTP caching for a refresher.)

Next — how was this not noticed? Surely one of us would have noticed certain files being loaded a bit too opportunistically. My PhantomJS benchmarking tool had shown us that there was a problem and it was time to open up Chrome to understand more.

Turns out that Chrome doesn’t really show you much about offline downloads. You can find which offline apps Chrome has stored by looking at chrome://appcache-internals. However, I still had questions about exactly how and when these files were downloaded and how the downloads affected potentially concurrent requests for the same assets from the HTML itself. This is when my new best friend chrome://net-internalscame in handy.

Since application cache network traffic is missing from the Network tab, chrome://net-internals#events is the only place to see the traffic. Based largely off of an old bug report in Chrome, I discovered that I could:

  1. Close all Chrome tabs to limit noise in the net-internals event stream
  2. Open an incognito window
  3. Go to chrome://net-internals/#capture, hit reset
  4. Open a new tab and go to my site
  5. Look at the network events in chrome://net-internals/#events

Sure enough, there I saw two URL_REQUESTs for every asset that was both in the manifest and part of the initial page download. I also saw a URL_REQUEST for assets which were only in the manifest. This confirmed that the browser was downloading assets for the application cache while also loading the normal page. It also confirmed that we were unknowingly downloading much more content than was necessary.

This was all pretty exciting! Our benchmarking effort had proven itself on the first try! We checked whether the performance impact of our changes was solely due to the application cache removal or if the fullscreen deferral also improved thing. On my machine, the application cache removal improved response time by half while the fullscreen deferral improved it by another 15%. w00t!

When we put the changes into production, they had a dramatic effect on Chrome Mobile especially, cutting load times by 25%. The other browsers also had an improvement, but it was more modest.

So, what did I learn?

  • Testing your assumptions through benchmarking can lead to surprising results and is worth the effort.
  • Never use the application cache (or service workers) unless you actually need offline features. Instead, believe in HTTP cache headers. (Even if you do need offline features, you should read this article first.)
  • chrome://net-internals is an extremely powerful tab that can help answer lots of questions you may have about how browser networking works, at a level a bit deeper than the Network tab allows. You can find some explanations about it here.

I hope to be adding this benchmarking to our automatic testing suite — wouldn’t it be nice to see whether you’ve made things better or worse right in your pull request? This sounds like a great future blog post :)

How about you? Have you been burned by the application cache? How do you benchmark your applications to get an idea of performance before releasing?

Author’s note 1: I’m still not entirely sure why my machine saw such a huge difference with and without the application cache while my colleagues did not. In his case, he sometimes had the awful performance I saw so I’m going to guess that it had something to do with my machine being newer and thus being able to get farther along in the load process before the ‘ready’ event came in.

Author’s note 2: in the process of writing this blog post, people pointed me to phantomas, which looks like something we should be adding to our development process.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.