Rigorous Performance Testing - But, Wait: There’s More (Data)
This is the third and final blog post on how to best evaluate performance on the Web. If you haven’t already read parts one and two, then please do for some vital context:
In short, the Internet has changed substantially over the past few decades, and particularly over the past few years. Infrastructure has evolved, delivery media has evolved, software has evolved, browsers have evolved, and web sites have evolved. These trends yielded new paradigms in development, deployment, delivery, and testing. There are new tools out there designed to address all of these changes and give you great data that truly represents user experience.
So, supposing you’ve read my prior blog posts, you’ve chosen the best performance metrics and you’ve collected a boatload of data with the testing tool of your choice. Now what? How do you interpret those results?
Averages, medians and percentiles — oh my!
Generally speaking, performance results are expressed as either an average or a median. Here’s the problem: average can get skewed horribly based on just a handful of severe outliers. Median expresses only the performance of your “middle of the road” users — but discards the data from your fastest and slowest users. As a service provider, you should care about all of your users — but you should definitely have particular concern for your slowest users.
One possible workaround to the above problems is to use a percentile. Aggregate performance data expressed as a percentile is (loosely) sorted in ascending order, and the first nth percentile figures are lower than the calculated figure. So, for example, if the 95th percentile of a population is 5.3 seconds, then you know that 95% of page loads were completed in 5.3 seconds or less.
- There are still outliers, no matter what percentile you choose (except for 100th, otherwise known as the max)
- All data points except for the designated percentile are essentially discarded. For example, if you request a 95th percentile, you have no idea what load times the 94th percentile or the 96th percentile saw — let alone the 1st, 15th, or 50th.
- The outliers are, by definition, your slowest users. Don’t you care about them the most? Why would you discard them?
But wait: there’s more (...data)!
The bottom line is that, if you went to the effort and expense of collecting hundreds or thousands of data points, then those data points contain hundreds of data permutations — different:
- internet connection types
- wireless connection qualities
- computing power
And, aside from all of the above variables, there is the natural variability inherent to the Internet.
Why should you express all that data as a cheap sound bite — especially a misleading one? There is much more data. Let’s figure out how to use it all.
Enter: the histogram
If we have hundreds or thousands of data points, it is just plain wrong to distill it all down to a single figure. So, we should try a new visualization. The histogram expresses how many users experienced a particular page load time:
- Taller bars mean that more users saw a load time within that interval.
- Conversely, shorter bars mean that fewer users saw a load time within that interval.
- Faster load times are on the left of the histogram, and slower load times are on the right side.
So, the crux of the histogram is this: if there are more, taller bars on the left side of a histogram, then your users had a fast experience.
Histograms really shine if you are comparing multiple data sets. You can simply put both data sets on the same chart. See this example histogram:
The red bars are more concentrated on the left, and the blue bars are more concentrated on the right. It is obvious that, in this example, red users had a faster experience overall.
High octane: the cumulative distribution function (CDF)
Everyone loves histograms. They represent all the data. They are easy for everyone to consume. There are no misleading statistical “cuts.” But, despite all these great points, histograms have a shortcoming: finite granularity.
Every histogram has discrete buckets and it is complicated to find a good representation. Too many buckets and you end up with just a long tail that is hard to understand. Too few buckets, and the load times get lumped together and become meaningless.
There is a (much) lesser-known chart that represents all performance data with incredibly high granularity (roughly, as sharp as your monitor!): the cumulative distribution function (CDF). The CDF expresses the percentage of page loads completed after a given amount of elapsed time:
So, in this example CDF, approximately 20% of page loads are done in 5 seconds or less. Slightly less than 70% of page loads are done in 10 seconds or less.
As with histograms, better CDFs have a curve to the left. If you plot two separate data sets on the same chart, it is easy to see which one is truly faster:
In this comparative CDF, the speed difference is expressed as the gap between the two lines. You could summarize a part of this chart in one statement by saying something like “when 80% of blue users are done loading the page, over 90% of read users are done loading the page.” More importantly, since the red line is above and to the left of the blue line for the entire length, we know that red is faster for all users — not merely a subset of those users.
Tie it all together...
Performance on the Internet is a black box for many, and I hope that my blog series has shed light on some dark corners. Here are some conclusions to chew on until next time:
- The Internet is a jungle.
- Methodology matters more than results.
- Pick your tool wisely.
- Irrelevant metrics mislead.
- Performance should never be expressed as a single number.
- Powerful visualizations always trump aggregate figures.
- Spreadsheets are amazing. Use them.
Positive or negative, I always want feedback. Please email me at email@example.com.
Thanks for reading!