Archive for April, 2010

The significance of significant figures

At this point in time, the use of significant figures is almost exclusively confined to scientific disciplines. It’s a pity, because these days so many businesses are relying in analytics to provide the basis for some serious decisions and the figures they have are rarely as accurate as they might appear.

The significant figures (or significant digits) from any number are those that contribute meaningfully once uncertainty is taken into account. At DrJess we like to get stuck in and really investigate uncertainty, but the first step is to realise it’s there.

Say Google Analytics says drjess.com/blog site has seen 1621483 unique viewers in the last month. A touch on the generous side, but in thought experiments we are allowed to be optimistic.  Neither the Goog nor almost any other free web analytics suites have much to say on the subject of accuracy, but presenting in any halfway scientific context a number with that many significant figures implies a very high confidence in its accuracy, percentage-wise.

Web analytics pros know that Google Analytics and its tag-based friends tend to under count traffic by rather more than 10%. More on the reasons why in another post, but let’s be generous and assume a maximum 10% error for the sake of a simple thought experiment. Applying that to my imaginary unique user count of somewhere between  1621483 and 1783631.3.

If I had to pick a single number to represent that range, it would be 1700000, not 1621483. Not only is it probably more accurate, it gives anyone looking at it a much better indication of what the uncertainty is.

Now, if I was going to base some serious and potentially very expensive business decisions based on analytics numbers, I’d want to know just how good my figures were. So would most people who want analytics done, if it occurred to them to wonder if precision might be lacking in the first place. With the over-precise numbers spat out by most analytics programs, the question is rarely posed unless inconsistencies crop up.

I’m not blaming Google. The simple truth is that given the choice between a web analytics program that spits out 1700000 and one that spits out 1621483, the overwhelming majority of users will perceive the latter as more accurate.

Legend states that the first surveyor of Mount Everest measured the height at 29000ft exactly. It was reported as 29002ft, supposedly because the surveyor didn’t think anyone would believe his rounder figure had a reasonable degree of precision.

That was back in 1856. Unfortunately it still seems we’re having this kind of problem. The only solution is to state clearly what the uncertainty is. Bring it out of hiding and discuss error, accuracy, and precision before making decisions.

Tags: , ,