Posts Tagged uncertainty

Measures and metrics

Traffic data is difficult to collect. Tag-based tracking systems like Google Analytics inevitably miss some visitors, and server log analysis isn’t perfectly accurate either. Let’s assume your tracking is Java-based, and perfectly optimised. It will still miss the 10 to 15% of users that routinely browse the wonders of the internet without Java enabled. However, that doesn’t mean you’ve got a total visit count with a 15% error on it.

What you have is a figure that is usually labelled ‘Total Visitors’, but is in fact not that at all. More correctly, it’s a lower bound on total visitors. It’s a measure of total visitor numbers, sure, but it is not the total number of visitors. When the measure goes up, you know visitors have gone up (assuming the fraction of Java-less users remains the same, which is not unreasonable). When the measure goes down, it’s fair to say that traffic has dropped.

The lower bound on total visitors is a very useful thing to know, but it’s also useful to acknowledge that it is not a true and perfect total visitor count. For a start, presenting the real state of affairs to potential investors or advertisers lets you a use bigger best estimate traffic figure than the one presented by your Java-based tracking system. Your website looks more popular. In fact, it probably is more popular than you think if all you are relying on right now is Java-based tracking like that used by Google Analytics.

We believe you should always try and get an idea of how accurate all your figures are. Knowing that protects you from making poor decisions based on poor data and gives you the confidence to move forward from a fully justifiable position, but in cases like the one discussed above, acknowledging inaccuracy in your stats will actually do you a pretty big favour.

Tags: , , ,

The significance of significant figures

At this point in time, the use of significant figures is almost exclusively confined to scientific disciplines. It’s a pity, because these days so many businesses are relying in analytics to provide the basis for some serious decisions and the figures they have are rarely as accurate as they might appear.

The significant figures (or significant digits) from any number are those that contribute meaningfully once uncertainty is taken into account. At DrJess we like to get stuck in and really investigate uncertainty, but the first step is to realise it’s there.

Say Google Analytics says drjess.com/blog site has seen 1621483 unique viewers in the last month. A touch on the generous side, but in thought experiments we are allowed to be optimistic.  Neither the Goog nor almost any other free web analytics suites have much to say on the subject of accuracy, but presenting in any halfway scientific context a number with that many significant figures implies a very high confidence in its accuracy, percentage-wise.

Web analytics pros know that Google Analytics and its tag-based friends tend to under count traffic by rather more than 10%. More on the reasons why in another post, but let’s be generous and assume a maximum 10% error for the sake of a simple thought experiment. Applying that to my imaginary unique user count of somewhere between  1621483 and 1783631.3.

If I had to pick a single number to represent that range, it would be 1700000, not 1621483. Not only is it probably more accurate, it gives anyone looking at it a much better indication of what the uncertainty is.

Now, if I was going to base some serious and potentially very expensive business decisions based on analytics numbers, I’d want to know just how good my figures were. So would most people who want analytics done, if it occurred to them to wonder if precision might be lacking in the first place. With the over-precise numbers spat out by most analytics programs, the question is rarely posed unless inconsistencies crop up.

I’m not blaming Google. The simple truth is that given the choice between a web analytics program that spits out 1700000 and one that spits out 1621483, the overwhelming majority of users will perceive the latter as more accurate.

Legend states that the first surveyor of Mount Everest measured the height at 29000ft exactly. It was reported as 29002ft, supposedly because the surveyor didn’t think anyone would believe his rounder figure had a reasonable degree of precision.

That was back in 1856. Unfortunately it still seems we’re having this kind of problem. The only solution is to state clearly what the uncertainty is. Bring it out of hiding and discuss error, accuracy, and precision before making decisions.

Tags: , ,