My stats guru colleague Dr Andrew Pratley and I are on the move to tackle Quantifornication, the plucking of numbers out of thin air. Here is the first in a series co-written together.
We’re far better at identifying good and bad writing than we are identifying good and bad numbers. The premise of this idea almost doesn’t seem logical. How can a language like English, with all of its oddities, be easier to separate the good from the bad?
Our education in English has a substantial amount of time devoted to comparing different types of writing and methods to improve readability and comprehension. We accept and understand there is no ‘right way’ and don’t become fixated on this idea. Almost all of our number education focuses on calculations to get the right answer. We rarely discuss where the numbers come from, or their validity.
When we talk about numbers we tend to assume we’re discussing clear and agreeable ideas such as the temperature in a room (19C), our height without shoes on (1,820mm) or the number of customers that visit the store in a day (65). These are fairly easy to check. In the same way, as each of these three scenarios generates a specific value, we can generate specific values for the predicted maximum temperature tomorrow, the predicted height of a child when they turn 18 or the number of customers we think will make a purchase tomorrow.
While we inherently know the “weatherman” is not always right, and that the height of a child and the number of buyers are only predictions, it is how we treat this information that is important. In some instances, we will treat the information with caution. Some of us will take something warm in case the temperature is a few degrees cooler than predicted. And for something like the height of a child, the timeline for realisation of that prediction is so far out, we give it little consideration – like many have with climate change. And for the number of customers we think will make a purchase tomorrow, we make decisions on staffing and inventory.
This last example of customers is an example of how we tend to intermix reliable numbers (the measured number of customers in the store) with what are normally unreliable numbers (the estimated number of customers purchasing). While temperature estimates are based on sophisticated modelling with estimated margins of error, the estimate of customer purchases is usually an educated guess unless you work for a company that has invested in analytics AND run the numbers with statistical validity.
The reality is that many of the important decisions we make every day are based on guesstimates that we like to believe are more accurate than they are. We think they are “point estimates” in statistician-speak.
Statisticians are always talking about point estimates. Our conversations with each other involve qualifiers like ‘do we have a representative sample’, ‘have we accounted for a particular bias’. Statisticians know the danger of playing the guessing game when making important decisions about investing, resourcing and prioritising projects.
The old saying – ‘there are lies, damn lies and statistics’ should really be rephrased to be – ‘there are lies, damned lies and guesstimates’.
Now we’re set up for next week where we will write about how to move from guestimates to point estimates for improved decision making.
Stay safe and adapt – with better measurement!