We’re constantly bombarded with “data-backed” claims. Whether it’s your colleague claiming a 100% revenue increase
from their project or a detergent claiming to make your sheets 10x whiter. The claims are data-backed, so they must
be true, right? WRONG!
While data is quantifiable, it’s only quantifying what a human asks it to. This opens the door to human error,
misinterpretation and bias. So, how do you channel your inner data-cynic and put these data-backed claims to the
This checklist of questions is designed to do just that - helping you see the truth within the data. Whether it
comes from a teammate, a client, vendor, or perhaps a political figure, it’s critical to fact check the numbers.
Quick note: As you ask these questions, be mindful of your own preconceptions and
biases. We are more likely to dismiss things that don’t conform to our existing view of the world.
Is the headline misleading? Has the data been over-simplified, over-inflated, or otherwise dramatized to become a
Has a generalization been made that doesn’t accurately reflect the data?
What’s the small print? Headlines often omit key details.
Real life example: A 2013 Times article claimed “More people have cell phones than toilets.” However, by looking at the actual data
(see here and here), we find that more people have access to mobile phones than toilets.
‘Access’ is a tricky word because it could mean that dozens of people share a single mobile phone, but the Times
headline makes it sound like the number of cell phones exceeds the number of toilets.
Just because someone quotes you a statistic or shows you a graph, it doesn’t mean it’s relevant to the point they
are trying to make.
Sanity check the claim. Do some quick back-of-the-envelope math or use your own prior knowledge. Are there other,
more plausible explanations for the effect? Could they have made a mistake?
Can you verify the claim in any other way? Perhaps you have access to other data or can pull a report from another
The less plausible the claim, the more heavily you’ll want to scrutinize everything else.
Real life example: You’re a customer support rep and your boss claims that “our best customer
support rep can resolve 800 tickets via phone a day.” Let’s do some quick math to see if this is plausible.
5 seconds (answer the phone and get the customer’s name)
5 seconds (pull up customer’s account and ask what the problem is)
10 seconds (customer explains problem)
30 seconds (verify the problem or find the source)
40 seconds (fix the problem)
= 90 seconds per ticket or 40 tickets per hour
Even if every call were this efficient and no breaks were taken during eight hours of solid calls, the customer
support rep would only resolve 320 tickets (via phone) a day - nowhere near the 800 tickets that were claimed.
Data is all about ‘compared to what?’ Last week? Last year? Competitor(s)? Revenue?
Real life example: Several years ago Colgate ran an advertising campaign claiming that “80% of
dentists recommend Colgate.” The implied comparison is that dentists recommend Colgate over and above other brands.
However, the Advertising Standards Authority discovered that in the survey, dentists could recommend more
than one toothpaste. In fact, another competitor was recommended almost as often as Colgate was.
Are they an expert?
What is their agenda? Combined with the plausibility of the claim, this will affect how heavily you’ll need to
scrutinize the data.
Where did the data come from in the first place?
Real life example: In 1998, a research paper published in The Lancet claimed there was a
link between certain vaccines and Autism. Several subsequent studies by independent organizations showed the author
of the paper, Andrew Wakefield, manipulated the evidence to
create the appearance of a link in his research.
Although he was a gastroenterologist and medical researcher, he wasn’t an expert in toxicology, genetics,
neurology, or other disciplines necessary to be an expert on autism. Additionally, he failed to disclose a conflict
of interest as he received significant money to prove the vaccine was dangerous.
In 2010, the “utterly false” article
was fully retracted by the editor-in-chief, Richard Horton. (Read more here and here
about the relevant research findings.)
How did they arrive at their conclusion/claim?
Often it’s not that easy to gather the exact data you need/want. What was their methodology? Have any
approximations been made? Were these done sensibly?
Is there too much extrapolation? Were best practices followed (such as significance tests, sampling biases avoided, etc.)?
Example: Suppose you want to know how long it takes a cup of coffee (at 140 degrees Fahrenheit)
to cool to room temperature. After observing for three minutes, you find the coffee cools by 5 degrees every
If you then extrapolate that data (extending the trend of 5 degrees cooler per minute), you could end up with the
ridiculous conclusion that after 30 minutes, the coffee would freeze.
This extrapolation fails to consider physical limits (coffee cannot become colder than room temperature) and that
the rate of cooling slows as it gets closer to room temperature.
Was their sample representative of the whole?
Has the data been ‘cherry picked’ (i.e. only using the information that they want)?
Do you have other data that would help put the claim in context?
Real life example: Global warming is an often debated topic where both ‘sides’ have trends to
back their claims. This is achieved by cherry picking only the data that supports their position while omitting the rest.
The graphs below from Skeptical
Science visualize cherry-picked global temperatures (trends verified by 18 scientific associations).
A companion to the cherry-picking bias is selective windowing. This occurs when the information you have access
to is unrepresentative of the whole.
In addition to cherry picking, other tactics might be employed. For example, the line chart axis might be
cropped or a misleading average might be shown.
Real life example: In 2012, Fox Business showed a chart visualizing the impact if Bush tax
were to expire. The top tax rate would change from 35% to 39.6%. However, the axis was cropped - beginning at
instead of 0% which made the tax increase appear larger than it actually was.
Curiosity is bad for cats, but good for stats. Curiosity is a cardinal virtue because it encourages us to
a little harder to understand what we are being told, and to enjoy the surprises along the way.
Want to learn more about data and how to evaluate it? Check out our basic data analysis guide, along with these books and resources: