Recommended Books on Data

Want to learn more about data? There’s over 200,000 results when you search for “data books” on Amazon. We’ve separated the wheat from the chaff, and reviewed a handful of our favorite books that will help improve your data skills.

How Not to Be Wrong – Jordan Ellenberg

A really entertaining popular science book. And you don’t have to be a data geek to enjoy it!

The book showcases the role maths plays in everyday decision making and how a basic grasp of it enables us to make smarter choices. It’s jam packed with interesting anecdotes and phenomena. One of our particular favourites explains how a group of students managed to make a fortune by gaming the Massachusetts State lottery!

While the book touches on complex mathematics, it does so in an accessible way - and makes you feel smarter for reading it. There’s an excellent explanation of Survivorship Bias in the first few pages that you can read here.

Highly recommended!

The Truthful Art: Data, Charts, and Maps for Communication – Alberto Cairo

One review on the cover describes The Truthful Art as part manifesto and part manual, which is a great characterisation of the book. It’s particularly aimed at people who need to communicate data.

The first part highlights many of the tricks and traps to be wary of and explores why they occur. Cairo urges us to think more like journalists and to always strive for the truth.

The purpose of infographics and data visualization is to enlighten people – not to entertain them, not to sell them products, services, or ideas, but to inform them.

Alberto Cairo, The Truthful Art: Data, Charts, and Maps for Communication

The second part starts with a chapter on the basic principles of visualization before launching into detailed practical advice: Which projection to use for maps, the importance of confidence intervals, how to choose appropriate bucket sizes, when to use a logarithmic scale, and more.

We haven’t found a better book when it comes to detail. Each chapter also ends with a useful list of further reading so you can go on to learn more.

At various stages, the book touches on some of the fallacies we’ve covered e.g. Simpson’s Paradox and Regression Toward the Mean, but it’s the manifesto and in-depth visualization advice that really make it shine.

How to Lie with Statistics - Darrell Huff

First published in 1954, How to Lie with Statistics has become a classic. It’s referred to by almost every other book on the subject! Accessible and short, it takes only a couple of hours to read. By the time you’ve finished, you’ll have a great grounding in the most common ways statistics can mislead us.

It’s very much a product of the era in which it was written so the language and examples are a little dated in places, but this doesn’t detract from the message.

The final chapter gives a great summary of practical advice for detecting and avoiding being lied to. There’s still no quicker book to read to get up to speed with the basics.

Bad Science - Ben Goldacre

Another fantastic, popular science book everyone would benefit from reading. Ben Goldacre is the leading figure in the fight against scientific ineptitude.

As a doctor, he focusses on how poor practices affect lives. He struggles to hide his frustration and disdain, but his passion shines through and he’s incredibly engaging.

Bad Science takes particular aim at the press for misreporting studies, at those who use pseudo-science and peddle health fads, and the cynical practices employed by the big pharmaceuticals, as well as the general incompetence of some scientists. It’s eye opening, frustrating, terrifying, and humorous all at the same time.

Ben Goldacre’s particularly outspoken on the problem of Publication Bias. You can get a good flavour of him and the book from his excellent TED talk on the subject.

If you have a relative or friend who pays too much attention to medical headlines in the press, you should definitely hand them a copy!

Statistics Done Wrong - Alex Reinhart

Primarily aimed at scientists, but also highly relevant to anyone who works with data. There aren’t many equations or formulae, rather it goes into greater depth on the common statistical mistakes than most of the other books on this list.

In its own words, ‘it explains how to think about p values, significance, insignificance, confidence intervals, and regression’.

By the time you’ve finished, you’ll be able to spot a dodgy A/B test from a mile off! Since it’s geared towards a more academic audience, it’s full of references to scientific studies on the subject.

In the final chapter, it explores how the scientific community in general can avoid doing statistics wrong. As well as improving the quality of your own research, it emphasizes the importance of scrutinizing other studies properly when the author’s ‘grasp of statistics is entirely unknown’ to you. Something to remember next time you’re confronted with spurious data in a meeting!

Naked Statistics - Charles Wheelen

An excellent introduction to statistics. It covers the basics of probability, correlation, central limits, regression, and more. And all the principles are related back to interesting real life examples that are easy to relate to.

While it doesn’t shy away from formulae in a few places, don’t let that put you off – it’s incredibly accessible and they’re not central to the message.

Again, it covers many of the fallacies we’ve highlighted as well as a few others like the Prosecutor’s Fallacy and Recall Bias. It also talks through some of the difficulties in consolidating trends down to a single statistic. How do you reliably report on the performance of a school or a baseball player? By the time you’ve finished, you’ll also have learned why Hollywood always seems to be reporting new records for the highest grossing films.

If you want a basic introduction to statistics, this is the book to read. However, the examples make it interesting for everyone.