What is an outlier?

An outlier is defined as a piece of data that is distant from the remaining set of data. Think of it as a straggler. There are a few reasons why you may encounter outliers in your data. They might be caused by a measurement error, they might be evidence of an abnormal distribution of data, or perhaps they indicate a smaller subset of the data.

An outlier may be found using different statistical methods including standard deviation, Turkey’s fences, Peirce’s criterion, and other advanced methods. Many of these can be performed using formulas in a spreadsheet or online calculators (see Additional Resources below for links).

Why is it important to identify outliers in statistics?

It’s important to be aware of outliers in your data because they can skew your analysis, leading to inaccurate or misleading reporting and perhaps poor decisions. One of the most pronounced distortions is when an outlier throws off the mean (average) of the data.

For example, if your customer service reply times for the past 10 tickets are 22, 18, 21, 27, 26, 23, 25, 134, 22, and 23 minutes, your average reply time will likely report 34 minutes. By removing the outlier (134), the average reply time drops to 23 minutes. This outlier causing the 11-minute difference in average might be from a reporting error or perhaps an abnormal circumstance preventing a member of the customer service team from responding within the average timeframe.

It’s worth noting that outliers shouldn’t automatically be discarded. Taking a second look at the data may help uncover a deeper or different issue causing the outlier.

Additional resources for learning more about outliers