Bias

What is bias in data analytics?

In general, bias can be defined as an inclination towards a thing, group, or person, often in a way that is inaccurate or unfair. Bias in statistics can affect your insights, leading to poor, and potentially, costly business decisions.

While there are many types of bias, here are some common biases in data analysis you’ll want to look out for.

  • Anchoring Bias: Relying heavily on the first piece of data encountered to make a decision (treating subsequent information as less significant).
  • Publication Bias: How interesting a research finding is, affects how likely it is to be published, distorting our impression of reality.
  • Sampling Bias: Drawing conclusions from a set of data that isn’t representative of the population you’re trying to understand. This is a type of Selection Bias.
  • Confirmation Bias: Looking for information that confirms what you already think or believe. This is similar to another common data fallacy - Cherry Picking.
  • Survivorship Bias: Drawing conclusions from an incomplete set of data, because that data has ‘survived’ some selection criteria.
  • Funding Bias (sometimes called Sponsorship Bias): Favoring the interests of the person or group funding the research or analysis. Data analytics might be selected or ignored to validate a predetermined conclusion.
  • Observation Bias (also known as the Hawthorne Effect): When the act of monitoring someone can affect that person’s behavior.

Biased data examples

Understanding the biases that can impact data analysis is a great first step to recognizing them when they pop up in your own data analysis. Here are some common examples of statistical bias you may notice as you work with data.

In salary negotiations, applicants and hiring managers alike might rely on the first salary rate mentioned as the basis for a reasonable range. This anchoring ignores other rates that might also be “reasonable” based on location, experience, job description, etc.

A customer success manager might want to understand why churn has been increasing month over month and has a hunch it’s because of a product feature that adds frustration to the user experience. When reviewing exit surveys, the customer success manager notices this feature is mentioned many times, but ignores that average ticket response rate is mentioned equally as often. This example of confirmation bias shows how our preconceived understanding of the data distorts reality.

A marketing team wants to know what channel their audience prefers to consume content. They conduct a Twitter poll since that’s their most responsive channel, not realizing that a significant portion of their audience (on Quora and LinkedIn) never saw the poll. While this example of sampling bias might seem obvious, it’s easy to focus on gathering data quickly and miss this common tendency.

Additional resources to learn more about bias in data analysis