Sampling error

What is sampling error?

Sampling error is the variation between the entire population (of data) and the sample. This variation is simply because the sample doesn’t (and can’t) perfectly reflect the whole.

The name can be confusing because ‘error’ is typically understood as ‘mistake.’ However, in data science and statistics, sampling error is defined as the difference between the subset (sample) and the whole.

How to reduce sampling error

The only way to completely eliminate sampling error is to test the entire population. Since this is often not feasible (e.g. polling the entire U.S population, measuring the efficiency of all flights worldwide, etc.), sampling error can be reduced by enlarging the sample size.

You can also calculate sampling error by using a specific sampling model. If you want to dive even deeper, it may be helpful to understand standard deviation.

What is the difference between sampling error and non-sampling error?

The term non-sampling error is more of a catch-all for mistakes that might be made when analyzing data (sampled or whole) or designing/collecting/reporting a sample. Examples of non-sampling errors include bias, inconsistent or missing data, measurement errors, poor sampling or questionnaire design, nonresponse, mistake in recording data, etc.

While sampling error is the inherent variation from the whole, non-sampling error refers to any extrinsic variation or mistake that distorts the perception of the whole.

Additional resources to learn more about sampling error