Data Science Terms Explained | Geckoboard

Data Science Terms Explained

Does complex terminology put you off using data? Here’s a glossary of common data science terms with straightforward definitions, helpful examples, and additional resources if you want to dive deeper.

Data Science Terms Explained


An algorithm is defined as a specified process for solving a problem - often written by a human and performed by a computer.


A broad definition of analytics is the review of data to discover, understand, and communicate meaningful patterns. Or more simply, analytics is the useful insights from raw data.


In general, bias can be defined as an inclination towards a thing, group, or person, often in a way that is inaccurate or unfair. Bias in statistics or data analytics can affect your insights, leading to poor, and potentially, costly business decisions.


A simple definition of correlation is the relationship between two or more variables (or data sets).

Data Processing

Data processing is simply a sequence for collecting raw data and turning it into meaningful information.

Data Sets

A data set is a technical term for a collection of data. Typically, a data set refers to the content of a single table or graph.


A hypothesis is an educated guess (often about the cause of a problem) that hasn’t been confirmed yet.

Margin of Error

The margin of error is the amount the results of a random sampling may differ from the results of surveying the whole.

Multivariate Test

A multivariate test is a process where variations of different elements (i.e. text, video, and images on a website) can be evaluated simultaneously.


An outlier is a piece of data that is distant from the remaining set of data. Think of it as a straggler.


The definition of probability is simply the likelihood that an event will happen.

Qualitative Data

Qualitative data is information that describes or categorizes something - the color of the sky, the smell of perfume, music genres, or coffee bean flavors.

Quantitative Data

Quantitative data is anything that can be measured or counted - monthly revenue, distance of a race and time of the winner, calories in a meal, temperature, or salary.


Regression shows the most likely outcome based on a trend of one or more known data points (predictors).

Sampling error

Sampling error is the variation between the entire population (of data) and the sample. This variation is simply because the sample doesn’t (and can’t) perfectly reflect the whole.

Summary Statistics

Summary statistics (or summary metrics) define a complicated set of data with some simple metrics.


A variable in statistics is something that can be measured, counted, or described. It could be a number or characteristic.

Venn Diagram

A Venn diagram shows the similarities and differences of two or more data sets by using overlapping circles.