# Data Science Terms Explained

Does complex terminology put you off using data? Here’s a glossary of common data science terms with straightforward definitions, helpful examples, and additional resources if you want to dive deeper.

### Algorithm

An algorithm is defined as a specified process for solving a problem - often written by a human and performed by a computer.

### Analytics

A broad definition of analytics is the review of data to discover, understand, and communicate meaningful patterns. Or more simply, analytics is the useful insights from raw data.

### Bias

In general, bias can be defined as an inclination towards a thing, group, or person, often in a way that is inaccurate or unfair. Bias in statistics or data analytics can affect your insights, leading to poor, and potentially, costly business decisions.

### Correlation

A simple definition of correlation is the relationship between two or more variables (or data sets).

### Data Processing

Data processing is simply a sequence for collecting raw data and turning it into meaningful information.

### Data Sets

A data set is a technical term for a collection of data. Typically, a data set refers to the content of a single table or graph.

### Hypothesis

A hypothesis is an educated guess (often about the cause of a problem) that hasn’t been confirmed yet.

### Margin of Error

The margin of error is the amount the results of a random sampling may differ from the results of surveying the whole.

### Multivariate Test

A multivariate test is a process where variations of different elements (i.e. text, video, and images on a website) can be evaluated simultaneously.

### Outlier

An outlier is a piece of data that is distant from the remaining set of data. Think of it as a straggler.

### Probability

The definition of probability is simply the likelihood that an event will happen.

### Qualitative Data

Qualitative data is information that describes or categorizes something - the color of the sky, the smell of perfume, music genres, or coffee bean flavors.

### Quantitative Data

Quantitative data is anything that can be measured or counted - monthly revenue, distance of a race and time of the winner, calories in a meal, temperature, or salary.

### Regression

Regression shows the most likely outcome based on a trend of one or more known data points (predictors).

### Sampling error

Sampling error is the variation between the entire population (of data) and the sample. This variation is simply because the sample doesn’t (and can’t) perfectly reflect the whole.

### Summary Statistics

Summary statistics (or summary metrics) define a complicated set of data with some simple metrics.

### Variable

A variable in statistics is something that can be measured, counted, or described. It could be a number or characteristic.

### Venn Diagram

A Venn diagram shows the similarities and differences of two or more data sets by using overlapping circles.