What is a data set?
A data set is a technical term that simply means a collection of data. Typically, a data set refers to the content of a single table or graph. More specifically, a data set contains a single time series such as number of customer service tickets resolved daily.
This term can be a little confusing since some people use it more generally as a reference to related tables. A more precise term for related tables is data collections (see below).
Data set examples
A data set can be anything from the trend of trial signups over the last month to the geographic location of customers to the value of bitcoin year-over-year. Usually, datasets contain a single time series. For example, you might have a dataset with the number of sales for every day this week or a dataset with monthly revenue churn.
What’s the difference between datasets, databases, and data collections?
Datasets refer to data with a single time series.
Databases are made of data on a particular topic from a single publisher and may contain many datasets. (Some people may use databases more loosely to refer to a group of datasets in one location, even if the datasets are compiled from different sources.)
Data collections are made up of related datasets or databases on a single topic.
Big data and open data
For additional context, when the number of datasets exceeds the capacity of normal data processing applications, it’s called big data.
Datasets that are aggregated and then shared in a public repository refer to open data.