What is correlation?
A simple definition of correlation is the relationship between two or more variables (or data sets). This relationship is defined more specifically by its strength and direction.
What is strong correlation in statistics?
A strong correlation (sometimes referred to as high correlation) is when two groups of data are very closely related. The inverse is also true - weak (or low) correlation means the two groups of data are only somewhat related.
For example, increasing ice cream sales have a strong (high) correlation with rising temperatures. The hotter it is, the more ice cream people eat.
What is positive and negative correlation?
The direction of related variables can be either positive or negative. Positive correlation means both values increase together. Negative correlation means one value increases while the other value decreases.
Continuing with the ice cream example, higher ice cream sales have a strong positive correlation with warmer temperatures because as the weather gets hotter (increasing value) more ice cream is sold (increasing value).
One negative correlation might be that less hot chocolate is sold (decreasing value) when the temperature gets warmer (increasing value). Or consider another example of negative correlation - the more someone pays on their mortgage (increasing value), the less they owe (decreasing value).
(The more technical and complex term for describing both the strength and direction of a correlation is called correlation coefficient).)
Correlation vs causation: what’s the difference?
If you’ve heard the mantra “Correlation does not imply causation” you might have wondered - what’s the difference between correlation and causation and why does it matter?
As described above, correlation indicates a relationship between two variables and is particularly helpful for making predictions. For example, if we know that SAT (Scholastic Assessment Test) scores have a strong positive correlation with a student’s grade point average (GPA) in college, we can assume both SAT scores and GPA will continue having a strong positive correlation in the future. So based on a student’s SAT scores in high school, we can predict what their GPA might be in college.
In contrast, causation refers to cause and effect - where one variable causes the other variable. To use an obvious example, we might notice that global temperatures have steadily risen over the past 150 years and the number of pirates has declined at a comparable rate (negative correlation). No one would reasonably claim that the reduction in pirates caused global warming or that more pirates would reverse it. But if we look at other contributing factors, we see the cause of both is industrialization.
It’s easy to assume that because two events happen at the same time (correlate), one must cause the other. This data fallacy is called False Causality. Remember that correlation alone does not prove a cause and effect relationship.