Pearson Correlation Coefficient (2024)

The Pearson correlation coefficient (also known as the “product-moment correlation coefficient”) is a measure of the linear association between two variablesXandY.It has a value between -1 and 1 where:

  • -1 indicates a perfectly negative linear correlation between two variables
  • 0 indicates no linear correlation between two variables
  • 1 indicates a perfectly positive linear correlation between two variables

The Formula to Find the Pearson Correlation Coefficient

The formula to find the Pearson correlation coefficient, denoted asr, for a sample of data is (via Wikipedia):

Pearson Correlation Coefficient (1)

You will likely never have to compute this formula by hand since you can use software to do this for you, but it’s helpful to have an understanding of what exactly this formula is doing by walking through an example.

Suppose we have the following dataset:

Pearson Correlation Coefficient (2)

If we plotted these (X, Y) pairs on a scatterplot, it would look like this:

Just from looking at this scatterplot we can tell that there is a positive association between variables X and Y: when X increases, Y tends to increase as well.But to quantify exactlyhowpositively associated these two variables are, we need to find the Pearson correlation coefficient.

Let’s focus on just the numerator of the formula:

Pearson Correlation Coefficient (4)

For each (X, Y) pair in our dataset, we need to find the difference between the x value and the mean x value, the difference between the y value and the mean y value, then multiply these two numbers together.

For example, our first (X, Y) pair is (2, 2). The mean x value in this dataset is 5 and the mean y value in this dataset is 7. So, the difference between the x value in this pair and the mean x value is 2 – 5 = -3. The difference between the y value in this pair and the mean y value is 2 – 7 = -5. Then, when we multiply these two numbers together we get -3 * -5 = 15.

Here’s a visual look at what we just did:

Next, we just need to do this for every single pair:

The last step to get the numerator of the formula is to simply add up all of these values:

15 + 3 +3 + 15 = 36

Next, the denominator of the formula tells us to find the sum of all the squared differences for both x and y, then multiply these two numbers together, then take the square root:

Pearson Correlation Coefficient (9)

So, first we’ll find the sum of the squared differences for both x and y:

Then we’ll multiply these two numbers together: 20 * 68 = 1,360.

Lastly, we’ll take the square root:√1,360 = 36.88

So, we found the numerator of the formula to be 36 and the denominator to be 36.88. This means that our Pearson correlation coefficient is r = 36 / 36.88 = 0.976

This number is close to 1, which indicates that there is a strong positive linear relationship between our variablesXandY. This confirms the relationship that we saw in the scatterplot.

Visualizing Correlations

Recall that a Pearson correlation coefficient tells us the type of linearrelationship (positive, negative, none) between two variables as well as the strength of that relationship (weak, moderate, strong).

When we make a scatterplot of two variables, we canseethe actual relationship between two variables. Here are the many different types of linear relationships we might see:

Strong, positive relationship: As the variable on the x-axis increases, the variable on the y-axis increases as well. The dots are packed together tightly, which indicates a strong relationship.

Pearson correlation coefficient: 0.94

Weak, positive relationship:As the variable on the x-axis increases, the variable on the y-axis increases as well. The dots are fairly spread out, which indicates a weak relationship.

Pearson correlation coefficient: 0.44

No relationship:There is no clear relationship (positive or negative) between the variables.

Pearson correlation coefficient: 0.03

Strong, negative relationship:As the variable on the x-axis increases, the variable on the y-axis decreases. The dots are packed tightly together, which indicates a strong relationship.

Pearson correlation coefficient: -0.87

Weak, negative relationship:As the variable on the x-axis increases, the variable on the y-axis decreases. The dots are fairly spread out, which indicates a weak relationship.

Pearson correlation coefficient: –0.46

Testing for Significance of a Pearson Correlation Coefficient

When we find the Pearson correlation coefficient for a set of data, we’re often working with asampleof data that comes from a largerpopulation. This means that it’s possible to find a non-zero correlation for two variables even if they’re actually not correlated in the overall population.

For example, suppose we make a scatterplot for variablesXandYfor every data point in the entire population and it looks like this:

Clearly these two variables are not correlated. However, it’s possible that when we take a sample of 10 points from the population, we choose the following points:

We may find that the Pearson correlation coefficient for this sample of points is 0.93, which indicates a strong positive correlation despite the population correlation being zero.

In order to test for whether or not a correlation between two variables is statistically significant, we can find the following test statistic:

Test statistic T = r *√(n-2) / (1-r2)

where nis the number of pairs in our sample,r is the Pearson correlation coefficient, and test statistic T follows a t distribution with n-2 degrees of freedom.

Let’s walk through an example of how to test for the significance of a Pearson correlation coefficient.

Example

The following dataset shows the height and weight of 12 individuals:

Pearson Correlation Coefficient (18)

The scatterplot below shows the value of these two variables:

The Pearson correlation coefficient for these two variables is r = 0.836.

The test statistic T = .836*√(12-2) / (1-.8362) = 4.804.

According to our t distribution calculator, a t score of 4.804 with 10 degrees of freedom has a p-value of .0007. Since .0007 < .05, we can conclude that the correlation between weight and height in this example is statistically significant at alpha = .05.

Cautions

While a Pearson correlation coefficient can be useful in telling us whether or not two variables have a linear association, we must keep three things in mind when interpreting a Pearson correlation coefficient:

1. Correlation does not imply causation. Just because two variables are correlated does not mean that one is necessarilycausingthe other to occur more or less often. A classic example of this is the positive correlation between ice cream sales and shark attacks. When ice cream sales increase during certain times of the year, shark attacks also tend to increase.

Does this mean ice cream consumption is causingshark attacks? Of course not! It just means that during the summer, both ice cream consumption and shark attacks tend to increase since ice cream is more popular during the summer and more people go in the ocean during the summer.

2. Correlations are sensitive to outliers.One extreme outlier can dramatically change a Pearson correlation coefficient. Consider the example below:

VariablesXandYhave a Pearson correlation coefficient of 0.00. But imagine that we have one outlier in the dataset:

Now the Pearson correlation coefficient for these two variables is 0.878. This one outlier changes everything. This is why, when you calculate the correlation for two variables, it’s a good idea to visualize the variables using a scatterplot to check for outliers.

3. A Pearson correlation coefficient does not capture nonlinear relationships between two variables.Imagine that we have two variables with the following relationship:

The Pearson correlation coefficient for these two variables is 0.00 because they have no linear relationship. However, these two variables do have a nonlinear relationship: The y values are simply the x values squared.

When using the Pearson correlation coefficient, keep in mind that you’re merely testing to see if two variables are linearly related. Even if a Pearson correlation coefficient tells us that two variables are uncorrelated, they could still have some type of nonlinear relationship. This is another reason that it’s helpful to create a scatterplot when analyzing the relationship between two variables – it may help you detect a nonlinear relationship.

Pearson Correlation Coefficient (2024)

FAQs

Pearson Correlation Coefficient? ›

Most recent answer

What is an acceptable Pearson correlation? ›

High Degree: Values between ±0.50 and ±1 suggest a strong correlation. Moderate Degree: Values between ±0.30 and ±0.49 indicate a moderate correlation. Low Degree: Values below +0.29 are considered a weak correlation. No Correlation: A value of zero implies no relationship.

What is a good enough correlation coefficient? ›

Correlation coefficients whose magnitude are between 0.7 and 0.9 indicate variables which can be considered highly correlated. Correlation coefficients whose magnitude are between 0.5 and 0.7 indicate variables which can be considered moderately correlated.

How do you know if a Pearson correlation is strong or weak? ›

The sign of the linear correlation coefficient indicates the direction of the linear relationship between x and y. When r (the correlation coefficient) is near 1 or −1, the linear relationship is strong; when it is near 0, the linear relationship is weak.

What does a Pearson correlation of 0.2 mean? ›

For example, a correlation coefficient of 0.2 is considered to be negligible correlation while a correlation coefficient of 0.3 is considered as low positive correlation (Table 1), so it would be important to use the most appropriate one.

What is the minimum value a Pearson correlation can take? ›

The Pearson correlation measures the strength of the linear relationship between two variables. It has a value between -1 to 1, with a value of -1 meaning a total negative linear correlation, 0 being no correlation, and + 1 meaning a total positive correlation.

What is a good score for Pearson correlation? ›

If it ranges from 0.2 to 0.39, it means a low correlation. A PCC from 0.4 to 0.59 represents a moderate correlation, a PCC from 0.6 to 0.79 has a high correlation, and a PCC higher than 0.79 has a very high correlation [51] .

What is the rule of thumb for the correlation coefficient? ›

63) introduces the following rule of thumb to help students decide if the observed value of the correlation coefficient is significant: Rule of Thumb No. 1: If |rxy| ≥ 2/ √ n, then a linear relationship exists. This paper provides statistical justification for the rule's use.

What is the most appropriate correlation coefficient? ›

As a general rule, you should use Pearson's r for continuous variables that have a linear relationship and meet the assumptions, Spearman's rho or Kendall's tau for continuous or ordinal variables that have a monotonic relationship and do not meet the assumptions, point-biserial r for one continuous and one binary ...

What is an acceptable value of correlation coefficient? ›

Values between 0.7 and 1.0 (−0.7 and −1.0) indicate a strong positive (negative) linear relationship through a firm linear rule. It is the correlation coefficient between the observed and modelled (predicted) data values. It can increase as the number of predictor variables in the model increases; it does not decrease.

How to know if Pearson's R is significant? ›

If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you. r is not significant between -0.632 and +0.632.

How to interpret pearson correlation table? ›

Pearson Correlation – These numbers measure the strength and direction of the linear relationship between the two variables. The correlation coefficient can range from -1 to +1, with -1 indicating a perfect negative correlation, +1 indicating a perfect positive correlation, and 0 indicating no correlation at all.

What is a weakness of Pearson correlation? ›

The weakness of Pearson correlation analysis is that it assumes the correlated variables to be distributed bivariately normal, which is often not met in real-world applications. This can lead to a significant effect on the magnitude of the correlation coefficient.

What does a Pearson coefficient of 0.5 mean? ›

A Pearson correlation coefficient of 0.5 indicates a moderate positive correlation. More generally, a correlation coefficient between 0.4 and 0.7 is usually considered a moderate correlation.

Is 0.4 a strong Pearson correlation? ›

A correlation of 0.4 indicates a moderate positive correlation. For example, a researcher might find that students' SAT scores and GPA have a moderate positive correlation. This means that a student's GPA can be used as a moderate indicator of that student's SAT score and vice versa.

How do you explain the Pearson correlation coefficient? ›

Basically, a Pearson product-moment correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).

Is a Pearson correlation of 0.3 significant? ›

Values between 0 and 0.3 (0 and −0.3) indicate a weak positive (negative) linear relationship through a shaky linear rule. Values between 0.3 and 0.7 (0.3 and −0.7) indicate a moderate positive (negative) linear relationship through a fuzzy-firm linear rule.

Is 0.7 a high correlation? ›

The relationship between two variables is generally considered strong when their r value is larger than 0.7. The correlation r measures the strength of the linear relationship between two quantitative variables. Pearson r: r is always a number between -1 and 1.

Is 0.75 A good correlation? ›

r values ranging from 0.50 to 0.75 or -0.50 to -0.75 indicate moderate to good correlation, and r values from 0.75 to 1 or from -0.75 to -1 point to very good to excellent correlation between the variables (1).

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Carmelo Roob

Last Updated:

Views: 5968

Rating: 4.4 / 5 (45 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Carmelo Roob

Birthday: 1995-01-09

Address: Apt. 915 481 Sipes Cliff, New Gonzalobury, CO 80176

Phone: +6773780339780

Job: Sales Executive

Hobby: Gaming, Jogging, Rugby, Video gaming, Handball, Ice skating, Web surfing

Introduction: My name is Carmelo Roob, I am a modern, handsome, delightful, comfortable, attractive, vast, good person who loves writing and wants to share my knowledge and understanding with you.