Pearson Correlation Calculator
Enter two equal-length data series (X and Y) to calculate the Pearson correlation coefficient (r) and coefficient of determination (r²).
r = Σ[(xi - x̄)(yi - ȳ)] / √(Σ(xi - x̄)² · Σ(yi - ȳ)²)
What is the Pearson coefficient?
The Pearson correlation coefficient (usually written as r) measures the strength and direction of a linear relationship between two numeric variables. It ranges from -1 to +1:
- +1: perfect positive linear relationship
- 0: no linear relationship
- -1: perfect negative linear relationship
If you are doing data analysis, statistics homework, research reporting, or business analytics, Pearson’s r is one of the most common first-pass tools for understanding how two variables move together.
How this Pearson coefficient calculator works
Step 1: Mean-center each variable
The calculator computes the average of X and the average of Y. It then measures how each point deviates from its variable’s mean.
Step 2: Compute covariance-like numerator
It multiplies each pair of deviations and sums them. If high values of X tend to occur with high values of Y, this sum is positive. If high X tends to occur with low Y, it becomes negative.
Step 3: Standardize by total spread
The numerator is divided by the geometric combination of each variable’s squared deviations. This makes the result unitless and bounded between -1 and +1.
How to interpret the result
In practice, interpretation depends on your field, sample size, and context. A common rule of thumb for the absolute value |r| is:
- 0.00–0.19: very weak
- 0.20–0.39: weak
- 0.40–0.59: moderate
- 0.60–0.79: strong
- 0.80–1.00: very strong
This calculator also reports r², the coefficient of determination, which is often read as the proportion of variance in Y linearly associated with X.
Important assumptions and caveats
1) Linear relationship
Pearson correlation captures linear association. Two variables can have a strong non-linear relationship and still show a low Pearson r.
2) Outlier sensitivity
A single extreme point can dramatically shift the coefficient. Always inspect your data visually (scatter plot) when possible.
3) Correlation is not causation
Even a high correlation does not prove that one variable causes the other. Confounding factors and reverse causality are common.
4) Variability is required
If all X values are identical (or all Y values are identical), the coefficient is undefined because there is no variation to compare.
Quick practical example
Suppose you track study time (X) and exam score (Y) for a small group. If students who study more tend to score higher in a roughly straight-line trend, Pearson r will be positive and likely moderate to high. Click Load Example to test the calculator with sample values.
Pearson vs. Spearman: when to use each
- Pearson: best for linear relationships on interval/ratio numeric data.
- Spearman: rank-based; useful for monotonic but non-linear relationships or ordinal data.
If your data is skewed, has outliers, or appears curved, Spearman’s rank correlation may be more robust.
Common input mistakes
- Different number of X and Y values
- Including text labels instead of numbers
- Using only one pair of values (you need at least two)
- Entering a constant list for one variable
FAQ
Can I use negative numbers and decimals?
Yes. The calculator supports positive/negative values, decimals, and scientific notation.
Do I need normally distributed data?
Normality matters more for significance testing and confidence intervals. For descriptive correlation alone, Pearson r can still be computed, but interpretation should be cautious if assumptions are violated.
What does r = 0 mean?
It means there is no linear association. A curved relationship may still exist.