R² Calculator (Simple Linear Regression)
Enter matching X and Y values separated by commas, spaces, or line breaks. This tool fits a best-fit line and computes R².
What is R² in plain language?
R² (read as “R-squared”) is the coefficient of determination. It tells you how much of the variation in your outcome variable (Y) is explained by your model. In simple terms, R² answers this question:
“How well does my regression line explain the ups and downs in the data?”
- R² = 0: the model explains none of the variation.
- R² = 1: the model explains all of the variation perfectly.
- R² = 0.72: about 72% of variation is explained by the model.
For most real-world data, R² is somewhere between 0 and 1, and usually not very close to 1 unless the relationship is very tight.
How do you calculate R²?
The most common formula is:
R² = 1 − (SSE / SST)
Where:
- SSE = Sum of Squared Errors (also called residual sum of squares):
Σ(yi − ŷi)² - SST = Total Sum of Squares:
Σ(yi − ȳ)² - yi = actual observed values
- ŷi = predicted values from the model
- ȳ = mean of observed Y values
If your model’s predictions are much better than just using the mean, SSE becomes small, so R² gets larger.
Step-by-step process
- Fit your regression model and get predicted values (
ŷ). - Compute
SST = Σ(yi − ȳ)². - Compute
SSE = Σ(yi − ŷi)². - Plug into
R² = 1 − (SSE / SST).
Alternative method (simple linear regression only)
If you only have one predictor (simple linear regression), R² is also the square of the Pearson correlation between X and Y:
R² = r²
This shortcut is convenient, but once you have multiple predictors, you should use the sums-of-squares definition from regression output.
Worked mini example
Suppose we have 5 points:
- X: 1, 2, 3, 4, 5
- Y: 2, 4, 5, 4, 5
After fitting the best-fit line, you might get something like:
SST = 6.8SSE = 2.4
Then:
R² = 1 − (2.4 / 6.8) = 0.6471
Interpretation: about 64.7% of variation in Y is explained by the linear trend with X.
How to interpret R² correctly
R² is useful, but easy to misuse. Keep these points in mind:
- High R² does not prove causation. It only shows fit, not cause-and-effect.
- Low R² is not always bad. In noisy domains (human behavior, markets, medicine), even modest R² can be meaningful.
- R² can increase when you add variables, even weak ones. That is why adjusted R² matters.
Adjusted R²: when you have multiple predictors
When you add more independent variables, regular R² never decreases. That can reward overfitting. Adjusted R² penalizes unnecessary complexity.
Formula:
Adjusted R² = 1 − (1 − R²) * ((n − 1) / (n − p − 1))
- n = number of observations
- p = number of predictors
Use adjusted R² when comparing models with different numbers of predictors.
Common mistakes to avoid
1) Treating R² as the only performance metric
Always check residual plots, RMSE/MAE, and out-of-sample performance. A model can have decent R² and still be practically poor.
2) Ignoring nonlinearity
If the true relationship is curved, a straight line may produce a weak R² even though a nonlinear model would fit well.
3) Using R² with no domain context
What is “good” depends on your field. In some physical systems, 0.95 may be expected. In many social systems, 0.20 can still be informative.
Quick recap
- Compute R² with
1 − SSE/SST. - In simple linear regression,
R² = r². - Interpret as proportion of explained variation.
- Use adjusted R² for multi-variable model comparison.
- Pair R² with residual diagnostics and validation.
If you want a fast answer, use the calculator above. If you want a trustworthy model, pair that number with good statistical judgment.