regression statistics calculator - Aaron Graves, PhDude Replica

X values (independent variable)

Use commas, spaces, or line breaks.

Y values (dependent variable)

Must contain the same number of values as X.

Optional: predict Y for X =

What this regression statistics calculator does

This tool runs a simple linear regression using one X variable and one Y variable. It computes the fitted line, measures model fit, and reports key diagnostics that help you understand whether your predictor has a meaningful linear relationship with your outcome.

In plain terms: if you have paired data points like study hours and exam score, ad spend and sales, or temperature and electricity demand, this calculator gives you the line of best fit and several useful summary statistics.

How to use the calculator

1) Enter matched X and Y values

Paste your numbers into each box. You can separate values with commas, spaces, or new lines. Each X must have a corresponding Y in the same position.

2) Click calculate

The tool computes slope, intercept, correlation, R-squared, error terms, and a residual preview table.

3) Add optional prediction

If you enter a value in the “predict Y for X” box, the calculator also returns the model’s predicted Y.

Regression outputs explained

Equation of the line

The model is: ŷ = b₀ + b₁x, where:

b₀ (intercept) is the expected Y when X = 0.
b₁ (slope) is the expected change in Y for each 1-unit increase in X.

Correlation (r)

Pearson’s r tells you the direction and strength of linear association:

Near +1: strong positive linear relationship
Near -1: strong negative linear relationship
Near 0: weak linear relationship

R² and adjusted R²

R² is the proportion of variance in Y explained by X. For example, R² = 0.72 means 72% of the variability in Y is explained by the fitted line. Adjusted R² compensates for sample size and model complexity; in simple linear regression it is close to R² but usually slightly lower.

Error metrics

SSE: Sum of Squared Errors (unexplained variation)
MSE: Mean Squared Error (SSE divided by residual degrees of freedom)
RMSE: Root MSE, in the original units of Y

Lower error values generally indicate better fit, though context always matters.

Standard error and t-statistic for slope

The standard error of the slope quantifies uncertainty in b₁. The t-statistic (b₁ / SE) indicates how far the slope estimate is from zero in standard-error units. Larger absolute values suggest stronger evidence of a non-zero linear effect.

Important assumptions for linear regression

Before trusting the results, verify core assumptions:

Linearity: relationship between X and Y is approximately linear.
Independence: observations are independent of each other.
Constant variance: residual spread is roughly stable across X values.
Residual normality: mainly important for inference in small samples.
No severe outliers or influential points driving the model.

Practical interpretation tips

Always inspect raw data and residuals, not just one metric.
High R² does not prove causation.
Extrapolation outside your observed X range can be risky.
In tiny samples, inference statistics can be unstable.

Example use case

Suppose a manager tracks weekly training hours (X) and customer satisfaction score (Y). Running this calculator may reveal a positive slope and moderate R². That suggests more training is associated with higher satisfaction, but the manager should still test for confounders and validate on future weeks.

Frequently asked questions

Can I use this for multiple regression?

No. This page handles one predictor and one outcome (simple linear regression).

What if all X values are the same?

Regression cannot be fit because there is no variation in X. The calculator will return an error.

How many data points do I need?

At least two points are needed to fit a line, but three or more are needed for full error statistics. In practice, use as many high-quality observations as possible.