Linear Regression (Least Squares)
Enter your x and y data as comma-separated values. The calculator returns the best-fit line, correlation, and optional prediction.
What is a line of best fit?
A line of best fit is a straight line that summarizes the relationship between two variables in a scatter plot. If your data points are noisy, the line gives you a clean trend. In statistics, this is commonly found using simple linear regression.
The model is written as y = mx + b, where:
- m is the slope (how much y changes when x increases by 1)
- b is the intercept (the value of y when x = 0)
How this calculator works
This page uses the least-squares method. It chooses the line that minimizes the squared vertical distances between your observed points and the predicted points on the line.
Inputs
- Two equal-length lists of numbers: x values and y values
- At least two points
- Optional x value for prediction
Outputs
- Best-fit equation in the form y = mx + b
- Slope and intercept
- Pearson correlation coefficient (r)
- Coefficient of determination (R²)
- Predicted y value for a chosen x (if provided)
Why the slope and R² matter
The slope tells direction and strength of change. A positive slope means y tends to increase as x increases; a negative slope means y tends to decrease.
R² shows how much of y’s variation is explained by x using this linear model. For example, an R² of 0.81 means roughly 81% of the variation in y is captured by the line.
Step-by-step interpretation guide
1) Check data quality first
Make sure both lists are aligned by position. The first x should pair with the first y, the second x with the second y, and so on.
2) Read the equation
If the calculator returns y = 1.8x + 2.1, each 1-unit increase in x is associated with an average 1.8-unit increase in y.
3) Evaluate fit
Use r and R² to evaluate linear strength. Values near 1 or -1 for r, and values near 1 for R², indicate a stronger linear relationship.
4) Use predictions carefully
Predictions are most reliable inside the observed x range. Extrapolating far beyond your data can be risky.
Real-world use cases
- Estimating sales growth over time
- Modeling study hours vs test scores
- Tracking ad spend vs conversions
- Analyzing temperature vs energy usage
- Understanding training volume vs performance
Common mistakes to avoid
- Using mismatched list lengths
- Forgetting that correlation does not prove causation
- Assuming a linear model is always the best model
- Ignoring outliers that can heavily affect the slope
- Over-trusting predictions outside known data ranges
FAQ
Does this support nonlinear regression?
No. This tool is for straight-line (linear) best fit only.
Can I use decimals and negative numbers?
Yes. Decimals and negatives are supported in both x and y inputs.
What if all x values are the same?
A unique line cannot be computed in that case because the denominator in the slope formula becomes zero. The calculator will show an error message.