2 sample t test calculator

Calculate a Two-Sample t-Test from Summary Statistics

Use this tool to compare means from two independent groups. Enter sample size, mean, and standard deviation for each group.

Sample 1

Sample 2

Note: This calculator uses summary data, not raw observations. Results are for independent samples only.

What is a 2 sample t test?

A 2 sample t test (also called an independent samples t-test) checks whether two group means are statistically different. It is one of the most common hypothesis tests in statistics, data science, and research.

Example use cases include:

  • Comparing average test scores between two classes.
  • Comparing average conversion rates (after transformation) across two campaigns.
  • Comparing average blood pressure between treatment and control groups.

Inputs required for the calculator

This calculator uses summary statistics, so you only need:

  • Sample size for each group: n1 and n2
  • Sample means: x̄1 and x̄2
  • Sample standard deviations: s1 and s2
  • Significance level α (usually 0.05)
  • Hypothesis direction (two-tailed or one-tailed)

Welch vs Student (pooled) t-test

Welch's t-test (recommended)

Welch's test does not assume equal population variances and is generally more robust. In practical work, this is usually the safest default.

Student's pooled t-test

The pooled version assumes both populations have equal variances. If that assumption is wrong, p-values can be misleading.

How to interpret the results

  • t statistic: standardized difference between means.
  • Degrees of freedom (df): controls the exact t-distribution used.
  • p-value: probability of observing data this extreme under the null hypothesis.
  • Confidence interval (CI): plausible range for the true mean difference (μ1 − μ2).
  • Cohen's d / Hedges' g: effect size estimates (magnitude, not just significance).

If p-value < α, reject the null hypothesis. If p-value ≥ α, fail to reject the null.

Assumptions to keep in mind

Independence

Observations in each group should be independent, and groups should be independent of each other.

Approximate normality (or adequate sample size)

The test is most reliable with roughly normal distributions, especially for small samples. Larger samples are usually more forgiving due to the central limit theorem.

Scale of measurement

Your outcome variable should be continuous or near-continuous.

Practical tips

  • Report both p-values and effect sizes.
  • Prefer confidence intervals over binary “significant / not significant” language.
  • Use domain context: statistical significance does not always imply practical importance.
  • When in doubt about variances, choose Welch's test.

Quick formula summary

Welch standard error

SE = √(s1²/n1 + s2²/n2)

Test statistic

t = (x̄1 − x̄2) / SE

Welch degrees of freedom

df = (s1²/n1 + s2²/n2)² / [ (s1²/n1)²/(n1−1) + (s2²/n2)²/(n2−1) ]

Final note

This tool is designed for quick, transparent calculations when you have group summaries. For production-grade analysis, pair this with exploratory plots, assumption checks, and reproducible scripts.

🔗 Related Calculators