linkage disequilibrium calculator - Aaron Graves, PhDude Replica

Biallelic Two-Locus LD Calculator

Enter haplotype counts (or estimated frequencies scaled to any total) for two loci: A/a and B/b. The calculator returns allele frequencies, D, D′, and r².

AB haplotype count

Ab haplotype count

aB haplotype count

ab haplotype count

What is linkage disequilibrium?

Linkage disequilibrium (LD) describes the non-random association of alleles at different loci. If allele A at one locus and allele B at another locus are observed together more often (or less often) than expected from their individual frequencies, the loci are in LD.

LD is central to population genetics, genome-wide association studies (GWAS), fine mapping, haplotype block analysis, and marker selection. In practical terms, LD helps determine whether one variant can act as a proxy for another nearby variant.

How this calculator works

This tool assumes two biallelic loci: A/a and B/b. You provide counts for the four haplotypes:

The calculator converts these counts to frequencies, estimates allele frequencies at each locus, and then computes common LD metrics used in genetics workflows.

Formulas used

p_AB = AB / N, where N = AB + Ab + aB + ab

p_A = p_AB + p_Ab, p_B = p_AB + p_aB

D = p_AB − p_Ap_B

D′ = D / D_max, where D_max = min(p_A(1−p_B), (1−p_A)p_B) for D ≥ 0, and D_max = min(p_Ap_B, (1−p_A)(1−p_B)) for D < 0.

r² = D² / [p_A(1−p_A)p_B(1−p_B)]

Interpreting the results

D (raw disequilibrium)

D indicates direction and magnitude of association but depends on allele frequencies, making cross-locus comparisons difficult.

D′ (normalized disequilibrium)

D′ scales D to its theoretical maximum range given observed allele frequencies. Values close to 1 (or -1) indicate strong historical linkage with limited recombination, but D′ can be high even when one allele is rare.

r² (correlation between loci)

r² is often preferred in association studies because it reflects predictive strength between markers. In tag SNP selection, higher r² means one marker better predicts another.

r² ~ 0.8–1.0: very strong correlation
r² ~ 0.5–0.8: moderate to strong
r² ~ 0.2–0.5: weak to moderate
r² < 0.2: weak LD

Worked example

Suppose your phased data gives haplotype counts AB=40, Ab=10, aB=20, ab=30 (the calculator default). These counts produce positive D and moderate-to-strong r², indicating that A and B tend to co-occur more often than expected under independence.

If you adjust counts so AB and ab dominate while Ab and aB shrink, LD usually increases. If all four haplotypes approach proportions expected from independent allele frequencies, LD decreases.

Common pitfalls

Using unphased genotype counts as if they were haplotypes without haplotype inference.
Comparing D values across loci with very different minor allele frequencies.
Interpreting very high D′ from sparse data as strong predictive power (check r² too).
Ignoring population structure, admixture, and sample size effects.

Best practices for reliable LD analysis

Apply quality control filters (call rate, HWE checks, MAF thresholds).
Use sufficiently large and ancestry-matched cohorts.
Report both D′ and r² for a complete picture.
When possible, validate findings in an external dataset.

Conclusion

A linkage disequilibrium calculator is a quick and useful way to quantify haplotype structure between two loci. For teaching, exploratory analysis, and quick validation, this page gives immediate LD estimates from simple inputs. For large-scale studies, use this as a conceptual companion to full genetics pipelines and dedicated bioinformatics tools.