pointwise mutual information calculator - Aaron Graves, PhDude Replica

PMI Calculator (from counts)

Enter your corpus counts to compute pointwise mutual information, normalized PMI, and lift.

Total observations (N)

Total number of tokens, documents, windows, or events in your dataset.

Count of X

Count of Y

Count of co-occurrence X and Y

Must be less than or equal to both Count of X and Count of Y.

Log base for PMI

What is pointwise mutual information?

Pointwise mutual information (PMI) measures how strongly two outcomes are associated, compared with what you would expect if they were independent. In practical terms, it tells you whether a pair appears together more often (or less often) than chance.

PMI is common in NLP, information theory, recommendation systems, and exploratory data analysis. You can apply it to words, products, user actions, medical events, biological signals, or any paired observations.

PMI formula

The standard formula is:

PMI(x, y) = log( P(x, y) / (P(x) × P(y)) )

P(x): probability of X
P(y): probability of Y
P(x, y): joint probability of X and Y together

If PMI is positive, X and Y co-occur more often than expected by chance. If PMI is near zero, they behave close to independent. If PMI is negative, they co-occur less often than chance.

How this calculator works

Input assumptions

This calculator expects simple counts from the same dataset:

N: total observations
Count(X): number of observations containing X
Count(Y): number of observations containing Y
Count(X,Y): number containing both X and Y

It converts counts to probabilities and then computes:

PMI
Lift = P(x,y)/(P(x)P(y))
Normalized PMI (NPMI) where possible

Interpreting results quickly

PMI > 0: positive association
PMI ≈ 0: near independence
PMI < 0: negative association
Lift > 1: appears together more than chance
Lift < 1: appears together less than chance

Example

Suppose you have 10,000 documents. Word X appears in 800 docs, word Y in 500 docs, and both together in 120 docs.

P(X) = 800 / 10,000 = 0.08
P(Y) = 500 / 10,000 = 0.05
P(X,Y) = 120 / 10,000 = 0.012
Expected under independence: 0.08 × 0.05 = 0.004

Since 0.012 is larger than 0.004, PMI is positive and lift is 3. That means this pair appears together about three times more often than random independence would predict.

Where PMI is useful

Natural language processing

Keyword extraction and collocation detection
Finding strongly related word pairs
Building semantic features for classification

Market basket and recommender systems

Product affinity analysis
Cross-sell opportunities
Association rule feature engineering

Behavior analytics

Action-to-action relationships in funnels
High-signal event combinations
Anomaly detection via unexpected pairs

Common pitfalls

Rare event inflation: Very low-frequency pairs can have high PMI by chance.
Zero co-occurrence: If Count(X,Y)=0, PMI trends to negative infinity.
Mismatched counting windows: Counts must use the same unit (document, sentence, session, etc.).
Over-interpretation: PMI signals association, not causality.

Best practices

Set minimum frequency thresholds before ranking pairs.
Compare PMI with lift and raw counts, not PMI alone.
Use normalized PMI when you need values on a bounded scale.
Validate findings on out-of-sample data if decisions are high impact.

FAQ

Is PMI symmetric?

Yes. PMI(x, y) = PMI(y, x).

Can PMI be negative?

Yes. Negative PMI means the pair appears together less than expected under independence.

What if co-occurrence is zero?

Then P(x,y)=0 and PMI is undefined in finite terms (often treated as negative infinity). In real pipelines, people often apply smoothing or minimum-count filters.

Final takeaway

PMI is a compact, interpretable way to quantify pairwise association. Use it with count thresholds and context-aware validation, and it becomes a powerful tool for feature discovery and insight generation.