PMI Calculator (from counts)
Enter your corpus counts to compute pointwise mutual information, normalized PMI, and lift.
What is pointwise mutual information?
Pointwise mutual information (PMI) measures how strongly two outcomes are associated, compared with what you would expect if they were independent. In practical terms, it tells you whether a pair appears together more often (or less often) than chance.
PMI is common in NLP, information theory, recommendation systems, and exploratory data analysis. You can apply it to words, products, user actions, medical events, biological signals, or any paired observations.
PMI formula
The standard formula is:
PMI(x, y) = log( P(x, y) / (P(x) × P(y)) )
- P(x): probability of X
- P(y): probability of Y
- P(x, y): joint probability of X and Y together
If PMI is positive, X and Y co-occur more often than expected by chance. If PMI is near zero, they behave close to independent. If PMI is negative, they co-occur less often than chance.
How this calculator works
Input assumptions
This calculator expects simple counts from the same dataset:
- N: total observations
- Count(X): number of observations containing X
- Count(Y): number of observations containing Y
- Count(X,Y): number containing both X and Y
It converts counts to probabilities and then computes:
- PMI
- Lift = P(x,y)/(P(x)P(y))
- Normalized PMI (NPMI) where possible
Interpreting results quickly
- PMI > 0: positive association
- PMI ≈ 0: near independence
- PMI < 0: negative association
- Lift > 1: appears together more than chance
- Lift < 1: appears together less than chance
Example
Suppose you have 10,000 documents. Word X appears in 800 docs, word Y in 500 docs, and both together in 120 docs.
- P(X) = 800 / 10,000 = 0.08
- P(Y) = 500 / 10,000 = 0.05
- P(X,Y) = 120 / 10,000 = 0.012
- Expected under independence: 0.08 × 0.05 = 0.004
Since 0.012 is larger than 0.004, PMI is positive and lift is 3. That means this pair appears together about three times more often than random independence would predict.
Where PMI is useful
Natural language processing
- Keyword extraction and collocation detection
- Finding strongly related word pairs
- Building semantic features for classification
Market basket and recommender systems
- Product affinity analysis
- Cross-sell opportunities
- Association rule feature engineering
Behavior analytics
- Action-to-action relationships in funnels
- High-signal event combinations
- Anomaly detection via unexpected pairs
Common pitfalls
- Rare event inflation: Very low-frequency pairs can have high PMI by chance.
- Zero co-occurrence: If Count(X,Y)=0, PMI trends to negative infinity.
- Mismatched counting windows: Counts must use the same unit (document, sentence, session, etc.).
- Over-interpretation: PMI signals association, not causality.
Best practices
- Set minimum frequency thresholds before ranking pairs.
- Compare PMI with lift and raw counts, not PMI alone.
- Use normalized PMI when you need values on a bounded scale.
- Validate findings on out-of-sample data if decisions are high impact.
FAQ
Is PMI symmetric?
Yes. PMI(x, y) = PMI(y, x).
Can PMI be negative?
Yes. Negative PMI means the pair appears together less than expected under independence.
What if co-occurrence is zero?
Then P(x,y)=0 and PMI is undefined in finite terms (often treated as negative infinity). In real pipelines, people often apply smoothing or minimum-count filters.
Final takeaway
PMI is a compact, interpretable way to quantify pairwise association. Use it with count thresholds and context-aware validation, and it becomes a powerful tool for feature discovery and insight generation.