pointwise mutual information calculator

PMI Calculator (from counts)

Enter your corpus counts to compute pointwise mutual information, normalized PMI, and lift.

Total number of tokens, documents, windows, or events in your dataset.
Must be less than or equal to both Count of X and Count of Y.

What is pointwise mutual information?

Pointwise mutual information (PMI) measures how strongly two outcomes are associated, compared with what you would expect if they were independent. In practical terms, it tells you whether a pair appears together more often (or less often) than chance.

PMI is common in NLP, information theory, recommendation systems, and exploratory data analysis. You can apply it to words, products, user actions, medical events, biological signals, or any paired observations.

PMI formula

The standard formula is:

PMI(x, y) = log( P(x, y) / (P(x) × P(y)) )

  • P(x): probability of X
  • P(y): probability of Y
  • P(x, y): joint probability of X and Y together

If PMI is positive, X and Y co-occur more often than expected by chance. If PMI is near zero, they behave close to independent. If PMI is negative, they co-occur less often than chance.

How this calculator works

Input assumptions

This calculator expects simple counts from the same dataset:

  • N: total observations
  • Count(X): number of observations containing X
  • Count(Y): number of observations containing Y
  • Count(X,Y): number containing both X and Y

It converts counts to probabilities and then computes:

  • PMI
  • Lift = P(x,y)/(P(x)P(y))
  • Normalized PMI (NPMI) where possible

Interpreting results quickly

  • PMI > 0: positive association
  • PMI ≈ 0: near independence
  • PMI < 0: negative association
  • Lift > 1: appears together more than chance
  • Lift < 1: appears together less than chance

Example

Suppose you have 10,000 documents. Word X appears in 800 docs, word Y in 500 docs, and both together in 120 docs.

  • P(X) = 800 / 10,000 = 0.08
  • P(Y) = 500 / 10,000 = 0.05
  • P(X,Y) = 120 / 10,000 = 0.012
  • Expected under independence: 0.08 × 0.05 = 0.004

Since 0.012 is larger than 0.004, PMI is positive and lift is 3. That means this pair appears together about three times more often than random independence would predict.

Where PMI is useful

Natural language processing

  • Keyword extraction and collocation detection
  • Finding strongly related word pairs
  • Building semantic features for classification

Market basket and recommender systems

  • Product affinity analysis
  • Cross-sell opportunities
  • Association rule feature engineering

Behavior analytics

  • Action-to-action relationships in funnels
  • High-signal event combinations
  • Anomaly detection via unexpected pairs

Common pitfalls

  • Rare event inflation: Very low-frequency pairs can have high PMI by chance.
  • Zero co-occurrence: If Count(X,Y)=0, PMI trends to negative infinity.
  • Mismatched counting windows: Counts must use the same unit (document, sentence, session, etc.).
  • Over-interpretation: PMI signals association, not causality.

Best practices

  • Set minimum frequency thresholds before ranking pairs.
  • Compare PMI with lift and raw counts, not PMI alone.
  • Use normalized PMI when you need values on a bounded scale.
  • Validate findings on out-of-sample data if decisions are high impact.

FAQ

Is PMI symmetric?

Yes. PMI(x, y) = PMI(y, x).

Can PMI be negative?

Yes. Negative PMI means the pair appears together less than expected under independence.

What if co-occurrence is zero?

Then P(x,y)=0 and PMI is undefined in finite terms (often treated as negative infinity). In real pipelines, people often apply smoothing or minimum-count filters.

Final takeaway

PMI is a compact, interpretable way to quantify pairwise association. Use it with count thresholds and context-aware validation, and it becomes a powerful tool for feature discovery and insight generation.

🔗 Related Calculators