Propensity Score Matching (PSM)
A principled way to simulate randomization when treatment is self-selected.
What PSM does
Propensity Score Matching reconstructs the conditions of a randomized experiment by pairing treated and control units that share similar likelihoods of receiving the treatment.
This removes confounding caused by selection bias—as long as the variables driving selection are observed in your dataset.
Why matching works
In observational data, people choose treatments for reasons correlated with the outcome. PSM solves this by estimating the propensity score:
Where X represents observed covariates influencing treatment choice. Treated users are then matched to control users with similar p(X).
After matching, treated and control groups become comparable in terms of X—making the remaining differences in outcomes interpretable as causal.
Key assumptions
1. Conditional Independence (CIA)
All drivers of treatment selection must be in X. If something important is missing (motivation, preference, intent, skill, etc.), matching cannot fix it.
2. Overlap
Every treated unit must have at least one similar control. If no comparable control exists, PSM cannot create a valid match.
3. Stable Unit Treatment Value Assumption (SUTVA)
Treatment of one unit must not affect the outcome of another.
How matching is done in
implements only the matching settings that data scientists actually need:
Matching ratio
- 1:1 (default, best balance)
- 1:2 or 1:3 (lower variance)
- 1:many (all controls within caliper)
Replacement
- No replacement (default)
- With replacement (useful when treated >> control)
Caliper
- None (default)
- Strict (0.1)
- Very strict (0.05)
Distance metric
- Propensity score distance (default)
- Mahalanobis distance (advanced)
After matching
automatically computes:
- Covariate balance plots
- Standardized mean differences
- Matched sample diagnostics
- Causal treatment effect estimation
Estimating the treatment effect
The estimate is unbiased if covariates successfully explain selection.
When PSM is ideal
- You can't randomize
- Treatment is self-selected (e.g., upgrades, feature adoption)
- You observe the key drivers of selection
- You want interpretable, matched pairs
When PSM struggles
- Missing important covariates
- Poor overlap
- Extreme propensities (near 0 or 1)
- Very high-dimensional covariate sets
PSM inside
provides a streamlined interface for dataset mapping, matching setup, diagnostics, and effect estimation. The workflow ensures clean, interpretable causal effects while preventing the common pitfalls that plague PSM.