Synthetic Control
A data-driven way to construct a synthetic counterfactual for a treated unit by optimally weighting control units.
What the Synthetic Control method does
Synthetic Control helps data scientists estimate the effect of an intervention applied to a single unit (or a small number of units) over time, using panel data. Instead of picking one control unit, the method builds a synthetic control—a weighted combination of untreated units—so that its pre-treatment trajectory closely tracks the treated unit.
After treatment starts, the gap between the treated unit and its synthetic counterpart is interpreted as the estimated treatment effect over time.
When to use Synthetic Control
Synthetic Control is a strong choice when:
- There is a clearly defined treated unit (e.g., a specific region, platform, or business line).
- You have a donor pool of similar but untreated units.
- You observe outcomes and predictors over a reasonably long pre-intervention period.
- Traditional difference-in-differences is challenging because no single control unit is a good match.
Common examples:
- Evaluating a policy change in one region using other regions as potential donors.
- Launching a major feature on a single platform or market while other markets remain unchanged.
- Assessing the impact of large, structural changes (e.g., pricing overhauls, regulation shifts).
How Synthetic Control works
At a high level, Synthetic Control chooses weights for control units so that a weighted average of their pre-treatment outcomes (and possibly covariates) best reproduces the treated unit's pre-treatment path.
Let Y1t be the outcome for the treated unit, and Yjt for control units j = 2,…,J. Synthetic Control finds weights wj ≥ 0 that sum to 1 such that the weighted sum ∑j=2J wj Yjt closely matches Y1t in the pre-treatment periods.
Once those weights are chosen, they are applied to the donor units in the post-treatment period to construct the synthetic counterfactual path for the treated unit.
Visual interpretation
A common way to present Synthetic Control results is with a time series plot:
- Line 1: outcome for the treated unit over time.
- Line 2: outcome for the synthetic control over time.
Before treatment, the lines should track closely if the synthetic control is well constructed. After treatment, the divergence between the two lines shows the estimated effect over time.
Core assumptions
For Synthetic Control to support causal interpretation, data scientists typically rely on:
- Good pre-treatment fit: the synthetic control reproduces the treated unit’s pre-intervention outcome path and key predictors well.
- No major shocks unique to the treated unit: aside from the treatment, there are no other large, treated-unit-specific shocks that are not shared with or represented in the donor pool.
- Stable relationships: the relationship between outcomes and predictors that held in the pre-treatment period continues to hold post-treatment in the absence of treatment.
A simple Synthetic Control example
Suppose you launch a new subscription tier in one country. You have monthly revenue data for that country and a group of similar countries where the subscription was not launched.
Synthetic Control chooses weights on the control countries so that their weighted average revenue closely matches the treated country’s revenue before launch. After launch, you compare the treated country’s revenue to the synthetic combination.
If the treated country’s revenue rises significantly above the synthetic control and the pre-treatment fit was strong, the gap can be interpreted as the estimated effect of the new subscription tier.
Placebo and robustness checks
To strengthen credibility, Synthetic Control analyses often include:
- Placebo tests: re-assigning the treatment to control units to see whether similar gaps appear by chance.
- Sensitivity checks: varying the donor pool or predictor set to test how sensitive the results are.
- Fit diagnostics: inspecting pre-treatment errors and predictor balance.