Choosing a Causal Inference Method

A practical comparison of A/B tests, Difference-in-Differences, Propensity Score Matching, and Synthetic Control— written for data scientists who need to choose the right tool for real-world questions.

Use this as a decision guide; let handle the implementation details.

Start with the question, then pick the method

There is no single "best" causal inference method. Each design makes different assumptions and works best with specific data structures. A good workflow starts from the question and dataset, then narrows down which methods are reasonable.

The table below summarizes the core methods supported in today:

Method Best for Key assumptions
A/B test (randomized) Online experiments where randomization is feasible Random assignment, no interference, consistent measurement
Difference-in-Differences – Two Point Pre/post changes with treated vs control groups and two periods Parallel trends, no other shocks differentially affecting groups
Difference-in-Differences – TWFE Panel data with multiple periods and staggered rollout Parallel trends (conditional on fixed effects), no anticipation, careful with heterogeneity
Propensity Score Matching Observational data with selection on observables Unconfoundedness given covariates, overlap (common support)
Synthetic Control Single treated unit with many controls and rich pre-treatment history Good pre-treatment fit, no unique shocks, stable relationships over time

A simple decision flow

1. Can you randomize?

If you can randomly assign treatment at the user, session, or geo level without breaking the product, an A/B test is usually the cleanest choice. Randomization removes many identification headaches.

Use 's A/B flow when:

2. Is this a pre/post change with a natural control group?

When a feature, policy, or price changes at a specific time for some units but not others, Difference-in-Differences is often a good fit.

DiD works best when you believe that, without the treatment, treated and control units would have followed similar trends over time.

3. Is treatment self-selected in observational data?

When users or units opt into a feature, campaign, or behavior, randomization is gone and selection bias is a real concern. Propensity Score Matching is useful when you believe that, after conditioning on covariates, treatment is as good as random.

Use PSM when:

4. Do you have one treated unit and many potential controls?

When a single country, platform, or business line is treated, there may be no perfect "twin" to use as a control. Synthetic Control builds a weighted combination of donor units whose pre-treatment path matches the treated unit, then compares their trajectories after the intervention.

Use Synthetic Control when:

How methods complement each other

In practice, strong causal work rarely relies on a single method. Instead, data scientists often combine designs to cross-check conclusions:

is built for this multi-method reality: the same dataset can feed multiple causal views without manual plumbing for each.

Mapping methods to typical product questions

"Did this new onboarding flow increase activation?"

If you can randomize: A/B test. If you launched to a subset of geos or cohorts at a specific time: DiD – Two Point or DiD – TWFE, depending on the data structure.

"What is the impact of a new subscription tier in one market?"

If only one country was treated: Synthetic Control using other countries as the donor pool, with DiD as a robustness check across groupings.

"Do users who enable this feature retain better?"

Self-selection is likely. Use PSM to match feature users to non-users with similar histories, then compare retention or run DiD on the matched sample.

"Did this policy change reduce risky behavior?"

If some regions were affected and others were not, and you have multiple time periods: DiD – TWFE, with checks for parallel trends and potential staggered adoption issues.

Where helps

Choosing methods is only half the battle. Implementing them correctly, checking assumptions, and communicating results are where most of the time and risk sit. helps by: