Running Conversion Experiments That Answer Questions
Many A/B tests fail to produce a useful decision — not because the change had no effect, but because the test design makes results ambiguous.
Common failure modes
1) Changing too many variables at once
If you change headline, layout, and CTA at the same time, you can’t attribute impact to any single factor.
2) Looking early without a plan
“Peeking” at results daily can inflate false positives if you didn’t design for sequential analysis. If you want the option to stop early, use a method that supports it and define rules up front.
3) Measuring the wrong metric
Click-through is not always value. Choose a primary metric that matches the business goal (e.g., checkout completion, qualified leads, subscription activation), and define guard metrics that must not regress.
How we run experiments
Pre-test checklist
- One hypothesis, one primary change
- Primary metric (what success means)
- Guard metrics (what must not get worse)
- Minimum detectable effect (what size change matters)
- Required sample size (decided before launch)
- Planned runtime (based on traffic and variance)
Post-test rules
- Evaluate results using the pre-defined decision criteria
- If primary metric improves and guard metrics are healthy, ship
- Document results even when negative — negative results are still answers
What a good experiment looks like
Hypothesis: Removing the phone number field from checkout will increase completion rate by ≥5% (relative). Primary metric: Checkout completion rate Guard metrics: Average order value, refund/return rate MDE: 5% relative Required sample size: Calculated pre-launch (based on baseline + MDE) Planned runtime: Derived from traffic and seasonality
One page, clear exit criteria, and a decision you can defend.
> **Disclaimer:** This article is for general informational purposes only.