Hey,
Everyone talks about A/B testing like it’s the holy grail of Facebook advertising. “Just test everything!” they say. “Let the data guide you!” they preach.
But here’s what nobody tells you: most A/B tests are completely wrong.
A while back, I spoke with a founder who proudly told me his team was “testing everything.”
New creatives every week. New audiences every few days. New campaigns constantly being launched and paused. He was running dozens of experiments at once, drowning in spreadsheets and dashboards, confident he was being “data-driven.”
The problem was obvious to me. He wasn’t getting insights. He was reacting to noise.
One test told him red buttons converted 23% better than blue, so he changed everything to red. Conversions crashed. Another showed video beating images by 41%, so he killed static ads. ROAS dropped by half.
Nothing was wrong with his effort. The testing itself was flawed.
Once I fixed how he tested, the chaos stopped. Winning tests stayed winners when scaled. Performance became predictable.
Today, I’m breaking down the 25 most common A/B testing mistakes advertisers make.
The 25 A/B Testing Mistakes
Mistake #1: Testing Too Many Variables At Once – You change the headline, image, AND audience. Which one made the difference?
Fix: Test ONE variable at a time.
Mistake #2: Not Running Tests Long Enough – You check results after 24 hours. That’s not data, that’s a coin flip.
Fix: Run tests for minimum 7 days or until statistical significance.
Mistake #3: Insufficient Sample Size & Ignoring Statistical Significance – To give clarity to the previous point, more days running an ad doesn’t mean you have enough data. You can let a test run longer and still be staring at noise. If the sample size is weak, a 5% CTR difference means nothing, no matter how many days it’s been live.
Fix: Minimum 100 conversions per variant. For CTR tests, minimum 8,000 impressions. Use a significance calculator and aim for 95% confidence minimum.
Mistake #4: Testing During Unstable Periods – You launch tests during Black Friday or right after iOS updates.
Fix: Test during normal business periods. Avoid holidays, major sales events, or platform changes.
Mistake #5: Unequal Budget Distribution – Ad A gets 70% of budget, Ad B gets 30%.
Fix: Split budget evenly across all variants.
Mistake #6: Testing Different Audience Sizes You test one ad to a 50K audience and another to a 500K audience. A 30-40% difference is workable, but 70%+ difference will skew results. This is especially problematic when testing cold audiences against remarketing audiences—that will definitely screw up your data.
Fix: Keep audience sizes within 30-40% of each other. Never compare cold and remarketing audiences in the same test.
Mistake #7: Confirmation Bias – You like the blue design, so you run the test until blue wins.
Fix: Decide success criteria BEFORE running the test. Stick to it.
Mistake #8: Testing Elements That Don’t Matter – You spend two weeks testing button shapes while your headline is terrible.
Fix: Test high-impact elements first: offer, angle, headline, then minor details.
Mistake #9: No Holdout Control Group – You test five new ads but don’t keep your current winner running.
Fix: Always run your current best-performer alongside new tests.
Mistake #10: Creative Fatigue During Scaling – An ad wins the test with fresh reach. You scale the budget aggressively, but now you’re showing it to the same people multiple times. Performance drops because of saturation, not because the creative stopped working.
Fix: When scaling, monitor frequency closely. If frequency climbs above 3-4, refresh the creative or expand audience size. Your winner needs fresh eyeballs to maintain performance.
Mistake #11: Mixing Audience Intent Levels – You’re testing creative performance, but one ad set happens to catch more bottom-funnel traffic while another gets top-funnel browsers. The data gets muddied by intent differences, not creative quality.
Fix: When testing creatives or copy, use Advantage+ audience or broad targeting consistently across all variants. Let Facebook’s algorithm distribute evenly. Don’t manually segment by funnel stage during creative tests.
Mistake #12: Seasonal Bias – You test in January and apply learnings to November.
Fix: Retest seasonal campaigns each season.
Mistake #13: Device Bias – Your winning ad only works on mobile, but 60% of purchases happen on desktop.
Fix: Check performance by device. Consider testing placement-specific variations.
Mistake #14: Budget Changes Mid-Test – You’re impatient and increase the budget halfway through because “it’s working.” Now you’ve changed the delivery dynamics and can’t trust the before/after comparison.
Fix: Set budgets before testing starts. Don’t touch them until the test concludes. Let it run its course.
Mistake #15: Time-of-Day Ignorance – Ad A runs all day. Ad B only gets budget during peak hours.
Fix: Run tests across the same timeframes or use dayparting equally for all variants.
Mistake #16: Testing Incompatible Concepts – Without Clear Intent You test a product-focused ad against a lifestyle ad, then wonder which “won.” They serve different purposes and attract different mindsets. If you’re testing creative approaches, that’s fine—but understand you’re testing positioning strategy, not just creative execution.
Fix: If you want to test product vs lifestyle approaches, acknowledge you’re testing different strategies, not just ad variations. Make sure your success metric aligns with what you’re actually trying to learn.
Mistake #17: Vanity Metric Obsession – You chase CTR when what actually matters is ROAS or CPA.
Fix: Test for your actual business goal.
Mistake #18: No Documentation – You can’t remember what you tested last month.
Fix: Keep a testing log. Document hypothesis, results, and learnings.
Mistake #19: Testing Too Frequently Without Sufficient Data Collection – You launch tests every two days because you’re impatient or you have budget to burn. The problem isn’t frequency—it’s that you’re not giving each test enough time to gather meaningful data. Even with high budgets, rushing to conclusions means you’re optimizing based on incomplete information.
Fix: Test frequency is fine if each test reaches statistical significance and proper sample size. The issue isn’t how often you test—it’s whether each test runs long enough to give you reliable data. High budget doesn’t replace statistical validity.
Mistake #20: External Traffic Contamination – You’re running Google Ads, influencer campaigns, and email blasts simultaneously.
Fix: Minimize other marketing activities during critical tests or use proper attribution.
Mistake #21: Weekend vs Weekday Confusion – You test Ad A Monday-Wednesday and Ad B Thursday-Saturday.
Fix: Run tests for full weeks to capture all days equally.
Mistake #22: Mobile vs Desktop Creative Mismatch – Your desktop ad looks terrible on mobile where 80% of traffic comes from.
Fix: Preview ads on all devices before testing.
Mistake #23 –Inconsistent Delivery Across Variants One ad gets shown heavily to one demographic or placement, while another reaches a different mix. Facebook’s delivery optimization can create unequal exposure patterns that skew your results.
Fix: Monitor the delivery breakdown by demographic and placement. If you see major discrepancies (like one ad only showing to 18-24 year olds while another reaches 35-44), the test data isn’t comparing apples to apples.
The Proper Testing Framework
Step 1: Form A Clear Hypothesis – “I believe [change] will improve [metric] because [reason]”
Step 2: Define Success Metrics – Pick ONE primary metric.
Step 3: Calculate Required Sample Size – Use a calculator. Don’t guess.
Step 4: Set Even Budgets – Equal distribution across all variants.
Step 5: Run Until Significant – Minimum 7 days AND statistical significance reached.
Step 6: Validate The Winner – Run the winner again in a new campaign.
Step 7: Document Everything – Build your knowledge base.
Real-World Example
I tested two angles for a productivity app:
Angle A: “Get more done in less time”
Angle B: “Stop feeling overwhelmed”
Angle A won for CTR (3.1% vs 2.4%). Angle B won for conversions (8.2% vs 5.1%).
If I’d optimized for clicks, I would’ve chosen the wrong winner. Always test for your actual goal.
The Quick Testing Checklist
Before launching any test:
Am I testing only ONE variable?
Do I have equal budgets across variants?
Are my audiences similar in size (within 30-40%)?
Have I defined success criteria in advance?
Is my sample size large enough?
Am I testing during a normal period?
Do I have a control group?
Will I run this for at least 7 days?
Am I tracking the right metric?
Have I documented my hypothesis?
The Bottom Line
Bad A/B testing is worse than no testing at all. When you make decisions based on flawed tests, you’re guessing with false confidence.
Stop testing everything. Start testing the right things, the right way. One variable at a time. Proper sample sizes. Statistical significance. Documentation.
Here’s to tests that actually tell the truth!
P.S. Which testing mistake hit home the hardest? Hit reply and let me know. If you’re stuck on how to structure a specific test, I’m happy to help.