Stop Guessing: Use a 10% Holdout to Prove Your Journeys Actually Work

Free

This playbook shows Shopify and e‑commerce app teams how to wire a 10% holdout into behavioural journeys in Spreeflo so they can measure true incremental lift, prove which automations earn revenue, and double‑down on what works.

Industry

Niche

Pattern

Loading sequence...

When Mia at “CartWizard” shipped her new upgrade journey, the numbers looked great on paper.

Merchants who hit 50 recovered carts now received a slick “You’re leaving money on the table, upgrade to Pro” email. Upgrade volume ticked up. Revenue graphs pointed in the right direction. The team high‑fived in Slack.

Then her co‑founder asked the annoying question:

“How many of those upgrades would have happened anyway?”

Silence.

Because without a control group, there’s no baseline. You’re seeing outcomes, not impact. And for a small e‑commerce app team where every email, every journey, and every hour of work has an opportunity cost, that blind spot is expensive.

The pattern in this article fixes that with one idea: every behavioural journey gets a small holdout group that receives nothing. The sequence at the top of this page is the whole journey, end to end. It uses a 10% Random Split so you can finally measure the actual incremental lift of your automations.

This is a playbook for Shopify / e‑commerce app teams who want proof, not vibes, that their journeys are worth the send.

Why “losing” 10% of sends is often your smartest move

For a founder running a $30k–$150k MRR app, the temptation is obvious: if a journey seems to drive upgrades or prevent churn, send it to everyone. Why hold anyone back?

Because without a holdout, you’re flying without instruments:

You don’t know if that shiny new onboarding flow is doubling activation, or just talking to users who were going to activate anyway.
You don’t know whether a discount email is actually lifting revenue, or just pulling forward upgrades your customers would have done at full price.
You can’t compare one journey to another on ROI. Everything “works” but you don’t know what works best.

A small, permanent holdout group turns every behavioural journey into a running experiment:

90% of eligible users get the treatment (emails, nudges, etc.).
10% get nothing related to this journey.
You compare upgrade/churn/conversion between the two groups over a fixed window.

The cost is theoretical upside from the 10% you hold back. The gain is clarity on where your limited time actually compounds — classic “founder‑led businesses win on leverage, not headcount.”

Once you’ve paid that cost once and proven a journey is genuinely accretive, you can always spin up a version with 0% holdout for full coverage.

A concrete scenario: measuring an upgrade nudge journey

To keep it specific, let’s design around a real e‑commerce app use case.

Back to CartWizard:

They recover abandoned carts for Shopify stores.
Plans: Basic ($29), Pro ($79), Elite ($199).
The team suspects that stores who’ve recovered 50+ carts on Basic are ripe for an upgrade.

They build a behaviour‑based journey: when a merchant crosses 50 recovered carts on Basic, they get a multi‑step upsell sequence over email.

We’ll use that scenario to walk the sequence at the top of this page, node by node, and see how the 10% holdout works in Spreeflo.

Step 1: Make sure you can see the behaviour

Before you build anything in the journey editor, you need the behavioural signal.

For an app, that almost always lives in your backend, not just in the browser. You want to track events like:

cart_recovered
plan_changed
subscription_upgraded
feature_used

Those events should be sent into Spreeflo via the Spreeflo API or, if part of the flow happens on the web, through the Spreeflo SDK.

For this pattern, we’ll assume you’re already sending an event like:

recovery_milestone_reached with properties:

milestone: "50_recoveries"
current_plan: "basic"

or you maintain a contact attribute like total_recoveries and plan.

Either way, the journey trigger just needs a clean way to identify “merchants who just crossed 50 recoveries on Basic.”

Step 2: Pick your trigger (behavioural, not calendar)

This pattern works with almost any behavioural trigger. In Spreeflo’s journey builder you could start from:

A Custom Event trigger on recovery_milestone_reached.
A Criteria Match trigger: “plan is basic AND total_recoveries is greater than 49”.
A Join Segment trigger if you prefer defining that logic once in a reusable segment.

For CartWizard’s upgrade journey, a Custom Event trigger is a clean fit:

Event name: recovery_milestone_reached
Property conditions: milestone is "50_recoveries" AND current_plan is "basic"
Re-enrollment: typically off here, so each merchant only enters this specific upgrade journey once.

The key idea: whatever behaviour defines “eligible for this treatment,” express it as the trigger. The holdout pattern is the same whether that’s activation, reactivation, or an upgrade nudge.

From this trigger, every eligible merchant flows into the next node.

Step 3: Immediately branch with a Random Split (your 10% holdout)

Right after the trigger comes the core of this pattern: Random Split.

In Spreeflo, Random Split is a Process node that:

Takes in a stream of contacts.
Sends a fixed percentage down Path A.
Sends the rest down Path B.

In our sequence:

Path A = holdout (10%).
Path B = treatment (90%).

Why immediately after the trigger?

You want the split to be as “pure” as possible — no tags, filters, or conditions skewing the population before the randomization.
That keeps holdout and treatment statistically identical except for the fact that one group receives the journey and the other doesn’t.

Configuration:

Percentage: 0.1 to the holdout branch.
0.9 automatically goes to the treatment branch.

Every time the trigger fires for a merchant, they’re randomly allocated to one of these arms. Over dozens or hundreds of entries, you end up with two comparable cohorts.

Step 4: Tag both arms for later analysis

Before you send anything, you want a permanent record of which merchants landed where.

Drop an Add Tag action as the first node on each branch:

On the 10% holdout path: Add Tag → upgrade_journey_holdout.
On the 90% treatment path: Add Tag → upgrade_journey_treatment.

Why tag?

It gives you two easy, stable cohorts for later reporting.
You can use getting started with tags and the segment builder to slice any metric by these tags: upgrade rate, churn, retention, revenue.

Tags don’t send anything to the customer, so they don’t violate the “holdout receives nothing” rule.

Step 5: Build the treatment sequence (email, spaced correctly)

On the 90% treatment branch, this is where your usual best‑practice journey lives.

A simple, effective upgrade nudge sequence might look like:

Send Email: Immediate Pro‑plan pitch
- Use our email builder to tailor the content:
- Subject: “You’ve already recovered $X — Pro helps you catch the rest”
- Body: reference their milestone (“50+ recoveries”), show before/after numbers, highlight 1–2 Pro‑only features.
- “Send only once” can stay on, because you don’t want someone receiving this same Pro pitch multiple times if they somehow re‑enter later.
Time Delay: Wait 3 days
- Time Delay supports hours or days, not minutes. For upgrade journeys, 2–7 days is a reasonable follow‑up window.
- This spacing also respects inbox fatigue — you never want two emails back‑to‑back in a way that feels spammy.
Send Email: Objection‑handling follow‑up
- Different angle: social proof, quick ROI math, maybe a mini‑case study from another merchant at a similar scale.
- Again, “Send only once.”

Optionally, you could:

Add a Wait Condition node after the first email: wait up to 7 days for a subscription_upgraded event. If they upgrade, skip the second email. If not, send it.
Add a Send Web Push if you’re on the Professional plan and your merchants subscribe to browser notifications. In that case, respect pacing and avoid immediate email + push back‑to‑back unless it’s a deliberate “multi‑channel moment.”

The point: your treatment branch is normal lifecycle automation, just with that Random Split and tagging layered in at the top.

Step 6: Keep the holdout truly “dark”

On the 10% holdout branch, discipline matters.

What you can do:

Add Tag: upgrade_journey_holdout (already covered).
Optionally, add an Update Contact Attribute node to set something like upgrade_journey_version = "v1_holdout" for deeper analysis later.

What you shouldn’t do:

No Send Email nodes related to this upgrade pitch.
No Send Web Push nodes for this specific treatment.

Those contacts should continue receiving your other, unrelated product communications (release notes, generic newsletters, etc.). The holdout is specific to this journey, not your entire marketing program.

Once you’ve tagged them, this branch often just ends.

Step 7: Measure the lift with segments and events

After the journey has been running for a while and you’ve accumulated enough volume (more on that in a moment), it’s time to pull the results.

You’ll use:

The tags you added (upgrade_journey_treatment, upgrade_journey_holdout).
A clear definition of “conversion” — usually the subscription_upgraded event or a plan attribute change.
Spreeflo’s segment builder.

Create two segments:

Upgrade Journey – Treatment
- Contact is tagged with upgrade_journey_treatment.
Upgrade Journey – Holdout
- Contact is tagged with upgrade_journey_holdout.

Then, for each segment, create a sub‑segment or a filter that represents “converted”:

Custom events: subscription_upgraded triggered at least 1 time in the last 30 days (or whatever window makes sense).
Or a plan attribute: plan is "pro" or plan is "elite".

You now have:

Treatment group size.
Treatment group conversions.
Holdout group size.
Holdout group conversions.

Compute:

Treatment conversion rate = conversions_treatment / size_treatment.
Holdout conversion rate = conversions_holdout / size_holdout.
Incremental lift = treatment rate − holdout rate.

Example:

Treatment: 900 merchants, 135 upgrades → 15%.
Holdout: 100 merchants, 10 upgrades → 10%.
Incremental lift = 5 percentage points (a 50% relative lift over baseline).

Tie that back to revenue:

If the average upgrade adds $50 MRR, then:
Incremental upgrades from the journey ≈ 0.05 × 1000 = 50 upgrades.
Incremental MRR ≈ 50 × $50 = $2,500 MRR from this journey over the window.

Now you know the journey isn’t just “driving upgrades”; it’s actually worth ~$2.5k MRR in incremental value, not counting repeat months. That’s the kind of evidence that tells a small team where to keep investing.

How long to run it, and how big the holdout should be

A few practical guidelines:

10% is a solid default. For higher‑volume use cases, you can drop to 5%. Below that, your holdout may be too small to measure anything.
Run until you’ve seen at least a few dozen conversions in each arm. If only 2 holdout users upgraded, random chance dominates.
Avoid holdouts for critical flows like password resets or essential billing notifications. This pattern is for marketing and lifecycle, not operations.

If you’re nervous about “losing” upgrades from the holdout, think of it as buying data:

You’re trading a small, temporary revenue dip for clarity on which journeys are actually accretive.
Once a pattern is clearly a winner, you can clone the journey, remove the Random Split, and send it to everyone going forward.

Variations you’ll probably want later

Once you’re comfortable with a basic 10/90 split on one journey, there are a couple of higher‑leverage extensions:

Test different upgrade offers
- Swap the holdout path for a second treatment path and use Random Split as a true A/B test (50/50).
- Path A: discount offer. Path B: feature‑based pitch, no discount.
- Same measurement approach with tags and events.
Add holdouts to other key journeys
- Post‑install activation sequence (“first campaign created”).
- Win‑back journey for app_uninstalled customers you re‑engage via email.
- Pre‑churn: users whose app usage drops below a threshold.
Re‑run the experiment when you significantly change the journey
- If you rewrite the upgrade emails or add web push, treat that as “v2” and re‑introduce a holdout tag (upgrade_journey_v2_holdout) so you can compare.

Because Spreeflo makes it easy to build a journey once and let it run indefinitely, these experiments don’t need constant attention. You design them thoughtfully up front, then check in when you have enough volume to make a decision.

Why this pattern is pure leverage for a small app team

The biggest advantage solopreneurs and lean SaaS teams have isn’t headcount. It’s the ability to move fast, instrument what matters, and cut what doesn’t.

A 10% holdout pattern is a textbook example of that:

You add a Random Split and a pair of tags once.
From then on, every merchant who hits the trigger silently becomes part of a live experiment.
A few weeks later, you can say with a straight face whether that journey is creating real incremental revenue or just making you feel productive.

No analyst. No heavyweight experimentation platform. Just a handful of nodes in Spreeflo, some well‑named tags, and the discipline to let your data, not your ego, decide which automations earn their keep.

When you look across your flows — onboarding, upgrade, reactivation — and see which journeys actually move the needle, you’re doing exactly what founder‑led businesses should do: making every hour of marketing work compound. That’s the real point of marketing automation, and it’s the mindset this pattern helps you practice.