Common questions defined

Dynamic Optimization allows marketers to test up to 10 variants in an experiment. But this does often bring up questions.

This article is for those who've themselves wondering:

Should I always test 10 variants?
How many variants can I statistically speaking safely test?
Does it differ when optimizing on clicks versus opens?

We've got information below that will clarify these questions regarding minimum testing requirements for our Dynamic Optimization technology.

Optimization events

The answers to most questions regarding Dynamic Optimization testing criteria hinge on what we refer to as optimization events. These are simply occurrences of your chosen optimization metric (i.e. the number opens or clicks).

For Dynamic Optimization to work properly, there must be a certain number of optimization events that occur within a certain time period. These differ for trigger and broadcast experiments.

Triggered experiment minimums

Let's start with triggers. For triggers to optimize properly, Phrasee suggests a minimum of 200 optimization events per variant per day on average.

Looking at the table below, you can see how many optimization events are suggested per day to optimize triggers in a meaningful, statistically significant way:

Events per day	Number of variants to test
More than 1,000 events	5 variants
More than 1,200 events	6 variants
More than 1,400 events	7 variants
More than 1,600 events	8 variants
More than 1,800 events	9 variants
More than 2,000 events	10 variants

If you don't have enough optimization events on average to allow for at least five variants, Phrasee does not recommend testing on that particular campaign.

Broadcast experiment minimums

Open optimization

For broadcast experiments, things are a bit different. Phrasee suggests a minimum of 200,000 delivered recipients for five variants. Phrasee also recommends a minimum of four hours of optimization time for a broadcast experiment optimizing on opens.

So, when optimizing on opens over four hours:

Delivered audience size	Number of variants to test
More than 200,000 recipients	5 variants
More than 220,000 recipients	6 variants
More than 240,000 recipients	7 variants
More than 260,000 recipients	8 variants
More than 280,000 recipients	9 variants
More than 300,000 recipients	10 variants

Click optimization

For broadcast experiments where you wish to optimize on clicks, Phrasee suggests a minimum of 2,000,000 delivered recipients for five variants. Phrasee also recommends a minimum of six hours of optimization time for a broadcast experiment optimizing on clicks.

So, when optimizing on clicks over six hours:

Delivered audience size	Number of variants to test
More than 2,000,000 recipients	5 variants
More than 2,200,000 recipients	6 variants
More than 2,400,000 recipients	7 variants
More than 2,600,000 recipients	8 variants
More than 2,800,000 recipients	9 variants
More than 3,000,000 recipients	10 variants

Minimum tested variants

Unless otherwise specified, we do not generally recommend testing fewer than five variants with Dynamic Optimization.

If you don't have a large enough list to allow for at least five variants, Phrasee does not recommend testing with Dynamic Optimization on that particular campaign type.

Experimenting outside of recommended guidelines

We certainly won't prevent you from testing as many variants as you'd like in a particular experiment, or from choosing clicks instead of opens.

However, you do risk the following outcomes if you choosing to experiment outside these minimum guidelines.

False winner identification and subsequent reversal

You may see a particular variant chosen as a winner when if given more data or time to mature, would not actually have been the winner overall.

Generally, we don't see a full reversal, where the declared winner ends up at the bottom of the pack. But it certainly still amounts to increased opportunity cost and leaving lots of potential uplift on the table.

Statistical guessing

When Dynamic Optimization does not have enough data to make a proper statistically significant conclusion, it will still attempt to optimize based on the data is has received.

However, with so little data returned for each variant or not enough time to make a mature decision, the outcome is essentially a statistical guess: The equivalent of essentially picking one at random to send to more of you audience.

You may still see uplift reported when this happens. But that uplift was essentially happenstance. More often, you'll see negative uplift, which means the opportunity cost of conducting the test outweighed the benefit.

Last reviewed: April 18, 2024