Phrasee's AI is constantly working in the background to optimize dynamic campaigns using the most innovative technology available today. Therefore, part of understanding how Phrasee is performing is understanding how Phrasee looks at data and calculates success.
Uplift defined
Phrasee displays uplift based on engagement metrics, as that is the type of metric directly attributable to content. To properly think about the impact of content on engagement metrics over time, Phrasee considers things like opens and clicks in the context of collective campaign performance but also performance in individual batches.
It's important to understand, generally speaking, humans consider performance of a campaign in totality. Phrasee displays total performance because humans often request that view of data.
However, when Phrasee is making its decisions from batch to batch (you can think of a batch as a window of time), it is considering data in the context of that individual period's performance. This is because Phrasee must make real-time decisions on whether to add variants, drop variants, increase a variant's share of the audience or decrease a variant's share of the audience.
Individual performance for a period is essential to good decision-making because volumes and audience engagement can vary wildly batch-to-batch. Phrasee ultimately devises time-adjusted rates to take into the audience size and behavior variation batch-to-batch into account.
Further, Phrasee's incremental uplift calculations always take into the opportunity cost of conducting optimization by subtracting the incremental events the human control hypothetically would have gotten if the whole of the audience had been shown the human control variant.
A practical example
The best way to understand how Phrasee makes decisions and calculates its time-adjusted rates is with an example. Consider the following data:
Day 1
| Source | Opens | Sends | Open rate | Uplift |
Variant 1 | Phrasee | 400 | 1000 | 40.0% | 0% |
Variant 2 - champion | Phrasee | 450 | 1000 | 45.0% | 13% |
Variant 3 | Phrasee | 300 | 1000 | 30.0% | -25% |
Variant 4 | Human | 400 | 1000 | 40.0% |
|
Daily Total |
| 1550 | 4000 |
|
|
For this batch, 1550 opens were achieved. If the human line had been sent to the full 4000 audience, its 40% open rate would have achieved 1600 opens. Therefore, -50 incremental opens were generated this day. In this case, there is an opportunity cost to testing: 50 more opens would have been achieved if the human control had been sent to the whole audience.
Day 2
| Source | Opens | Sends | Open rate | Uplift |
Variant 1 | Phrasee | 160 | 800 | 20.0% | 0% |
Variant 2 - champion | Phrasee | 1300 | 6000 | 21.7% | 8% |
Variant 3 | Phrasee | 70 | 400 | 17.5% | -13% |
Variant 4 | Human | 160 | 800 | 20.0% |
|
Daily Total |
| 1690 | 8000 |
|
|
For this batch, the previous day's uplifts have been analyzed and the proportion of audience being sent each line adjusted accordingly. 1690 opens were achieved. If the human line had been sent to the full 8000 audience, with the 20% open rate it achieved today, it would have generated 1600 opens. Therefore, 90 incremental opens were generated this day.
Day 3
| Source | Opens | Sends | Open rate | Uplift |
Variant 1 | Phrasee | 0 | 0 |
|
|
Variant 2 - champion | Phrasee | 1000 | 1800 | 55.6% | 11.1% |
Variant 3 | Phrasee | 0 | 0 |
|
|
Variant 4 | Human | 100 | 200 | 50.0% |
|
Daily Total |
| 1100 | 2000 |
|
|
For this batch, all the previous uplifts have been analyzed and the proportion of audience being sent each line adjusted accordingly. 1100 opens were achieved. If the human line had been sent to the full 2000 audience, with the 50% open rate it achieved today, it would have generated 1000 opens. Therefore, 100 incremental opens were generated this day.
Overall performance
Human open rate | 33.0% |
Champion open rate (unadjusted) | 31.3% |
Champion uplift (Variant 2) | 10.65% |
Champion open rate (adjusted) | 36.5% |
Summarizing the overall performance, we find that:
-50 + 90 + 100 = 140 incremental opens were achieved by the experiment in total
Overall, the human line achieved 660 opens from 2000 sends, which is an open rate of 660/2000 = 33.0%.
The best performing variant (#2) achieved 2750 opens from 8800 sends, which is an open rate of 2750/8800 = 31.3%.
There is an interesting paradox here: Variant 2 got a higher open rate than the human control every day. However, the total open rate for Variant 2 is less than the human control. Why is this? It is because the open rates and send volumes vary from day to day. Variant 2 got a high volume of sends on a “bad” day (Day 2). This brought its overall average down. This illustrates the risk of using metrics aggregated over time.
To correct for this problem, Phrasee looks at batch-wise uplifts. The average uplift achieved by Variant 2 is (13 + 8 + 11.1) / 3 = 10.65%.
To represent this, Phrasee presents a time-adjusted open rate by applying this average uplift to the human line performance: 33.0% + (10.65% * 33.0%) = 36.5%
Related articles
Last reviewed: April 18, 2024