As an agency committed to optimising and making decisions for our clients based on data, the experiments tool is one of our favourites. Google provides a tool which lets us take an existing campaign and duplicate it as a ‘draft’. We can then make changes to the draft and split traffic between the draft and the original. It also provides a reporting interface to review performance in real time. After the experiment has run its course the advertiser has the ability to either reject the experiment or apply the apply experiment. With 1 click, rejecting the experiment reverts the campaign back to how it was. Otherwise applying the experiment with 1 click will roll the changes back into the original campaign (or create a new second campaign).
What We Tested
With that out the way, let’s jump into the experiments we performed. We ran 93 experiments in total over the course of 12 months. Most of the experiments used a success metric of CPA, where the goal was to achieve a significantly lower CPA. Some of the ad copy experiments used CTR as the success metric. The experiments covered:
- Landing Pages: changes to landing page design & copy
- Ad Copy: changing key messages in the ad copy headlines
- Campaign Structure: adjusting keyword groupings within Ad Groups
- Locations: Adjusting investment in different locations
- Bid Strategies: testing manual bid changes & new auto-bidding strategies
- Audiences: Adjusting bidding based on audience demographics
- Devices: Adjusting bidding based on devices
Results Overview
From the 93 experiments we ran, 51% of them were successful and 49% were not successful. By success, we mean that they were both statistically significant and showed a positive effect on performance.
The results are shown below:
Experiment |
Success |
Failure |
Landing Pages |
45% (13) |
55% (16) |
Ad Copy |
47% (7) |
53% (8) |
Structure |
50% (2) |
50% (2) |
Location |
100% (1) |
0% (0) |
Bid Strategy |
53% (18) |
47% (16) |
Audience |
63% (5) |
38% (3) |
Devices |
100% (1) |
0% (0) |
TOTAL |
51% |
49% |
Note that ‘location’ & ‘device’, ran few experiments each and therefore are not a good enough sample set.
In general the most successful experiments came from testing changes to ‘Audience Demographics’ and ‘Bid Strategies’.
We ran many experiments for Landing Copy changes and Ad Copy changes, but often found that our changes performed worse then the original.
Digging Deeper
When it came to audiences demographics, most of the experiments focused on adjusting bids for age groups and income levels. Before the campaign we would review the audience insights and generate a hypothesis as to which audience to bid down or exclude. We would then run an experiment. In most cases our hypothesis was proved correct.
For the bid strategies testing we ran a mix of experiments. Our success metric in all cases was decreasing CPA. Many of the experiments that were successful were manual bid changes, such as decreasing bids by 20% across the whole account. This is intuitive.
When it came to testing auto bidding strategies against manual bidding strategies the results were not as good. We found that auto bidding strategies such as Target CPA only outperformed manual bidding approximately 25% of the time. The machine failed three quarters of the time to achieve a better CPA. CPA targets were set based on past performance.
When testing Ad Copy changes we found the results were often very close. For example, using Responsive ads in campaigns achieved a better result in some campaigns but others not. It was split about 50/50.
When it came to testing landing pages, we ran quite a few different variations. Some of the themes that stood out were: Removing the menu bar from the top of the landing page is a good idea, it consistently increases the conversion rate. Similarly taking pricing off the landing pages increase lead volume, but obviously might generate lower quality leads. Adding tabled data rather then text did not make a significant change.