Machine Learning opens up a whole arena of new strategies for digital marketers. As an agency that has this capability in house, we are always looking for ways we can utilise these new tools. They provide us with an edge and help us push our clients performance to new heights. In the following example we have used Machine Learning tools to improve our clients quotation process, Google Ads performance & in turn ROAS.
Our client’s objective was to uncover which customer attributes generate the highest revenue. For each customer interaction we looked at all the data inputs we had (feature inputs or x-values) and whether we could determine what combination of features led to the highest revenue (output or y-value)
Workflow
The first thing we did was work out the average sale value over the last 12 months. We discovered this was $200. We then changed our y-value field into a binary metric. Any sale above or equal $200 was classified as 1, any sale below $200 was classified as 0. We changed this from a regression problem into a classification problem. In the next step, we split the data into a training set and a test set. We had ~140,000 rows of sales data, where each row represents an individual sale. 80% of this data would be used to train the model, the remaining 20% would be used to test the model. In the training set we feed all the feature inputs to the model including the y-values (the sales data, ie the answers). The model iterates through each row in the data set and then evaluates its own predictions against the actual answers. For more detail on how this process works and the different models we can use, see an earlier article I wrote about fitting a random forest model to Google Ads data. We perform this training process multiple times for different models. The below screenshot shows one of the decision trees we trained and how it build’s different paths to make decisions.Results accuracy
Following this training period we evaluate the models to see how accurate they are on new unseen data. We run the trained model on the test set which contains 20% of the data and we do not include the answers for this test set. After the model makes predictions on the test set, we then compare these to the actual answers. From our best model, we were able to predict with 95% accuracy whether someone would be a high value or low value sale (1 or 0). To put this in context, if we were to assume everyone was a low value sale (0), then we would have predicted correctly 71% of the time (71% accuracy). So we see that 95% accuracy is far better than the mode. Below we see these results in more detail.- For the 1’s we predicted 7,280 of the 8,283 correctly as 1’s. – 87.89% accurate
- For the 0’s we predicted 20,066 of the 20,481 1’s correctly as 1’s – 97.97% accurate