Drawing up personas is standard practice in marketing. These are the theoretical profiles of customers who purchase a service or product—a small business owner, a working mother, a corporate team leader, an athleisure enthusiast in her 30s, or perhaps a Gen Z young professional.
This project does not conjure compelling characters. Instead, it uses data on purchasing behavior to group the different kinds of customers. We could then inspect each group’s characteristics, use these insights to refine ad targeting, and leave it to lookalike audiences to find new customers who might similarly be interested in the product.
For a health beverage brand in the US, our goal was to attract new customers and increase sales for the client. The result was a 93% increase in conversions.
The first step was to find out what types of people to target and how. This was done by exploring customer data via the client’s Shopify account. Through purchasing behaviors—particularly the amount each customer spent and the number of purchases made—we plotted each customer. We then used the K-Means algorithm to group clients according to their level of consumption.

Excluding this outlier who made more than 25 purchases, the data showed that there were four main groups of customers, which I categorized from low to high purchasers.

The next step was to differentiate and take a closer look at the behavior of each cluster.
Who were the customers with the highest probable lifetime value?
What were their preferences in terms of products and packages?
What enticed a new customer to try the brand for the first time?
Averages, represented by a single number, don’t always show the real picture of a population. Instead, let’s take a look at the spread of each group. We’ll start from the bottom cluster and work our way to the top.

Customers in the low bracket were most probably enticed by the brand’s initial special offers with no real intention of incorporating the products into their regular lives. Don’t we all like to try fancy smoothies and health products from time to time? These people account for 85% of the customers.
As we go from low to mid-consumption, there is a huge drop in group size. The population of the middle cluster is 8 times less than that of the low purchasers (162 vs. 1,255), but each mid-customer spends 4 to 12 times more in total than a low customer.
Mid-high and high clusters have the most loyalty, making recurring purchases of the brand’s health drinks. These are people who are serious about investing in their bodies and have the means to maintain products that they like. In terms of customer lifetime value, this is the ideal target market.

Now let’s look at what products each cluster likes to purchase. Product names describe both the flavor of the drink and the type of package, from long-term subscriptions to their sample pack.

(In hindsight, I could have been more descriptive about the SKU labels without compromising privileged information. But do notice how the top products for the higher brackets, arranged alphabetically, are mostly reversed in the lower brackets with few exceptions—notably SKU B which is popular across all customer brackets)
The brand’s most popular flavor was a favorite regardless of purchasing class. The bigger difference was in terms of the package or subscription plan. High spenders signed up for auto-renewing subscriptions (like SKU A), while low spenders were merely sampling the brand’s products through the variety sample pack (SKU H).
From this analysis, the next actions were simple. We used the e-mail addresses of the higher brackets as seeds to target lookalike audiences for further acquisition through social media ads. For brand awareness, the variety sample pack (SKU H) would be a low-investment entry-point for new leads. SKU B was popular across all brackets, from single purchasers to customers with subscriptions, and could be a great item to attract new customers. We then periodically re-targeted non-subscribers for re-purchases or auto-renewing plans.
The analysis was done using Python to clean and crunch the numbers, as well as derive other customer aspects that weren’t in the data, such as guessing gender from first names (more than 80% of the customer base was found to be female). For visualization, I used Tableau.