Popularity has become its own justification.
– Jonathan Franzen
by: Zank, CEO of Bennett Data Science
We all know that popular items sell well, because, well, they’re popular! But there’s a downside to recommending most-popular products or content to your users — it biases future work. That downside is called ‘popularity bias’. This article explains what popularity bias is and shows why it’s an important factor to consider when building product personalization. If ignored, it can creep in and negatively affect your best personalization efforts for years!
I show you why popularity is a good place to start, then identify the problem this creates (the bias) and show you how to avoid most of the big mess it creates.
Start with Popularity
When you don’t have much data collected on your users/customers or items/products, suggesting popular items to them is a great place to start. Popular items are usually popular for a good reason. This shouldn’t be confused with personalization. It’s just really simple stuff; count the number of times an item or product has been purchased in the past. The one with the highest count wins. You might show the top five most popular products to a user in an email, for example. But it’s not personalized, because everyone gets the same email.
And this works. It’s so much better than hand curation (because it scales) and random selection (because that’s just horrible). But eventually (hopefully!) you’ll have collected more data about your users and their consumption patterns and you’ll be able to deliver real one-to-one personalization.
Watch out for Popularity Bias
After using popularity for personalization for a while, you’ll have a big pile of data that shows…drumroll…your customers like popular products (well, yeah, because that’s all you’ve been showing them!). It paints a color across all of your data, and it’s very difficult to wash off. Imagine looking at the purchase or consumption history of one of your clients. What would you see? Well, if your website or storefront organizes products by what’s most popular, most users will buy those products. So your database will be full of popular items. This begets, you guessed it, even more of the same as the popular items become even more popular. It’s a vicious cycle that can be really tough to get out of!
Moving Away from Popularity
Later, when your analytics team wants to transition to ’smarter’ ways of achieving personalization, the new techniques end up graded against historic (popular!) data. Remember, your database is full of popular items. So, a new product recommender will have to best popularity. But how can it, when all the historic data is biased towards those popular items?
The problem is, the new algorithms probably won’t recommend only the most popular products. But grading them against the only yard-stick we have, popularity data, makes them appear less effective than they really are! That’s popularity bias and it makes it seem like nothing is (much) better than popularity.
Two Solutions to Popularity Bias
Avoid global popularity as much as possible from the beginning
Reduce popularity slowly until its effects are largely gone
ONE – Avoiding Popularity Bias from the Beginning
There are a few ways to mitigate popularity bias from the early stages and they’re straightforward to implement. I’ll share my favorite: segmentation. For a fictitious example, let’s assume you have a big group of young users who are from 18 to 25 years old and another group who is over 40 (wouldn’t that be nice?!). The segmentation technique calls for building two lists of popular items, one for each age group. This gets you closer to pure personalization while remaining trivial to implement. It’s possible to segment a few more times and end up with several lists of popular products. In general, this is trivial to implement and, as long as the groups have different consumption patterns, will minimize popularity bias. As a wonderful side effect, it may also lead to much better (more relevant) recommendations!
TWO – Reduce Popularity Bias Slowly
The road to reduced popularity bias and increased personalization is paved with a healthy mix of intuition and rigorous testing. Since we won’t be able to assess new personalization techniques on historic (biased) data, we’ll have to use our intuition to do our best. This is why subject matter expertise is so important to our field. Through good intuition and rigorous online A/B testing, it’s possible to deploy smarter models that push out the old popularity data. In other words, if a new model shows better A/B testing results, users will start seeing new recommendations from that model. Then, over time, the old popularity data will be used less and less to train the new models.
Of course, the logical question to ask is, “Doesn’t the new model bias the data too?” Yes, of course it does, but in a different way that hopefully provides more diversity and personalization. This gets into a few topics for another post, such as recommender diversity or serendipity.
Popular items sell well because they’re popular. But if that’s all we use for recommendations, we can end up in a vicious cycle of selling popular, but hardly personalized products to customers. And that doesn’t serve anyone well!
Through a careful consideration of techniques, we can avoid this cycle and achieve much more satisfying and personalized recommendations to our users.
Big thanks to Vanessa from Daizy who kindly motivated me to write this article.
Zank Bennett is CEO of Bennett Data Science, a group that works with companies from early-stage startups to the Fortune 500. BDS specializes in working with large volumes of data to solve complex business problems, finding novel ways for companies to grow their products and revenue using data, and maximizing the effectiveness of existing data science personnel. https://bennettdatascience.com