BusinessTechnology

You’re Probably Ranking your Products all Wrong

By January 17, 2019 January 27th, 2019 No Comments
Reading Time: 3 minutes

Here’s a very common scenario that surprisingly few companies get right. And the worst part, when you get it wrong it really hurts user experience. This is something I’m very sensitive to. Let’s look at the problem and I’ll show you a very quick and easy fix.

Say you have a product with 100 ratings. The ratings are thumbs up or thumbs down. Out of those 100 ratings, 80 had thumbs up and 20 were thumbs down. 

Trick question: What’s the rating of the product?

Is it 80%? In other words, do 80% of your users like this product.

The answer is almost always, that we just don’t know. Why not? Because we haven’t asked everyone yet. So that 80% is an indication of how well the product is liked, but there’s a vital piece of information missing…

Confidence.

Confidence is everything! Let’s see why by using another example. Say we have two products, A and B, and we need to rank them to show the best one at the top. (Of course if we only had two products, order wouldn’t much matter, but this extends easily to millions of products.)

Let’s say product A got 80 of 100 thumbs up. Let’s also say product B got 2 of 2 thumbs up. Which do you show first, A or B.

Well, B was liked 100% of the time, and A was liked only 80% of the time. But how confident are we in each?. Of course, we’re not very confident at all, especially with B. So, what can we do?

I suggest we play it safe. Let’s assume the worst possible case for A and B. Could A have an approval rating of 40%? Sure, but that’s not very probable, since it’s done well over quite a few ratings. Could B have an approval rating of 40%? Absolutely! And That wouldn’t be very shocking at all.

What we need is a way of estimating what the lowest possible score could be, given the number of ratings we have and how confident we want or need to be. Luckily for us, we can use something called the lower bound of Wilson score confidence interval for a Bernoulli parameter. Say that one time fast! Translated it means this: if we want a certain level of confidence that our rating is above some level, we can use our desired confidence along with the number of positive and total votes to receive the minimum rating we would expect to see. An example will make this very clear.

Example: What’s the lowest score we would see for 2 of 2 thumbs up, 95% of the time (where 95% is our confidence).

Here’s how you get the answer in python:

from statsmodels.stats.proportion import proportion_confint 
print proportion_confint(2, 2, alpha=0.05, method='wilson’)

The answer is 34.2%. That means, that 95% of the time, the rating will end up being greater than 34.2%!! Wow, that’s a huge difference from 100%. But now we have confidence in the value. And I recommend using that lower bound to rank your products.

Now let’s compare the confidence intervals for the 2/2 and 80/100 cases:

print proportion_confint(2, 2, alpha=0.05, method='wilson’)
print proportion_confint(80, 100, alpha=0.05, method='wilson’)

OUTPUT:
(0.342380227506653, 1.0)
(0.7111708344068411, 0.8666330666689676)

This rather cryptic output shows the bounds, low and high for each case.
For B, we’re 95% confident the ‘real’ value is somewhere between 34.2% and 100%And for A, we’re 95% confident the score is between 71.1% and 86.7%.

We don’t need to show these values to the end users but we can certainly use them to rank the products from most to least thumbs up.

I hope this helps you rank your products more effectively. I got the inspiration for this post years ago. Please look here for a wonderful explanation of this work and implementations in Excel and even SQL. A big tip of the hat goes to Evan Miller!

Zank Bennett is CEO of Bennett Data Science, a group that works with companies from early-stage startups to the Fortune 500. BDS specializes in working with large volumes of data to solve complex business problems, finding novel ways for companies to grow their products and revenue using data, and maximizing the effectiveness of existing data science personnel. bennettdatascience.com