by: Zank, CEO of Bennett Data Science
Star ratings were created to indicate how much we like a product or service. They actually do a rather poor job of it.
If a product gets a lot of five-star reviews, it must be good; or so the story goes. Let’s peer into star ratings and look at a few caveats of these little golden constellations.
Star ratings are given explicitly by users to rate a product or service as good or bad. This is in contrast to implicit feedback which is collected from user behavior, such as watching only the first five minutes of a movie or binge watching an entire season.
With explicit feedback, the user is in control. And that’s part of the problem.
When we explicitly rate products, we bring a lot of bias. Let’s look at hotels in Tulum, Mexico, for example. There is currently a lot of seaweed in front of many of the beachfront resorts. Naturally guests rate these hotels lower, since in some cases the beaches are inaccessible and the smell can be quite bad. But that’s not the fault of the hotel. And these lower ratings are not suffered by the hotels a block back from the beach that serve similar clientele.
In other words, this seaweed is causing a big ratings dip for the beachfront hotels, through no fault of their own, while the hotels a block back (that advertise the same beaches) are unaffected. Star ratings are not painting an accurate picture of the service, restaurants, cleanliness, etc.
What about the one-star Amazon reviews for products that came a day later than expected or were delivered to the neighbor’s house? These one-star reviews have nothing to do with the actual products, but rather the fulfillment. And as some merchants are quick to remind me on those little paper slips they send with their wares, “Your reviews keep our business alive.”
There’s another issue with explicit feedback; we can’t rank an extraordinary product above one with no flaws. Let me explain. The way the five-star system exists today, we assume everything starts with a five-star review. For example, if I rate my Lyft driver as a three-star or below, I won’t be matched to that driver again. A four-star rating is even considered bad. So how can I differentiate an incredible ride with a ride that simply gave me no reason to complain? The answer is that I can’t.
So, as users, we’re forced to suffer through paragraphs of user comments, looking for the reasons people liked or disliked that beachfront hotel or that new vacuum cleaner. But alas, a lot of those reviews are paid for. So what to do?
Implicit feedback is a very powerful way to measure how much customer like your products or services. But it only works when you can measure the “like”. Movies are a perfect example, and Netflix smartly moved from the golden stars to a thumbs up/down scoring system. They get the majority of their information from how much and how often we watch.
On the other hand, clothing stores have no such feedback. They sell a shirt and never know how much it gets worn. So they rely on reviews, with all the warts that come with them. For better or worse, there are companies springing up to help alleviate this problem by embedding sensors into garments. But the question of how much a consumer likes a garment is critical to manufacturers and will not go away any time soon.
It’s worth paying attention to how user ratings are used and what they really mean in terms actual satisfaction. Often times, they’re covering up a complex underlying story.