In just over one week, I’ll be speaking at the TDWI conference here in San Diego. It’s more of a training event than it is an opportunity to present new work or white papers. And this really excites me. It gives me an opportunity to teach others some of the most important things I’ve learned in over 20 years as a data scientist.
I’ll be giving a talk directed to executives, called Preparing Your Company for Effective Data Science: A Proven Framework.
It’s a four hour (!) presentation. As I’ve been running through the slides, one topic comes up over and over. It’s “Choosing the right objective”. Here’s why I think it’s so important.
Good objectives answer my favorite question, “What’s the point?” What does your company do that delights users and (hopefully) leads to revenue? The next most important questions is, “Can you measure it?”
Objective: What’s the point?
Metric: How to measure impact?
Here’s a thought example: What’s Netflix’s objective?
Does Netflix want you to binge watch the latest season of their new show? Sure they do, right?
Not so fast; what if all that sitting on your sofa makes you feel guilty for all the screen time, and you stop watching Netflix for a few weeks. That probably wouldn’t be a good thing, especially if it happened over and over. Spikes in engagement like this could result in churn. And churn is hardly the right objective for Netflix.
In this case, it sounds like daily engagement might be a better objective. But how would Netflix define engagement? Is it time spent, or time and number of shows watched, or number of shows watched all the way through? These are likely better indicators of engagement.
There are a lot of important subtleties to think about here. Objectives are as tough to define as they are essential to get right.
Once the objective is in place, it’s essential to choose the right metric to measure it.
For example, Netflix started with their five-star system for assessing how much we’d all like each of their shows/movies. Their objective was to show us the shows or movies we would would rate the highest. Their metric was RMSE (root mean square error), which measured how close their estimate was to our actual rating (if we left one). For their objective of estimating our star rating of a show/movie, it was an appropriate metric. (I’m ignoring that predicting our star ratings was a very poor objective in the first place!!)
Some rules for setting good objectives:
- Start with something lofty (and likely unmeasurable) such as customer delight.
- Think about whether or not it can be measured. If not, find a proxy that can be measured (if it’s not possible to measure delight, it might be possible to measure time spent using the app or product.)
- Get sign off from the CEO. After all, it’s very important; This is what the company will be optimizing for.
- Choose a metric appropriate to measure the objective and use it to assess the machine learning models
Here’s a recent blog post on that topic.
Travel Recommendations with TRECS:
Last week we presented our new work on TRECS. We received a lot of positive feedback on the updates to our interface and methodology. Recommended destinations are now more relevant and give the user a lot more control over what’s shown. Thanks to everyone who took the time to kick the tires and play around with the tens of thousands of destinations around the world. And special thanks to those who hit reply and told us what they like and dislike (you’re the BEST!).
Just for fun:
Take a second and compute the answer to this seemingly trivial equation: 8 ÷ 2(2+2) = ? What did you get? 1? 16? Let’s see how we might arrive at either answer. Start with parentheses: 2+2 = 4. Then multiple that by 2 to get 8. Then 8/8 = 1. But that’s moving right to left.
What if we handle the parentheses first, then move left to right? Starting with parentheses again, we have 8 ÷ 2(4). Now from left to right it’s: 8 ÷ 2 = 4 and 4(4) = 16. That feels strange doesn’t it. What’s the “correct” way to do this and why/how is it so confusing? That’s the topic of this article. Have fun!
Lost in translation:
A recent McKinsey report proclaims: “Hire as many data scientists as you can find—you’ll still be lost without translators to connect analytics with real business value”. That’s a very interesting and timely observation. I’ve found it to be true. It’s difficult to find data scientists who can proficiently translate between science and product. Never mind that the two move at very different speeds. There are a few different ways to solve this problem, and doing so can result in huge productivity benefits. I’ve seen it first hand. I like this report because it emphasizes an approach I hadn’t thought about; hiring specifically for translation. Please let me know what you think about this approach.