Reading Time: 4 minutes

by: Zank, CEO of Bennett Data Science

Today, as companies collect more and more data, they’re becoming acutely aware of the value of this data to their business.

That’s great, and the impulse is right. But sadly, most of them are going to be disappointed down the road in the usefulness of all that data, because they missed a fundamental element at the start:

A clear objective.

It’s not enough to collect big piles of data. You must ask: What are you really trying to discover, measure, or understand?

I recognize that this may seem pretty basic. If you’re the Chief Product Officer of a business, or maybe even the CEO or the owner, you’re thinking, “Well, of course we have clear objectives. We want to collect heaps of demographic information of our customers.”

The problem is that in most cases, your data science team will need strategic alignment for the data to be useful. Generally, there are many different stories you can tell with all the data you’ve collected. That’s a fantastic benefit, but it can also be a curse. You need to be thinking from the start about exactly how you will use the data later, and what insights you actually want to end up with, in order to improve your product or process.

What I often see is companies jumping onto the data science bandwagon head first. They start by hiring data scientists without this up-front strategic work, and figure data scientists can just “make stuff better”. But without specific objectives, teams can easily meander off course and spend months or years iterating on models and providing insights that don’t directly support the products and initiatives that make money for the company.

Unfortunately, without having asked the right questions, questions that go to the objective, large disconnects can start and grow between product and engineering. I’ve seen this happen, even with senior data scientists.

That’s why, in an ideal world, I’ll spend a lot of time with a client getting the objective right before we even start poring over data. It’s less about technologies we might use than it is about understanding what drives the main KPIs for our clients.

Here’s a story that will show what I mean.

We were called in to help bring a massive-scale recommender system to a local app company that provides thousands of apps to their worldwide customers.. Their users could download, install, open and close the apps.

At first, the company thought that they wanted to optimize for more downloads. That would seem at first glance to be the obvious objective. Why wouldn’t an app distributor want more people to download the apps they have to offer?

But as we dove deeper into the business, we discovered that what they really wanted was higher engagement. That happened at a whole different place in the customer journey. So what we wanted to be measuring and maximizing wasn’t downloads. It wasn’t even installs. It was a combination of opens and time spent using the app. How often people opened the app — which meant they were actually using it — and who opened most often, combined with how much time they spent using it, was the data we needed to evaluate engagement, discover who the most engaged customers were, how and when they were using the app, and how we could get them to use it even more.

Had the company only gone after data on downloads and/or installs, but NOT opens or time spent using the app, we could never have provided them with the insights they really needed to improve their product and delight their customers. I’m always surprised how often data science teams get this one wrong.

On closer inspection, we’ve noticed that engagement doesn’t look the same for every product. Take Gmail. Some people open it and keep it open for months!. That’s just one open, but in that case it turns out to be a very significant indicator of engagement.

Let’s consider an app that gets a lot less “time spent”; If I’m in Chicago, I might open my weather app a few times a day, but if I’m in San Diego, where it’s always sunny, I wouldn’t need to. So, in this case, opens might not have as much to do with engagement for all users.

So when I build a data science model to show somebody a new app that I predict would receive maximum engagement, what app do I want to show them? Do I want to show them an app that they’ll download or an app that they’ll open all the time, or an app that they’ll keep open all day?

Well, the answer is: it’s not the download or the install. It’s not just a single open and it’s not just keeping it open. It’s a combination of the number of opens with the duration. Eventually we came out with the perfect formulation. It can take time to get there; in the case of this client, it took six weeks of working closely with the product team to define this objective. The take-home is simple:

AI isn’t effective unless we know what we’re optimizing for.
The time up front to define the objective, and what actually constitutes meeting it, is the best time you’ll spend to maximize your data science investment. Companies that skip this step typically regret it later. Because they have a lot of data, but not a lot of understanding.

And that understanding, the kind that helps you iterate and continually improve your product, is why you’re doing data science in the first place.