Reading Time: 4 minutes

by: Zank, CEO of Bennett Data Science

In business, as in life, change is a big part of our day to day. There’s not much we can or even should do to stop it. It’s inevitable. And changes affect business. There are longer-term trends, like consumer spending, short term changes such as spikes and troughs around major holidays and black swans we’re not prepared for, such as interest rate hikes or another Facebook investigation.

With change to people (your customers) comes changes to product consumption. So, if you’re relying on predictive models to personalize service to your customers, it’s imperative to watch out for contamination of your models! As usual, I have some questions:

  • What should be expected when one of these changes occurs?
  • Is the data science model supposed to somehow anticipate change?
  • How about sales on Christmas Day?
  • Or traffic during the Super Bowl?

Some events are rather predictable. But a housing crisis, or major news event can be much more difficult to handle. For most unknown events, we can’t do much to insulate predictive elements from big changes in user behavior. Luckily most of these types of events are short-lived.

Data science model freshness

Let’s look at what happens when seasons or trends change and what data scientists can do to be sure that predictive models stay accurate. To illustrate exactly how we maintain model “freshness”, I’ll show how most data science models are created.

Let’s take a step back and look at the data that models are built upon and assessed with. From a ten thousand foot view, it looks something like this:

We use training data to build the model, and testing data to see how well it performs on new, never-seen data.
The blue portion represents somewhere between 70 and 90% of the available data. It’s that data we use to predict what’s in the testing data. How well we can predict the testing data gives us, generally speaking, an idea of how accurate our predictive model is. Data scientists carefully construct training and test sets for nearly all new models we build. These are called offline tests because they’re done on a test set, not to new, real, users in production.
Once we see strong performance in these offline tests, it’s time to train on ALL The data and put our models into a production environment. In that case, the predictive model would be trained on this:
Then we feed that data into the production model that serves up predictive, highly personalized results to users.
You can see that as users require some personalized result, the model takes in some user data and for each user, produces a personalized result.
Then, to handle new users or items that may have been introduced recently the model is rebuilt every night (or more or less often) from a new set of training data. The most common way to do this is to go back, say, two months to get the training data.

Training

Model retraining is essential to keeping the predictive model fresh to changes in the user base. Generally speaking, model retraining is done specifically to keep up with new users and changing tastes.
But what happens during an anomalous event? If the event wasn’t in the training data, there’s virtually no way the predictive model will respond to it in a personalized manner. Here’s what can we do to:

  1. Be proactive for expected changes to our training data
    Models built to respond to expected changes like seasonality or holidays generally benefit from having access to the same times in prior years. These dates or corner seasons can be built into predictive models and handled in a date-aware manner. This means using time as a predictor in our model and going back a few years with the training data. I’m making a lot of simplifications here! Many times, it’s not feasible to go back years, as there can be too much/little data, etc.
  2. Be reactive to recent, unexpected changes

    When unexpected changes occur, if reflected in the training data, we can update the model to account for this change, or incorporate new data sources where necessary.

    For drastic changes, it may become imperative to greatly reduce the look- back of the training data to just a few weeks. This can reduce bias of “How it used to be” and emphasize “How it is now”.

Careful construction and monitoring of data science models is essential to long-term health! Change is a wonderful thing, especially when we can respond to it in a way that shows our customers that we’re paying attention.

Thanks for reading & talk soon!

-Zank

Zank Bennett is CEO of Bennett Data Science, a group that works with companies from early-stage startups to the Fortune 500. BDS specializes in working with large volumes of data to solve complex business problems, finding novel ways for companies to grow their products and revenue using data, and maximizing the effectiveness of existing data science personnel. https://bennettdatascience.com

Signup for our Newsletter