Last week, I talked about generating real business value from data science projects and shared our proven 10-step data science process. In its simplest form, three main steps are involved:
- Preparing the data
- Doing the machine learning
- Deploying the model we just built
Today, let’s talk about the importance of each of these steps:
Preparing the data:
This is where data scientists spend up to 90 percent (!!) of their time. Have you ever wondered why data science projects don’t work on “sprints” that most dev teams adhere to? The main reason for this is that it’s quite tricky to clean and prepare the data for repeatable, sustainable machine learning – which is essential for successful deployment.
The initial predictive model can (and should) be something simple. So much so that we’ll skip over this step for now. For example, instead of first building a highly-complex churn model, how about a simple “model” that only counts the number of times a user visits the “contact us” support page? At a first pass, that might be an accurate indication that a user is ready to churn. We don’t need complex machine learning in the beginning. Start simple. Build that pipeline from data prep to deployment. When everything is working swimmingly you can move on to building (more) complex models. Not before.
Deployment may be defined as the act of retrieving insights from a model. For example, let’s say we’re using a simple model to predict house prices. And we built the model to predicts the house price for a given square footage. In this case, the first goal of the deployment phase is to make that simple calculation available for the world to use. Later goals may include making sure the predicted house prices stay accurate over time.
In next week’s Tech Tuesday, we’ll walk through a simple real-world example using house prices and show exactly what it means to “deploy a model”. This is where we unveil the magic – stay tuned!
Data Science From Home
Read how a data scientist at Walmart Labs handles working from home and his answer to the following question: How will the field change in response to the new normal?
KDD 2020 Opens Registration for 26th Annual Conference With Fully Virtual Program
This is one of the most popular data science conferences and one I look forward to each year. Now, virtual. Read more about it here:
The Need for Speed: Faster A.I. Adoption Requires a new Plan
With so many suddenly working from home, companies are looking for ways to use A.I. to help with this transition and keeping employees connected and safe, while also keeping customers happy and satisfied. It’s essential, however, that in the rush for A.I. adoption, business leaders take the time to consider the human factor in A.I. adoption.