One of the most enjoyable feelings I experience as a Data Scientist is watching the error rates fall as I work on my modeling projects. Iteratively increasing classification accuracy through feature engineering or decreasing mean squared error of a regression model through residual analysis is intensely satisfying. One can get lost, as I often do, as the hours and days pass in pursuit of the “perfect” model.
As a team leader and entrepreneur, though, I realize that machine learning efforts are only as valuable as the business outcomes they drive. When I step back from my keyboard and close the Jupyter notebook, I force myself to think about whether my instincts to try a new modeling approach or run a new experiment are intended to achieve business goals or to satisfy my ego. But simply asking yourself these questions in the middle of a project is not enough. In order to maximize the business impact of machine learning efforts, you need a process that ensures technical work is aligned with strategic objectives in the project ideation, planning, and implementation stages.
Here are 5 steps you can take to ensure that your machine learning efforts are properly aligned with the business outcomes you wish to achieve.
- Determine the desired business outcomes – Before beginning any technical work, it’s imperative that you have a clear understanding of what the business wishes to achieve. Achieving these results begins with clearly stating the outcomes. Neglecting this step, say by rushing into the model building process, can lead to a great deal of time, effort, and resources put into building solutions that address the wrong questions.
- Define the business success criteria – Once the desired business outcomes are clearly stated, you need to define the business metrics that will track your progress towards those goals. Specific metrics reduce ambiguity and increase focus. And the easier it is to measure your success, the more certain you can be that you’re achieving the results you set out in the first place.
- Translate the business metrics into a machine learning metric – Machine learning models are trained by optimizing metrics like accuracy or mean squared error. By selecting the metric that relates directly to business outcomes, data scientists can optimize directly towards the goals the business wishes to achieve.
- Establish a baseline – Regardless of the metric you decide to optimize, it’s necessary to establish a baseline measure of performance. The baseline provides a point of comparison that helps track your progress and allows you to judge the rate of return of increasing the complexity of your modeling solution.
- Deploy an MVP quickly and begin iterating immediately – Peter Skomoroch, CEO of SkipFlag and former lead data scientist at LinkedIn and AOL, recently stated that developing machine learning algorithms without user feedback is risky and can lead to unintended consequences. Instead, teams should “ship a complete MVP in production ASAP, benchmark, and iterate”. Here’s what I’ve been reading recently:
Distributed Time Travel for Feature Generation – After validating machine learning models offline, Netflix live A/B tests their models to measure improvements in core metrics like member engagement, satisfaction, and retention. In order to allow data scientists to build models on historical data, Netflix built a time machine to generate features from previous points in time. This enables quick iteration from idea to model to A/B test and ensures feature generation logic needs to be written once.
How Zendesk Serves TensorFlow Models in Production – Zendesk deploys deep learning models to recommend useful articles to customers’ questions. Those chose to deploy their networks using Tensorflow Serving since it provides low latency predictions, horizontal scalability, and a micro-service architecture. Here they provide an overview of their architecture and discuss their experience with model versioning and A/B testing their live models.
d6tflow – This open-source python library allows you to build data science workflows as DAGs (direct acyclic graphs). Building workflows in this manner lets you simply and explicitly state your task dependencies so that running your workflow automatically runs all upstream dependencies. The library handles persisting intermediate results and provides detailed error logging. Check out this template for integrating d6tflow into your project.
Machine Learning – Balancing model performance with business goals – Excellent post from a data scientist discussing business considerations for evaluating machine learning models. According to the author, data scientists should consider performance, cost, and time to understand if a model is fit for production. She provides examples of how to consider each of these for classification, regression, and recommendation algorithms.
Three Ways to Identify the Right Metrics in a Deep Learning Strategy – Identifying the right machine learning metrics that map to business goals should be the first step in new machine learning projects, according to the author of this post. Other steps in maximizing business impact with machine learning include regularizing your models to avoid over-fitting and working iteratively. That does it for this week’s issue. As always, feel free to respond to this email to talk machine learning!