Reading Time: 3 minutes

In our last two articles “Get Real Business Value from Data” (1) and “Don’t get These Essentials Wrong” (2), I discussed our proven data science process, introduced deployment as the last of three high-level stages, and defined deployment.

In today’s Tech Tuesday, part three of our series on deploying predictive models, I’ll show an example of a simple real-world model we may want to deploy for others to use. This will allow me to show exactly what a model is, and that it’s not as nebulous as you might think. Let’s have some fun!

Say we want to build a house price prediction model, and we only have one input to use: square footage. We can build this very simple model. It’s not very accurate, as you can see below, but it will give us an example we can work from.

The red line is called the trendline (from a least-squares fit to all of the blue points). As you can see it does a pretty lousy job of fitting all the data points. For the sake of simplicity, let’s ignore that. Here’s the chart:

Bennett Data Science Demystifying models

It’s a plot of some recent home prices in Seattle, WA, showing prices in USD plotted versus square footage of the homes.

The red line is our “predictive model”, showing how we might predict home prices solely based on the square footage of the house. Again, it’s horribly inaccurate, but the focus here is on the model, not the accuracy. Here’s the simple model:

Price = $136 x Square Footage + $200,000

For any value of Square Footage, we can predict the price of a home.

Let’s simplify something that’s generally misunderstood. The “model” is nothing to be scared of. Look at how simple it is:

This house price prediction “model” consists of the values: $136 and $200,000 along with the instructions for calculating the price.

Here’s how we can use that model in the real world: Let’s say we’re building a Zillow competitor to help our users understand the market price of their homes. In this context, I hope it is clear what “deployment” needs to do; It needs to quickly calculate a house price for any value of square footage.

But there’s a lot more to it. And here’s where early conversations about deployment pay off hugely!

Data science (or the team responsible for deployment) needs to establish a contract with the team that will actually use the home prices. And this must be done before any data science models are created.

Keeping with our Zillow competitor, here are some important concerns:

  • How fast should the model return a result? I.e. what are the latency requirements?
  • Is square footage always going to be the only input? What are the expected input payloads and output formats?
  • Will the model be responsible to account for results-filtering, such as ignoring houses in areas where the model has shown large inaccuracies?
  • How will we judge the performance of the model once it’s deployed and integrated into the site?

These are important questions to answer early in the process to direct the modeling phase well before it begins. They inform and put constraints on methods that data scientists use. We have found that bringing them up early and establishing the deployment contract is essential for success.

This week is all about understanding what a model is and how to ask questions early to set your team up for success. Next week, in part four, I’ll talk about the importance of these early conversations from a data science team perspective and how they impact the types of models data scientists create.

Of Interest

Coronavirus Accelerates A.I. in Health Care
With millions of cases and outbreaks in every corner of the world, speed is of the essence when it comes to diagnosing and treating COVID-19. So it’s no surprise doctors were quick to employ A.I. tools in an effort to get ahead of what could be the worst pandemic in a century.
https://www.axios.com/coronavirus-artificial-intelligence-medicine-5aa9c365-1f98-4413-8ac9-d1e0cb6e8c85.html

Walmart Employees are out to Show its Anti-Theft A.I. Doesn’t Work
This is a fascinating story about how A.I. can sometimes get it wrong, and this can be quite frustrating to the humans who have to endure the consequences; in this case, false acquisitions of theft.
https://www.wired.com/story/walmart-shoplifting-artificial-intelligence-everseen/

Facebook is Working on a Comprehensive Visual Shopping System, and it’s Fascinating
Read about how Facebook built and deployed GrokNet, a universal computer vision system designed for shopping. It can identify fine-grained product attributes across billions of photos — and in different categories such as fashion, auto, and home decor.
https://ai.facebook.com/blog/powered-by-ai-advancing-product-understanding-and-building-new-shopping-experiences/