Successful Data Science

A high-level overview of what it’s like to work with Bennett Data Science

If you’re interested in working with us to achieve your data science dreams, this document is designed for you! We’ll show you the proven approach we take with our clients and answer lots of questions along the way. Still have questions? Please let us know.

At Bennett Data Science, our mission is to help our clients leverage data to build better products. That mission is woven into the fabric of all we do. This overview speaks to that mission at each step.

Our Winning Approach
We’ve used our data science process to help large and small companies achieve big wins with predictive analytics. We start by working closely with you to understand your business goals and your data. From there, we hold stakeholder meetings, designed to understand relevant KPIs and what, if any, analytics are currently in place. From these meetings, we write a project definition and milestones.

The following describes this process in greater detail. Our intent is to demystify the process of working with us to achieve your analytics goals. We look forward to working with you!

Our Proven Data Science Process
Our proven data science process involves eight major steps:

  1. Business understanding
  2. Data assessment and understanding
  3. Choose model candidates
  4. Data preparation
  5. Pipeline construction
  6. Modeling
  7. Evaluation
  8. Deployment

1 – Business understanding
We work closely with you to understand your business objectives. Informally, what do you want leverage your data to achieve? These are things like:

  • Personalize product or messaging offers (Netflix, Amazon)
  • Maximize efficiency by matching constrained supply to finite demand (Uber, Lift)
  • Identify and describe user clusters (Facebook, Google)

Your KPI’s help us define your critical business objective(s) in terms of the data your company collects. This is important because we use your data to build models that drive higher KPI’s for your business, leading to the next step…

2 – Data assessment and understanding
A careful assessment of your data informs the type of predictive models we can build. With little or no data, we look to outside data sources (optionally) and help you implement a workflow that builds relevant data as we go. With a surplus of data, we verify that we have the right data for our objective.

Insider knowledge is key. That’s why we work closely with your team to understand caveats, availability, breadth and depth of your data. Since your business likely has its own intricacies, we ask lots of questions to make sure we have a strong fundamental grasp of the sources and meaning of your data and any exceptions that may exist. Before we advance further, we become experts in your data.

3 – Choose model candidates
Armed with a strong understanding of your data, we next consider predictive model candidates. Generally this means beginning with a simple heuristic; something we can build quickly, that avoids complexity and that works well enough in early stages. Many companies make the mistake of skipping this step and jump right into creating advanced models that take a lot longer to tune and develop. By starting simple, it allows us to focus on getting the fundamentals right from the start. Building a rock-solid pipeline quickly pays dividends later on, allowing us to iterate and advance much faster as needed. It’s no wonder the best data science companies in the world do it this way (see Google’s Rules of Machine Learning for one example).

4 – Data preparation
By assessing the data, then choosing model candidates, we understand what is required to make the predictive model(s) work. Our next step is to develop a super-stable procedure to package your data in such a way that our model(s) can ingest it and produce results. This becomes the foundation of our data pipeline…

5 – Pipeline construction
The data science pipeline is the bedrock of enterprise data science. When done right, the pipeline connects all the essential parts of a predictive model, providing reliable infrastructure that powers your predictive models. This pipeline achieves many goals, such as:

  • Collecting data from various sources – such as front end, API’s, databases, live streams
  • Transforming data for use in one or more models
  • Building and promoting models to production

“Focus on your system infrastructure for your first pipeline. While it is fun to think about all the imaginative machine learning you are going to do, it will be hard to figure out what is happening if you don’t first trust your pipeline.” – Google, Rules of Machine Learning

We work closely with your tech team to build the data science pipeline the right way, reducing costs, later tech debt, and saving time.

6 – Modeling
This is the step where we, as data scientists, get most of our street cred. In this step, we build the “model” that takes in your data and outputs your insights.

With a strong data understanding and a rock solid pipeline, this step is usually fairly straightforward. We use our comprehensive knowledge of classic and modern data science methodologies, to select and build a few low-complexity (sometimes heuristic) models. In later iterations, we can swap out a simple, performant model for one of higher complexity and performance, often with no changes to infrastructure. We start with performant but low complexity models first. We verify the pipeline and user experience are solid and only then look to more complex models. We do this when subsequent modeling shows we can achieve a big win by doing so.

We have found the greatest success by starting simple and building off early wins, rather than starting with something complex and troubleshooting if something isn’t right.

7 – Evaluation
In this stage, we assess how well the modeling achieved your business goals. This “offline” testing is done before deployment, and gives us an idea of how models will perform in the wild. For example:

  • Do the new product recommendations resonate with users?
  • Did the supply/demand optimization result in the greatest throughput / revenue combination?
  • Did the churn model help the retention team keep more users than last month?

To do this we break the datasets into “train” and “test”. We train our model(s) on the “train” data that has the answers. Then, to see how well we did, we use the “test” data with the answers hidden from the model. When the model predicts the answers, we assess the accuracy. It’s this accuracy score that determines which model to move forward with. When we feel that our scores are high enough in this offline testing, we’re ready to put the model to use by deploying it…

8 – Deployment
Initial deployment – This is the final stage where we work with your tech team to deploy the predictive model(s) to (usually) your cloud-based infrastructure where it provides predictive insights and value to your customers or products.

An example may help to explain how this works. When you visit YouTube, the site presents you with individually personalized videos. To do this, they have a series of predictive models that live on Google infrastructure that get called each time you visit or refresh the page. YouTube sends your user name and cookie information (for latest browsing history), to a deployed model that quickly returns a list of videos for you to watch. The YouTube front end has to take that information and display it in a way that’s hopefully pleasing to you. Just about every major personalized web service works like this (think Amazon, Facebook, LinkedIn).

Continuous improvement and monitoring – After deployment, we create dashboards to measure model performance as users interact with your service. Model accuracy can change slowly over time, or suddenly due to fundamental changes. These dashboards let us make the necessary adjustments to keep things working smoothly.

To test is to improve. The final stage of deployment involves A/B testing. The offline tests we mentioned above using “train” and “test” data are useful before we have real users interacting with our model(s). In the wild however, real improvement comes from testing new models and assumptions. Our goal is to always be improving through testing.

Conclusion
Our data science method has been battle tested!We used this methodology to help a Fortune 200 achieve the lofty goal of same-day model deployment that once took longer than six months.

We also used our method to help a scrappy start up build an ambitious v1 product that resulted in seven-figure Series-A funding.

Those are just some of the examples, and we’re confident we can do the same for you! Please get in touch if you have any questions or would like to chat about your how we can help you leverage your data to help you build better products.