Reading Time: 3 minutes

This isn’t a convince you to work with our group post. Not at all. In fact, this question came from someone within Bennett Data Science. And the answer lets me talk about how we use modern technology to completely demystify the data science process while making things simple to monitor and track. This isn’t nearly as boring as it sounds! It’s really important stuff. In this post, I’ll show you how we handle most of our clients in terms of IP, data and accountability. Let’s get into it.

Let’s look at some of the pros and cons of hiring in house and using an expert consultancy (because there are all levels!). Both can walk at any moment (at least California is an at-will state).
In-house data scientists:
Pros:

  • You have complete managerial control of your team (hours worked, supervision, etc.)
  • Can be better for long-duration projects in some cases
  • They won’t work on other projects (formally)
  • The IP they contribute belongs to your company

Cons:

  • You have to control (manage) your team – and managing data scientists can be challenging
  • If you want higher capability or skill level you have to train or hire for it
  • Without careful management, there can be a lot of idle/wasted time

Consultants:
Pros:

  • You get specialized talent for as long as you want/need it
  • At the high levels, little to no mentorship/management/supervision is needed. In fact it often flows in the opposite direction
  • The IP contributed belongs to your company
  • You pay for what you actually need (there’s no water-cooler talk)

Cons:

  • Some leaders don’t feel that using consultants contributes to building the company culture and IP
  • Initial costs can seem higher (but with value pricing, that notion goes away)
  • Availability
  • Legal considerations

Something I’m very sensitive to that comes up in both lists of Pros is “IP”. Doesn’t a company retain more IP with in house employees? It really depends, but assuming the contract is worded in such a way that the IP generated from the contract work belongs to the company, there is no down side to using a contractor.


Let’s look at some of the tools we use on a daily basis that will demonstrate how efficient data scientists can be at sharing:

  • Data
  • Results
  • Code
  • Reports

I wrote a bit about this in What Is Your Data Science Team Really Up To? and will add a few things here.

Tools of transparency:Most of the tools we use provide incredible transparency into the process of what we do and our progress in making your products profitable.

  1. Jupyter Notebook – this is a web-based tool for authoring Python (and R, among others) right in a browser window. And the engine can (and I hope does!) live on the client’s infrastructure. In this manner, when we hit “save”, it’s being saved on a remote machine (YOUR machine). In fact the code is never saved to our computers. Ever.
  2. Cloud-based data stores – these are the norm. It should be no surprise that it’s been a long time since we’ve pulled data locally. We just about flat out refuse it. We’d much rather access your data on your infrastructure (that you’re probably renting from Google or Amazon)
  3. GitHub – this is where our code lives. And again, you own it. We “push” it there to keep it safe (and versioned). And why GitHub? Because it plays nicely with Jupyter Notebooks, allowing CEO’s to analysts to see data science work as it gets committed.

Here’s an example of a Jupyter Notebook entry that you might see in one of our GitHub pushes:

Notice the plain-text title, the code that made the chart, and the chart. All simple to understand, and all self-documented. Literally anyone could pick up this code an reproduce the work. The IP belongs with the client!

Using such flexible and internally accessible tools is a cornerstone to our approach. We want our clients to be in the loop every step of the way. We need them to feel like they own the process, the data, the IP, and that at any moment, a new person could walk in and come up to speed very quickly.
There’s nothing hidden here. With those tools we achieve almost everything we need to do in terms of bringing predictive models to fruition and supporting amazing products for our clients.

Zank Bennett is CEO of Bennett Data Science, a group that works with companies from early-stage startups to the Fortune 500. BDS specializes in working with large volumes of data to solve complex business problems, finding novel ways for companies to grow their products and revenue using data, and maximizing the effectiveness of existing data science personnel. https://bennettdatascience.com