Reading Time: 3 minutes

Let’s say you want to start small, from one to four people, as you build out a data science team. This week, I’ll talk about how I’ve done just that, many times over, with great success for big and small companies. The key is to start small, hire strategically, and grow around a strong initial data scientist.

For your first hire, especially if this person will be the sole source of your analytics efforts for an extended period, find an experienced (senior) generalist with relevant experience; someone who:

  • Can explain data science benefits and complexity to non-technical stakeholders
  • Has worked in your industry, or related
  • Has worked with IT to get and clean data
  • Has experience building/deploying simple machine learning models

I can’t emphasize enough how important it is to find a good communicator for your first hire. During the interview process, ask the candidate to explain their work. Do you “get” it? You should, and nearly immediately. If the candidate is having trouble explaining something on their resume, imagine how that will play out when she’s in a time crunch, trying to explain a last-minute model change to the dev team or revenue loss to the CEO. There are lots of guides to effective hiring, so I’ll leave it there.

For the next hire, I recommend bringing on two roles, to form what I call The Magic Three. From a high-level, here are the roles, starting from the first hire (Sr. data scientist):

  • Senior Data Scientist – manages and coordinates the efforts of the team, including translating business needs into AI objectives, while handling all the ML modeling
  • Data Engineer – in charge of making data available, working with the senior data scientist to build and administer ETLs for entire team
  • Data Analyst / Jr. Data Scientist – responsible for informing best methods for ETL, providing insights to the company and supporting the senior data scientist day to day

I’ve found data engineers to be invaluable in this scenario, and often times, the next hire or two beyond these three is another data engineer. This is a nod to the complexity data scientists face when cleaning and handling large amounts of data. A good data engineer working closely with a data scientist can alleviate a lot of that complexity. The analyst position is largely a support role and is very valuable as a source of help to the other team members and to provide data driven insights throughout the company.

There are many other titles out there and some may be useful to your company, depending on your situation. Far from exhaustive, here are some other types of analytics professionals with general descriptions:

  • Data Scientist (generalist) – can handle all the tasks done by the specialists below, but generally prefer specializing in one area
  • Machine Learning Engineer – works with models, objectives and metrics
  • Data Engineer – works with data and pipelines
  • Statistician – understands relationships between data and business logic and how this informs better models
  • Data Analyst – adept at understanding and presenting trends in data

To recap, start with a strong data scientist in a central role and when you’re ready, go for The Magic Three. Build out carefully around this group, emphasizing the needs of your particular organization.

Of Interest

Converting Text to Speech – Try It
If you haven’t seen what Google is doing to convert text to speech, this is worth a look. They use technology called WaveNet to synthesize realistic voices. On this page you can type in some text, choose the voice you’d like, including foreign accents, and hear a sample immediately. This goes a long way towards supporting all sorts of automated customer interactions at tremendous scale. Here’s the page:
https://cloud.google.com/text-to-speech/

There’s Nowhere to Hide
A new study shows you can be easily re-identified from almost any database, even when your personal details have been stripped out. Researchers from Imperial College London and the University of Louvain have created a machine-learning model that estimates exactly how easy individuals are to reidentify from an anonymized data set. You can check your own score here, by entering your zip code, gender, and date of birth. Here’s the article:
https://www.technologyreview.com/s/613996/youre-very-easy-to-track-down-even-when-your-data-has-been-anonymized/

Find that Perfect Outfit Using AI
Facebook’s Fashion++ system uses a deep image-generation neural network to recognize garments and offer suggestions on what to remove, add, or swap. It can also recommend ways to adjust a piece of clothing, such as tucking in a shirt or rolling up the sleeves. Previous work in this area has explored ways to recommend an entirely new outfit or to identify garments that are similar to one another. Fashion++ instead aims to suggest subtle alterations to an existing outfit that will make it more stylish.
https://ai.facebook.com/blog/building-ai-to-inform-peoples-fashion-choice/