The Surveillance Economy

The Surveillance Economy

By | Tech Tuesdays
Reading Time: 3 minutes

Do you ever think about how you might be uniquely identified in a huge database of other people/customers? For example, in the United States, we have social security numbers. Each person gets a uniquely identifiable number. Personal email addresses are similar, with the rare exceptions where people (usually couples) share an email address. Phone number is another one, albeit not used as commonly as the first two. Device identifiers (in the case of apps) can be another.

These unique numbers help analytic professionals track what people are doing. Hopefully, they use this info to provide useful and time-saving personalization. When you go to Netflix, you expect your content feed to be highly personalized. Netflix generally does a good job of this, and it’s all by leveraging your unique identifier; in this case, your email address.

The trouble arises when customers use their email addresses in other places, leaving an electronic trail of personal actions. Sure, we all expect to use our email address to log into all the various sites we use; think Amazon, Netflix, Gmail, etc.

The average user, however, has used their email address in tens if not hundreds of apps. If the average person has 60-90 apps on their phone, how many required email to sign up? How many “other” services require email? Did you give your email the last time you checked into a hotel? How about for your grocery store rewards card? The places to give up this valuable identifier are growing all the time, because not only is your email address unique to you, it’s a way to contact you!

Companies are becoming as bold as to ask for an “electronic signature” when attending company events. Imagine this: you walk up to an event and to gain access, you need to give your name, email address and even an image! This is called signing with your identity. It really is that simple. Your email and photo uniquely identify you. One provider of software that powers this sort of corporate gatekeeping is called Envoy ( Here’s the hero statement from their website:

Welcome to the modern workplace
From people to packages, Envoy helps you handle everything that comes through your front door.

Wow! PEOPLE! That’s scary as hell! Here’s why:

Big companies like Pandora, FitBit, Yelp and likely Uber already use this software. Think about what Envoy knows about you. Your latest haircut. Your age. That black eye you got from tripping in your hallway two months ago. It’s all captured if you submitted pictures each time you visited one of their clients.

This is frighteningly similar to what the Chinese in Hong Kong are fighting against with their lives.

Here’s an article written by Fortune’s Adam Lashinsky that goes into his personal experience using Envoy, the “ick” factor he experienced:

Of Interest

18 Impressive Applications of Generative Adversarial Networks (GANs)
The authors review a large number of interesting applications of GANs. You’ll see the types of problems where GANs can be used and useful. It’s not an exhaustive list, but it does contain many example uses of GANs that have been in the media.

Meet Barbara Liskov. She Invented the Architecture That Underlies Modern Programs
She pioneered the modern approach to writing code. She warns that the challenges facing computer science today can’t be overcome with good design alone. “Designing something just powerful enough is an art.” Good code has both substance and style. It provides all the necessary information, without extraneous details. It bypasses inefficiencies and bugs. It is accurate, succinct and eloquent enough to be read and understood by humans.

America’s Math Curriculum Doesn’t Add Up
Most high-school math classes are still preparing students for the Sputnik era. Steve Levitt wants to get rid of the “geometry sandwich” and instead have kids learn what they really need in the modern era: data fluency. They get at questions like, “Does anyone actually use the math we are teaching in their daily life? Is there any benefit at all to learning this stuff? And are there not more interesting and useful things we could be teaching them?” (Thanks for sending this my way, TH-D!) Listen to the full podcast (or read a transcript) here:


A.I. and the Trillion-Dollar Fashion Industry

By | Tech Tuesdays
Reading Time: 3 minutes

A.I. has come a long way in supporting fashion, a trillion-dollar industry that is still on the rise. In this Tech Tuesday, I’ll discuss some of the big advances we’ve seen and developed for our clients especially where computer vision is concerned. Computer vision is how computers gain understanding of digital images. First, I’ll write a bit about why computer vision works so well with apparel, then I’ll discuss a couple of applications: incredibly accurate similarity search, color extraction and collaborative filtering without training.

Why Does Computer Vision Work so Well with Fashion?
People buy clothing. And clothing doesn’t come in packages (usually). You know when you buy cereal and it comes in a box that looks nothing like what you’ll actually eat? Well, that’s different than that sweater you’ve been thinking about as the weather turns colder.

You buy the sweater. The exact item in the picture. This is true with a lot of apparel and accessories, such as watches and jewelry.

To compare two different sweaters, we can use huge amounts of information extracted solely from the images. Imagine trying to compare frosted flakes to granola by looking at the box; good luck!

Think about what is typically done. You go to a site and (gasp!) type “henley sweater” into a search box. Do you want a “text description”? No, you want the sweater. The one in the picture. But what you generally get is a list of products that are supposed to “look” like the text you entered. That’s a major disconnect. With apparel, we simply don’t shop for text, we shop for that special look (no pun intended) that speaks to us.

Accurately Find Similar Images
We’ve developed methods to quickly search massive catalogs of apparel and find similar items based solely on attributes extracted from images; and it works shockingly well. And this makes sense. When looking for visually similar items, we search for what a customer wants, not how the customer describes what they want. What a difference!

Collaborative Filtering Without Training
We have a large pre-trained fashion model. And we can use this model with transfer learning to immediately work with new clients that have little or no interaction data. Collaborative filtering is what’s used when we talk about “people who like what you like, also liked…”. Again, computer vision plays an essential role to help us recommend products based on visual similarity.

Color Extraction
We developed Color IQ to take the biggest pains out of accurately assessing the color of garments while ignoring background colors and skin tones. Color IQ relies heavily on computer vision. Accurate color extraction is currently a very expensive and manual, time-consuming process. Color IQ is designed to help companies leverage color for every item in real-time, across big catalogs.

Read a bit more here:

Of Interest

Danger – Statistics and Theorems 😉
Here’s a very simple overview of the Central Limit Theorem. The Central Limit Theorem allows us to make assumptions about large groups of individuals when we don’t have exhaustive data. In fact, this theorem is behind a lot of what data scientists do, since we don’t generally have access to entire populations. Take a look at this very accessible article:

Why is There so Much Money in Las Vegas?
It’s simple really; it comes down to the Law of Large Numbers. This law says that if enough people play blackjack, the house will realize winnings equal to the known probability of blackjack. In general, the house has a 1-8% advantage, depending on how well someone plays (and avoids the free drinks!). But even with low odds for the house, eventually, the house always wins. This is important. We can all flip five heads in a row on an unbiased coin. But after millions of tosses, we’ll end up with 50/50 to a lot of decimal places. Always.

How AI in the Exam Room Could Reduce Physician Burnout
A surge of new healthcare products – ranging from wearable consumer health trackers to diagnostic algorithms promising to improve medical outcomes and costs with artificial intelligence (A.I.) – is prompting physicians and hospital executives to ask a fundamental question: “Are these technologies solving the right problems?”

data science technology

Data Science Isn’t a Technology

By | Tech Tuesdays
Reading Time: 3 minutes

Data science enhances product. It’s part of product. But far too often I see data science teams reporting to the CTO or a technical leader and I think that’s wrong, and part of the reason data science isn’t always as effective as it can be (read: why it fails).

Don’t get me wrong, data scientists need technical prowess and must work cross functionally to build strong data and deployment pipelines, but when those areas are complete, model building and insights should be driven by product teams. Our industry needs a course correction. Some companies get it. Fortune’s excellent newsletter, Eye on A.I. has covered this topic recently, espousing the idea that effective data science comes from not only stakeholder buy-in but understanding.

To drive home this idea that data science supports product (and should report to product leaders), here are a few of the applications of data science that are product driven:

  • Product recommenders – one of my favorite areas of data science. Product recs are there to add personalization to a product catalog; to increase customer engagement. When more customers see what they want, when they want it, everyone wins. After all, that’s why customers come to a site. Product recs should be driven by a product lead. This is a person who understands the product catalog, where profits come from and what’s required to grow revenue. It’s not necessary to understand complex data pipelines to effectively manage a data scientist running product recommendations.
  • Customer Lifetime Value (LTV) calculation – this is a model that predicts the value of a customer at any given time. It’s important for companies of all sizes to understand customer LTV. Think about how much it costs to acquire a new customer versus incremental sales from existing customers. For most industries there’s no comparison; it costs vastly more to acquire new customers. And marketers know this. That’s why customer retention is essential, and this is driven by a product, retention or sometimes a marketing team. Specific offers and reach-outs can reengage old customers, and using data science to increase LTV can help companies identify the appropriate customers and even prescribe appropriate methods. Churn prediction is closely related to this topic.
  • Customer Segmentation – Customer segmentation helps companies understand groupings of their customer (or products) based on myriad metrics. Often, segmentation helps marketing and sales reach the right customers or prospects with more personalized messaging or offers. For segmentation, we almost always see marketing teams driving the objectives for data science. And this makes sense, as the insights are directly tied to a major concern for marketing teams: getting the right message to the right customer or prospect.

Those are just a few examples of why I believe end users of data science are not technical teams. The implications here are huge.

Initially, data science must align with tech. But as soon as it’s operationalized and functioning, data science almost always enhances product and should align to support product. In terms of company structure, I prefer to see the head of data science report to a CPO or, optimally, the CEO.

What do you think? I’m taking a strong stance here and I’d appreciate your feedback. Have you seen something different work better for your company? Hit reply and let me know.

Of Interest

A.I. Used to Impersonate CEO
Criminals used artificial intelligence-based software to impersonate a chief executive’s voice and demand a fraudulent transfer of €220,000 ($243,000) in March in what cybercrime experts described as an unusual case of artificial intelligence being used in hacking.

Using Neural Networks to Generate New Song Lyrics
It’s nothing new to generate lyrics in the style of a particular artist. But now, with the release of the full GTP-2 language model, results are better than ever (read: very, very believable). In these two links, the authors describe how to generate text in the genre of some input text, such as song lyrics or even chats. And

Google Collects Health Data
Google has confirmed it’s collecting health data on millions of Americans through a new partnership with Ascension, one of the country’s largest nonprofit health systems. The tech company and Ascension confirmed they were working together to analyze patient data to give health care providers new insights and care suggestions for patients. The project, codenamed “Project Nightingale,” was first reported by the Wall Street Journal Monday.

comments readers

Comments from Our Readers

By | Tech Tuesdays
Reading Time: 3 minutes

Over the months I’ve written Tech Tuesdays I’ve received a lot of positive feedback and praise. Thank you all for your thoughtful feedback and comments! This week, I’m reflecting back on some readers’ comments. If you want to reach me directly, please hit the reply button and tell me what’s on your mind. It makes my day.

I’d also like to take this opportunity to make a small ask: if you enjoy reading this newsletter and know a colleague who might also want to read it, please hit the forward button and send it their way. It’s that easy. Thank you!!

Now, some reader comments …

On “The trouble with star ratings”,

KK wrote:
Yes! Even worse are airbnb ratings, where a 4.5/5 stars means something is wrong. Mostly because the vast majority of people who stay in someone’s house, and actually meet and interact with that person, will feel bad to give them anything but 5 stars- even when the place clearly has flaws. And then coffee shops. I want to know how the wifi is and if there are outlets. A shop can have 4.7/5 stars and have non-existent wifi, but everyone was rating the taste of the coffee.

On “Loss and the importance of diversity,” (This was the week I lamented the passing of a dear colleague of mine),

DO wrote:
Life’s ironic juxtaposition between bitter and sweet is never lost on me, and this is certainly a poignant example. Clearly you will keep shining her light through yours, and for that I am thankful, and we are all the better for it. Take care my friend – keep on the good fight, but most importantly, keep on the great dance!!

BK wrote:
I’m very sorry about the loss of your close friend Zank. It’s heartbreaking to hear stories like this. It’s my 14th wedding anniversary today and I have three young healthy boys and a loving wife. Life can get crazy at times, and stories like this make you realize just how fleeting life can be and a reminder of what matters most. May her memory be eternal.

ES wrote:
Thank you for sharing Liz’s story and her tribute. It’s really sad to see someone with such great influence and potential cut short. This is a great post, we too often forget to thank and express our gratitude towards those that positively influence us, I thank you for that.

On “First hires when building a data science team”,

LP wrote:
Lots of good lessons here for building a first team. Might want to follow up with how to find these folks e.g. look for experienced people who’ve worked with data but may not have had the data scientist title.

We’re putting all our TechTuesday newsletters on their own Bennett Data Science webpage soon. If you want to go back and find something or share with your colleagues, we’ll soon have you covered. We’ll let you know when it’s live.

Of Interest

State-of-the-Art Natural Language Processing in Ten Lines
Here’s a natural language processing (NLP) library that’s getting a lot of attention and uses some of the most current and powerful NLP models to date.

Supervised vs. Unsupervised Learning
Supervised and unsupervised learning are two of the three major branches of machine learning (the other is reinforcement learning), but what’s the difference? Why do we need both? Which one is better? Let’s get into it!——data_science-5

Predicting Startup Failures Using Classification
Using data from CrunchBase, this data scientist looks at how to predict startup success or failure. The more money a company raises the more likely it is to succeed. And, for each additional $1 Million a company raises per round (on average) their odds of success will increase by 16%. The longer a company can last between funding rounds, the better it is for them; each additional month between funding rounds increases the odds of success by 5%. Read more here:——data_science-5

Language Processing That’s So Good Humans Can’t Tell It’s a Machine
Remember that scary AI text-generator that was too dangerous to release? It’s out now:

Mentorship is not Optional

By | Business, Tech Tuesdays
Reading Time: 3 minutes

by: Zank, CEO of Bennett Data Science

Mentorship is probably the single most important source of outside help I’ve ever received. I spent 10 years at a big consulting company and I worked for a wonderful man, Duane. He was slow.

This is all for good reason; when Duane started programming, he had to punch holes in punch cards and feed them into a computer manually. When something went wrong, he had to recreate the entire stack of cards. So when he started using modern computers, he held onto this careful methodology.

And it drove me crazy. But when he finished slowly finger-pecking a few lines of code into the terminal and the code ran, he always got the result he was after.

What I learned from Duane

I think of him every time I attack a new problem as crazy-quickly as I can and it errors out and I fix it and run the code and there’s another error and so on. Lots of people do it this way.

And when I don’t get an answer I like, I use breakpoints and print statements in my code to let me know how things are progressing. Duane was different. He’d look up at me and say something like, “The answer is roughly one divided by the square root of the number of samples.” Please go run the analysis.

I remember coming back after a week of work only to tell him, “Yep, the answer is about 1 over root n”. I can’t shake that memory. He had intuition, and I wanted it. So I started to ask him questions and spent as much time with him as possible. Sitting with him at lunch? Yep. Sailing trip in San Francisco? You bet! Over the years we worked together, I became more confident and a much better coder and critical thinker. He was the most important mentor I ever had in terms of technical skills.

Mentorship today

Nowadays I spend a lot of time with tremendous business people, learning how to best work with clients and grow our firm. I apply the same approach that I did with Duane. I get close, I ask questions and I listen a lot. And the advice I received helped me tremendously.

Convinced yet?

If so, you may wonder how you can find a mentor? Or maybe you’ve been around the block and you’d like to start mentoring others yourself. How can you find people who need your help?

The answer… find someone you’re interested in working with and ask them.

People have asked me for help over the years and I formed wonderful relationships with more than a few budding young scientists and entrepreneurs, as a result. And I’ve asked others to help me. Something like, “Hi Jenifer, I really admire how you XYZ, and I think I could learn so much from you. Would you be willing to have a quick half-hour meeting with me once every two weeks?”

Accountability calls

Darren Hardy recommends a weekly accountability call (based on yearly or quarterly goals) that can go something like this:

  • Name three things that went well this week?
  • What thing or things didn’t go well this week?
  • Specify what you are going to do to change that?
  • What was your one a-ha from the week?

I do this each week, and have been for over two years with my mentor, and I get a ton out of it. (Thanks S.C.!!)

Still need convincing? Check out this wonderful book, Your Best Year Ever by Darren Hardy. He talks about mentorship and a whole lot more. I highly recommend it!

data productivity

Four Ways to Increase Data Science Productivity

By | Tech Tuesdays
Reading Time: 3 minutes

Like other roles, data scientists need to have a few fundamental needs: Such as: ownership of, and credit for their work, ongoing education, and access to mentorship. Here are a few ways to achieve these goals:

  1. Make it easy to deploy AI models quickly
  2. Be sure data is stored in such a way that it facilitates team throughput/efficiency
  3. Increase number of group projects/involvement
  4. Hold frequent individual reviews

Here are each four in more detail:

Reduce Deployment Time
When data scientists struggle to deploy their models, their work and hopes go unfulfilled. It’s essential to facilitate quick model deployment. This starts with a good relationship with dev ops. Be sure dev ops understands exactly what they’re deploying and how often it needs to be refreshed. Develop an understanding of boundaries between dev ops and data science so dev ops isn’t trying to recode models and data science isn’t trying to provision cloud resources or tune load balancers. When this is done right, data scientists feel empowered to build more and better models. With long deployment time, data scientists start to wonder how their work helps the company, and sooner than later, they’ll go off to another company where they can deploy their work and get credit for it!

Increase Team Throughput
Storing data in the right format can drastically reduce the time a data scientist spends handling data. This means more/better models get built (and deployed) sooner. Imagine the case where a simple task like pulling some user data takes hours of complex joins across 10+ tables. That’s a recipe for a lot more than just errors. It slows everyone down. Fixing this problem by creating a user aggregate table (for example) fixing the problem so that all subsequent pulls can be done quickly. Find out what data your team uses or needs daily or weekly and make sure they have straightforward access to it. I’ve seen this technique unlock teams, with the added bonus of all-important data consistency.

Work in Groups
Group engagements are more efficient, more satisfying to the team and achieve better results faster. If you have projects with a single technical resource, consider adding another person. I’ve found it to be only beneficial.

Have Frequent Reviews
This can be quite time-intensive for management. In particular, it requires real time commitments as well as an agreement on where each data scientist stands in the company. Data scientists are like any other employee in that they want to know where they sit on the career ladder and what’s required to advance to that “next level”. Going over major accomplishments is easy and is something most leaders do at least weekly with teams. Guidance on upcoming work is more challenging and positioning can be quite delicate. This one can be lots of work but is well worth it in terms of keeping employees who might otherwise move away.

Periodic (quarterly) one-on-one reviews work well. Topics covered:

  • Major accomplishments
  • Upcoming projects/goals
  • Kudos for good work
  • Guidance for upcoming work
  • Position in company/team

I’ve used these four techniques extensively and have seen very positive results. Hit reply if you’d like to let me know if you think I missed something fundamental. I’d love to hear from you!

Of Interest

Data Science is not a Science Project
Wow, what a title! That’s exactly how I feel. Data science is there to support products, not be an excuse for a room full of smart people to science the heck out of a mountain of data. I know how strange this sounds, but it happens all the time and costs tens of millions of dollars each year in wasted productivity. The report says that at least half of analytics results never make it into production. Wow, we can do better! Here’s the article:

Worth Reading Again and Again
Google published this Rules of ML many years ago and it’s still my favorite read on the topic. I highly recommend it!

Watch a robot solve a Rubik’s Cube
I never learned to solve these when I was a kid, but they always fascinated me. This post is well done, with interactive visualizations and fun videos.

The Yes
30 data scientists join the former COO of StitchFix. They’ve raised a ton of capital. This is one to watch!

data testing

The Power of Efficient A/B Testing

By | Tech Tuesdays
Reading Time: 3 minutes

A/B testing is such an important step in product development because it’s directly tied to revenue. Data-driven companies know this. They run thousands of tests concurrently and even have teams that develop in-house systems for monitoring all these tests. It’s a massive part of their lifeblood. And for good reason: data science models and product enhancements can’t be considered improved unless they’re measured. And A/B testing is how that’s generally done. But like anything else, some companies get it right, and some don’t.

It’s really easy to waste a lot of time and resources on A/B testing. You’ve got to know how to make tests work for, not against your business.

And if you’re not yet testing, there’s no better time to start than now! After all, what gets measured gets managed, and A/B tests are ultimately just powerful measurement tools.

I’ll assume you’re on board with the importance of testing. I’m glad that’s out of the way! There are two main ways we usually see companies approach tests: using frequentist and probabilistic methods. Frequentist methods involve p-values and not looking at the results before they’re done (no peeking!). They also involve lots of impressions, meaning that you’re going to be sending thousands of impressions to option A and B, regardless of how well each performs during the test. Probabilistic methods are much more relaxed about statistical significance and avoid huge number of impressions by adapting in real time to the preferences of your customers.

For one recent client, changing from frequentist A/B tests to a probabilistic framework shaved months off their iteration time, reducing testing time from 3-4 months to as little as half a day. That’s because the number of impressions required for probabilistic tests to “pick a winner” is much lower. Drastically lower.

The probabilistic method I’m talking about here is the Multi Armed Bandit (MAB). And it’s what big companies like Google use. It’s fast. It self optimizes (meaning that as the A/B test is running, it actually starts to send more traffic to the better option). And the MAB is very simple to use.

If you’re still using regular ole A/B testing and you’re tired of seeing thousands of impressions go by before you’re allowed to pick a winner, have a read through this article and see if a MAB approach might save you a lot of time and money.

Of Interest

When A/B testing doesn’t tell the whole story:
Ever wonder if the winner of an A/B test is actually the best long-term option? In other words, what if option A gets more clicks, but those who chose B went on to have higher customer lifetime value? In that case, A/B testing may not be the right tool. And that’s where reinforcement learning comes in. Google’s Deep Mind has open-sourced some new libraries to be used in this space. Learn more here:
How should we handle exams when A.I. is available?
If a university has no way of determining whether an assignment was written by a human or an algorithm, existing grading systems lose any semblance of meritocracy or fairness. This article dives into the power of neural networks to complete assignments that we cannot currently tell aren’t authentic and what we can do about it:
Keeping with this week’s theme:
Here’s a Collection of A/B Testing Learning Resources: Newbie to Master:

The Importance of Selling Data Science

By | Business, Food for thought, Tech Tuesdays
Reading Time: 3 minutes

by: Zank, CEO of Bennett Data Science

Data science is complex stuff. From problem formulation through implementation and deployment, it requires expert knowledge from (hopefully) highly trained professionals. And for data-driven organizations, data science is felt companywide. But with all that complexity, how are stakeholders meant to get their hands around what analytics has to offer? How can we get the head of marketing pumped up about our latest classification model?

When presented as a “classification model”, well, good luck. That’s not what anyone wants to hear about. And it’s the duty of analytics professionals to meet stakeholders way more than 50% of the way there. I call it Data Science Sales, and it’s a heck of a lot of fun to watch it work. Here’s what I mean.

Data Science Sales

Instead of touting a new “classification model” in the next weekly standup, how about showing the effects of reduced churn and increased LTV on a time-compressed simulation of users on a dynamic map of Europe? What about showing the effects of seasonality on fashion trends in the Northeast of America by showing collections of clothing where stakeholders can choose the month and state? These types of visualization are not (generally) done in Tableau or some other BI tool, because this isn’s BI. It’s data scientists building dynamic web pages to sell their latest breakthroughs to stakeholder, and the effects happen quickly and are long lasting.

The intent is to remove that layer of complexity surrounding data science and replace it with a gut-feeling of how all that tech can affect the business. I urge data scientists to show off. A lot.

It’s so important to make sure that stakeholders understand how advances in analytics could affect their team and products. And the fist step to doing this is to show them with visualizations they’ll immediately understand and remember.

In my first job at Abbott Labs, I made the somewhat clumsy screen on an Abbott Labs hospital pump turn into the full-color screen of a Palm Pilot using Flash (yes, Flash, gasp!). It was such a hit that three years later, at a new job, I got a phone call from an executive who wanted to know where he could find that video. In other words, I lead with feeling. I sold this visualization and it got a lot of people taking and thinking.

Instilling Data Science Sales in Your Organization

How do you instill Data Science Sales in your organization? It’s pretty simple, or at least, here’s what I did in the past that’s worked well.

  1. Ask operations to set up a secure sandbox area and give the entire data science team FTP access
  2. Find someone to run a tutorial (2-3 hours) about how to create bespoke visualizations with HTML, Javascript (Ajax) and CSS and provide one example of a finished page as a template
  3. If possible, ask each person on your data science team to put up at least one visualization that week
  4. Send the pages around to relevant stakeholders in your company

Using visualizations as a central pillar of Data Science Sales has worked for me many times and I like the approach because it empowers the data science team, promotes knowledge sharing and increases cross-functional communication.

Have you had success doing something similar? Please drop me a line and let me know.

Zank Bennett is CEO of Bennett Data Science, a group that works with companies from early-stage startups to the Fortune 500. BDS specializes in working with large volumes of data to solve complex business problems, finding novel ways for companies to grow their products and revenue using data, and maximizing the effectiveness of existing data science personnel.

data hiring

First Hires When Building a Data Science Team

By | Tech Tuesdays
Reading Time: 3 minutes

Let’s say you want to start small, from one to four people, as you build out a data science team. This week, I’ll talk about how I’ve done just that, many times over, with great success for big and small companies. The key is to start small, hire strategically, and grow around a strong initial data scientist.

For your first hire, especially if this person will be the sole source of your analytics efforts for an extended period, find an experienced (senior) generalist with relevant experience; someone who:

  • Can explain data science benefits and complexity to non-technical stakeholders
  • Has worked in your industry, or related
  • Has worked with IT to get and clean data
  • Has experience building/deploying simple machine learning models

I can’t emphasize enough how important it is to find a good communicator for your first hire. During the interview process, ask the candidate to explain their work. Do you “get” it? You should, and nearly immediately. If the candidate is having trouble explaining something on their resume, imagine how that will play out when she’s in a time crunch, trying to explain a last-minute model change to the dev team or revenue loss to the CEO. There are lots of guides to effective hiring, so I’ll leave it there.

For the next hire, I recommend bringing on two roles, to form what I call The Magic Three. From a high-level, here are the roles, starting from the first hire (Sr. data scientist):

  • Senior Data Scientist – manages and coordinates the efforts of the team, including translating business needs into AI objectives, while handling all the ML modeling
  • Data Engineer – in charge of making data available, working with the senior data scientist to build and administer ETLs for entire team
  • Data Analyst / Jr. Data Scientist – responsible for informing best methods for ETL, providing insights to the company and supporting the senior data scientist day to day

I’ve found data engineers to be invaluable in this scenario, and often times, the next hire or two beyond these three is another data engineer. This is a nod to the complexity data scientists face when cleaning and handling large amounts of data. A good data engineer working closely with a data scientist can alleviate a lot of that complexity. The analyst position is largely a support role and is very valuable as a source of help to the other team members and to provide data driven insights throughout the company.

There are many other titles out there and some may be useful to your company, depending on your situation. Far from exhaustive, here are some other types of analytics professionals with general descriptions:

  • Data Scientist (generalist) – can handle all the tasks done by the specialists below, but generally prefer specializing in one area
  • Machine Learning Engineer – works with models, objectives and metrics
  • Data Engineer – works with data and pipelines
  • Statistician – understands relationships between data and business logic and how this informs better models
  • Data Analyst – adept at understanding and presenting trends in data

To recap, start with a strong data scientist in a central role and when you’re ready, go for The Magic Three. Build out carefully around this group, emphasizing the needs of your particular organization.

Of Interest

Converting Text to Speech – Try It
If you haven’t seen what Google is doing to convert text to speech, this is worth a look. They use technology called WaveNet to synthesize realistic voices. On this page you can type in some text, choose the voice you’d like, including foreign accents, and hear a sample immediately. This goes a long way towards supporting all sorts of automated customer interactions at tremendous scale. Here’s the page:

There’s Nowhere to Hide
A new study shows you can be easily re-identified from almost any database, even when your personal details have been stripped out. Researchers from Imperial College London and the University of Louvain have created a machine-learning model that estimates exactly how easy individuals are to reidentify from an anonymized data set. You can check your own score here, by entering your zip code, gender, and date of birth. Here’s the article:

Find that Perfect Outfit Using AI
Facebook’s Fashion++ system uses a deep image-generation neural network to recognize garments and offer suggestions on what to remove, add, or swap. It can also recommend ways to adjust a piece of clothing, such as tucking in a shirt or rolling up the sleeves. Previous work in this area has explored ways to recommend an entirely new outfit or to identify garments that are similar to one another. Fashion++ instead aims to suggest subtle alterations to an existing outfit that will make it more stylish.

Resist the Algorithms

By | Business, Tech Tuesdays
Reading Time: 3 minutes

by: Zank, CEO of Bennett Data Science

I took this photo a couple of weeks ago. As a person who has spent years writing algorithms to help people, the back-window writing struck me as a very powerful and interesting statement. “Resist the Algorithms”.

I get it. I really do. At times, we should resist the algorithms. This week, I’m going to take a stance on a controversial subject: maximizing engagement. I’ll explain.

Personalization at Scale Requires Data Science

Often, data science is used to personalize offers at scale. That means that large companies can interact with millions of customers in real time with personalized offers for each customer. Amazon does it. Netflix does it. Many many companies do too.

And the goal of all this personalization is to increase engagement. But what if we win? In other words, what if we write the perfect algorithm that makes a product so “sticky” that users can’t get enough. They’re permanently engaged. Picture a user holding her phone, 100% engaged with some app. For hours. As data scientists, that’s actually what we aim for when we set out to maximize engagement.

Sounds as creepy as it does impossible. Yet that is exactly what apps like Facebook and Instagram (among others) have achieved. And they do it through carefully timed messages and close attention to algorithm-driven 1:1 personalization.

When founders start a new venture, they dream of having huge engagement, whether it’s for social good, such as collecting donations for a natural disaster or getting users to hurl birds at rocks all day. More often than not, companies seek greater user engagement (because they know revenue will follow) more than anything else.

Our Obligation

As data scientists, we have a moral obligation to understand that if we succeed in creating an app that users have trouble turning away from, we could be damaging society as a whole, taking people from time with friends, family and self.

This is quite serious stuff. We all have an opinion when we see a toddler transfixed at a restaurant by an iPad playing SpongeBob, oblivious to anything in their proximity. But what about the 38-year old woman in the checkout stand who’s reading her Google News feed for the 23rd time that day, also oblivious to her surroundings? Or throngs of teens who turn to Instagram for attention.

These apps were built to maximize engagement, and they use data to get better at it each day. Knowledge is power. Data scientists have an obligation to discuss these effects with stakeholders. And companies have an obligation to their customers and users to protect them from endless engagement.

What Can we Do?

One solution might be to let users choose the times they can receive hyper-personalized content. In other words, turn the algorithms off between certain hours.

Algorithms are wonderful for so many purposes. Let’s remember that and do our part to listen and respond morally to these and other issues as they arise.

Zank Bennett is CEO of Bennett Data Science, a group that works with companies from early-stage startups to the Fortune 500. BDS specializes in working with large volumes of data to solve complex business problems, finding novel ways for companies to grow their products and revenue using data, and maximizing the effectiveness of existing data science personnel.