Is that Data Science Company a Fake?

Reading Time: 4 minutes

Today’s newsletter is a reaction to an article, How to Recognize A.I. snake oil. The author writes about the large and growing number of fake A.I. claims companies are making. Unfortunately, it can sometimes be quite difficult to tell real from fake A.I..

I love this topic. It’s also quite timely, as a colleague I respect a lot wrote me recently. In his wonderful stream of consciousness style, he asked me the following:

What makes someone’s machine learning or AI efforts in marketing better than others? Everyone has a model and algorithm. Are some trained better? Is the answer “prove it” and the more performant algorithm wins? Or is it tied to the sustainability to ML over time? Curious how customers make that distinction as many I know who buy AI/ML as part of a system are about an inch deep?

DB, thank you very much for initiating this conversation with me!!

My knee-jerk reaction is to say, ask a professional (data scientist) to properly vet the company. But of course that doesn’t scale, and really, that’s not what data scientists are for.

But there’s a lot in there in his questions and comments. Let’s unpack it a bit.

What about the word “better”? How is one A.I. better than another? The simple/naive answer is, the one that performs best on the metric(s) used to measure maximization of some objective. But that’s (wordy as heck and also) not what he had in mind.

Rather, the questions are part of a growing dilemma: there’s so much confusion around what A.I. is and what it can do, that companies are getting away with using words like data driven personalization or cutting edge A.I., when in fact they have little or none of either. Snake oil!

I’ve seen my very own words featured on websites of companies that I had been talking with. They were bold enough take phrases from my initial call with them and paste them on the front page of their websites a few days later. I have to say it was flattering, but even more elucidating.

Here’s another example:I hear companies promise that they use machine learning to do customer segmentation all the time. Then, behind the scenes, they’re using male/female or age brackets chosen arbitrarily to perform segmentation. Are these segments really purchasing/behaving differently? The hope is yes, but there’s zero analysis done behind the scenes to assure it. But it sort of works. I’m reminded here of the Anchorman quote:

60% of the time, it works every time.

These examples are (almost) amusing. But this can be very frustrating and expensive for companies looking to pay for intelligent services. In some cases, job candidates are overlooked because the A.I. in charge of vetting didn’t pick up the correct number of keywords. What are applicants doing? They’re taught to use words like “Cambridge” or “Oxford” in white text in their digital resumes. Humans can’t see it but computers drink it up! Hardly intelligent on the part of the machine.

This is the result of companies using the confusion around our field and slapping the A.I. label on anything and everything. After all, it helps with fundraising and getting that next client. There’s a lot of ethics involved here. But it gets worse.

Want to feel a little uneasy? Read this list of areas where A.I. could potentially help:

Predicting criminal recidivism
Predicting job performance
Predictive policing
Predicting terrorist risk
Predicting at-risk kids

The question here is, can social outcomes be predicted?

I won’t spoil everything in the article, but the answer is, of course, hardly. And:

“We must resist the enormous commercial interests that aim to obfuscate this fact.”

Please read the fascinating (PDF) article on this topic here: https://www.cs.princeton.edu/~arvindn/talks/MIT-STS-AI-snakeoil.pdf

Of course, there are good companies out there providing fantastic data-driven products and services to their clients. But A.I. practitioners face real challenges as we hope to differentiate our companies from those offering little or no intelligent use of data.

And if you’re wondering whether to get involved with a company or organization offering “cutting edge A.I.”, ask to speak with their head of data science. In many cases, you’ll know right away.

Of Interest

Senators Want Answers About Algorithms That Provide Black Patients Less Healthcare
This makes sense. When data scientists train algorithms on datasets where minorities are underrepresented, results can be life-threatening. This sort of “bias” is the topic of this article. We’re all responsible for paying attention to the ethical issues that come from how we interpret and use our data. Cambridge Analytica ring a bell?
https://arstechnica.com/tech-policy/2019/12/senators-want-answers-about-algorithms-that-provide-black-patients-less-healthcare/

Gender Bias in A.I. and What we can do to fix It
There is a big bias against the pronoun “hers” in the datasets used to train most of the language models we use today. The result: bias in how we address women in A.I. products. The source of the bias is the perfect metaphor for bias in A.I. more broadly. Read more in this fascinating, and quite long, article. The author poses the problem well, then offers an elegant solution.
https://medium.com/@robert.munro/bias-in-ai-3ea569f79d6a

Who Better Than Pinterest to Discuss A.I. for Images
In this technical article, Pinterest scientists talk about how they utilize image embeddings throughout their search and recommendation systems to help users navigate through visual content by powering experiences like browsing of related content and searching for exact products for shopping. They describe a multi-task deep metric learning system to learn a single unified image embedding, that can be used to power our multiple visual search products.
https://arxiv.org/abs/1908.01707

Is that Data Science Company a Fake?

Of Interest

Previous PostWhat I Learned Teaching my Computer to Play Tic-Tac-Toe

Next PostA.I. and Highway Speed Traps