This month, I want to shine a light on the inappropriate use of machine learning before it’s warranted. The law of the instrument applies here: when the only tool you have is a hammer, everything starts to look like a nail. In other words, many teams are using machine learning too soon, and under the wrong assumptions, because that’s the tool they have. A data science team isn’t supposed to not produce predictive models, right?
There are many reasons not to use machine learning.
Perhaps there isn’t enough data yet. Maybe the data are not the right type for the proposed method(s) or maybe a simpler method will yield similar results. In these cases, there may be heuristic approaches to be used well before considering machine learning. Sounds easy enough, but teams often dive in, throwing fancy black-box algorithms (the hammer) at problems before it’s warranted.
I find that this early use of and reliance on machine learning is something done less by veteran machine learning professionals (a solid argument for hiring data scientists with years of real-world experience) and more by junior data scientists. Schools can’t teach not using machine learning in an analytics program. That doesn’t even make sense.
So, emphasis with junior data scientists in industry must be placed on doing the right thing for the problem at hand, not pulling out that fancy algorithm and acting frustrated when it doesn’t work; or worse, tweaking the data and/or model until it appears to work. I did that last one often when I started out.
“Unrealistic expectations encourage the use of these tools before they are ready.”
This quote is from the MIT Technology Review story below on how machine learning was ineffective in helping us manage the global COVID-19 pandemic. They concluded that, “Many hundreds of predictive tools were developed. None of them made a real difference, and some were potentially harmful.” And much of this was linked to poor data quality.
Before rushing in with algorithms, data science teams would be well advised to understand the strengths and caveats in the data and only then apply machine learning, and only where it’s appropriate.
Be well!
-Zank
Of Interest
What to do When you Don’t Have Enough Data for Your Project
The fuel that powers machine learning algorithms is data. Specifically, labeled data. This data can be difficult to come by. If you want to build a machine learning model that minimizes churn, you first need a lot of examples of churn to train the machine. Early on, you generally don’t have these examples. What can you do in this case and any other case where clean, labeled data simply don’t exist? This article discusses semi, active, and weakly supervised learning, and shows examples of the latter two approaches from DoorDash, Facebook, Google, and Apple. Read more here.
Hundreds of A.I. Tools Have Been Built to Catch Covid. None of Them Helped
Spoiler alert: it’s largely because of the data. Covid cases were measured differently in different regions and a lot of data was gathered in haste, without checks in place or a governing body present to ensure homogeneity. When standardization was applied, it often was one-off, never shared with the global community in a meaningful way. Read more here.
Startup Synack Valued at $500 Million to Boost ‘White Hat’ Hacking From Home
In 2018, Wes Wineberg (previously a senior security software engineer at Microsoft) decided to take the plunge and make his side hustle—hacking companies for pay—a full-time gig. Since, with more people working from home and relying on digital services due to the Covid-pandemic, the security of websites and apps has become even more important to companies. Simultaneously, side-gig hackers have more free time to lend their skills to bug bounty platforms. As a result, Wineberg’s bug bounty startup Synack now finds itself valued at an impressive 500 million US dollars. Read more here.
Elon Musk Unveils Tesla Robot After Warning A.I. Will Take Over Humanity
Tesla is building a humanoid robot “sometime next year” called the Tesla Bot, announced Elon Musk at the company’s A.I. Day event. The robot will use the same artificial intelligence that the electric car company uses for its vehicles, will be approximately 173 centimetres tall (5ft 8in), weigh around 57kg, and be built from “lightweight materials” with a display somewhere on its body to show information. Musk says the robot is “intended to be friendly and navigate through a world built for humans” but will be built at a “mechanical level” so someone could “run away from it, and most likely overpower. it.” Read more here.
MUM: A new A.I. Milestone for Understanding Information
By Pandu Nayak, Google Fellow and Vice President Search: “When I tell people I work on Google Search, I’m sometimes asked, ‘Is there any work left to be done?’ The short answer is an emphatic ‘Yes!’ There are countless challenges we’re trying to solve so Google Search works better for you. Today[, in this article], we’re sharing how we’re addressing one many of us can identify with: having to type out many queries and perform many searches to get the answer you need.” Read more here.
The First Rule of Machine Learning: Start Without Machine Learning
This is such good advice! I’ve been saying and doing this for about a decade. There’s a time to use machine learning, and it’s nearly never at the beginning of a project, unless it’s specifically designed that way. To demonstrate this, author Eugene Yan uses charts and rules and even references Google’s wonderful Rules of Machine Learning. This article is well worth a read and study.
Why you Should Know Erik Vernhardsson
Do you listen to Spotify? If you do, chances are good that you’ve listened to music that was recommended to you, versus music that you specifically searched for. Maybe you regularly use Discover Weekly to find new music. Well, if you have, then Erik Vernhardsson is a name you might want to know. He’s responsible for a lot of that system, as he built the first version of that recommendation system years ago and managed the machine learning team for years. In this interview with him, he talks about his approach to technical problems. I met Erik many years ago and found him kind and easy to talk with. He puts a lot of valuable content in his answers. Read more here.