Reading Time: 4 minutes

Building predictive models is tough. Not to do, but rather to do it right. Data science is a very complex subject, generally requiring years of education and practical experience to have a proper foundation.

I’m pretty sure this isn’t new information or shocking to anyone reading this.

What may be shocking, however, is that there’s been a lot of movement by big firms like Google and Amazon towards developing platforms that allow automated machine learning, which does exactly what it sounds like: you upload some data and it learns how to make predictions you desire.

For example, if you want to predict churn, you feed it a bunch of data containing customer attributes, actions, and churn events and, hopefully, the automated machine learning tool provides you with a predictor of churn that you can use in your company – all to let you know if Sarah M. from Kansas is ever going to come back to purchase with you or if she’s gone forever.

I would love to have a tool that made it that easy! And given my decades of experience as an analytics professional, I would know exactly how to use it and what types of data to feed it. I would be able to get heaps of value out of an automated machine learning tool.

But most people wouldn’t. 

And that’s not very apparent, even in the fine print, when looking at the marketing materials for these tools. 

In today’s Tech Tuesday, we’ll take a look at examples of automated machine learning tools, why such tools can’t and shouldn’t be used by just anyone, where they might be most useful, and how to consider them when you’re looking for cost savings. 

Spoiler alert: you can’t simply pay for an automated machine learning tool and let your team of five talented data scientists go.

Automated Machine Learning Tools

Here are two of the “solutions” currently out there for automating machine learning:

Amazon

Google

Amazon has Amazon SageMaker Autopilot that they claim can: 


Automatically create the best classification and regression machine learning models, while allowing full control and visibility”.

Google has Auto ML that runs atop its Google Cloud Platform (GCP). They say


“Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs. It relies on Google’s state-of-the-art transfer learning and neural architecture search technology.”

Google claims “state-of-the-art performance” and that you can “get up and running fast”.

But, of course, no tool can get around this fundamental guideline: garbage in, garbage out. And “garbage” is not so black and white as you may think. 

Data scientists think of data as having predictive power. That ability of data to predict outcomes is not all on or all off but rather exists on a continuum. Assessing the predictive power of data is essential, but non-seasoned practitioners would likely not be in a good position to make such an assessment.

Bias is also a huge issue that may be difficult for non-practitioners to spot and has really blown up recently, as algorithms have been found to favor or disfavor a particular segment of society. Here’s more reading on bias in A.I.

I’m not beating up automated machine learning because it has no practical use, but rather to point out that it’s not a tool that’s ready for the masses just yet, nor may it ever be.

How to use Automated Machine Learning Tools Effectively

Rather than thinking about automated machine learning as a panacea for the difficulties of hiring and managing a typical data science team, I propose that automated machine learning should be considered an advanced tool to be used by skilled data scientists to save time and increase efficiency; a huge timesaver to the right company with the right conditions.

Rather than tossing in a bunch of data and hoping for the best, here’s a much better recipe for success: have a mid- or senior-level data scientist collect, understand, and clean some relevant data, and then use automated machine learning to assess the ability of the data to predict the desired outcome, such as churn. 

An experienced data scientist would understand the automated output and be able to work with the data to increase model performance.

Here’s some more reading on automated machine learning, from both sides:

I’ve been asked to speak on this topic in early December and in the coming months, I intend on revisiting it as I dive into more of the intricacies of automated machine learning. 

Of Interest

Amazon is Filled With Fake Reviews and it’s Getting Harder to Spot Them
Since Amazon’s early days, reviews are the one big metric customers have relied on to determine the quality and authenticity of a product. Amazon’s listings often have hundreds or thousands of reviews, instead of the handful found on competing marketplaces. But many of those reviews can’t be trusted anymore. Thousands of fake reviews have flooded Amazon, Walmart, eBay and others, and it’s getting harder to spot them.

Inside TikTok’s Killer Algorithm
In one of our recent Tech Tuesdays, How Is TikTok Controlling its Users, I talked about how TikTok manages to keep people so engaged and why the A.I. behind it works so well. Now, TikTok has revealed some of the new, elusive workings of the prized algorithm that keeps hundreds of millions of users worldwide hooked on the viral video app. TikTok executives said they were revealing details of their algorithm and data practices to dispel myths and rumors about the company. Have a read here.

Something Weird Is Happening on Venus
After the moon, Venus is the brightest object in the night sky, gleaming like a tiny diamond in the darkness. The planet is so radiant because of its proximity to Earth, but also because it reflects most of the light that falls across its atmosphere, more than any other world in the solar system. Last week, scientists revealed that something really weird is happening in those clouds.