Tech Tuesdays: Why Data Labeling is a Billion Dollar Business

Reading Time: 4 minutes

Data labeling is a really big deal. In this week’s Tech Tuesday, I’ll discuss what data labeling is, why it’s so important to A.I., what the biggest players in the data labeling space do, and share my thoughts on the future of data labeling.

What is data labeling?

Data labeling is the process of annotating a piece of data (an image, a word, video, voice recordings, etc.) with its real label. It’s generally done by humans and its value to A.I. systems is significant; it’s how an A.I. knows what truth is.

For example, the video from self-driving cars must be annotated (labeled) frame-by-frame to identify people, road markings, signs, buildings, trees, bicycles…you get the point. All of this must be done by hand at first so that an A.I. system can learn from it later. Here’s what a typical labeled image might look like:

Bennett Data Science Tech Tuesday Image Labeling Example

Imagine the work that went into labeling this single image. Now imagine it’s part of a video stream, captured at 15 frames per second. Then imagine that car drives for two hours, this means that:

15 images by each camera/second x 60 seconds/minute x 60 minutes/hour x 2 hours = 108,000 images are captured by this one camera.

Now imagine that there are nine such cameras on that car. This means that a total of 972,000 images are captured during these two hours – that’s a lot of labeling work

How to go about this?

For those nearly 100k images, there are multiple objects in each image. And a human must find, outline (or trace), and then label each of them. There are some shortcuts, but in the beginning, this is how the process works.

With properly labeled data, A.I. systems like those that power Google Voice, Alexa, Siri, and various self-driving cars are extremely valuable. And this value has not gone unnoticed. Recent news shows that a data labeling company called Scale reached a 7.3 billion dollar valuation.

Scale and their corporate counterparts provide a huge service to companies that use A.I. and need their A.I. to be as accurate as possible. Having started by labeling video and image data, they expanded to work with text (natural language processing or NLP).

At Bennett Data Science, we regularly rely on data labeling services when we need “truth” data for training purposes. We reach out to our preferred labeling partner company and provide them with a few examples of what a properly labeled dataset looks like and give them hundreds or thousands of examples.

From there, they quickly scale to tens or hundreds of workers, and usually within a few days, we have the training data that used to take us months to generate.

The future of data labeling

The value that these companies are seeing in the market doesn’t surprise me.

As they grow, I expect to hear more about how these labeling companies are able to use A.I. themselves to get a jump start on the labeling process. If they’re for example labeling trees in hundreds of thousands of images, it makes sense that they would train their own A.I. to find the trees and save the labelers some time, needing to then only verify the A.I. was correct versus tracing the tree from scratch.

No doubt these companies are here to stay as they provide huge value to companies building new A.I. systems. How long they’ll continue to be valuable for remains to be seen. After all, it’s the goal of A.I. to make this sort of work obsolete.

I know we have some readers who work in this space. If you have a take on data labeling, please hit reply and let me know what you think.

Best,

-Zank

Of Interest

5 Stories Data Tell us About Data Scientists
Data scientists tell stories through data. But what stories can data tell about data scientists? It may sound like the revenge of structured data, but it’s actually just a survey conducted by Kaggle Platform. Read this article by Éverton Bin in which he analyses the data of the 2019 Kaggle Machine Learning and Data Science Survey and looks at the educational background, daily activities, used tools, and salaries of data scientists.

A.I. Unlocks Ancient Dead Sea Scrolls Mystery
Researchers at the University of Groningen in the Netherlands say A.I. has for the first time shown that two scribes wrote part of the mysterious ancient Dead Sea Scrolls. They examined the Isaiah scroll using “cutting edge” pattern recognition and A.I. and analysed a single Hebrew letter, aleph, which appears more than 5,000 times in the scroll. In a paper published by the scholars, they share that they’ve “succeeded at extracting the ancient ink traces as they appear on digital images”. Read more here.

Study Explores Inner Life of A.I. With Robot
Ever wondered what Apple’s virtual assistant is thinking when she says she doesn’t have an answer for that request? Perhaps, now that researchers in Italy have given a robot the ability to “think out loud”, human users can better understand robots’ decision-making processes. “There is a link between inner speech and subconsciousness [in humans], so we wanted to investigate this link in a robot,” said the study’s lead author, Arianna Pipitone from the University of Palermo. Read more here.

Tags:

Tech Tuesdays: Why Data Labeling is a Billion Dollar Business

What is data labeling?

How to go about this?

The future of data labeling

Of Interest

Previous PostTech Tuesdays: An Introduction to Sentiment Analysis

Next PostTech Tuesdays: Are Social Networks Starting to do the Right Thing?