Fake news is difficult to spot and for good reason – our abilities to fact check meet a dead-end when generated content is indistinguishable from fake content. This isn’t the future. It’s already here.
Data scientists (and increasingly, anybody) can create pictures, video, and text that humans cannot tell was artificially generated.
Is that Instagram profile you like so much even real? Did that politician really say that? Currently, we don’t have any way of knowing. And even “real video” suffers from measurement bias.
Several months ago I asked a police officer about a small black box strapped across his chest. He replied that it’s a chest-mounted video camera. Why? He explained that police are involved in many situations where bystanders record videos and the on-chest cameras give another source of truth to reconcile, if it comes to that.
Translation – even camera angles and perspectives can tell different stories. And those are genuine, real videos. Imagine what happens when the entire content of the video is fabricated and there’s no way to know. This is why additional sources can be of key importance.
Monitoring Fake News
Due to the ability to fabricate alternative truths, people – such as presidents – can defend and counter anything they were recorded saying by claiming that it was, you guessed it, fake news.
Twitter and Facebook have been acting controversially lately. For the first time, President Trump had a post removed from Facebook for saying that children are “virtually immune” to the coronavirus. They are not. And posting such misinformation can be detrimental to real lives.
Similarly, Twitter also removed posts from the President and others. There is clearly a need to monitor fake news and companies are stepping up their game in efforts to do so.
But people like President Trump are high-profile members who get a lot of exposure, along with more scrutiny. In general, how can we identify fake news at scale, across millions of posts? This is an interesting and relevant question to ask.
Identifying Fake News at Scale
Typically, data science works by first using lots of labeled training data to build a predictive model. But we don’t have a lot of data — such as news, stories, images, or videos — labeled as real or fake. And this is due in no small part to the fact that human labelers cannot truly know if content is real or fake. So there’s a bit of a non-starter.
Additionally, for every fake image out there, there may be ten billion real images, making detection very difficult, as machine learning models don’t work well with such one-sided datasets.
I believe that detection can and will come from a strong understanding of the machines that generate the fakes. That said, only time will tell how well they’ll be able to perform and how widespread their use will be.
If you want to see something almost magical, here’s a video of a space disaster that never happened. What if the Apollo 11 mission had gone wrong and the astronauts had not been able to return home? A contingency speech for this possibility was prepared, but never delivered by President Nixon – until now: https://youtu.be/LWLadJFI8Pk.
Remark on Last Week’s Tech Tuesday
In last week’s Tech Tuesday I proved that that 1 = 0. However, a correction must be noted.
Remember the initial equation: “x = y + 1” ?
One set of values that makes it true is:
x = 2 and y = 1, yielding: 2 = 1 + 1.
No problem so far.
Now, look at step 5 where we divide each side by:
(x – y – 1)
But when x = 2 and y = 1, that becomes 2 – 1 – 1 = 0!
And that’s where everything breaks down. I divided by zero and that’s not a defined operation, making everything after and including this step nonsense.
DG was the first one to email me and catch it — well done!
Have a wonderful week!
Unsplash’s Dataset is now Open Source
The most complete high-quality open image dataset by Unsplash ever released. This is a game-changer! Take a look here.
‘Drawn-on-Skin’ Electronics Offer Breakthrough in Wearable Health Monitors
A team of researchers has developed a new form of electronics known as “drawn-on-skin electronics,” allowing multifunctional sensors and circuits to be drawn on the skin with an ink pen. The advance, the researchers argue, allows for the collection of more precise, motion artifact-free health data, solving the long-standing problem of collecting precise biological data through a wearable device when the subject is in motion. Read about it here.
The Coronavirus Doesn’t Care About Your Policies
“The Coronavirus Doesn’t Care About Your Policies” is a rather controversial article that argues that, based on data, there seems to be no relationship between lockdowns and lives saved. That’s remarkable, given that we know for sure that lockdowns have destroyed economies the world over. Read more about this here.