Tech Tuesdays: Lying With Statistics

Reading Time: 4 minutes

We’re into the new year and as much as I do my best to stay out of the way of the news, some of it finds its way to me. This week I want to shed some light on what I’ll call inflammatory statistics and provide several ways to ask informed questions and try to understand what’s really going on.

Inflammatory Statistics

An example of an inflammatory statistic would be a newscaster “reporting” that Covid cases have doubled in some towns [cue my blood boiling]. Something worth noting is that twice a small number is still a small number (2×2 is only 4) and twice a big number is still a big number (1 billion x 2 is 2 billion).

For some reason, most people think that doubling is a really big deal. That’s why newscasters mention it that way. Of course, any new Covid case is horrible. That’s not my point. My point is solely that reporting a “doubling” is not nearly as informative as it is inflammatory because of our perception of what doubling is.

Next time you hear this, ask for more information such as:

What were the before and after counts? Going from 10 cases to 20 cases may for example not be newsworthy.
What age groups were involved? Age is a major risk factor for Covid-related ailments.
What is the population of the territory? A small doubling in a very populous area might for example be a sign of a small outbreak rather than something much direr.

Another confounding factor specific to Covid is the lack of reliable and standardized data. What one territory calls a Covid-related death might be attributed to another cause somewhere else.

Context is Everything

Generally speaking, inflammatory statistics are easy to come by using a simple technique: zoom in really, really far and describe what you see without context. In fact, there’s a whole (and quite old) book on this sort of thing called “How to Lie with Statistics” (link – I’m not an affiliate). And here’s a spin on that book title in the form of an article: How To Lie With COVID-19.

From Darrell Huff’s “How to Lie with Statistics”:

“The death rate in the Navy during the Spanish-American War was nine per thousand. For civilians in New York City during the same period, it was sixteen per thousand.” The American Government used these statistics to entice young people to join the Navy, proving that, paradoxically, “it was safer to be in the Navy than out of it.”

These are two entirely different populations: one chosen for their young age and health and another made up of everyone. They (the Navy) zoomed in and described something (read: prevaricated) without context. In this particular case, it would have been useful to know the mortality rate of those in New York City having the same age as those enlisted in the Navy.

The Take-Away

Things aren’t great right now, but they might not be as bad as we’re led to believe.

If you’re feeling down by the news, perhaps some of it is perception-driven. Ask yourself if the numbers you hear about could be taken out of context from a zoomed-in view, designed to snag your attention and earn your click or attention. In some cases, headlines may be backed by these inflammatory statistics that deserve a much deeper story and greater context.

Let’s all take the pandemic seriously, but let’s use the news to inform rather than inflame us by considering context when we use statistics.

Be well, and happy 2021!

Of Interest

CatalyzeX: A Must-Have Browser Extension for Machine Learning
Implementing complex code from journal articles can be difficult and time-consuming. Here’s something that might help. CatalyzeX is a free browser extension that finds code implementations for ML/AI papers anywhere on the internet (Google, Arxiv, Twitter, Scholar, and other sites). It’s a must-have browser extension for machine learning engineers and researchers! Read more here.

Neural Network Creates Images From Text
OpenAI trained a neural network that they call DALL·E with a dataset of text and image pairs. So now the neural network can take text input and output random combinations of descriptors and objects, like a purse in the style of Rubik’s cube or a teapot imitating Pikachu. Read more here.

Why is A.I. so Power-Hungry?
It takes a lot of energy for machines to be trained with data sets. By some estimates, training an A.I.-model generates as much carbon emissions as it takes to build and drive five cars over their lifetimes. This article talks about the consequences of building an A.I.-model, why A.I. is so power-hungry, and what we can do about it.

Tags: