When we help our clients tactically, we’re often asked: what are you going to do first?
Of course, we generally start off with a list of high-level company objectives and the KPIs that are used to measure them. But when we roll up our sleeves and dig in, it comes down to two basic actions we perform over and over again.
These two methods we use to understand data are:
In this week’s TechTuesday, I’ll briefly explain these two methods and provide links for further reading.
Initial Data Analysis
IDA is where we dive into the data and inspect it rather carefully. Think: histograms and outlier detection and several tests to help us understand what we have.
Over the years, we’ve used IDA to uncover all sorts of anomalies. One client had a “20” problem; Every so often, a column in their database that should have contained strings (letters) had the number 20. For no good reason. And, of course, the person who set up the database was long gone from the company. So, there it was, 20, a few percent of the time.
What should we do with that? How can we replace it in our analysis and what should it be replaced with?
These are the findings that IDA leads to and the questions we have to ask to be able to work with the data reliably. IDA may lead to a data catalog so that this process need not be repeated later by other teams.
Exploratory Data Analysis
EDA, on the other hand generally results in summarization of the main characteristics of data. This usually results in graphs and charts, all with the purpose of determining what model might work best for the desired analysis.
In this stage, it’s common for us to look at correlations between variables, detect patterns, test hypotheses, and check assumptions. It’s much less of a “what’s in there?”, and much more of a “how can we assemble all this data to get after the big objectives?”
IDA and EDA
Performing comprehensive IDA and EDA is where we get a feeling for how a project will progress and how we’ll be able to use A.I. to do things like personalize offers, detect churn, or otherwise automate business processes.
Depending on the company and how well they take care of their data (that’s a whole newsletter right there!), these stages of IDA and EDA can take anywhere from two or three weeks to just over a month. The better the data are organized, generally the faster and more streamlined the process of IDA and EDA are.
That’s a quick overview of our first tactical steps. I hope you’ve found it useful. Feel free to hit reply and let me know – It’s always nice to hear back from you.
Have a wonderful week,
Data Science Pairs With Cancer Research
The next generation of treatments for cancer may be found, not by scientists peering through microscopes, but by computer scientists crunching numbers. Thanks to unprecedented amounts of data, Purdue University scientists are using innovative data science techniques to better understand the genetics and cellular biology of cancer cells and tumors. Read more here.
Lessons From the Pandemic’s Superstar Data Scientist
When pandemic superstar data scientist Youyang Gu noticed the scattershot Covid-19 projections last spring—one model projected 2 million US deaths by the summer, another predicted 60,000—Gu questioned whether that was as good as the modeling could be. He decided to take a shot at making a covid-19 model himself. Within a week’s time, he created a machine-learning model, running it daily on his laptop (it only took an hour), and ended up generating remarkably accurate covid-19 predictions. Read more here.
This A.I. Disproved Five Mathematical Conjectures by Itself
An artificial intelligence has disproved five mathematical conjectures – unproven theorems – despite not being equipped with any information about the problems. Read more here.