Reading Time: 4 minutes

Know what I usually think about around the holidays? Models that suddenly don’t work at all.

I should explain.

A lot of what we do day in and day out revolves around repeatable patterns of user or customer behavior. If users/customers browsed, clicked and bought in a completely random way, we would have no way to predict what they want. Personalization would drop off drastically.

Imagine going to Amazon and buying random products that don’t seem to go together at all. This rarely happens, but is exactly what we see around the holidays – you should see the random things we get for our house during Halloween!

We worked for a company several years ago that tracked their engagement and related revenue by the hour. We’d see huge drops during Christmas and several other holidays. We completely expected it, but it was still terrifying! What if the revenue drop was a false negative for detecting a system failure? In other words, what if a holiday coincided with some sort of tech snafu that just happened to land on that day? Improbable, but not impossible.

Of course, the way to handle this is to simply relax a bit over the big holidays. Sit back and have a sip of eggnog as all your customers head off in their cars or open presents or dress up like demons and terrorize the neighborhood. For companies that have been around long enough, it’s possible to look back over the years and predict these periodic events.

Holidays are different than seasonality, where weather patterns over month-long timelines affect user behavior. I’m talking about the fast, one-day-and-they’re-gone events. We know how to handle seasonality pretty well, but is there anything we can do about holidays, especially in multiple countries where traditions and time zones differ?

To answer this, I went looking for a Python package that might help. A Python package is a bit of code designed to accomplish a specific task. Often packages are open source, so we’re free to use them commercially. With little effort I discovered one, and it’s called, unimaginatively, holidays.

Essentially, the package returns true/false for the question:

Is today (or any date) a holiday in a given location?

It also reports each major holiday for populous areas around the world.

Here’s what that looks like for the U.S. vs. Portugal:

Holidays in The United States for 2019:
2019-01-01 New Year’s Day
2019-01-21 Martin Luther King, Jr. Day
2019-02-15 Susan B. Anthony Day
2019-02-18 Washington’s Birthday
2019-03-31 César Chávez Day
2019-04-01 César Chávez Day (Observed)
2019-05-27 Memorial Day
2019-07-04 Independence Day
2019-09-02 Labor Day
2019-10-14 Columbus Day
2019-11-11 Veterans Day
2019-11-28 Thanksgiving
2019-12-25 Christmas Da

Holidays in Portugal for 2019:
2019-01-01 Ano Novo
2019-03-05 Carnaval
2019-04-19 Sexta-feira Santa
2019-04-21 Páscoa
2019-04-25 Dia da Liberdade
2019-05-01 Dia do Trabalhador
2019-06-10 Dia de Portugal
2019-06-13 Dia de Santo António
2019-06-20 Corpo de Deus
2019-08-15 Assunção de Nossa Senhora
2019-10-05 Implantação da República
2019-11-01 Dia de Todos os Santos
2019-12-01 Restauração da Independência
2019-12-08 Imaculada Conceição
2019-12-24 Vespera de Natal
2019-12-25 Christmas Day
2019-12-26 26 de Dezembro
2019-12-31 Vespera de Ano novo

How each or any of these holidays may affect your business is going to be entirely related to your customers and the type of business you have, but it’s a start.

Are you curious about the different holidays in some of your target countries? Here’s a link to the Python notebook I used to generate the output above:
https://colab.research.google.com/drive/1kRL3-pLg0fbgD07gKMpjMy-BRl7D3MC2.

Moreover, here’s a link to a five-minute guide that shows the ins and outs of using holidays package:
https://towardsdatascience.com/5-minute-guide-to-detecting-holidays-in-python-c270f8479387

May your sales be strong, and nerves even stronger this holiday season 🙂

Happy Holidays from all of us at Bennett Data Science!

Of Interest

Data Science Books you should read in 2020
As we get into the new year, here’s a jump on a few good reads to help you or your favorite data scientist kick off the 20’s!
https://towardsdatascience.com/data-science-books-you-should-read-in-2020-358f70e1d9b2

Optimizing Blackjack Strategy through Monte Carlo Methods
Ever wondered how exactly to play blackjack to maximize your chances of winning? This article gives all the Python code required to simulate thousands of hands, allowing you to change the way you “play” each hand and viewing the results. A Monte Carlo simulation approach relies on random sampling of a model, observing the rewards returned by the model, and collecting information during normal operation to define the average value of its states. The value of all possible combinations of player and dealer hands in Blackjack can be judged through repeated Monte Carlo simulations, opening the way for optimized strategies.
https://towardsdatascience.com/optimizing-blackjack-strategy-through-monte-carlo-methods-cbb606e52d1b

Using A.I. to Separate Songs into Their Individual Instruments
While not a broadly known topic, the problem of source separation has interested a large community of music signal researchers for a couple of decades now. It starts from a simple observation: music recordings are usually a mix of several individual instrument tracks (lead vocal, drums, bass, piano etc..). The task of music source separation is: given a mix can we recover these separate tracks.
https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e