Global Covid-19 cases in July were over 2 times bigger than predicted mathematically in May
What people want to know the most in 2020 is probably this : When will this end ? In normal occasions, maths should teach us especially because this is about number. So I thought mathematical approach would help me grab the outline about the future even if it’s not 100% accurate.
Whether it’s about price or temperature, when a certain number changes over time, autoregressive integrated moving average is one of the common ways to predict the future. It’s sometimes called ARIMA model [Wikipedia]. According to the statistics compiled by Johns Hopkins University [URL], the global cases surpassed 1 million on 2nd April. I used the statistics about the daily new cases from 2nd April to 13th May to attempt to predict the next 60 days. Yes, this is what happened 3 months ago from today.
This is exactly what I made 3 months ago. The red line WAS the forecast. Sorry, I made this only for my personal use so there is not even a legend. x-axis means the days from 2nd April and y-axis represents daily new cases. So, for example, 80k means the world had 80k new Covid-19 patients that day.
The prediction said on 12th July, the daily new cases would be around 89,000. How much was it in reality ? On 12th July, we had 199,198 new cases globally. Unfortunately, the reality was 2.23 times worse than my ARIMA model. In the end of July, the number went over 290,000 in reality.
I’ve been looking around ARIMA predictions of other researchers from India to Russia, but all of them underestimated the reality about Covid-19. This model is frequently used to predict stock price and it’s known for the convincing accuracy. Why is this model so useless when it comes to Covid-19 ? It’s partially because we are not seeing Covid-19 cases. What we are seeing is only the number of the tests performed. The number of the tests is heavily influenced by social factors, such as medical budget, staff, weather, holiday, and politics as well. When the infection is in the rapid spreading phase, the more they test, the more cases they find. Other way around, if a government decides to limit the tests, the new cases may look controlled. The artificial part is making the whole situation more unpredictable. Estimating the end is in everybody’s interest now. Only transparency in the data processing and also collection can make it real.