Pandemics across the globe have long-lasting, profound impacts on all countries and on each individual. For instance, COVID-19 has crushed hospitalization systems worldwide and caused more than 6.3 million cumulative deaths, while H1N1 directly resulted in about 575400 deaths globally during its first year of circulation [2][3]. However, death and suffering are only the superficial consequences of a circulating epidemic, as its harmful effects permeate almost all aspects of people’s lives. The International Monetary Fund (IMF) estimates that from 2019 to 2020, the global median GDP fell by 3.9%, the worst recession since the Great Depression, demonstrating an imperative need for designing a system that mitigates the harm of epidemics [1]. The unprecedented impact of COVID-19 can, in part, be attributed to the inability of previous studies to obtain essential information about a newly emerged virus. Without the support of that basic information, policymakers failed to establish practical measures against the virus, and individuals were misinformed or simply not informed, causing severe public concerns. The problematic areas are as follows.
System Usage Flowchart
Change between each compartment of the EPSEIRV model summarized in a flowchart.
Comparison between the SEIRV model equations and the EPSEIRV model equations.
For the Transmission forecast model, previous work has struggled to produce reliable transmission predictions because of some inherent flaws of the SEIRV model [4]. Even some authoritative organizations, including the CDC, still cannot obtain the accurate value of the transmission coefficient (R0) of Omicron [5], let alone predict the growth of the epidemic. In addition, Dr. Fauci, one of the most trusted epidemiologists, has unexpectedly inaccurate and uninformative predictions in many well-known media, such as CNN [6].
Change between each compartment of the SI3R model summarized in a flowchart.
Projected Positive Percentage as a Function of Time in the US Produced by the
Real-Time Reproduction Number as a Function of Time in the US Produced by the EPSEIRV Model
For the Variant Prediction model, little to no work can be found about future variant prediction when confronted with a novel disease. For example, when the first wave of Omicron was close to an end, experts still had no clue what future variants would look like and under what conditions they could emerge [7]. Moreover, Dr. David Aronoff, an infectious disease expert and chair of the Department of Medicine at Indiana University School of Medicine, even said, “We really don’t know if there’s going to be another variant that may create a lot of havoc [8].
Simulation of the Competition Between Omicron and a Hypothetical Mutant
Visual Display of the IHOV Model Results
For hospitalization resources, highly contagious pandemics invariably threaten hospitalization systems, as the number of patients can grow exponentially in merely a few days. Omicron, for instance, caused substantial staffing and resource shortages in one-fourth of all US hospitals, resulting in delays in elective surgeries [9]. While most hospitals in the US had excess resources and staff, due to the rapid, exponential increase in the demand for hospitalization in certain states, officials had no other choice than to deploy the National Guard to “provide direct support to hospitals, care centers, and other medical facilities;” some even forced infected employees to keep working, putting patients at risk [10].
Realizing the urgent need to minimize the damage of global epidemics, this study devised PanDict (Pandemic PreDiction, a computer system that uses the EPSEIRV model, the SI3R model, and the IHOV model to produce accurate projections of future epidemic situations and to address the three pressing issues above with each model tackling one of those problems. The system provides its users with a reliable transmission forecast, an informative variant prediction, and a projection of the demand for hospitalization resources. PanDict allows its users to input some of the most basic parameters of a novel disease and immediately receive the essential information needed to minimize its damage. Some detailed explanations of each part are as follows.
Realizing the urgent need to minimize the damage of global epidemics, this study devised PanDict (Pandemic PreDiction, a computer system that uses the EPSEIRV model, the SI3R model, and the IHOV model to produce accurate projections of future epidemic situations and to address the three pressing issues above with each model tackling one of those problems. The system provides its users with a reliable transmission forecast, an informative variant prediction, and a projection of the demand for hospitalization resources. PanDict allows its users to input some of the most basic parameters of a novel disease and immediately receive the essential information needed to minimize its damage. Some detailed explanations of each part are as follows.
First, to develop a reliable transmission forecast, the author formulated the EPSEIRV model. The original SEIRV model contains a variable, α, that can’t be determined experimentally, for it contains no physical meaning. Hence, it has a preset value (average of past diseases), which skews information from the data and exacerbates the model's accuracy. Thus, the author replaced those variables with quantifiable variables that introduce population density and time of exposure into the system, allowing the system to conduct simulations in each local community and obtain accurate results. This study tested it with real-life data in South Africa and the United States, and our prediction of the infected population fits almost perfectly with the actual infected curve. This study yielded an R0 value around 18.9 for Omicron, substantially higher than the conclusion of previous studies, “above 7.05”. The author published our findings on medium.com around mid-January, attached here. Secondly, to answer, “Will new variants emerge and how?” the author created the SI3R model, which simulates the competition between a mutant and a resident. After implementing it in the case of Omicron, the author concluded that for a new variant to emerge, it has to overcome the public’s immunity against Omicron. Otherwise, it would die out almost immediately. The theoretical foundation regarding super-infection and double-infection of the SI3R model references an article from The American Naturalist [11]. The article stopped at discussing variant evolution on the conceptual level, while the SI3R model allows for simulating competitions based on real-life data. Finally, the author devised the IHOV model to project the demand for hospitalization, extending an uncompleted Brown project [12]. The project initiated by Brown School of Public Health provides current hospitalization data, including the number of available beds, occupation rate, etc. However, that project simply listed some possible cases based on guessed situations, so it serves no forewarning purposes to the public. In contrast, EPSEIRV models the infected population of each state as a function of time produced and inputs those projections to the IHOV model, allowing us to predict and show exactly where, when, and how much, for example, beds are needed.
As far as our knowledge goes, the author was the first to develop an accurate simulation of the spread of Omicron. In a nutshell, our system serves two purposes. Firstly, it informs individuals, corporations, and the government about the future trends of newly emerged viruses. This helps reduce public concerns about the potential impact of pandemics and allows the health sector to better prepare for the emergence of new variants. Secondly, it assists hospitals in efficiently allocating resources, thereby minimizing unnecessary suffering and deaths. Currently, the author has only completed the IHOV model for the New England region. However, future work can easily expand the system's coverage to encompass the entire United States or even the entire globe.
Almost all mathematics-based disease transmission models are variants of the SIR model. This section explains the SIR model and presents some popular variations of it [4]. The SIR model separates the population into three compartments: Susceptible, Infectious, and Removed. Everyone who has neither obtained immunity against the disease nor been infected with the disease is considered Susceptible; everyone who is infected and can infect others is regarded to be Infectious, and everyone who is immune to the disease is defined in the Removed category.
The SEIRV model implemented two additional compartments to the SIR model, Exposed and vacant. The original Infectious group in the SIR model is elaborated into two, taking patients’ latent period into account; the Exposed group contains all who have been infected but are not yet infectious. At the same time, the Infectious group includes those who were infected and have become infectious. The Vaccinated compartment comprises of people who have been vaccinated but have not yet gained immunity (those who have gained immunity are considered Removed). A more complete review of the SEIRV model is provided in the Method section.
In short, the SEIRV model has an exponent on the infection expression, α, which has no physical meaning, so it can’t be determined with precision at all. It is pre-set to be around 1.2, a value that works best for most kinds of diseases. Thus, with α predetermined, the model only contains one degree of freedom: the contagiousness measure, β, which significantly undermines its accuracy. In contrast, the author improved the model by discarding the guessed variables and introducing two additional parameters derived from manipulating probability equations of encounters and the fraction of time people spend interacting with others. This improvement allowed us to accurately model the spread of the Omicron in January.
Most researchers use mathematical or biological approaches to predict the emergence of future variants of a disease. This section explores the deficiencies of each approach and how our system resolves this problem. The biological approach to predicting future variants requires an extensive amount of experimentation and often takes a large amount of time before attaining adequate information. In the case of Omicron, the infectious disease expert, Dr. David Aronoff, who attempted to use biological methods to predict future variant emergence, said after the surge of Omicron that they still “really don’t know if there’s going to be another variant that may create a lot of havoc.”8 This more mathematics-based article by Minus Van Baalen and Maurice Sabelis explored the competition between the resident strand and a mutant of a virus [11]. The study also explained the impact of that competition on the evolution of the virulence, as well as the infectiousness, of a disease. However, the study stopped on the theoretical level and did not provide a usable model for predicting variants. Thus, the author devised the SI3R model to simulate the competition between the resident strand and a hypothetical mutant of a virus to explore the conditions under which new variants could emerge.
Before our project, the Brown School of Public Health was the only group that attempted to calculate the projected demand for hospitalization resources [12].However, their study did not implement a disease transmission model that could estimate the spread of the disease; instead, they guessed the infected population and time and then calculated future demand according to the guesses. Thus, it has no specificity to the disease and will not be instructive to policymakers who need predictions of exactly where and when hospitals will hit their capacity to prepare for the exponential growth of the number of patients.
The system aims to help minimize the damage of a newly emerged epidemic. It consists of three essential modules, each addressing one of the previous studies’ problems.
As shown in Figure 1, the process of the system is as follows. The user first inputs the population statistics of the target country and enters the date of the first case in each local community. The basic parameters of the newly emerged disease are also required. Those mainly include the incubation period, infectious period, and vaccine efficacy. Subsequently, the system produces and displays an accurate prediction of the spread of the new virus according to those parameters. With the projected infection curve, the second module will test the condition and probability of the emergence of a new variant using the SI3R model. Lastly, the system uses the projected infection curve to calculate the demand for health resources in each individual county of the target country and displays the projected data on a map (you can download the actual numbers as well).
The SEIRV model is based on the following assumptions:
The total population remains constant (new births and deaths have trivial impacts and are not taken into account). Measures against the disease remain unchanged. People’s living habits have no drastic changes. The population is relatively homogeneous. An individual can’t catch the disease twice.
The exponent α in (1), (2), (5) has no physical meaning and thus cannot be obtained in any way other than estimating with infection data. For most kinds of diseases, the value of α = 1.2 works the best, so it was set to be around 1.2 for novel diseases as well without any other supporting experiment or evidence.
The model has only one degree of freedom (beta) with the exponent alpha predetermined; this lack of specificity significantly reduces its accuracy and robustness. In addition, this model does not account for changes in population density and social distancing. Thus, the author made the following changes to the model.
This section summarizes the derivation of the daily infection expression.
Assuming that the probability of one person meeting an infectious individual and capturing the disease is
where
This change in the expression connects daily infection numbers with population density and exposed time. Exposed time is associated with the model by μ, while this paragraph illustrates the model’s connection to population density. Suppose r is the radius of a person’s daily activity range, p is the probability of catching the disease upon encounter, N is the population of the target community, and A is the area. Then, the author can express
Thus, the system of differential equations in the EPSEIRV model is as follows:
The following chart elucidates the changing relationship between each compartment:
On day 1,
The author will use the parameter determination process of the Omicron variant of Sars-CoV-2 as an example to demonstrate how each model parameter can be obtained.
According to Centers for Disease Control and Prevention (CDC) investigations, the incubation period of Omicron is approximately three days, and the mean infectious period of Omicron is about 11 days [14]. CDC reports also reveal that the effectiveness of unboosted vaccines is insufficiently low, and the British Broadcasting Corporation (BBC) claims that the efficiency of booster shots is roughly 80% [15][16]; therefore, the author decided only to include booster shots as an effective type of vaccination. In the United States, the boosted population has an average daily increase of 600,000 (0.18% of the entire population), while in South Africa, no one has received a booster shot yet, so the author decided to abandon the Vaccinated category when modeling the spread in South Africa [3].
As shown by the United States Department of Labor statistics, below is a list of major daily activities and time spent by an average American (in hours) [17].
The activities marked red are the ones that potentially involve in-person interactions with others. However, for each activity, only a portion of the times listed above will imply in-person interactions. Supporting facts are as follows:
As shown in TABLE I. , Roughly 80 percent of the population shops online [18]. Forty-two percent of the workforce works from home [17]. About 40 percent of socializing time is online [19]. Private cars dominate the American Commute. Only 5% of US commuters use public transportation [20]. Forty-six percent of students receive only online instruction [21].
THE STATISTICS OF THE UNITED STATES DEPARTMENT OF LABOR
Major Activity | Average Time | Potential Fraction | Resulting Time |
---|---|---|---|
Personal Care Activities (Sleeping, Grooming...) | 10.74 | 0 | 0 |
Household Activities | 2.01 | 0 | 0 |
Purchasing Goods and Services | 0.38 | 0.2 | 0.076 |
Caring for and Helping Household Members | 0.43 | 0 | 0 |
Caring for and Helping Non-household Members | 0.14 | 1 | 0.14 |
Working and Work-Related Activities | 3.02 | 0.58 | 1.7516 |
Attending Class (Education) | 0.18 | 0.54 | 0.0972 |
Homework and Research | 0.19 | 0 | 0 |
Organizational, Civic, and Religious Activities | 0.18 | 1 | 0.18 |
Socializing and communicating | 0.54 | 0.6 | 0.324 |
Watching Television | 3.05 | 0 | 0 |
Participating in Sports, Exercise, and Recreation | 0.37 | 1 | 0.37 |
Telephone Calls, Email, and Mail | 0.22 | 0 | 0 |
Travel | 0.79 | 0.05 | 0.0395 |
Therefore, the number of hours that involve potential interactions with others can be calculated as: 0.38×0.2+ 0.14 + 3.02 × 0.58 + 0.18 × 0.54 + 0.18 + 0.54 × 0.6 + 0.37 + 0.79 × 0.05 ≈ 2.97 hours, which means
Upon the emergence of a novel virus, not only are authors concerned about the spread of the resident strand of the virus, but the public is also perturbed by the possibility of the appearance of new variants, particularly when the author can’t obtain sufficient information about when and under what conditions new variants may emerge. Since little to no work has been done to investigate the competition between a mutant and a resident, the author devised the SI3R model to simulate the competition and thus determine the conditions under which a new variant might emerge.
The model consists of five compartments: Susceptible, Infectious1, Infectious2, Infectious, and Removed. The model’s assumptions include those of the improved SEIRV model and that the mutated strand cannot evade the public’s immunity against the resident variant. Therefore, the Susceptible compartment represents the portion of the population that has caught neither of the strands and is thus prone to receive the virus. Infectious contains those who have been infected with the resident strand and are capable of spreading it. Individuals in the Infectious2 compartment can spread the mutant, while those in the Infectious compartment can transmit both strands. Since both strands, the resident and the mutant, according to the prediction, cannot escape the immunity against the other strand, individuals removed from any of the three infectious compartments would be defined in Removed. This system of equations is not extremely detailed but is sufficiently functional to generate general results. More future work could be implemented to enhance the accuracy, for example, by adding the Exposed categories.
As established above, the expression for daily infection number of one variant can be modeled by
Below is the system of differential equations that describe the change of each compartment.
Using the EPSEIRV model, the author can thus simulate the growth of the new variant in each local community as a function of time. Then, obtain the number of hospitalization resources needed (n) in each hospital using (18):
In which I represents the infected population of that target community as a function of time yielded by the EPSEIRV model,
Due to our time limitation, the author was only able to implement the IHOV model on the state level in the New England region. However, with simple data gathering and inputting, future work can be done to implement the IHOV model in each county and even globally. The author has implemented all New England states’ parameters to the code, including population, population density, number of beds available, bed occupancy rate, etc. Assuming that 40 percent (can be adjusted) of all occupied beds will be emptied for the steep increase in the number of patients, the system then calculates the number of beds.
The results section has three parts, each covering a part of the system. Since the system aims to generate useful guiding information when new diseases emerge, this section will discuss the results yielded from running the system on Omicron data in the United States.
The best-fitting p for the data obtained in the United States equals to 3.57×10-8. Please note that the author used positive test percentages instead of new daily cases because the percentage better represents the status of the entire population, for only a small portion of the population is tested each day. In addition, the author subtracted the percentage of Delta variant positive cases (used the model on Delta as well) from the total positive percentage. Also, the author assumed day 0 to be November 22, 2021, since the first detected case in the US is a traveler who returned from South Africa on November 22 [22].
As shown in Figure 5, in which the vertical and horizontal axes respectively represent infected population percentage and time, the red line is the data the author used to generate our prediction (green line), while the purple line represents the real-life data of the ensuing days. Our prediction was generated on January 6 and based on the data before that day. Our model accurately predicted the peak of the infected percentage and produced a projected infection curve that is almost identical to that in real life. The dotted lines mark the error of our predicted trajectory, which, shown in the picture, is negligibly small.
Figure 6 is the yielded real time reproduction number of Omicron in the US as a function of time by the EPSEIRV model. This concludes that the R0 value of Omicron in the US is approximately 18.8, and its real-time reproduction number (R(t)) decreases as a larger portion of the population becomes infected or removed. This is a reliable R0 value because the infection curve of this R0 value perfectly models the actual infection data. (Note: R0 is a concept in epidemiology that estimates an infectious agent’s propensity for epidemic transmission. Simply explained, the R0 value of a disease means how many secondary infections were caused by the very first case in a fully susceptible population, while R(t) is defined as the average secondary infections caused by one patient relative to time [23].) An R0 value of 18.8 is extremely high for a disease, which is consistent with its absurdly quick circulation speed in the US. Before the author published our data on Medium.com, the CDC announced that the R0 value of Omicron was about 75. Now, with more projects examining the R0 number of Omicron in the US, an R0 number above 13 for Omicron has become somewhat of a consensus, reaffirming our produced result.
In contrast, Figure 7 shows the performance of the original SEIRV model when α, according to its published essay, is set to 1.2. All other parameters were set the same way as the EPSEIRV model. While also using the same amount of data, the SEIRV model performs significantly worse than our proposed EPSEIRV model (the shaded region also represents its prediction error). Apparently, Omicron’s absurdly high contagiousness is not well considered in the 1.2 value of alpha in the original model.
Figure 6 models the increase of accumulative positive Omicron cases in the United States; it is found that the increase of accumulative positive Omicron cases will be slow until late December 2021. However, it will increase rapidly by March 2022, which more directly shows that most of the population will be infected with Omicron before the end of March 2022. After that, the cumulative number of cases will not increase and level off.
Since the existence of a mutant is hypothetical, the author can’t determine the specific parameters of it. However, the model still produced illustrative results. Assuming that on March 5, 2022, a new mutant of Sars-CoV-2 appears, thus setting day 1 to March 5, 2022, the author obtained Figure 7. The purple line represents the Removed category, the orange line is Infectious1, the minuscule green tip is Infectious2, and the almost straight red line is Infectiousa. This study tested with a range of input parameters, from absurdly high transmission rate to low transmission rate and from long infectious periods to short infectious periods. Nonetheless, it had a negligible impact on the growth of the number of patients in each Infectious compartment, for both strands died out very quickly in all scenarios due to the large number of removed individuals. Thus, under the assumptions of the model, it is not likely that a new mutant of Omicron can prevail. In conclusion, assuming that no drastic changes happen to the US population, for a new variant to prevail, it has to overcome the immunity against Omicron.
For demonstration and testing purposes, the author ran the system based on Omicron data prior to January 6, 2022. Figure 8 shows the need for hospitalization resources in each state between 50-60 days since the first case. Users of the system can input different day numbers to access the estimation for that day with a slider. Again, the author only finished the Northeast region of the United States, for it is time-consuming to enter the parameters mentioned in Methods section. Future work may expand the system to the US or even around the world.
The COVID-19 pandemic substantially impacts almost all countries in multiple aspects and can spread to populations worldwide due to its high contagiousness. Therefore, it is a serious challenge for us to quickly address some of the most basic problems with emerging viruses.
In this study, the author focused on the many issues of the current disease transmission department that were exposed in the uncontrollable COVID-19 pandemic. The author built the EPSEIRV model, designed the SI3R model to expand our understanding of mutant competition, programmed the IHOV model to project the demand for inpatient resources, and created the overall PanDict system, which informs the public about infection prediction and variant emergence, and local resource shortages when new viruses emerge. The author modified the original SEIRV model to significantly improve its accuracy by eliminating the inaccurate index, α, and introducing population density and exposure time into the system of equations.
In addition, the EPSEIRV model can be further improved by moving people from the Removed category back to the Susceptible compartment to address reinfection. Secondly, the SI3R model supports mutant competition simulation, which can be used to better understand and predict the emergence of new variants. Besides, the IHOV model uses the generated predictions to estimate the local demand for inpatient resources. Based on this estimation, the proper arrangements of limited health resources can significantly reduce the staffing and resource shortages that hospitals experienced during the Omicron outbreak.
As a result, our system can minimize inpatient resource shortages and reduce unnecessary economic losses and human casualties when new viruses appear. This will be good for the public and the health sector to better prepare for the emergence of new variants.
The relevant code is publicly available on GitHub at: https://github.com/123Bohan/LBH.git.