Predicting COVID-19 Cases and Subsequent Hospital Burden in Ohio OSU / IDI∗ COVID-19 Response Modeling Team April 7, 2020 1 Introduction Coronavirus 2019 (COVID-19) disease has resulted in 1.13 million confirmed cases and 60,115 deaths reported globally as of 4 April 2020 [31]. As evidenced by epidemics in other countries, particularly in Italy and Spain, COVID-19 patient case loads have the potential to overwhelm healthcare systems [15]. The Ohio State University IDI working in conjunction with CPH1 , OSU Department of Mathematics, and SI2 established a working relationship with ODH3 to act as a service to the State, beginning in 2018. Based on this initial collaborative relationship, the EEPH program4 within IDI took the initiative to offer epidemic modeling and decision analytics support to ODH preparatory planning and response to the COVID-19 pandemic, specifically the Ohio epidemic. Just as in any computational modeling effort, there are many approaches and methods to choose from to model emerging epidemics. Different methods are suited to different situations in terms of data, scenarios, conditions, and urgency. This is no different for modeling the COVID-19 pandemic, either nationally or for the state of Ohio. Modeling approaches include: compartmental models [2,12,30], statistical models [17], and agent-based simulations [14]. For the state of Ohio, the OSU-IDI group has approached the modeling challenge on two fronts by developing: 1) predicted statewide estimates of a times series of COVID-19 incidence, and 2) a geographic component that transforms the output of the statewide model to hospital burden by county or “hospital catchment” area. 2 Methods 2.1 Model Overview The predictive statewide model for Ohio is comprised of a dynamic network model [23], where nodes correspond to individuals and network edges correspond to connection through proximity. We highlight three key distinguishing features of the statewide model: 1. The model considers a dynamic network where the edges can be deactivated over time— supporting social distancing impacts more accurately. ∗ Infectious Disease Institute - The Ohio State University College of Public Health - The Ohio State University 2 Sustainability Institute - The Ohio State University 3 Ohio Department of Health 4 Ecology Epidemiology and Population Health 1 1 2. We use a law of large numbers to derive a set of differential equations that describes the disease process on a large network [19] without requiring simulation methods. 3. The solutions of these differential equations can be used to estimate model parameters using a principled statistical approach based on survival analysis. This approach is based on writing an explicit likelihood for these parameters given data on times of illness onset [29]. Critically, this allows for more accurate quantification of the uncertainty in the model predictions. Because of these features, this approach retains the tractability of an analytical model while incorporating features of complex contact networks to better represent social interactions and distancing. The predictive statewide model takes data on illness onset dates of confirmed cases as input and produces estimates of COVID-19 incidence (i.e., new cases) in Ohio over time. This output is not age-stratified, but the age distribution of the new cases is assumed to match the age distribution of confirmed cases when predicting the number of hospitalizations, which occurs in the next step of the model. Estimates of the number of hospitalized COVID-19 cases are derived in the geographic portion of the model. Because illness severity and the risk of hospitalization for COVID-19 patients varies by age and comorbidities [10, 28], we use age structure and population density to distribute case counts from the statewide epidemic model across smaller geographic areas within the state. We use the age distribution within each geographic unit to predict hospitalization rates over time. Consequently, counties or hospital catchment areas that have a different demographic structure (e.g., older or younger populations) will differ in their COVID-19 hospitalizations over time. 2.2 Detailed Methods of Statewide Model There are challenges to using traditional compartmental models [27] to estimate future COVID-19 incidence. This includes challenges in estimating the effective population size on which the epidemic takes place (directly applying compartmental models using e.g. the entire state’s population size can lead to dramatic over-estimates of illness), and lack of data on mild or asymptomatic infection. In our statewide model, we use an approach called Dynamic Survival Analysis (DSA), which is an extension of survival dynamical systems [6, 29]. Details on this approach and the model development can be found in Appendix A. Three key strengths of the DSA approach for predicting novel virus epidemics such as COVID-19 are: • It does not require knowing the size of the susceptible population, • It does not require information on overall disease prevalence in the population, • It does not require prior knowledge of the shape of the epidemic curve. Since testing has focused on the most symptomatic and severe cases, we have almost no information on the number of asymptomatic infections or those with less severe symptoms who are not tested. Because of the novel nature of this virus and the resulting pandemic, analysis of the epidemic curve cannot be based on previous epidemics caused by other viruses such as SARS-CoV-1 or influenzas. Because it relies on illness onset times rather than counts of new cases, the DSA method can incorporate a partial epidemic curve like the one produced by testing for COVID-19 within Ohio, which is affected by both limited testing capacity and undetected asymptomatic infections. The DSA approach has been developed to handle such limited data in a manner that allows accurate quantification of uncertainty. The method is strengthened by its simplicity; in 2 most practical circumstances (see [19]), it requires only a single differential equation. Parameters for the model are optimized using empirical temporal data on new illnesses, and the model output is a time series of expected future illnesses. The optimization is accomplished using maximum likelihood estimation (MLE). This method provides for a consistent estimator for both small and large data sets, a valuable asset in numerical optimization. This optimization is done using standard quasi-Newtonian algorithms that reach convergence using the negative log-likelihood, which is more numerically stable than the likelihood function itself. Due the availability of reliable and efficient MLE algorithms in Python and other languages, the MLE is a commonly used method of function optimization. Our predictive statewide model provides robust estimates of cases over time given partially observed daily counts of new illnesses. This is ideal in a setting where testing capacity is limited or changing due to constraints on lab capacity or detection limits. The counts of new illnesses are often known as an observed epidemic curve in the literature [32]. Our approach is derived from the general stochastic model of a pathogen spread across a contact network where the nodes represent individuals in a community [19]. As a working model of a contact network we use a type of random graph called a dynamic configuration model (CM) [7], which is sometimes referred to as a pairwise model [18]. Assumptions. All models have specific assumptions used to develop and implement them. First, we assume that each individual in the network (network node) has a number of neighbors (their degree) drawn from a degree distribution. Subsequently, local Markovian infectious pressure changes their status from susceptible (S ) to infective (I) to removed (R). We further assume that the R individuals are no longer able to transmit the infection and cannot be reinfected—an unknown, but readily assumed component of contemporary models. For the dynamics of the network model, individuals transition between the S, I, and R compartments based on the following principles that allow the extraction of a mathematical model of the ongoing epidemic based on observable data: 1. Disease spread occurs over a network of contacts, that is, an infectious individual can infect their immediate neighbors at a fixed rate (the rate must be greater than 0). It is implicitly assumed that the average number of a person’s contacts is also greater than 0. 2. Each infected individual recovers (or dies) from infection at rate that is > 0, or is restricted from contacting their network neighbors through mandatory or voluntary isolation or social distancing of its neighbors at a rate that is greater than 0. 3. Once infected, an infected individual has an infectious period that has an exponential distribution. 4. People who are ill remain infectious, and a partial count of new illnesses is observed over time with a negligible chance of misdiagnosis (i.e., false positives). Estimating Transmission Rates. The complete model description and detailed development can be found in Appendix A. In brief, the statistical model is used to estimate the number of people and timing of transfer between states S, I, and R. Then through substitution, a term St , the number of susceptible people at time t is developed. Embedded within this St is an improper survival function for the time to the onset of illness in a random susceptible person. Using this survival function approach, the time to infection of a randomly selected susceptible person within a large population follows a temporal pattern determined by probability laws. After we estimate the time series of susceptibles, we can estimate the probability of a randomly selected susceptible 3 individual being infected during the lifetime of an epidemic. From this we develop a conditional probability of remaining susceptible past time t. Estimating Dropout and Recovery Rates. The impact of social distancing is accomplished via a generalized approach to a person being dropped from the the network. As has been visualized elsewhere, when a person is removed from the network, then their neighbors and contacts within the network are removed, which limits transmission. This is accomplished by estimating the rate of infectious contact within a network (β̃ in Appendix A), and then considering drop outs as a function of recovery rate (details in Appendix A). 2.3 2.3.1 Geographic Modeling Methods Estimating Hospital Census over Time After the statewide model produces time series estimates of cases across the state, these are then translated into estimates of hospitalizations and subsequent ICU admission. Details for this portion of the model are given in Appendix B. The conversion of statewide illness onsets from the statewide model to hospital census over time in each geographic region involves three steps: 1. Estimation of case onset time series for each age group based on age stratification in each geographic area modeled. 2. Estimation of the number of severe cases that will need hospitalization, using age and other risk factors [10, 11]. 3. Use probability distributions for time from illness onset to hospitalization, and for length of stay in hospital, to estimate hospital census numbers over time. This provides a means of modeling the time lag between symptom onset and when hospitalization would occur based on observed rates of hospitalizations nationally, and also incorporates different lengths of stay for non-ICU vs. ICU patients. Once daily hospital counts were estimated, we compared these to the reported number of COVID available beds pooled across all hospitals in a geographic area. This allowed us to understand when and where hospital bed need might exceed current capacity. 2.3.2 Description of Data Demographic data. Demographic data on the age distribution for Ohio counties and ZIP Codes were obtained from the U.S. Census Bureau’s 5-year American Community Survey 2014-2018 estimates [8]. The ACS data were used because the sample size is large enough for small geographies for reasonable standard errors and stable estimates. The Census Bureau does not develop data products for the USPS ZIP Codes. Rather, ZIP Code Tabulation Areas (ZCTAs) are generalized areal representations of United States Postal Service (USPS) ZIP Code service areas. We mapped the Census ZCTAs to ZIP Codes and used population estimates for ZCTAs. We used table B01001 which breaks out population counts by sex and 5 year age groups. Figure 1 shows the distribution of the high risk population (age 55+) by county and Hospital Catchment Area in Ohio (definition of Hospital Catchment Areas provided in Section 2.3.3). 4 Figure 1: Proportion of the population over the age of 55 in A) counties and B) hospital catchment areas. OHA Hospital and Bed Data. The Ohio Hospital Association (OHA) provided data on the location of all hospitals in the state and the number of registered beds for each facility. Beds were broken out into several categories: Airborne Isolation, Critical Care, General Medical/Surgical Beds, and Extracorporeal Membrane Oxygenation (ECMO) beds. These bed types represent two levels of care required for COVID-19 patients: general hospital care for severe disease, and ICU beds for patients requiring ventilation. The OHA data contains information that constitutes a trade secret under Ohio Revised Code Section 1333.61. Description of OHA hospital market share data. OHA provided data on hospital market share derived from administrative hospital claims from the past year (January 2019-December 2019, inclusive). For each hospital, patient encounters were grouped by patient ZIP code. A hospital’s market area included ZIP codes that represented the top 80% of all encounters at that hospital. 2.3.3 Definition of Hospital Catchment Areas We developed small area estimates from state-level model predictions for counties and hospital catchment areas (HCA). Boundary files for the 88 Ohio counties were obtained from the U.S. Census Cartographic Boundary Files [9]. We developed hospital catchment areas using an approach modified from the Dartmouth Atlas Project [1]. We define a hospital catchment area as a collection of ZIP codes whose residents receive most of their hospital care from the hospital(s) in that region. Figure 1-B shows the HCAs developed and mapped. Note that in figure 1-B there are HCAs that cross over into neighboring states. This is explained in more detail below. HCA definitions depend on the integrated use of geospatial methods that grouped each ZIP code in the state with the most geographically proximate hospital and modified these groupings 5 using data on hospital market share by ZIP code. Only hospitals with acute care beds that could be used for COVID-19 patients were included in the analysis. We excluded facilities such as long-term acute care (LTAC), hospice, orthopedic, rehabilitation or psychiatric/behavioral health hospitals and freestanding ERs. We defined HCAs using three steps: 1. The locations of all hospitals in Ohio, and those in Michigan, Indiana, West Virginia, Kentucky and Pennsylvania located on the border with Ohio, were used to generate a Voronoi diagram with hospitals as generating points. [25] We included hospital in neighboring states in the Voronoi analysis to avoid edge effects which would attribute patients to Ohio hospitals that typically use hospitals in other states. This yielded 233 distinct areas, one for each hospital generator. 28 areas were subsequently deleted because they included no area within OH. 2. Voronoi polygons were overlaid with ZIP codes. ZIP codes were assigned to the Voronoi polygon if their centroid fell within the polygon. This ensured that that each ZIP code was associated with the most geographically proximate hospital and divided the state into groupings of ZIP codes assigned to each hospital. 3. Using the OHA hospital market share file, we examined the level of agreement between each Voronoi polygon and market share ZIP codes for each hospital. In cases where adjacent Voronoi polygons were generated by hospitals which also shared 60% or more of their market share ZIP codes, we aggregated these polygons to create one hospital catchment area. In large metropolitan areas with many hospitals, we aggregated groupings of Voronoi polygons to create regional catchment areas. In rural areas, this typically resulted in aggregating two adjacent polygons. Using this procedure, we generated 96 Hospital Catchment Areas for the state of Ohio. Some HCAs included ZIP codes from neighboring states, and some Ohio ZIP codes were included in HCAs for non-Ohio hospitals. HCAs are shown in Figure 2. 3 3.1 Discussion Comparison with Other Approaches Although, due to the significant differences in approach, it is difficult to directly compare our predictive model to other contemporary methods, some indirect comparisons may be offered. The susceptible-infectious-recovered (SIR) framework is the basis for many COVID-19 epidemic models [2, 12, 22, 30]. Several websites provide implementations of the SIR framework and present scenarios with different interventions such as social distancing affecting the transmission parameter β [2, 12], thus allowing for different hypothetical scenarios to be explored. Note that the I variable in the SIR model corresponds to infections, rather than symptomatic cases. Consequently, knowledge of asymptomatic to symptomatic infections is needed to translate output from these models to cases and hospitalizations. Other models such as [12] extend the basic SIR model to include additional compartments corresponding to incubation, presymptomatic infectious, stages of symptomatic infectious, and hospitalization. Parameterizing these models is a challenge, given the lack of clear clinical information and detailed data about these compartments. The current online SIR tools [2,12] are suited for scenario exploration, like illustrating flattening of the curve under different levels of social distancing compliance. Unfortunately, they are less 6 Figure 2: Estimates of hospital burden mapped to hospital catchment areas for 6 days during the epidemic. These estimates are based upon ODH case onsets through March 19, 2020. suited for forecasts as model parameters are not calibrated based upon case or outcome data. Pei et al [22] use a metapopulation approach with SEIR dynamics in each patch and estimate model parameters (including proportion of asymptomatic infection) based upon case counts from China. This approach gives a spatially explicit model without age-structure. Conversely, [30] gives an example of a model with age structure but without space: Weitz extends the SEIR framework to include age groups, and estimates model parameters based upon hospitalization and death counts from Georgia. An alternative approach [17] also uses mortality data as model inputs, but takes a purely statistical approach in fitting a sigmoidal curve to cumulative COVID-19 deaths using a mixed-effects model. Another alternative is an agent-based model such as that used in [14]. Similar to [22], [14] forecasts a very large number of infections with COVID-19. Software The software to perform model fit along with a data example is available from [21]. 7 Additional Materials The video presentation describing the methodology used for epidemic size predictions is available from [26]. 4 Appendix A - Predicting Statewide Cases of COVID-19 4.1 Detailed Description of Statewide Model Development This derivation is taken in whole from the manuscript in preparation [5]. We will outline the derivation of a simple but powerful general modeling framework that provides robust estimates of the quantities relevant to monitoring local outbreaks where only limited amount of information is available through partially observed daily counts of new (symptomatic) infections - e.g. illnesses. Our approach is derived from the general stochastic model of a contagion spread across a contact network of nodes representing individuals in a community [19]. As a working model of a contact network we use a dynamic configuration model (CM)-type random graph (e.g., [7]). Such a model is often also referred to as a pairwise model [18]. Briefly, we assume that each node has their degree and that the nodes may change their status from the initial “Susceptible” (S) to “Infective” (I) (or “Infectious”) and, finally, to “Removed” (R), according to their local Markovian infectious pressure (hazard). We assume that the “Removed” individuals are no longer able to pass infection and cannot be reinfected. 4.2 Network Dynamics Assumptions The dynamic model of individuals transition between the states S,I and R is based on several simple principles that allow to extract a mathematical model of the ongoing epidemic and relate it to observable data: 1. the spread occurs over a network of contacts, that is, an infectious individual may only infect his/her immediate neighbors at fixed rate β > 0; it is assumed that the average number of node’s contacts is µ > 0 2. the infected individual may recover at rate γ > 0 or be restricted from contacting his/her network neighbors either through mandatory or voluntary quarantine at rate δ > 0 3. the infected individuals stay infected for a random amount of time according to an exponential distribution 4. the symptomatic infectives are infectious and the partial count of new infectives is observed over time with a negligible chance of misdiagnosis (false positives). We note that the model as described here extends previous work studying epidemics on CMrandom networks (e.g., [24]) and that this formulation above can also account for extensions of the basic SIR compartments to include a latent period (up to 14 days for COVID-19, see [27]), as well as more general staged progression models [16]. Instead of directly analyzing the stochastic CM model described above, which is challenging due to the heterogeneity in the number of contacts and the connectivity structure evolution (e.g., [4,20]), we will make use of the general results on the mean field approximation [3,13], and the convergence of the random infection hazard in large networks. Specifically, as shown in [19] under the assumption of the Poisson-type degree distribution (for instance the node degree is Poisson or negative-binomial 8 distributed), the mean field approximation of the dynamics is given by the set of equations (dots denote time derivatives) (4.1) ẋS = −βxD xS ẋI = βxD xS − γxI ẋD = β(1 − κ)x2D + κµβxS2κ−1 − γ̃ xD where: the pair (xS , xI ) describes the relative number of susceptibles and infected, xD = xSI /xS is the relative density of infectious connections, κ is the average contact network density5 , and γ̃ = β + γ + δ. The usual initial conditions are xS (0) = 1; xI (0) = ρ > 0; xD (0) = µρ. 4.3 Estimating Reproduction Rate For the purpose of statistical analysis of the system (4.1) we make an assumption that only the empirical counts of the new infected are available in practice. Dividing last equation in (4.1) by the first one, solving for xD in terms of xS and substituting back into the first equation we obtain the reduced system with only one equation describing the decay of susceptibles. To simplify notation, denote St := xS (t) to obtain (4.2) − Ṡt = β̃(1 − Stκ )Stκ + γ̃ St (1 − Stκ−1 ) + ρ̃Stκ 1−κ where: S0 = 1 and ρ̃ = βµρ, γ̃ = β + γ + δ, and β̃ = µβ. Note that the equation (4.2) is defined for κ = 1 by taking the limit κ → 1. The value of the basic reproduction number R0 for both reduced (4.2) and full (4.1) system is R0 = κβ̃/γ̃. The condition κ = 1 implies the Poisson degree assumption for the pairwise model and reduces (4.2) to (4.3) − Ṡt = β̃ St − St2 + γ̃ St log (St ) + ρ̃St and S0 = 1. Instead of thinking about St as a proportion of susceptibles it is convenient to think about St as an improper survival function of the time to infection of aR single random susceptible. Then St ∞ has an improper density −Ṡt (see [29]). It is improper, since 0 −Ṡt = 1 − S∞ = τ < 1 where τ is defined below (see also [6], Example 2). Under this survival function interpretation, the infection time for a randomly selected initially susceptible individual (in an infinite population) follows a temporal patterns according to the probability law St given by (4.2) or (4.3). When we observe only a partial epidemic trajectory, say until time T , then the observed infection time is conditional on the infection occurring by time T , that is, on an event that has probability τT = 1 − ST . It is easy to show that as T → ∞ then τT → τ∞ = τ the probability of a randomly selected susceptible individual being infected during the lifetime of an epidemic. (One may think about τ also as the final proportion of infected in the epidemic in infinite population). Then the conditional density of symptom onset is fT (t) = −Ṡt /τT 5 The network parameter κ > 0 is defined as the ratio of network mean excess degree and mean degree, see [19] for details. It is known that κ = 1 corresponds to the Poisson degree network whereas κ > 1 corresponds to the negative binomial one. 9 which is simply the scaled derivative of the probability of staying susceptible past time t (denoted St ). Accordingly, setting θ = (κ, β, γ, ρ) the approximate likelihood of the joint symptom times (epidemic curve) of n observed new cases by current time T in an infinite population [29]) is given by n Y (4.4) L(θ t1 , . . . , tn , T ) = fT (ti ) i=1 Although the expression above looks simple note that the function fT (t) depends upon the vector of parameters θ only implicitly through the differential equations (4.2) or (4.3). 4.4 Estimating Dropout and Recovery Rates Recall γ̃ = β + γ + δ. Given γ̃, we may estimate the recovery rate γ from the recovery density. Then assuming β is negligible, we have the approximate expression for drop-out δ ' γ̃ − γ. To estimate γ we consider now the recovery density. As shown in [29], the density of daily recovery times is given by Z t g(t) = f∞ (u)γe−γ(t−u) du. 0 For practical model fitting, a shift parameter ε ∈ R may be needed as ḡε (t) = R ∞ 0 g(t + ε ∧ 0) . g(u + ε ∧ 0) du The overall density of recovery is then the following mixture 1 ρ ḡε (t) + γe−γt . 1+ρ 1+ρ The analogous likelihood as in (4.4) can be now produced using the conditional recovery density g̃(t) = ğT (t) = R T 0 (4.5) g̃(t) for t ∈ [0, T ]. g̃(t) dt L(γ t1 , . . . , tk , T ) = n Y ğT (ti ) i=1 This likelihood is used to estimate γ directly conditional on estimated St (and thus f∞ ). 4.5 Estimating Size of an Outbreak We assume that the size of an outbreak k∞ is a fixed integer representing the likely number of total infections in the contact network of the confirmed cases only. Hence k∞ is not the prevalence of the disease in the population but rather just an estimate of the total outbreak size in the community of n individuals where we see infections. We estimate n at any given time T by the discount estimator n̂T = kT /(1 − ST ) where kT is the number of cases observed by time T . Then we estimate the total number of cases by the end of an epidemic as τ kT k̂∞ = 1 − ST where τ = 1 − S∞ is the final probability of infection defined in Section 4.3. 10 4.6 Prediction and uncertainty quantification We follow a semi-Bayesian approach for prediction and uncertainty quantification. We assume (independent) non-informative priors for the parameters β̃, γ̃, and ρ̃. This choice of prior distributions has the following important implication: the maximum likelihood estimates (MLEs) of β̃, γ̃, and ρ̃, obtained by numerically maximizing the likelihood function in (4.4), correspond to the mode of the posterior distribution. We then estimate the asymptotic covariance matrix of β̃, γ̃, and ρ̃ using the bootstrap method. Having estimated the mode of the posterior distribution and the asymptotic covariance, we can now perform a Laplace approximation to the posterior around the mode. This basically allows us to approximate the posterior distribution locally by a gaussian distribution, from which we can now draw posterior samples. While the cumulative epidemic curve corresponding to the MLEs obtained as a solution to (4.3) gives us the most likely trajectory, the posterior samples of β̃, γ̃, and ρ̃ are used to generate a pointwise Monte Carlo confidence interval around the most likely trajectory. Therefore, in essence, we generate predicted trajectories corresponding to the posterior samples and then compute appropriate quantiles at desired time points to get the confidence interval. 5 Appendix B - Estimating Hospitalizations from Statewide Predictions of Case Numbers Let C(t) be the trajectory of case onset times as generated by the statewide model. We wish to translate this to hospital census hi (t) for a geographic region i. For simplicity of exposition, we do not discuss separate types of hospital beds (e.g. ICU vs. non-ICU) here. Extension to separate bed-types is straight-forward. Consider an age group P a ∈ A. Let ni,a be the number of individuals in age group Pa in geographic region i, and let ni = a∈A ni,a denote the total population size of i. Let N = i ni denote the entire population size of Ohio. Converting C(t) to hi (t) involves the following steps: (i) Estimating the case onsets ci,a (t) for age group a in geographic region i. (ii) Deriving from this the onsets of severe cases si,a (t) that will eventually require hospitalization by age group and geographic region. (iii) Using probability distributions for time from case onset P to hospitalization and length of stay to estimate the hospital census hi,a (t), where hi (t) = a∈A hi,a (t). Estimating case onsets by age group and geographic region. We consider the following factors in estimating ci,a (t) from total case onsets C(t): the population size of i, age structure of i, and the relative likelihood by age of being identified P as a COVID-19 case. We first compute the total case onsets ci (t) := a∈A ci,a (t) for i in proportion to the population size of i: (5.6) ci (t) = ni C(t). N To distribute these case onsets by age, define the relative susceptibility sa of age group a as the ratio of the proportion of observed cases in a relative to the proportion of the state’s population that is in a. 11 A relative susceptibility of one corresponds to the proportion of observed cases across the state matching what would be expected if cases were distributed uniformly at random across individuals. Relative susceptibilities larger than one correspond to more cases identified in a than would be expected at random, while relative susceptibilities less than one correspond to fewer identified cases in a than would be expected at random. For the Ohio data we observe far fewer cases in youth (ages less than 21) than would be expected at random, but more cases in the elderly. We then compute ci,a (t) by multiplying ci (t) by a probability incorporating the relative susceptibility of a and the age structure of i: na,i sa (5.7) ci,a (t) = ci (t) P . `∈A ni,` s` Onset of cases that will eventually require hospitalization. It has been documented that case outcome varies strongly with age [11]. We use the mean of the probabilities reported in [10] of severe case outcome by age group. Let pa be the probability of severe case outcome for age group a. Then (5.8) si,a (t) = pa ci,a (t). Refinements include incorporating gender and other demographic covariates including comorbidities. We are actively studying these for the Ohio data and will incorporate our findings into future versions of the model. Hospital census over time. The trajectories si,a (t) correspond to onset times of cases that will eventually require hospitalization. To translate these into hospital census of COVID-19 patients, let wa (t) be the probability distribution for time between case onset and hospitalization, and qa (t) the probability distribution for length of stay in hospital for age group a. Then the convolution of si,a (t) with wa (t) gives hospital admissions over time, and the convolution of admissions with length of stay gives hospital occupancy: hi,a (t) = (si,a ∗ wa ∗ qa )(t). (5.9) References [1] The Dartmouth Atlas #research-methods-faq. of Healthcare. https://www.dartmouthatlas.org/faq/ [2] COVID-19 Hospital Impact Model for Epidemics (CHIME). https://penn-chime.phl.io/, 2020. [3] Frank Ball and Peter Neal. Network epidemic models with two levels of mixing. Mathematical Biosciences, 212(1):69–87, 2008. [4] Maurice Stevenson Bartlett. Methuen, 1960. Stochastic population models in ecology and epidemiology. [5] Caleb Deen Bastian, Wasiur R. KhudaBukhsh, Yuhan Pan, Eben Kenah, and Grzegorz A. Rempala. Predicting the size and duration of the outbreaks of COVID-19 under minimal assumptions. Technical Report, The Ohio State University College of Public Health, April 2020. 12 [6] Caleb Deen Bastian and Grzegorz A Rempala. Throwing stones and collecting bones: Looking for poisson-like random measures. Mathematical Methods in Applied Sciences (Early View), https://doi.org/10.1002/mma.6224, 2020. [7] Béla Bollobás. Random Graphs. Springer, 1998. [8] US Census Bureau. programs-surveys/acs. [9] US Census Bureau. mapping-files.html. American community Mapping files A. survey. https://www.census.gov/ https://www.census.gov/geographies/ [10] CDC COVID-19 Response Team. Preliminary estimates of the prevalence of selected underlying health conditions among patients with Coronavirus Disease 2019 — United States, February 12–March 28, 2020. Morbidity and Mortality Weekly Report, 69:382–386, 2020. [11] CDC COVID-19 Response Team. Severe outcomes among patients with coronavirus disease 2019 (COVID-19) — United States, February 12–March 16, 2020. Morbidity and Mortality Weekly Report, 69:343–346, 2020. [12] M. Childs, M. Kain, D. Kirk, M. Harris, J. Ritchie, L. Couper, I. Delwel, N. Nova, and E. Mordecai. Potential long-term intervention strategies for COVID-19. https://covid-measures.github.io/, 2020. [13] Richard Durrett. Random graph dynamics. Cambridge University Press, 2007. [14] N. M. Ferguson, D. Laydon, G. Nedjati-Gilani, N. Imai, K. Ainslie, M. Baguelin, S. Bhatia, A. Boonyasiri, Z. Cucunubá, G. Cuomo-Dannenburg, A. Dighe, I. Dorigatti, H. Fu, K. Gaythorpe, W. Green, A. Hamlet, W. Hinsley, L. C. Okell, S. van Elsland, H. Thompson, R. Verity, E. Volz, H. Wang, Y. Wang, P. G. T. Walker, C. Walters, P. Winskill, C. Whittaker, C. A. Donnelly, S. Riley, and A. C. Ghani. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. https:/doi.org/10.25561/77482, 2020. [15] G. Grasselli, A. Pesenti, and M. Cecconi. Critical care utilization for the COVID-19 outbreak in Lombardy, Italy. Lancet, doi:10.1001/jama.2020/4031, 2020. [16] H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599–653, 2000. [17] IHME COVID-19 health service utilization forecasting team and Christopher JL Murray. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months. medRxiv, 2020. [18] Péter L Simon et al. István Z Kiss, Joel C Miller. Mathematics of epidemics on networks. Cham: Springer, page 598, 2017. [19] K. A. Jacobsen, M. G. Burch, J. H. Tien, and G. A. Rempala. The large graph limit of a stochastic epidemic model on a dynamic multilayer network. Journal of Biological Dynamics, 12(1):746–788, 2018. [20] M. Kermack and A. G. Mckendrick. Contributions to the mathematical theory of epidemics. Part I. Proc. R. Soc. A, 115:700–721, 1927. 13 [21] Wasiur R. KhudaBukhsh and Caleb D. Bastian. Python Code for DSA-based Modeling. https: //github.com/wasiur/dynamic_survival_analysis. [22] R. Li, S. Pei, B. Chen, Y. Song, T. Zhang, W. Yang, and J. Shaman. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (COVID-19). medRxiv, https://doi.org/10.1101/2020.02.14.20023127, 2020. [23] M. E. J. Newman. Networks. Oxford University Press, 2nd edition, 2018. [24] Mark EJ Newman. The structure and function of complex networks. SIAM review, 45(2):167– 256, 2003. [25] A. Okabe, B. Boots, , and K. Sugihara. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. Wiley, 2nd edition, 2000. [26] Grzegorz A Rempala. Mathematical models of epidemics: Tracking coronavirus using dynamic survival analysis. https://mbi.osu.edu/events/ seminar-grzegorz-rempala-mathematical-models-epidemics-tracking-coronavirus-using-dynamic. [27] Aimee Taylor Rene Niehus, Pablo M De Salazar and Marc Lipsitch. Quantifying bias of COVID-19 prevalence and severity estimates in Wuhan, China that depend on reported cases in international travelers. medRxiv, 2020. [28] R. Verity, L. C. Okell, I. Dorigatti, P. Winskill, C. Whittaker, N. Imai, G. Cuomo-Dannenburg, H. Thompson, P. G. T. Walker, H. Fu, A. Dighe, J. T. Griffin, M. Baguelin, S. Bhatia, A. Boonyasiri, A. Cori, Z. Cucunubá, R. FitzJohn, K. Gaythorpe, W. Green, A. Hamlet, W. Hinsley, D. Laydon, G. Nedjati-Gilani, S. Riley, S. van Elsland, E. Volz, H. Wang, Y. Wang, X. Xi, C. A. Donnelli, A. Ghani, and N. M. Ferguson. Estimates of the severity of coronavirus disease 2019: a model-based analysis. Lancet Infectious Diseases, doi.org/10.1016/S1473-3099(20)30243-7, 2020. [29] Eben Kenah Wasiur R KhudaBukhsh, Boseung Choi and Grzegorz A Rempala. Survival dynamical systems: individual-level survival analysis from population-level epidemic models. Interface Focus, 10(1):20190048, 2020. [30] Joshua S. Weitz. COVID-19 near-term epidemic risk assessment for Georgia. https://github. com/jsweitz/covid-19-ga-summer-2020, 2020. [31] World Health Organization. Coronavirus disease 2019 (COVID-19) situation report – 70. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200330-sitrep-702020. [32] Zunyou Wu and Jennifer M McGoogan. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72,314 cases from the Chinese Center for Disease Control and Prevention. Journal of the American Medical Association, 2020. 14