National and Regional Influenza-Like-Illness Forecasts for the USA

There is demand from national health planners for forecasts of key metrics associated with influenza-like-illness (ILI); near-term weekly incidence, week of season onset, week of peak, and intensity of peak. Here, we describe our participation in a weekly prospective ILI forecasting challenge for the United States for the 2016/17 season and a subsequent evaluation of our performance. We implemented a meta-population model framework with 32 specific model variants. Model variants differed from each other in their assumptions about: the force-of-infection (FOI); use of uninformative priors; the use of discounted historical data for not-yet-observed time points; and the treatment of regions as either independent or coupled. Outputs from model variants were chosen subjectively as the the basis for our weekly forecasts. Coupled models were only available part way thought the season. Most frequently, during the 2016-17 season, we chose; FOI variants with both school vacations and humidity terms; uninformative priors; the inclusion of discounted historical data for not-yet-observed time points; and coupled regions (when available). Our near-term weekly forecasts substantially over-estimated early incidence when coupled models were not available. However, our forecast accuracy improved substantially once coupled solutions were available. We were able to forecast onset and peak timing and peak intensity with reasonable accuracy but without a long lead time. When we conducted retrospective forecasts for six previous seasons for which data were available, we found that the 2016/17 season was not typical: on average, coupled models performed better when fit without historically augmented data. We were able to substantially improve accuracy during a prospective forecasting exercise by coupling dynamics between regions. Although the reduction of subjectivity should be a long-term goal, some degree of human intervention is likely to improve forecast accuracy in the medium-term. Author summary Influenza typically infects approximately 4 million people each year, resulting in 400,000 or more deaths. Influenza-like-Illness (ILI) is a practical way for health-care workers to easily estimate likely influenza cases. The Centers for Disease Control (CDC) collects and disseminates this information, and has, for the last several years, run a forecasting challenge (the CDC Flu Challenge) for modelers to predict near-term weekly incidence, week of season onset, week of peak, and intensity of peak. We have developed a modeling framework that accounts for a range of mechanisms thought to be important for influenza transmission, such as climatic conditions, school vacations, and coupling between different regions. In this study we describe our forecast procedure for the 2016/2017 season and highlight which aspects resulted in better or worse forecasts. Most notably, we found that when the dynamics of different regions are coupled together, the forecast accuracy improves. We also found that the most accurate forecasts required some level of forecaster interaction, that is, the procedure could not be completely automated without a reduction in accuracy.


Introduction
Infectious pathogens with short generation times pose public health challenges because 16 they generate substantial near-term uncertainty in the risk of disease. This uncertainty 17 is most acute and shared globally during the initial stages of emergence of novel human 18 pathogens such as SARS [1], pandemic influenza [2], or Zika virus [3]. However, at A map of the continental US colored by the ten HHS regions. The green circle in each HHS region denotes the population density weighted location of the centroid of the region, and the radius of each circle is proportional to the weight of the region which is determined by its relative population. Regions are sometimes referred to by the city in which their HHS office is located: 1, Boston; 2, New York; 3, Philadelphia; 4, Atlanta; 5, Chicago; 6, Dallas; 7, Kansas City; 8, Denver; 9, San Fransisco; 10, Seattle.
transitions to the I compartment in region j) depends on: (1) the risk of infection from 95 those in the same region j, (2) the risk of infection from infected people from region i 96 who traveled to region j, and (3) the risk of infection encountered when traveling from 97 region j to region i. To account for the three mechanisms of transmission, ref [12] 98 defined the force of infection, or the average rate that susceptible individuals in region i 99 become infected per time step as: where D is the total number of regions. In our-case, unlike reference [12], the 101 transmissibility is not the same for all regions and it is allowed to depend on time: β j (t). 102 Given this force of infection we can write the coupled S-I-R equations for each region as: Eqs. (2)(3)(4) are the coupled version of the familiar S-I-R equations, where T g is the 103 recovery rate (assumed to be 2-3 days in the case of influenza). The mobility matrix, 104 m ij of Eq. 1 describes the mixing between regions. Thus, element i, j is the probability 105 for an individual from region i, given that the individual made a contact, that that 106 contact was with an individual from region j. As shown below, the sum over each row 107 in the mobility matrix is one and in the limit of no mobility between regions the 108 mobility matrix m ij is the identity matrix so that λ i (t) = β i (t) Ii Ni and we recover the 109 familiar (uncoupled) S-I-R equations: and its interaction kernel , κ(r ik ): This kernel is expected to depend on the geographic distance between the regions (r ij ), 113 and following Mills and Riley [12] we use a variation of the off-set power function for it: 114 κ(r ij ) = 1 1 + (r ij /s d ) γ (9) where s d is a saturation distance in km and the power γ determines the amount of 115 mixing between the regions: as γ decreases there is more mixing while as γ increases, 116 mixing is reduced. In the limit that γ → ∞there is no mixing between regions and we 117 recover the uncoupled SIR Eqs. (5)(6)(7). The DICE package is designed to allow the 118 estimation of these two parameters (γ and s d ), but they can also be set to fixed values. 119 The S-I-R equations model the total population, but the data are the number of 120 weekly observed cases or incidence rate for each spatial region (I R j ). The weekly 121 incidence rate is calculated from the continuous S-I-R model by discretizing the 122 rate-of-infection termλ j (t)S j (or β j (t)

Sj Ij
Nj in the uncoupled case): scaling by percent clinical p C j , and adding a baseline B j . p C j is the proportion of 124 infectious individuals that present themselves to a clinic with ILI symptoms and B j is a 125 constant that estimates non-S-I-R or false-ILI cases. The integral runs over one week 126 determining the number of model cases for week t i . ∆ t approximates the time delay 127 from when an individual becomes infectious to when they visit a sentinel provider for 128 ILI symptoms and is to 0.5 weeks based on prior calibration [13,14]. Eq. 10 describes 129 how DICE relates its internal, continuous S-I-R model to the discrete ILI data. In the 130 next section we describe the procedure used for fitting this property to an ILI profile.
The first time dependent term, F 1 (t), allows for a dependence of the transmission 135 rate on specific-humidity, the second (F 2 (t)) on the school vacation schedule, and the 136 third (F 3 (t)) allows the User to model an arbitrary we calculate our national forecast as the weighted sum of the regional profiles with the 152 weights given by the relative populations of the regions. It is also possible to fit the 153 national % ILI profile directly, without using any information from the ten HHS regions. 154 In the coupled scenario, the MCMC procedure uses Eqs. (2)(3)(4)  respectively.  This total of 32 model-runs were used to make predictions of incidence at both the 216 national and regional levels. The national curve was also fitted directly (without any 217 regional information) using all the models and priors, but these direct results were only 218 used at the end of the season when estimating the performance of each of our 219 procedures.
with the UP prior only on EW 9. Hence, some of the coupled results reported in this 223 section were not available in real-time and were generated at the end of the season (but 224 using only the %ILI data that was available in real-time at each week.)

225
Each week a single forecast was selected from these results for each of the ten HHS 226 regions and the national. At the regional level we selected a single forecast from one of 227 the (32) uncoupled or coupled procedures enumerated in the previous paragraph. We aggregated result obtained as the weighted average of our ten regional selections (with 234 the weights given by the relative population of each region).

236
Models selected for forecasts 237 We selected different FOI variants during different weeks. At the regional level,  q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q   intensity, the peak week of EW 6 was the same as the historical mean. Between EW50 277 (eight weeks before the season peaks) and EW 4 (two weeks before the season peak) our 278 forecast correctly predicted to within ±1 week of the observed peak week (EW 6). One 279 week before the season peaks, and at the peak week (EWs 5 and 6), our model forecast 280 has an error of two weeks.

281
Forecasts based on the mechanistic model performed better than the historic null (1-, 2-, 3-and 4-weeks ahead), for the prediction of the National %ILI intensity.

302
Coupled models were more accurate than non-coupled models for the 2016-17 season 303 and for historical seasons for all 4 lead times.

304
The performance of mechanistic models was comparable to that of the historical 305 average null model at the beginning and end of the season. However, in the middle of 306 the season when there is greater variation in the historical data, the performance of the 307 best mechanistic model variants was substantially better than that of the historical 308 average model.

We examined the performance of the different model variants for individual regions 317
for the near-term forecasting of %-ILI ( Figures S2 Fig and S3 Fig). Again, the coupled 318 models with uninformative prior outperformed other model variants. Although for some 319 regions, the improvement in forecast score for the uninformative prior variants over 320 other coupled variants was less pronounced (Regions 1 and 7), these models never 321 appeared to be inferior to the other variants. In this study, we have described our participation in a prospective forecasting challenge. 336 Although we drew on results from a large set of mechanistic models, our single forecast 337 for each metric was made after choosing between available model results for that metric 338 in that week and was therefore somewhat subjective. We performed poorly at the start 339 of the competition when our mechanistic models consistently over-estimated incidence. 340 However, during the middle phase of the season, our models produced less biased Onset Week Peak Week Peak Intensity there has been a gradual progression over time such that objective forecasts are 356 becoming more accurate than subjective forecasts. We note also that although we 357 describe the subjective process as it was conducted, we also provide a thorough 358 retrospective assessment of the predictive performance of each model variant. they explicitly represent spatial structure [20,21]. Given that the model structure we 365 used to represent space was relatively coarse [12], further work is warranted to test how 366 forecast accuracy at larger spatial scales can be improved by models that include 367 iteratively finer spatial resolution.

368
In submitting forecasts based on uninformed mechanistic priors using an uncoupled 369 model at the start of the season, we failed to learn lessons that have been present in the 370 influenza forecasting literature for some time [17]. Historical variance is low during the 371 start of the season and the growth pattern is not exponential. Therefore, it would be 372 reasonable to forecast early exponential growth only in the most exceptional of 373 circumstance, such as during the early days of a pandemic. Model solutions that are 374 anchored to the historical average in some way, such by the use of augmented data for 375 not-yet-seen time points, are likely to perform better.

376
Models that included humidity forcing performed better on average in our analysis 377 of all historical data than equivalent models that did not include those terms, especially 378 for the forecasting of ILI 1-to 4-weeks ahead [22]. However, we did not see similar support for the inclusion of school vacation terms improving accuracy, which has been 380 suggested in a retrospective forecasting study at smaller spatial scales (by this 381 group) [23]. The lack of support for school vacations in the present study could indicate 382 that the prior work was under-powered or that sampling and then averaging of school 383 vacation data across large spatial scales degrades its contribution to forecasts. 384 We found the experience of participating in a prospective forecasting challenge to be 385 different to that of a retrospective modeling study. The feedback in model accuracy was 386 Eq. 11 allows the transmission rate to depend on time using three terms. The first 403 time dependent term, F 1 (t), allows for a dependence of the transmission rate on 404 specific-humidity. In temperate regions specific humidity has a seasonal oscillation with 405 a minimum in the winter and a maximum in the summer. We follow Shaman et al. [24] 406 and relate the local SH, q j (t), to the reproduction number as: In the above equation, and unlike the work published by others, the values of the 408 parameters a and ∆ R are fitted. The effect of specific humidity can be combined with 409 that of school vacation which is discussed in the following sub-section. The second term in Eq. 11 allows the transmission rate to depend on the weekly 413 school vacation schedule (p j (t)) and we implement is as:  average. For all ten regions, and in agreement with the results for the nation, the 438 coupled procedure performs better than the uncoupled.
targets: onset week, peak week and peak intensity for the ten HHS regions 441 averaged over the 2010-2016 seasons.

442
Top, middle and bottom rows are onset week, peak week and peak intensity. The

443
NULL result is calculated using the historic mean regional profile. For all three targets, 444 and all ten regions, the coupled method does better than the uncoupled with the details 445 of the prior and force of infection models depending on the region.