PT - JOURNAL ARTICLE AU - Sebastian Funk AU - Anton Camacho AU - Adam J. Kucharski AU - Rachel Lowe AU - Rosalind M. Eggo AU - W. John Edmunds TI - Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area Region of Sierra Leone, 2014–15 AID - 10.1101/177451 DP - 2018 Jan 01 TA - bioRxiv PG - 177451 4099 - http://biorxiv.org/content/early/2018/11/23/177451.short 4100 - http://biorxiv.org/content/early/2018/11/23/177451.full AB - Real-time forecasts based on mathematical models can inform critical decision-making during infectious disease outbreaks. Yet, epidemic forecasts are rarely evaluated during or after the event, and there is little guidance on the best metrics for assessment. Here, we propose an evaluation approach that disentangles different components of forecasting ability using metrics that separately assess the calibration, sharpness and unbiasedness of forecasts. This makes it possible to assess not just how close a forecast was to reality but also how well uncertainty has been quantified. We used this approach to analyse the performance of weekly forecasts we generated in real time in Western Area, Sierra Leone, during the 2013–16 Ebola epidemic in West Africa. We investigated a range of forecast model variants based on the model fits generated at the time with a semi-mechanistic model, and found that good probabilistic calibration was achievable at short time horizons of one or two weeks ahead but models were increasingly inaccurate at longer forecasting horizons. This suggests that forecasts may have been of good enough quality to inform decision making requiring predictions a few weeks ahead of time but not longer, reflecting the high level of uncertainty in the processes driving the trajectory of the epidemic. Comparing forecasts based on the semi-mechanistic model to simpler null models showed that the best semi-mechanistic model variant performed better than the null models with respect to probabilistic calibration, and that this would have been identified from the earliest stages of the outbreak. As forecasts become a routine part of the toolkit in public health, standards for evaluation of performance will be important for assessing quality and improving credibility of mathematical models, and for elucidating difficulties and trade-offs when aiming to make the most useful and reliable forecasts.