Using networks to combine "big data" and traditional surveillance to improve influenza predictions

Sci Rep. 2015 Jan 29:5:8154. doi: 10.1038/srep08154.

Abstract

Seasonal influenza infects approximately 5-20% of the U.S. population every year, resulting in over 200,000 hospitalizations. The ability to more accurately assess infection levels and predict which regions have higher infection risk in future time periods can instruct targeted prevention and treatment efforts, especially during epidemics. Google Flu Trends (GFT) has generated significant hope that "big data" can be an effective tool for estimating disease burden and spread. The estimates generated by GFT come in real-time--two weeks earlier than traditional surveillance data collected by the U.S. Centers for Disease Control and Prevention (CDC). However, GFT had some infamous errors and is significantly less accurate at tracking laboratory-confirmed cases than syndromic influenza-like illness (ILI) cases. We construct an empirical network using CDC data and combine this with GFT to substantially improve its performance. This improved model predicts infections one week into the future as well as GFT predicts the present and does particularly well in regions that are most likely to facilitate influenza spread and during epidemics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining*
  • Epidemics
  • Humans
  • Influenza A Virus, H1N1 Subtype
  • Influenza, Human / epidemiology*
  • Internet*
  • Models, Statistical
  • Population Surveillance / methods*
  • United States / epidemiology