PT - JOURNAL ARTICLE AU - Taj Azarian AU - Pamela P Martinez AU - Brian J Arnold AU - Lindsay R Grant AU - Jukka Corander AU - Christophe Fraser AU - Nicholas J Croucher AU - Laura L Hammitt AU - Raymond Reid AU - Mathuram Santosham AU - Robert C Weatherholtz AU - Stephen D Bentley AU - Katherine L O’Brien AU - Marc Lipsitch AU - William P Hanage TI - Prediction of post-vaccine population structure of <em>Streptococcus pneumoniae</em> using accessory gene frequencies AID - 10.1101/420315 DP - 2018 Jan 01 TA - bioRxiv PG - 420315 4099 - http://biorxiv.org/content/early/2018/09/18/420315.short 4100 - http://biorxiv.org/content/early/2018/09/18/420315.full AB - Predictions of how a population will respond to a selective pressure are valuable, especially in the case of infectious diseases, which often adapt to the interventions we use to control them. Yet attempts to predict how pathogen populations will change, for example in response to vaccines, are challenging. Such has been the case with Streptococcus pneumoniae, an important human colonizer and pathogen, and the pneumococcal conjugate vaccines (PCVs), which target only a fraction of the strains in the population. Here, we use recent advances in knowledge of negative-frequency dependent selection (NFDS) acting on frequencies of accessory genes (i.e., flexible genome) to predict the changes in the pneumococcal population after intervention. Implementing a deterministic NFDS model using the replicator equation, we can accurately predict which pneumococcal lineages will increase after intervention. Analyzing a population genomic sample of pneumococci collected before and after vaccination, we find that the predicted fitness of a lineage post-vaccine is significantly and positively correlated with the observed change in its prevalence. Then, using quadratic programming to numerically solve the frequencies of non-vaccine type lineages that best restored the pre-vaccine accessory gene frequencies, we accurately predict the post-vaccine population composition. Additionally, we also test the predictive ability of frequencies of core genome loci, a subset of metabolic loci, and naïve estimates of prevalence change based on pre-vaccine lineages frequencies. Finally, we show how this approach can assess the migration and invasion capacity of emerging lineages, on the basis of their accessory genome. In general, we provide a method for predicting the impact of an intervention on pneumococcal populations and other bacterial pathogens for which NFDS is a main driving force.