Abstract
The polymicrobial context of chronic infection has received increasing attention due to widespread use of microbiome sequencing technology. However, clinical microbiology analysis of infection samples in hospitals continues to focus only on established human pathogens. This disconnect between diverse ‘infection microbiomes’ and limited clinical microbiology profiling leaves open the possibility that important risk markers are being unexploited during infection management. To address this disconnect, we focus on lung infections in people with Cystic Fibrosis (CF). A cohort of CF patients (N=77) were recruited for this study. We collected health information (age, BMI, lung function) and clinical microbiology records for each patient. We also collected sputum samples during a period of clinical stability, and determined lung microbiome compositions through 16S rDNA sequencing. We use a regularized linear regression algorithm (ElasticNet) to select informative features to predict lung function. We find that models including whole microbiome quantitation outperform models trained on pathogen quantitation alone, with or without the inclusion of patient metadata. Our most predictive models retain key pathogens as negative predictors (Pseudomonas, Achromobacter) along with established correlates of CF disease state (age, BMI, CF related diabetes). In addition, our models select specific non-pathogen taxa (Fusobacterium, Rothia) as positive predictors of lung health. Our analysis does not address causality, leaving open whether these non-pathogen taxa are playing an active role in promoting lung health (e.g. by suppressing pathogens), or are simply informative biomarkers of patient health (orthogonal to age, BMI, etc). Our results support a reconsideration of clinical microbiology pipelines to ensure the provision of the most informative data to guide clinical practice.
Introduction
Bacterial infections in otherwise healthy people are often rapidly resolved by effective immune responses, independent of antibiotic treatment. In some cases however, infections fail to clear even with appropriate drug treatment, permitting the establishment of chronic (long-lasting) infection and imposing elevated morbidity and mortality risk on affected individuals.1 Chronic infections are a rising burden on global health-care systems as populations at risk (e.g. diabetics) grow.2 Deficits in host barrier defenses and/or immune function in these at-risk people provide an opening for the establishment of infection, which are further compounded by changes in pathogen growth mode (e.g. biofilm formation3 and the accumulation of other pathogens to form complex multispecies communities.4
The polymicrobial context of chronic infection has received increasing attention due to advances in microbiome sequencing technology. However, clinical microbiology analysis of infection samples in hospitals continues to focus only on the ‘usual suspects’ of established human pathogens – a relatively short list of organisms for which there is a long established literature on risk to patient health. This disconnect between diverse ‘infection microbiomes’ and limited clinical microbiology profiling leaves open the possibility that important risk markers are being unexploited during infection management.
To address this potential disconnect, we focus on chronic lung infections in people with cystic fibrosis (CF). Cystic fibrosis is an autosomal recessive disease characterized by defective lung mucociliary clearance and an accumulation of viscous mucus in the patient’s lung.5–7 This environment provides both nutrients for bacterial growth and protection from host immune responses,8–11 facilitating long-term microbial infections.12–15 CF-lung infections have historically been studied as single-species phenomena, focused on a few key pathogens that are routinely identified by clinical microbiology labs (e.g. Pseudomonas aeruginosa and Staphylococcus aureus). However, the advent of inexpensive 16S rDNA sequencing has caused a major shift in CF lung microbiology research. Sequencing of expectorated sputum samples has revealed diverse communities of tens to hundreds of taxa, including numerous non-pathogenic bacteria.13,16
CF lung microbiome studies have linked lung microbiome composition to disease progression and overall patient health17,18 and found three key patterns: (1) severe disease is associated with pathogen dominance and loss of microbiome diversity in cross-sectional studies;17–19 (2) loss of microbiome diversity correlates with declining lung function in longitudinal studies;20 (3) prevalence of non-pathogenic fermentative anaerobes (Veillonella, Prevotella, Fusobacterium) is associated with higher lung function.21,22
While these correlative observational results are supported across multiple studies, their causal interpretation is the subject of some controversy. One line of argument proposes that these results reflect community ecological processes at play within the lung, where networks of facilitatory and inhibitory interactions among species govern community structure and subsequent harm to the host.12,23,24 Conversely, the counter argument is that these patterns are simply the result of oral anaerobe contamination during sample collection.25,26 Under this contamination model, increasing pathogen load compared to a constant background of oral microbiome contamination generates a spurious link between oral microbes, microbiome diversity, and patient health, assuming a causal relationship between pathogen burden and health.25 These conflicting hypotheses highlight the uncertainty in the role of taxa present in sputum, and the limitations of observational studies in establishing causal inference.
In the current study, we side-step the question of causal inference, and instead focus on the degree to which expectorated sputum microbiome data (inclusive of potential oral contaminants) is informative of patient lung health. We hypothesize that the addition of non-pathogen data provides additional information that can improve the predictability of patient health outcomes, compared to established pathogen data alone. To address this hypothesis we apply machine learning tools to an integrated lung microbiome and electronic medical record dataset for a cohort of 77 CF patients. We find that compared to the benchmark of pathogen data alone, prediction of lung function was improved by the addition of non-pathogen taxa.
Results
Clinical and microbiome data summary
In total, we obtained sputum expectorates from 77 CF patients with varying lung function. We measure lung health by percent predicted forced expiratory volume in 1 second (ppFEV1), and stratify ppFEV1 into four categories from Normal to Severe; a summary of patient information is presented in Table 1. As expected, age varies significantly across lung function category (ANOVA; p < 0.01). Blood glucose control (HbA1c levels) and bacterial load (log-scaled) are also significantly associated with lung function (ANOVA, p < 0.01, p < 0.05 respectively). Unsurprisingly, culture-based detection of Pseudomonas aeruginosa also strongly associated with lung function (ANOVA, p < 0.001).
Turning to our 16S microbiome analyses, we found that the majority (>90%) of reads mapped to one of 13 genera (Fig 1a), highlighting the relatively species-poor nature of sputum microbiome. Fig 1a illustrates that these 13 dominant genera are a mixture of recognized CF pathogens (red) and orally derived bacteria (black).
The predominant pathogenic species was Pseudomonas aeruginosa, accounting for 30.4% of all reads. PA sequences were detected in every patient sample. Other established CF pathogens (Staphylococcus, Achromobacter, Haemophilus, and Burkholderia) collectively represented a further 19.3% of all reads, while oral taxa account for over 45% of all reads (Fig 1). Total pathogen and oral taxa abundance were both found to vary significantly (p<<0.001) with lung function (Table 1).
Microbiome Composition Correlates with Lung Function
To explore the relationship between the composition of sputum samples and patient health, we next analyzed microbiome composition across broad lung function categories. Fig 2a shows composition plots highlighting the relative abundance of six canonical CF pathogens. As expected, Pseudomonas was more prevalent in people with severe lung function, whereas samples from people with normal lung function contain more Haemophilus and non-pathogen taxa (grey). However, the data also illustrate that identification of pathogens with lower lung function is not clear cut – there are multiple individuals whose sputum is dominated by pathogens and yet have normal lung function. Conversely, there are multiple individuals with low prevalence of any or all pathogens and yet suffer from heavily impaired lung function. We next examined the composition of the non-pathogen component of sputum samples (the gray bar in Fig 2a), and found a striking consistency across individuals regardless of lung function (Fig 2b). Veillonella and Streptococcus consistently dominate the non-pathogen microbiome component, regardless of lung health or pathogen status. Integrating across pathogen and non-pathogen components, we find that normal lungs are more diverse than severe ones (p<0.01, Fig 2c), in line with multiple other studies.27,28 Turning to ordination plots (principle coordinates analysis, Fig 2d) we find ppFEV1 was significantly associated with microbiome composition (Mantel test, r=0.195, p<0.001).
Integrating microbiome and patient meta-data
Fig 2 illustrates patterns of association between microbiome data and a critical patient health outcome. However multiple confounding variables are not addressed, such as patient age, BMI or CF-related diabetes (CFRD). To look more globally at the associations between our multiple clinical and microbiome metrics, we generated a clustered correlation matrix across all variable pairs (Fig 3). We found a complex autocorrelation structure, with many expected consistencies. Heirarchical clustering notably groups 16S quantitation variables with patient metadata and clinical microbiology results. Unsurprisingly, FEV1 and ppFEV1 cluster together and are anticorrelated with ppFEV1 decline rate (average rate of decline in ppFEV1 since birth). Additionally, 16S quantitation results for Pseudomonas, Staphylococcus, Burkholderia, and Achromobacter cluster with their respective culture-based clinical microbiology results. This does not hold for Stenotrophomonas, which may be due to its infrequent detection.
Hierarchical clustering identified two large clusters of correlated variables. One correlated with ppFEV1, and included alpha diversity as well as 16S quantitation of Fusobacterium, Haemophilus, and Neisseria. The other anticorrelated with ppFEV1, and included ppFEV1 decline, pathogen abundance, CFRD and 16S quantitation of Pseudomonas and Achromobacter.
Predicting Lung Function
The hairball correlation matrix in Fig 3 highlights the statistical challenges in addressing our underlying question of identifying meaningful predictors of patient outcomes. First, there are many potential predictors (Fig 3 shows 44 patient parameters out of 86 total, including 59 bacterial genera), and second, there are substantial and at times strong correlations among these parameters. Further compounding the challenge, we have relatively few independent patient observations (N=77) compared to the number of potential predictors.
To address these challenges, we first restrict our microbiome analysis to only the top 23 genera in our dataset, to focus on commonly encountered taxa only (Fig 1). We also calculate three additional features: % pathogen, % oral taxa, and Shannon diversity. Next, we use machine learning methods to provide a principled basis for the retention of meaningful predictors. To address compositionality of 16S data, we incorporate total bacterial load (universal 16S primer qPCR) as a predictor. In addition, we use a centered log-ratio (clr) transform on our genus-level relative abundance data before standardizing to mean zero, unit variance inputs.
Our machine learning pipeline is outlined in Fig 4, illustrating our approach to assessing the relative predictive power of different subsets of patient data (patient electronic records and sputum 16S microbiome data). Our null hypothesis, following the work of Jorth et al. and others25,26 is that clinical microbiology provides an adequate explanatory basis for lung function outcomes, and more specifically that the addition of non-pathogen 16S data does not improve predictive ability. We expect that the addition of patient metadata (age, BMI etc) will improve our ability to predict lung function due to the progressive nature of CF, however our null hypothesis predicts that the addition of non-pathogen microbiome data will not improve predictive power, with or without the inclusion of meta-data.
To illustrate our machine learning approach, we begin with the model output trained on the full dataset (all 16S and metadata predictors, Fig 5). Fig 5a plots predicted versus observed lung function, for both the training dataset (data on 53 patients used to train model parameters) and the test dataset (data on 24 patients held back during model training). The consistency of training set and test set R2 values suggests the model is not overfitting the training sets. Fig 5b highlights the parameters retained in the predictive model and their weighting (blue for positive predictors, red for negative predictors). In Fig. S1 we illustrate the performance of models trained on subsets of the data, all of which show lower R2 values than the model trained on all data (Fig 5). However the predictive features selected in the integrated model are broadly consistent with models trained on each dataset individually. Positive and negative predictors selected in the pathogen-only and all 16S models (Fig S1b) were also selected in the all data model. While bacterial load and CFTR mutation type were informative in a metadata-only model (metadata, Fig S1b), the all data model does not select these features. We hypothesize that mutation type and bacterial load, share information with Rothia quantitation, and indeed find these features closely clustering in our correlation analysis (Fig 3).
Our analysis in Figures 5 and S1 suggest that the addition of non-pathogen 16 data improves model performance as evidenced by improvements in R2, and flags specific non-pathogen taxa as potential predictors. To more carefully assess these suggestions, we computationally augment our training datasets using 1000-fold bootstrap resampling and train models on each of the bootstrapped datasets.
Figure 6a shows the relative model prediction performance (measured by mean squared error) for each of the five input data sources, and plots bootstrap-generated confidence intervals (boxplots) in addition to single points for the non-bootstrapped train/test model approaches from Figure 5 and S1 (black points). To establish a performance baseline, we design a non-informative (randomized) input dataset by within-feature shuffling the entries from the ‘all data’ input set, scrambling between-feature correlations while preserving the mean zero, unit variance within-feature structure. All models using patient metadata or microbiome data outperform our negative control baseline.
To address the key question of relative model performance, we find that the addition of non-pathogen taxa significantly improves predictive ability (significantly reduces bootstrapped MSE; Figure 6a), with or without the addition of patient meta-data. Models trained on all 16S quantitation overall significantly outperform models trained only on pathogen quantitation. Interestingly, while microbiome data and metadata-trained models perform equivalently, combining the two datasets permits greater model performance. Looking broadly across models, we find reasonable consistency in positive and negative predictor selection between our non-bootsrapped train/test (black dots) and our bootstrapped (boxplots) models (Fig 6c-g).
We find multiple features selected across all training sets. Pseudomonas, Achromobacter, age, and diabetic status are consistently selected as negative predictors, while Haemophilus, Fusobacterium, Rothia, oral taxa abundance, and BMI are consistently positive predictors. All informative features selected in the independent models (Fig 6c-e) were also selected in the integrated model (Fig 6g). A small subset (< 50%) of the bootstrapped models also selected a handful of oral taxa, bacterial load, and CFTR mutation type as positive predictors of lung function (Fig 6g, gray boxplots). However, a majority of bootstrapped models and the train/test model did not select these as informative features.
As an additional check against overfitting, we obtain ranges of model errors (measured by mean squared error of predicted ppFEV1 values) using leave-one-out cross validation (Fig 6b). We do not find significant differences between cross-validated model errors across our training sets. Median cross-validated errors were consistently lower than the train/test split model error (Figure 6b), suggesting that our models are not overfitting.
Discussion
People with CF face the challenge of managing long-term chronic infections. Current management practice is driven by clinical microbiology identification of specific pathogens in expectorated sputum samples, alongside measures of overall health status (lung function, BMI, CF-RD). In the current study, we used 16S sequencing to assess sputum microbiome content more broadly, and ask whether the addition of non-pathogen taxa improves our ability to predict patient lung health, with or without the inclusion of patient health data. To address this question we applied machine learning tools to an integrated 77 patient lung microbiome and electronic medical record dataset. Our analysis revealed that the addition of non-pathogen data improves prediction of patient health, with the most accurate models selecting patient metadata, pathogen quantitation, and non-pathogen information. Our inclusive ‘all data’ models additionally point to a predictive role for specific non-pathogen taxa, in particular the oral anaerobe genera Rothia and Fusobacterium.
Despite the significant contribution of non-pathogen data, our results are still broadly consistent with what might be termed the ‘traditional’ view of CF microbiology. Established CF pathogens (P. aeruginosa, S. aureus, H. influenzae, B. cenocepacia) are the major drivers of patient outcomes, as evidenced by substantial improvement in predictive outcomes whenever we include pathogen data (Fig 6a), and the by comparison relatively weak contribution of the addition of non-pathogen taxa. Note that we specifically use quantitative 16S measures of pathogen composition to provide a level playing field in the comparison of pathogen and non-pathogen predictive contribution. Fig 3 highlights that quantitative 16S and qualitative (presence/absence) clinical microbiology data are in general agreement.
The traditional role of CF pathogens as the central predictors of patient outcomes has been challenged over the past decade by the advent of microbiome sequencing. In a CF context, extensive surveys have documented an association between CF lung function and microbiome diversity, also evident in the current study (Fig 2). These results at face value suggest a biological role for these non-pathogen taxa, potentially competing with29 or facilitating30 pathogen taxa and therefore indirectly shaping disease outcomes. Jorth et al. recently published a forceful rejection of this ‘active microbiome’ view, stressing the causal role of changing pathogen densities in shaping disease outcomes, viewing shifting diversity metrics as a simple statistical artifact of shifting pathogen numbers against a roughly constant oral contamination background.25 Our analyses provide some support for this view, in particular the constancy of the non-pathogen microbiome across patients (Fig 2b) and the lack of substantial predictive improvement on addition of non-pathogen data (Fig 6b). However on a more fine-scaled analysis we see that specific non-pathogen taxa are retained in our most explanatory models, alongside pathogen taxa.
Our ‘all data’ models highlight Rothia and Fusobacterium as positive predictors of lung function across our 77 patients, in models that already take into account pathogen data, age and BMI. The retention of these specific taxa in both this full model and in partial models (Fig 6b-c) suggests that these taxa provide potentially valuable predictive information on current patient health. Of course, this analysis does not allow inference to causal mechanism or even direction of causality. It is entirely possible that these taxa are simply bio-markers of dimensions of improved health that are largely independent of age, BMI, and other established positive predictors that are already accounted for in the model. It is also possible that these specific taxa play a more active causal role, for instance holding specific pathogens at bay via competitive interspecific mechanisms.31
Interestingly, our ‘all data’ models also highlight Haemophilus, a canonical CF pathogen, as a positive predictor of lung function. Haemophilus influenzae infections are most common in younger CF patients,8,32 hence we would expect a positive association in a model that is not controlled for age (Fig 6c, 6d). However we see that the positive weighting on Haemophilus is retained in models that also account for age as a positive predictor of lung function (Fig 6f-g). A second possibility is that the positive weighting of Haemophilus is due to pathogen-pathogen competition and the relatively less severe nature of Haemophilus infections in adults (i.e., Haemophilus is ‘best of a bad job’). Fig 2a illustrates that we only appreciably detect two and rarely three coexisting pathogens of the six we find across all patients. The relatively depauperate pathogen communities implies that Haemophilus presence coincides with the absence of other more severe pathogens – and indeed we see a dominance of negative correlations among pathogens (Fig 3). In this context we cannot preclude a protective role of Haemophilus against more severe pathogens in older patients.
A caveat of this analysis is the dependency of machine learning performance and robustness on particular distributions of data, and the failure of linear algorithms such as LASSO and ElasticNet on microbiome-like data.33–35 This is in part due to the compositionality constraint of microbiome data, which can be mitigated by using absolute quantitation.36 However, training on absolute abundances introduces additional caveats, as order-of-magnitude differences in qPCR sample quantitation can in turn over-represent samples with higher bacterial loads. We address these issues by using a centered-log transform on relative abundance data and including log-scaled bacterial load as a potential feature to select. While a small minority of bootstrapped models selected bacterial load as a positive predictor (Fig 6c, Metadata + All 16S Data), the majority of models did not. This further suggests that the majority of microbiome information is encoded in the relative ratios of taxa abundance, which is broadly consistent with previous findings.25,26
Finally, our study is limited to a cross-sectional analysis, limiting us to making predictions on lung function state at the same time-point as microbiome sample and EMR collection. Assessing and refining our predictive machine learning algorithms on subsequent lung function data is an important future goal. Our primary objective is to predict future disease states and preemptively identify patients in need of medical intervention using early warning microbiome markers. To this effect, we plan to continue our analysis on a cohort of patients across time to evaluate predictive capacity for future health status.
In summary, our study finds that inclusion of non-pathogenic taxa significantly improves model prediction accuracy of patient health status. We identify two oral-derived taxa (Fusobacterium, Rothia) that are independently informative of lung function, which may be either biomarkers or potential probiotics. Our results call attention to the potential predictive utility of oral microbes (regardless of their functional roles) in the clinical assessment of CF patient health.
Methods
Subjects
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and national research committees. Authorization was obtained from each patient enrolled according to the protocol approved by the Emory University Institutional Review Board (IRB00010219 for adult and IRB00002161 for pediatric patients).
Sample collection
Expectorated sputum samples were obtained from the CF-BR at Children’s Healthcare of Atlanta and Emory University Pediatric CF Discovery Core from January 2015 to August 2016. De-identified patient information including age, sex, height, BMI, CFTR genotype, degree of glucose tolerance (HbA1c), and ppFEV1) were obtained (Table 1). Among these CF patients, 39 were diagnosed with CF-related diabetes patients (CFRD) by a CF endocrinologist. HbA1c value was missing for one CFRD subject.
All patients were clinically stable, defined as having less than a 10% change in ppFEV1 over the previous year with no medication changes for three weeks prior to sputum collection. Upon collection, sputum samples were stored and transported according to Emory CF-Biospecimen Registry protocols. Briefly, samples were diluted 1:3 (mass:volume) with PBS supplemented with 50 mM EDTA. Diluted samples were then homogenized by being repeatedly drawn through a syringe and 18-gauge needle. The resulting sputum homogenates were aliquot and stored at −80 °C until all 77 samples were collected; these were then sent to MR DNA Lab (Shallowater, TX) for DNA extraction, sequencing library preparation, Miseq sequencing, and absolute 16S quantitation. Clinical microbiology results were additionally obtained on the sputum sampling date.
DNA extraction and 16S sequencing
DNA was purified from sputum homogenate with the MoBio Power Soil kit (MoBio, Carlsbad, CA). The V4 region of the resulting DNA was amplified with the 16S universal primers 515F (5’-GTGCCAGCMGCCGCGGTAA-3’) and 806R (5’-GGACTACHVGGGTWTCTAAT-3’). A single-step 30 cycle PCR integrating sequencing amplification and library adapter/barcode attachment was performed using the HotStarTaq Plus Master Mix Kit (Qiagen, USA) by first incubation at 94 °C for 3 minutes, followed by 28 cycles of 94 °C for 30 seconds, 53 °C for 40 seconds and 72 °C for 1 minute, followed by a final elongation step at 72 °C for 5 minutes. Amplification products were then normalized, pooled and purified using calibrated Ampure XP beads for Illumina Miseq sequencing.
Bioinformatics pipeline
Illumina Miseq sequencing generated in a total of 10,603,544 sequences, with an average of 137,708 sequences per sample (minimum 76,281, maximum 191,868). All sequence processing was done through QIIME2 2018.2.0. Raw sequences were firstly de-multiplexed and quality filtered on a per-nucleotide basis (min quality: 4, window: 3, min length fraction: 0.75, max ambiguous: 0). Reads were denoised using the deblur plugin, and the sequences were trimmed at the length of 250 bp (sample stats: T, mean error: 0.005, indel_prob: 0.01, indel_max: 3, min_reads: 10, min_size: 2, jobs_to_start: 1). Taxonomic assignments were classified against both the SILVA and greengenes database and assigned based on their highest taxonomic resolution. Discrepancies were resolved manually through BLAST and comparing against the non-redundant NCBI sequence database.
Based on taxonomic information, microbiome composition data was obtained for every sputum sample and a phylogenetic tree was constructed via fasttree. To correct for the variation 16S rDNA copy number among different taxa, the number of sequences per sample were divided by known 16S rDNA copy number of the genus or divided by four (average number of 16S rDNA copy number) if the information was missing.37 Samples were rarefied to 17000 reads to guarantee equal sampling for subsequent analysis.38
Statistical and Quantitative Analysis
Patient samples were binned by ppFEV1-based lung function (Normal: >80%, Mild: 80-60%, Moderate: 60-40%, Severe: <40%). Variance across lung function categories in patient metadata, clinical microbiology data, and 16S metadata was tested using ANOVA. Variation between microbiome composition and ppFEV1 was tested using Mantel tests on Bray-Curtis distances at 9999 permutations. Within-sample and among-sample diversity was calculated using the Shannon diversity index and Bray-Curtis based PCoA on 16S quantitation data agglomerated to the genus level.39 Associations between continuous variables were tested using Spearman correlations. To mitigate compositional effects, 16S data were center-log transformed prior to all analyses. A full pairwise correlation matrix was calculated, with rows and columns ordered by hierarchical clustering.40
Machine Learning
We use ElasticNet to fit regularized linear models predicting lung function (ppFEV1) from patient metadata, microbiome composition, and clinical microbiology results.41 All input features were standardized to mean 0 variance 1 prior to model training. We create 6 input datasets based on information source – Clinical Micro, CF Pathogens, Other Taxa, All 16S Data (CF Pathogens + Other Taxa), Metadata, and All Data. We additionally perform within-feature shuffling on the All Data set to create a bootstrap randomized dataset with the same dimensions, serving as a non-informative negative control.
For model validation, we employ three methods. First, we use a simple 70/30 train/test holdout, where models are trained on 53 samples and used to predict on the remaining 24. Model accuracy is measured using mean squared error (MSE). Second, we perform leave-one-out cross-validation on the training set to simulate model performance on new data, and compare the resulting MSE ranges to the holdout method to assess model overfitting. Finally, we perform bootstrap reshuffling on the training set to generate 1000 new training sets, fit a new regularized linear model to each, and obtain ranges and variances for the selected model coefficient weights. We perform each validation method on each input dataset to compare between information sources. We identify important predicting variables as ones with nonzero median coefficient weights selected across all bootstrap models.
Acknowledgements
We would like to thank Karan Kapuria and Eunbi Park for help with the development of our bio-informatic pipeline, and Peng Qiu for advice on our machine learning approach. We also thank the CDC (BAA 2016-N-17812, BAA 2017-OADS-01), the NIH (HR56L142857, R21AI143296) and the Cystic Fibrosis Foundation (BROWN19I0) for funding, as well as the Cystic Fibrosis and Airways Disease Research and Children’s Healthcare of Atlanta for Initial pilot funding to Drs. Stecenko and Goldberg. Sputum samples were obtained from the CF-BR at Children’s Healthcare of Atlanta and Emory University Pediatric CF Discovery Core.