Comparative Analysis of 16S rRNA Gene and Metagenome Sequencing in Pediatric Gut Microbiomes

The colonization of the human gut microbiome begins at birth, and over time, these microbial communities become increasingly complex. Most of what we currently know about the human microbiome, especially in early stages of development, was described using culture-independent sequencing methods that allow us to identify the taxonomic composition of microbial communities using genomic techniques, such as amplicon or shotgun metagenomic sequencing. Each method has distinct tradeoffs, but there has not been a direct comparison of the utility of these methods in stool samples from very young children, which have different features than those of adults. We compared the effects of profiling the human infant gut microbiome with 16S rRNA amplicon vs. shotgun metagenomic sequencing techniques in 338 fecal samples; younger than 15, 15–30, and older than 30 months of age. We demonstrate that observed changes in alpha-diversity and beta-diversity with age occur to similar extents using both profiling methods. We also show that 16S rRNA profiling identified a larger number of genera and we find several genera that are missed or underrepresented by each profiling method. We present the link between alpha diversity and shotgun metagenomic sequencing depth for children of different ages. These findings provide a guide for selecting an appropriate method and sequencing depth for the three studied age groups.

a PCR amplification step, this type of profiling requires a relatively low number of sequenced reads 51 per sample to maximize identification of rare taxa and is generally cheaper than shotgun 52 metagenomic sequencing. 53 Shotgun metagenomics indiscriminately sequences the entire metagenome, and therefore 54 typically requires more sequenced reads per sample to find unique taxonomic identifiers. This need 55 for increased sequencing depth carries a higher cost (Comeau et al., 2017), but yields information on 56 many genes rather than only one. This substantially increases resolution in taxonomic assignments -57 metagenomic profiling often provides species-level assignment where amplicon sequencing is 58 restricted to identifying genera (Ranjan et

Metagenome data processing and analysis 142
Metagenomic data were analyzed using bioBakery workflows with all necessary dependencies and 143 default parameters (McIver et al., 2018). Briefly, KneadData (v 0.7.10) was used to trim and filter 144 raw sequence reads, and to separate human and 16S ribosomal gene reads from bacterial sequences in 145 both fecal and oral samples. Samples that passed quality control were taxonomically profiled to the 146 genus level using MetaPhlAn (v 3.0.7), which uses alignment to a reference database of "marker 147 genes" to identify taxonomic composition (Beghini et al., 2020). 148

Statistical Analysis 149
Statistical analyses were carried out in R (4.0.3). vegan (v 2.5-6) was used for all alpha-diversity 150 calculations: Shannon diversity index (Shannon, 1948) (alpha diversity measurement of evenness and 151 richness), evenness (how homogeneous the distribution of taxa counts are), and richness (number of 152 taxa in a community). Pairwise Bray-Curtis dissimilarity was used to assess beta-diversity, or the overall variation between each sample (Bray and Curtis, 1957). The Bray-Curtis dissimilarity metric 154 compares two communities based on the number or relative abundance of each taxon present in at 155 least one of the communities. When we calculated these values, we assumed that the set of 156 dissimilarities calculated across a group was independent, even when the same child was paired to 157 other children multiple times. These distance matrices were used for Principal Coordinates Analysis 158 (PCoA) to create ordinations. The two principal components that explained the most variation were 159 used to create biplots ( Figure S2). 160 Univariate comparisons were performed in two-sample two-tailed t-tests when we could 161 assume normality, and Wilcoxon Signed Rank tests when we could not. P-values of less than 0.05 (or 162 the equivalent after Benjamini-Hochberg false discovery rate correction (Benjamini and Hochberg, 163 1995)) were considered statistically significant. Mixed effects linear models in lme4 were used to 164 analyze data from subsampling results, to account for the fact that multiple subsamples were 165 generated from each sample. Shannon ~ 1.58 + 5.21x10 -4 *read depth -3.79x10 -1 *less than 15 166 months -4.38x10 -2 *older than 30 months -1.50x10 -4 *read depth: less than 15 months -1.56x10 -167 4 *read depth:older than 30 months. 168 2.7 Comparing missing and underrepresented genera in 16S rRNA to shotgun metagenomics 169 datasets 170 A genus was classified as being unique to a particular profiling method if reads were only assigned to 171 it through one method. Taxa that could not be resolved down to the genus level (taxonomic 172 assignments containing the phrases "unclassified," "unidentified," "group," or "uncultured") were 173 removed prior to calculating relative abundance diversity, and all downstream metrics. Genera that 174 only occurred in one but not the other method were classified as uniquely identified by 16S rRNA 175 profiling or shotgun metagenomics. We found the intersection of genera by identifying microbes that 176 were found at least once by both methods. 177 We used Wilcoxon Signed Rank tests to compare the abundances of microbes that were found 178 by both methods. This analysis was limited by the direct comparison of relative abundances instead 179 of direct counts. Because 16S rRNA profiling was able to identify more taxa at the genus level, this 180 meant that the relative abundances of its organisms were systematically lower. 181 2.8 Analyzing primer coverage 182 TestPrime 1.0 (Klindworth et al., 2013;Ludwig et al., 2004) was used to perform in silico PCR to 183 investigate how well certain primer pairs align to microbes in the SILVA database. We entered our 184 forward and reverse primers (515FB and 926R) into the TestPrime web-tool provided by SILVA 185 (Quast et al., 2013) to analyze the percent primer coverage of microbes found only with metagenomic 186 sequencing, but not by amplicon sequencing. Coverage is defined as the percentage of matches for a 187 particular taxonomic group (# of matches / (total # of mismatches + matches). The primers described 188 in Methods 2.4 were compared to sequences found within the SSU r138.1 SILVA database. A single 189 nucleotide mismatch between each primer and 16S rRNA gene sequence was considered a mismatch 190 for that organism. Once the percent coverage was calculated, we compared the average coverage of 191 microbes uniquely found by shotgun metagenomics, 16S rRNA profiling, or both methods. Some 192 genera identified uniquely by shotgun metagenomics were not as identified as hits to the primer, 193 despite being in the SILVA database. Their alignment was manually entered to be 0% for 194 downstream analysis. 195

Generating phylogenetic trees 196
The union of all genera that were identified by either 16S rRNA gene or shotgun metagenomic 197 sequencing was used to generate a phylogenetic common tree using TimeTree (Kumar et al., 2017). 198 In addition to these genera, Thermus aquaticus was added as an outgroup. This tree was visualized 199 using the Interactive Tree of Life (iTOL) v 5.5.1, (Letunic and Bork, 2019), along with metadata that 200 described which profiling method (either 16S, shotgun metagenomics, or both) was able to identify 201 the genus Bork, 2007, 2019). For taxa that were unidentified by a particular profiling 202 method, we investigated whether or not that taxon was present in the missing database. The 203 phylogenetic tree notes taxa that would be impossible to be identified by that method, as they were 204 not present in the relevant database. 205

Exploring the effect of read depth on diversity using metagenome samples 206
We investigated the results of decreasing read depth on alpha and beta-diversity by resampling 207 shotgun metagenomic reads from a subset of children within the RESONANCE cohort that had 208 deeply sequenced metagenomes (average 7,209,871± 2,562,647 reads). Metagenomic reads from 30 209 children were selected and 10k, 100k, 250k, 500k, 750k, and 1M reads were randomly sampled (with 210 replacement) from each child's reads. Each child was resampled at each depth four times for the 211 analysis involving RESONANCE subjects.
To investigate whether these observations were generally applicable to other childhood 213 cohorts, we performed the same subsampling analysis on the DIABIMMUNE cohort (Simre et al., 214 2016). Only a single sample for each depth was obtained for DIABIMMUNE subjects due to the 215 substantially higher number of original samples. DIABIMMUNE subjects were subsampled at depths 216 of 100k, 250k, 500k, 750k, 1M, and 10 M reads. All children were separated by developmental stage 217 (less than 15 months: n = 10, between 15 and 30 months, n = 10, over 30 months, n = 10). Reads 218 were reassigned taxonomy using MetaPhlAn (see section 2.7) and diversity was recalculated. The 219 majority of these samples subsampled at 10,000 reads had no identifiable taxa and were excluded 220 from downstream analysis. 221 3 Results 222

Alpha diversity increases with age in both 16S rRNA gene-and metagenomic-profiled 223
samples 224 First, we directly compared taxonomic profiles generated by shotgun metagenomic or amplicon 225 sequencing to assess their ability to detect poorly characterized or low abundance taxa. On average, 226 the proportion of microbes resolved to the genus level in a sample was 97.7% (SD = 1.7%) when 227 profiled by shotgun metagenomic sequencing and 78.2% (SD = 20.7%) when profiled by 16S rRNA 228 sequencing. As expected, regardless of the profiling method, the observed alpha (within-sample) 229 diversity of the gut microbiome of children increased in the first 30 months of life (Welch's t-test, p-230 value < 0.001). Given that we observed that children's microbiomes grow increasingly complex and 231 diverse, we hypothesized that any differences in ability of the profiling methods to identify less-232 abundant taxa would only be magnified with age. Consistent with this hypothesis, we found that 233 profiles created from shotgun metagenomics data had systematically lower alpha diversity than 234 profiles from 16S rRNA sequencing at the genus level across all developmental stages ( Figure 1A). 235 The mean of these differences between paired profiles increased as the children age, with the largest 236 differences observed in children older than 30 months (mean of the differences = 0.18, paired t-test, 237 p-value < 0.001). This suggests that the differences between 16S rRNA and shotgun metagenomics 238 profiling in capturing alpha diversity are amplified as children age and their microbial diversity 239 becomes increasingly complex. 240 We next examined between-sample, or beta, diversity within each of the three age groups to 241 determine if age or profiling method were associated with large between-sample differences.
Comparisons of beta diversity within children of the three groups indicated the similarity between gut 243 microbiome communities increased with age in both profiling methods. Regardless of which method 244 was used, Bray-Curtis dissimilarity, a pairwise measure of beta diversity between two communities, 245 was the smallest between children over the age of 30 months ( Figure 1B). 246 After observing differences in the two profiling methods among young children, we next 247 compared profiles generated from the different methods for the same fecal sample. If data from 248 shotgun metagenomics and 16S rRNA gene profiling both produced exactly the same gut microbial 249 profiles, we would expect that profiles from the same child's fecal sample would have a Bray-Curtis 250 dissimilarity of ~0. At a minimum, we would expect to see that the Bray-Curtis dissimilarity among 251 profiles constructed from the same stool sample would be smaller than the dissimilarity between two 252 profiles from two random children. As hypothesized, we observe that the average Bray-Curtis 253 dissimilarity among paired samples is much lower than that of unpaired samples ( Figure 1C; mean 254 difference = 0.348, Welch's t-test, p-value < 0.001). The largest differences in the paired profiles 255 were found in children less than 15 months (Figure 1C, 1D). 256

Discrepancies between 16S rRNA and shotgun metagenomics profiles 257
To further investigate the cause of the largest discrepancies in diversity between the two profiling 258 methods, we looked at biases in taxonomic representation at different taxonomic levels. At all 259 taxonomic levels, except the species level, 16S rRNA amplicon profiling identifies more taxa 260 (Figure 2A). We found that 41 families were found by both methods, while 33 and 14 were uniquely 261 identified by 16S rRNA and shotgun metagenomic profiling, respectively. At the genus level, of 202 262 genera identified across all samples, only 105 genera were identified with both amplicon and shotgun 263 metagenomic sequencing. 16S rRNA amplicon sequencing identified 63 genera not found by 264 metagenomic profiling including Acetobacter, Bacillus, Flavobacterium, Pseudomonas, and 265 Sulfitobacter, while only 34 genera were uniquely found using shotgun metagenomic sequencing, 266 such as Citrobacter, Coprococcus, Enterobacter, Gordonibacter, and Helicobacter (Figure 2B). At 267 the species level, 16S rRNA amplicon profiling was not able to resolve any taxa to the species level, 268 while shotgun metagenomics was able to identify 385 unique species. We decided to focus on 269 comparing taxonomic differences at the genus level, as that is the most specific taxonomic level in 270 which we are able to meaningfully compare the two methods. 271 After identifying genera that were found by only one of the two methods, we next 272 investigated whether there were any taxa that were systematically found at higher levels in one method versus the other. We found that Butyricicoccus was observed to have a significantly higher 274 relative abundance in 16S rRNA profiles compared to samples profiled with shotgun metagenomics 275 (Wilcoxon signed rank test for this and all microbes, p-value < 0.001 ) (Table S1)

. Similarly, 276
Romboutsia (p-value < 0.001) and Sutterella (p-value < 0.001) were found to have a higher relative 277 abundance when detected by 16S rRNA amplicon sequencing. In contrast, genera such as 278 Bifidobacterium (p-value < 0.001), Eggerthella (p-value < 0.001), and Klebsiella (p-value < 0.001) 279 systematically had higher relative abundance when detected by shotgun metagenomic techniques. After comparing two different profiling methods, we investigated the effect of reducing metagenomic 283 sequencing depth on observed alpha diversity among the three developmental groups. We selected 284 samples from a sub-group of 30 children (10 from each developmental stage) that were initially 285 sequenced at the highest depth (mean 7.2 million reads; SD = 2.6 million reads) and performed 286 random resampling of shotgun metagenomic reads at varying depths (100k, 250k, 500k, 750k and 287 1M reads). We then recalculated alpha diversity metrics (evenness, richness, and Shannon) for each 288 community of re-sampled reads after assigning taxonomy using MetaPhlAn. Figure 3A shows the 289 relationship between the evenness, richness, and sequencing depth across all the resamplings we 290 performed. Regardless of the starting community's diversity, as sequencing depth increased, 291 observed sample richness and evenness also increased ( Figure S3). For example, samples that were 292 only profiled with 100k reads had a mean Shannon Index of 1.35, whereas those sampled at 1M reads 293 had mean Shannon Index of 1.89 ( Figure 3B). 294 In addition, we observed that increasing sequencing depth affected children of different ages 295 differently. Not only did children younger than 15 months have a lower median Shannon Index when 296 we ignore sampling depth (<15 months median: 1.42, >15 months median: 1.99), the Shannon Index 297 increases more slowly with sampling depth in kids under 15 months. In particular, a mixed effects 298 linear model showed that the slope of the Shannon Index on sampling depth is significantly lower for 299 children under 15 months, compared to those between 15 and 30 months (p < 0.001), and the slope is 300 significantly lower for children between 15 and 30 months compared to those greater than 30 months 301 ( Figure 3B; p < 0.001).
While the step-wise increase in alpha diversity with sampling depth is statistically significant 303 for children less than 15 months (p < 0.001), the increase in observed alpha diversity is substantially 304 smaller than typical effect sizes in childhood microbiome studies. For instance, a recent meta-305 analyses of other studies that investigated alpha diversity of children that were and were not breastfed 306 observed average differences in Shannon Index to be 0.34 (95% Confidence Interval: [0.20, 0.48]) 307 (Ho et al., 2018), but increasing sequencing depth from 500k reads to 1M reads only increased this 308 metric by 0.06 ( Table 1, Table S2). Increasing interest in the human microbiome, especially during early child development, raises the 319 urgency of selecting appropriate methods for interrogating taxonomic and functional composition of 320 human-associated communities. Given that shotgun metagenomic sequencing is capable of providing 321 higher taxonomic resolution as well as information about gene functional potential, it is clearly 322 preferable to amplicon sequencing when working with high biomass samples such as stool and when 323 cost is not an issue. However, the higher cost of sequencing to provide sufficient sequencing depth 324 for shotgun metagenomics is relevant when resources are constrained. Because infant microbiomes 325 are substantially less diverse than adult microbiomes, we reasoned that lower sequencing depth (and 326 therefore lower cost) may enable comparable taxonomic resolution to amplicon sequencing at a 327 similar cost. 328 We, therefore, set out to analyze a group of child stool samples sequenced with both methods 329 and profiled with commonly used taxonomic-assignment tools so that direct comparisons could be 330 made. As expected, microbial communities from younger children (less than 15 months old) were 331 substantially less diverse than communities from older children, and both amplicon and shotgun 332 metagenomic sequencing with ~1.2 Gb per sample were able to capture comparable taxonomic diversity at the genus level across all age groups. It is important to note that metagenomic sequencing 334 generally captures more diversity due to its species-level resolution (Ranjan et al., 2016), but we 335 restricted our analysis to the genus level in order to make the most direct comparison to amplicon 336 sequencing. Interestingly, though the observed diversity overall was comparable between methods, 337 the actual taxonomic profiles generated by each method had substantial differences, particularly in 338 the youngest children. we showed that 16S rRNA gene amplicon and shotgun metagenomic sequencing each missed some 347 taxa, but more genera were identified overall by 16S rRNA gene profiling, at least in the 348 RESONANCE cohort. This may be due to an increased ability to identify very low abundance taxa or 349 some artifact of amplification or sequencing, though in the DIABIMMUNE cohort, more genera 350 were identified using shotgun metagenomic profiling, suggesting that the relative performance of 351 each method for some metrics may vary between populations. Interestingly, we also show that the 352 largest discrepancies between the two profiling methods were found in the youngest kids. This is 353 likely due in part to the low diversity of these samples, since loss of one genus in a profile with few 354 genera may have a larger impact on dissimilarity metrics. Another possible explanation is the large 355 fraction of many samples in young children (as much as 40% relative abundance) that could not be 356 resolved to the genus level (see section 3.1) with amplicon sequencing. As unresolved taxa were 357 excluded from our alpha diversity analysis, the true diversity could be much higher or lower than we 358 observe in those samples. 359 Some of the discrepancies we observed were due to technical differences in sequencing 360 methods. For example, some taxa found exclusively through 16S rRNA gene profiling were not 361 found in the MetaPhlAn database, including 16 genera that did not have reference genomes available. 362 All of the genera found uniquely by shotgun metagenomics were present in the SILVA database, but 363 their 16S rRNA gene sequences may not have perfectly complemented the primers we used. Though 16S rRNA PCR primers are often referred to as "universal," there is considerable sequence diversity 365 in the 16S rRNA gene, even in the most conserved regions and among bacteria of the same species 366 (Větrovský and Baldrian, 2013). Using TestPrime 1.0, we identified several genera that had very low 367 alignment with our primers, such as Solobacterium (2.2% alignment) and Pediococcus (1.3%) and 10 368 genera that were present in the SILVA database and identified using shotgun metagenomics, but were 369 not found to be hits with our primers. We also explored if certain clusters of taxa were more 370 systematically unidentified by a particular profiling method. For example, several genera identified 371 uniquely by shotgun metagenomic profiling had lower primer coverage compared to the genera 372 identified by 16S rRNA amplicon profiling (Figure S5). Other taxa were only identified using 16S 373 amplicon profiling (Figure 2B; ex. clade containing Ruegeria, Planktotalea, Planktomarina, and 374

Sulfitobacter). 375
Given that both profiling methods exhibited some biases against certain taxa, future study 376 designs should carefully consider which method is most appropriate to their research question, and 377 further investigation using communities where the ground truth of composition is known should be 378 pursued to interrogate whether these differences are systematic. In addition to uncertainty about the 379 true composition of these samples' communities, our study was also limited in scope to a single 16S to have its own biases. We chose to compare widely used and accessible methods to compare for 385 investigation of child microbiomes, but further investigation to select the best combination of 386 methods may be warranted. Finally, advances in sequencing technology (e.g., long-read sequencing 387 of 16S rRNA genes (Karst et al., 2020)), changes to reference databases and improved taxonomic 388 assignment methods may affect the performance and relative trade-offs in the future. 389

Conclusion 390
Understanding the advantages associated with different methods of investigating the human 391 microbiome will allow others in the field to use the most cost-effective methods to explore the 392 relationship between the gut microbiome and human health. Most research is limited by financial 393 resources, which impacts the number of controls, replicates, samples we can analyze, and the depth to 394 which we can characterize each sample. Better insight into how we can sequence more efficiently will allow us to use these finite resources more effectively. Hopefully, this will allow us to devote 396 resources where they will be best utilized (eg. deep sequencing for older children with higher alpha 397 diversity) and reduce them where they are not necessary.  Colors indicate which method was able to identify taxa (peach = identified by 16S, cyan = identified 595 by shotgun metagenomics, yellow = taxa was not present in the database of the method with which it 596 was not found). 597   TestPrime 1.0 was used to calculate the percent primer coverage of the primers used in our study for 634 amplicon sequencing. We compared the percent coverage for microbes found uniquely by 16S rRNA 635 sequencing, both methods, and shotgun metagenomic sequencing. A pairwise Wilcoxon test found 636 that primer coverage for microbes found uniquely by amplicon sequencing is significantly higher 637 than that in the genera found uniquely by shotgun metagenomics (p < 0.05). 638 Tables Table S1: Genera systematically over-represented with either profiling method  640 The Wilcoxon signed-rank test was used to compare the relative abundances of a particular genera, 641 calculated from 16S and shotgun metagenomics profiling. "Diff" is the average relative abundance 642 difference for a particular genera (mean 16S relative abundance -mean shotgun metagenomics 643 relative abundance) "P.adjust" is the p-value after Benjamini-Hotchberg correction. The table  644 presents genera with significant differences (adjusted p-value < 0.05), indicating genera that had 645 higher average relative abundances when profiled by 16S rRNA or shotgun metagenomics. "Method" 646 indicates the profiling method where the genus was more abundant. 647