Mother-Infant Gut Viruses and their Bacterial Hosts: Transmission Patterns and Dynamics during Pregnancy and Early Life

Early development of the gut ecosystem is crucial for lifelong health. While infant gut bacterial communities have been studied extensively, the infant gut virome remains under-explored. We longitudinally assessed the composition of gut viruses and their bacterial hosts in 322 total metagenomes and 205 metaviromes from 30 mothers during and after pregnancy and from their 32 infants during their first year of life. While the maternal gut virome composition remained stable during late pregnancy and after birth, the infant gut virome was dynamic in the first year of life and contained a higher abundance of active temperate phages compared to the maternal gut viromes. The infant gut virome composition was also influenced by infant feeding mode and place of delivery. Lastly, we provide evidence of viral-bacterial strains co-transmission from mothers to infants, demonstrating that infants acquire some of their virome from their mother’s gut. Highlights - Longitudinal characterisation of the gut microbiome and virome in 30 mothers during pregnancy, at birth and 3 months after birth and in 32 infants from birth across the first year of life. - The maternal gut bacteriome changes from the first to the second trimester and then remains stable through birth and the first 3 months after birth. - The maternal gut virome remains stable during late pregnancy, birth and the first 3 months after birth. - The infant gut virome is highly dynamic during the first year of life and is shaped by infant feeding mode and place of delivery. - The infant gut harbours more temperate bacteriophages than the maternal gut, but their relative abundance decreases with increasing infant age. - Gut viral strains and their bacterial host strains are co-transmitted from mothers to their infants. - Gut viral strains are transferred from mother to infant around birth directly or via transfer of their bacterial hosts followed by the induction of prophages.


1
-Longitudinal characterisation of the gut microbiome and virome in 30 mothers 2 during pregnancy, at birth and 3 months after birth and in 32 infants from birth 3 across the first year of life. 4 -The maternal gut bacteriome changes from the first to the second trimester and 5 then remains stable through birth and the first 3 months after birth. 6 -The maternal gut virome remains stable during late pregnancy, birth and the first 3 7 months after birth. 8 -The infant gut virome is highly dynamic during the first year of life and is shaped by 9 infant feeding mode and place of delivery. 10 -The infant gut harbours more temperate bacteriophages than the maternal gut, but 11 their relative abundance decreases with increasing infant age. 12 -Gut viral strains and their bacterial host strains are co-transmitted from mothers to 13 their infants. 14 -Gut viral strains are transferred from mother to infant around birth directly or via 15 transfer of their bacterial hosts followed by the induction of prophages. 16

Introduction 29
The human early-life gut ecosystem has garnered much interest in recent years because of its 30 links to health and disease later in life, but core aspects of its origin and development remain 31 poorly understood. 1 Previous studies have characterised the development of the infant gut 32 microbiome through the first 2-3 years of life, after which the gut microbiome reaches a state 33 of high microbial richness and diversity that is similar to that of an adult 2-6 . While the focus 34 of research thus far has been the developing gut bacteriome, the gut ecosystem also 35 comprises viruses, archaea and eukaryotes, whose role in the early gut ecosystem is perceived 36 to be very important but whose composition and development over time has received little 37

attention. 38
Microbes from the maternal gut, skin and vaginal tract have been described as sources 39 of the infant gut microbiota 7,8 , and recent studies provide increasing support for the maternal 40 gut bacterial reservoir as a key source of microbes transmitted from mothers to infants 9-15 . 41 environmental studies have demonstrated that bacteriophages are key players in the 47 modulation of bacterial communities 18,19 , it is crucial to study them in the context of the 48 developing human gut ecosystem as the bacterial community is established in the months 49 following birth. To this end, a crucial Liang et al. study examining virus-like particle (VLP) data 50 representative of metaviromes in 20 healthy infants provided evidence that the 51 bacteriophages colonising the infant gut arise from excisions from pioneering infant gut 52 13 the abundance of their bacterial hosts. The hosts of these bacteriophages were not associated 239 with feeding mode themselves (B. fragilis: p-value=0.3, beta=2.0; P. vulgatus: p-value=0.9, 240 beta=-0.4; B. caccae: p-value=0.9, beta=-0.3). When corrected for both the host abundance 241 and the estimated number of prophages from whole metaviromes, the differential prevalence 242 of active temperate phages of B. fragilis remained significantly higher in formula-fed infants 243 (FDR=0.02, beta=1.1). This suggests that formula feeding might be associated with the 244 induction of bacteriophages in B. fragilis, independent of changes in bacterial abundances. 245 Since only two mothers gave birth to their infants by caesarean section, we did not 246 have sufficient power to explore the effect of birth mode on virome composition 247 ( Supplementary Fig. 1c). However, as 28% of the infants were born by vaginal delivery at 248 home, we investigated if home versus hospital delivery was associated with specific vOTUs, 249 aggregated by the microbial host or microbial taxa themselves. Here we observed that the 250 bacterial species Akkermansia muciniphila (Supplementary Fig. 4c) was more abundant in 251 infants born at home compared to those born at the hospital (FDR=0.04, beta=2.5). 252 Concomitantly, the phages of A. muciniphila were also more abundant in infants delivered at 253 home (p=0.02, beta=1.5, Supplementary Fig. 4d). This shows the importance of accounting 254 for the birth environment when considering the early development of the infant virome and 255 its interactions with its host(s). 256 257

Infants can acquire gut viruses and bacterial hosts from the maternal gut 258
Despite the large difference in adult and infant gut microbes, related mother-infant pairs have 259 previously been shown to share gut bacterial species 11,12 , but limited information is available 260 about sharing of viruses between maternal and infant guts. We therefore investigated if the 261 gut ecosystem of mothers and their infants harbours the same viruses. We first compared the 262 14 percentage of infant vOTUs that were shared with the pooled pre-birth (Month 7, Birth) and 263 pooled post-birth (Month 1, 3) maternal samples. Infants shared a higher percentage of 264 vOTUs with post-birth maternal samples (32.3%) compared to the pre-birth maternal samples 265 (26.6%) across all infant timepoints (p=0.04, beta=3.3e-02, Fig. 3e). As pioneer viruses in the 266 infant gut are thought to be primarily temperate phages induced from the first gut bacterial 267 colonisers 20 , we next assessed the sharedness of maternal to infant vOTUs while accounting 268 for prophages detected in whole metaviromes. Notably, sharedness increased significantly 269 when considering prophages in both the maternal and infant gut (p=0.001, mean increase of 270 4.9%, Fig. 3f, Supplementary Fig. 4f). We then sought to explore whether infant feeding mode, 271 place of delivery and infant gestational age influenced the percentage of shared vOTUs 272 between mother and infant, but this was not the case. On average, the relative abundance of 273 vOTUs shared with maternal gut virome in infants was 32.7% (95% CI: [28.1, 37.4]). These 274 findings suggest that sharing of vOTUs between mothers and infants is more likely attributed 275 to cohabitation rather than direct seeding of these viruses during birth. Additionally, the 276 higher degree of sharedness observed when considering the whole metavirome provides 277 support for the notion that the presence of shared bacteria containing prophages contributes 278 to the colonisation process in infants. 279 We next sought to see if there were cases of strain-sharing within mother-infant pairs. 280 To do so, we selected vOTUs that were shared between mother and infant and passed strict 281 cut-offs for completeness and coverage (Methods), resulting in 51 vOTUs for downstream 282 analysis. For these 51 vOTUs, we reconstructed consensus sequences from quality-trimmed 283 reads aligned to vOTUs from metagenomes and metaviromes and calculated pairwise genetic 284 distances (Kimura) between consensus sequences corresponding to the same vOTUs. We 285 compared these genetic distances between the viruses shared across an infant and their own 286 15 mother as compared to unrelated mothers. We found that, for 28 of the 51 vOTUs (55%), the 287 genetic distance between related mother-infant sample pairs was significantly lower than 288 that of unrelated mother-infant sample pairs (Fig. 4a). We then defined strain-sharing using 289 a distance cut-off estimated assuming strain retention in longitudinal samples (Methods,290 Supplementary Fig. 4g). In 26 of these 28 viruses, we observed 841 strain transmission events 291 between samples from related mothers and infants (Methods). Of the 26 transmitted viruses, 292 23 were shared with higher frequency within related mother-infant pairs compared to 293 unrelated pairs (FDR<0.05, Fig. 4b). Seven of these were found among PPVs in 50% of infants 294 with more than 3 timepoints available. These persistent transmitted colonizers were 295 predicted to infect bacteria from genera Phocaeicola, Bacteroides and Parabacteroides. Next, 296 we explored the transmission of the predicted bacterial hosts of the shared viruses. For the 297 26 transmitted viruses, we constructed 37 strains of their 29 bacterial hosts in both maternal 298 and infant faecal samples (see Methods). Our findings indicate that, for 26 of the 30 (86.7%) 299 reconstructed bacterial host strains present in both mother and infant, the distances between 300 related mother-infant pairs were lower than those observed between unrelated mother-301 infant pairs (Fig. 4c, FDR<0.05). Of those 26 bacterial strains, 24 were shared with higher 302 frequency within related mother-infant pairs compared to unrelated pairs (Fig. 4d,FDR<0.05,303 Supplementary Fig. 5). These bacterial strains mostly belong to the genera Alistipes, 304 Bacteroides, Bifidobacterium,Faecalibacterium,Parabacteroides,Phocaeicola and Sutterella. 305 306 To establish whether viral transmission occurred during or after birth, we investigated 307 if there was a difference in viral strain-sharing between infant samples (at all timepoints) and 308 maternal pre-birth (pregnancy month 7, birth) versus post-birth (month 1, month 3) samples. 309 Here, we found one significant difference in viral strain-sharing between pre-birth and post-310 16 birth samples (FDR < 0.05, Table S39). In concordance with this, the host of this virus, 311 Parabacteroides distasonis, was also shared with a higher frequency between infant samples 312 and post-birth maternal samples as compared to pre-birth maternal samples (p-value < 0.05), 313 suggesting a higher probability that this bacterium and its phage were transmitted after birth. 314 Subsequently, we tested whether bacteriophages were preferably co-transmitted alongside 315 their bacterial hosts, as opposed to other bacteria, by correlating their strain-sharing events 316 in concurrent samples (Methods). Our observations revealed that bacteriophages were 317 predominantly co-transmitted in conjunction with their bacterial hosts (p-value=0.01, Fig. 5a-318 b, Table S41). An example of one such bacteriophage is L85266_LS0, whose host Bacteroides 319 uniformis shows a very similar topological pattern in its phylogenetic tree (Fig 5c). We also 320 found evidence for non-random co-occurrence of phages and their bacterial host in 14/32 321 virus-host pairs (linkage FDR<0.05, Methods, Table S40), indicating a common co-322 transmission mechanism throughout different mother-infant pairs. This co-occurrence was 323 more often seen between phage-host pairs than between phage-unrelated bacterium pairs 324 (Fisher test, p-value=0.02, Fig. 5b). This co-transmission was observed for multiple species of 325 the genus Bacteroides (7), Bifidobacterium bifidum and Sutterella wadsworthensis. 326 327

Possible mechanisms for the origin of the early-life virome 328
Having established that there are cases of viral strain transmission from mother to infant, we 329 sought to explore the mechanisms underlying the colonisation of bacteriophages in the infant 330 gut. Among the 26 transmitted viruses, 21 were identified as virulent bacteriophages. Based 331 on their predicted lifecycle, it is likely that they were transmitted through direct seeding of 332 VLPs from the maternal to infant gut. 333 17 We next focused on investigating the origin of temperate phages in the infant gut. 334 One of the strongest cases of co-transmission between bacteriophages and bacteria was B. 335 bifidum and its predicted temperate bacteriophage L34922_LS1 (r=1, p-value=0.02, Fig. 5a). 336 The phylogenetic trees of this temperate phage and its bacteria are topologically very similar 337 (Fig. 6a). We postulated that the high co-transmission rate we observed might be attributed 338 to the temperate nature of L34922_LS1, which enables it to integrate and be transmitted 339 within the genome of its host. As we observed L34922_LS1 in both metaviromes and 340 metagenomes (Fig. 6b), it could also suggest possible phage induction from its transmitted 341 host. To investigate this hypothesis, we initially reconstructed the genome of B. bifidum from 342 metagenomes in which the presence of L34922_LS1 was detected. Next, we mapped the 343 L34922_LS1 genome sequence to the genome of B. bifidum and observed a high identity 344 (>99%) and coverage (100%) for the L34922_LS1 sequence (Fig. 6c), which confirmed that this 345 phage could be observed in the prophage form within the B. bifidum genome. Additionally, 346 we detected both the integrase and viral recombination genes in the L34922_LS1 sequence 347 infant and maternal samples (Fig. 6f). Overall, these observations suggest that L34922_LS1 356 originated from the B. bifidum of the mother. 357 18 We next attempted to find the origin of temperate bacteriophages that were not 358 shown to be significantly transmitted from mother to infant. One of these was a temperate 359 bacteriophage identified in infant samples, L37775_LS1, that is predicted to infect multiple 360 species of the Bifidobacterium genus. After mapping the genome sequence of L37775_LS1 to 361 patched Bifidobacterium genomes reconstructed from metagenomes concurrent to 362 metaviromes carrying L37775_LS1, we narrowed down the host range to Bifidobacterium 363 scardovii (qcov 100%, e-value<0.005), which was absent in maternal samples. Metagenomic 364 and metaviromic read-alignment profiles to the B. scardovii genome revealed that 365 L37775_LS1 is present at the indicated prophage region and can be induced from its host 366 (Supplementary Fig. 6 a,b,c). As B. scardovii was only present in infant gut metagenomes and 367 metaviromes, our observations suggest that its phage L37775_LS1 does not originate from 368 the maternal gut. Gene annotation showed that, in addition to carrying an integrase gene, 369 L37775_LS1 also carries a CAZyme (Glycosyl hydrolases family 25, Supplementary Fig. 6d), 370 indicating that this phage might be associated with infant feeding. 371 19

Discussion 372
In this study, we characterised the faecal microbiome and virome in 30 mothers and their 32 373 infants longitudinally during pregnancy, at birth and during the first year of life. To our 374 knowledge, this is the only study to look at the maternal virome longitudinally during 375 pregnancy, birth and after birth. In the maternal total microbiome, we observed a notable 376 shift in composition between the first and second trimesters of pregnancy. This is in line with 377 the findings of Koren et al., who proposed that hormonal shifts, immune system adaptations 378 and dietary variations during pregnancy can impact the composition of gut bacteria 25,26 . 379 During late pregnancy, birth and after birth, however, we observed that the overall 380 composition of the maternal gut microbiome and virome do not change. These results suggest 381 that once established during the second trimester of pregnancy, the maternal gut microbiome 382 and virome remain consistent throughout this critical period of maternal and infant health. 383 We demonstrated that the infant gut virome during the first year of life was highly 384 dynamic, and while it progressively transitioned to resemble an adult-like virome with time, 385 it was still very different from that of the mother at the age of one year. We show that the 386 infant gut has a high proportion of temperate phages in the first 3 months of life and that this 387 proportion decreases drastically at 6 months. We thus hypothesise that temperate phages 388 are fundamental in seeding the gut virome, most likely through prophage induction of 389 pioneering gut bacteria. However, even at the age of one year, the abundance of prophages 390 still remained higher than that observed in adults, indicating ongoing viral development. 391 Altogether, our results indicate there is a high degree of prophage induction in the gut during 392 the first 3 months of infancy that is followed by stabilisation of the gut environment at a later 393 age as the availability of more bacterial hosts allows for more prophage integration. 394 20 Our results highlight the influence of infant feeding mode on infant virome 395 composition. The higher alpha diversity we observed in exclusively formula-fed infants may 396 be attributed to the different composition of formula compared to breast milk. The increased 397 richness of active temperate bacteriophages in exclusively formula-fed infants suggests that 398 formula feeding may provide specific nutrients or environmental conditions that promote the 399 proliferation of temperate phages. The consistent association observed in both active 400 metaviromes and prophages supports the notion that formula feeding has a lasting impact on 401 the acquisition and maintenance of temperate phages in the infant gut. Our investigation into 402 vOTUs grouped by their host bacteria reveals an intriguing finding related to Bacteroides 403 fragilis. Although the relative abundance of B. fragilis itself was similar between feeding 404 groups, formula feeding was associated with a higher presence of active temperate phages 405 specifically targeting B. fragilis. This suggests that formula feeding may induce the production 406 of bacteriophages that target this bacterial species, independent of changes in bacterial 407 abundances. The mechanisms underlying this association warrant further investigation. 408 Only a few studies have addressed infant gut viromes in relation to maternal 409 viromes 7,21,22 . Duranti et al. focused entirely on the transmission of Bifidobacterium phages 410 from mother to infant gut and showed that these could be transmitted 7 . Our study showed 411 that not only Bifidobacterium phages but numerous phages predicted to infect bacteria from 412 other genera like Bacteroides were also transmitted from mother to infant. A recent study by 413 Walters et al. in 53 infants showed that the overall infant gut virome composition was not 414 driven by exposure to mothers but rather by dietary, environmental and infectious factors 22 . 415 However, in this study, no direct comparison was made between infant and maternal gut 416 microbial strains. Another study by Maqsood et al. comparing virus scaffold presence-absence 417 between mother and infant in 28 infant twin pairs showed that, on average, 15% of the infant 418 21 virome was shared with their own mother's gut virome 21 . Our study revealed that, despite 419 significant distinctions between the infant and maternal gut viromes, infants shared on 420 average 32.7% of vOTUs with their mothers. This difference in sharedness may be attributed 421 to the fact that our study encompassed longitudinal samples from both mother and infant for 422 a longer timeframe, whereas Maqsood et al. looked at the sharing of vOTUs between mother 423 and infant cross-sectionally around birth. However, to make definitive claims about 424 transmission, one cannot merely rely on virus sequence co-occurrence in maternal and infant 425 viromes, and it is essential to examine the genetic makeup of viral strains, their sequence 426 similarity and to define a strict strain identity discrimination threshold, an aspect not explored 427 in the above-mentioned studies. For vOTUs shared between the infant and maternal guts, we 428 constructed strains and showed that more than half of the strains were shared more 429 frequently within related mother-infant pairs compared to unrelated mother-infant pairs. 430 This shows that, while it might not be the most defining factor of the infant gut virome, infants 431 do share viral strains with their mother's gut. In microbiome transmission studies, 432 determining the directionality of strain inheritance poses a fundamental challenge. Although 433 it is theoretically possible for an infant to acquire a strain and transmit it to the mother, this 434 scenario is improbable given the greater diversity and stability of the adult gut microbiota. 435 The acquisition of a shared strain by both individuals from a common environmental source 436 is another plausible explanation. However, the prevailing consensus in the field suggests that 437 the primary direction of transmission is from mother to infant 12 which is what we assume in 438 this paper. Our findings suggest that some viral and bacterial strains are co-transmitted 439 between related mother-infant pairs, and this holds especially true for species of the genus 440 Bacteroides, Bifidobacterium bifidum and Suttertella wadsworthensis. We then show 441 examples of how this occurs via direct transmission of bacteriophages and prophage 442 22 induction following transmission of their bacterial hosts. We also show how infants obtain 443 some viral strains from induction from bacteria that did not come from their mothers. 444 Our study has several strengths. Firstly, our quantitative approach (avoidance of 445 amplification techniques during the isolation and sequencing of VLP DNA) allowed accurate 446 quantification of viruses and led to minimal bias in our estimation and characterisation of 447 dsDNA viruses. Furthermore, by utilising both total and viral-enriched metagenomes, we 448 could characterise the whole virome, including prophages. Additionally, our dense 449 longitudinal sampling design in both mothers and infants allowed us to study viral 450 compositional dynamics in critical periods such as pregnancy, birth and post-birth. Our study 451 also addressed previously unstudied factors for maternal-to-infant gut microbial transmission 452 such as the place of delivery. As home deliveries constitute 12.7% of the total deliveries (2018)  453 in the Netherlands and 28% in our samples, we had a unique opportunity to explore the effect 454 of place of birth on the viral and bacterial communities of the gut, although we note that our 455 sample size is small and larger studies are needed to make strong conclusions about this. 456 Our study has several limitations. Firstly, due to our isolation method, we focused 457 solely on the dsDNA viruses present in the gut and overlooked the single-stranded DNA 458 viruses and RNA viruses. While RNA viruses are typically perceived to have a lower abundance 459 in the healthy human gut, it is crucial for future investigations to delve into this under-studied 460 aspect of the infant virome. Including single-stranded DNA viruses and RNA viruses in future 461 research will shed light on their potential roles within the gut ecosystem. Our sample size was 462 also small, which hampers associations with phenotypes and limits the generalizability of the 463 findings to the whole population. Due to the small number of participants who had C-sections, 464 we could not study the effect of this important factor on the gut virome even though an 465 impact of birth mode on the early-life gut virome has previously been described 29 . Thirdly, 466 23 while we use the term "human gut virome" throughout this study to be consistent with earlier 467 studies, we acknowledge that this result may be biased as faecal samples do not accurately 468 take into account the viruses residing in the gut mucosa. Hence our results regarding the gut 469 virome are limited to faecal viromes. Finally, despite our best efforts, we cannot guarantee 470 that the viral scaffold database is free from bacterial contamination. 471 In conclusion, we characterised the total gut microbiome and gut virome in 30

Study cohort 485
The samples for this study were obtained from the Lifelines NEXT cohort, a birth cohort 486 designed to study the effects of intrinsic and extrinsic determinants on health and disease in 487 a four-generation design 30 . Lifelines NEXT is embedded within the Lifelines cohort study, a 488 prospective three-generation population-based cohort study recording the health and health-489 related aspects of 167,729 individuals living in the Northern Netherlands 26 . In Lifelines NEXT, 490 we included 1,450 pregnant Lifelines participants and intensively followed them, their 491 partners and their children up to at least 1 year after birth. During the Lifelines NEXT study, 492 biomaterials, including maternal and neonatal (cord) blood, placental tissue, faeces, breast 493 milk, nasal swabs and urine are collected from the mother and child at ten timepoints. 494 Furthermore, data on medical, social, lifestyle and environmental factors are collected via 495 questionnaires at 14 different timepoints and via connected devices 27 . The current study is a 496 pilot study of the first samples collected in the Lifelines NEXT project, without prior selection. 497

Informed consent 498
The Lifelines NEXT study was approved by the Ethics Committee of the University Medical 499 Center Groningen, document number METC UMCG METc2015/600. Written informed 500 consent forms were signed by the participants or their parents/legal guardians. 501

Sample collection 502
Mothers collected their faeces during pregnancy at weeks 12 and 28, very close to birth and 503 during the first 3 months after birth (Fig. 1a). Faeces from infants were collected from diapers 504 by their parents at 1, 2, 3, 6, 9 and 12 months of infant age. Parents were asked to freeze the 505 stool samples at home at -20°C within 10 min of stool production. Frozen samples were then 506 collected and transported to the UMCG in portable freezers and stored in a -80°C freezer until 507 extraction of microbial and viral DNA. For this study, we collected 361 samples for total 508 microbiome analysis. 509

DNA extraction from total microbiome 510
Total microbial DNA was isolated from 0.2-0.5 g faecal material using the QIAamp Fast DNA 511 Stool Mini Kit (Qiagen, Germany) using the QIAcube (Qiagen) according to the manufacturer's 512 25 instructions, with a final elution volume of 100 μl. Additionally, DNA was extracted from two 513 negative controls consisting of Milli-Q water. The exact same procedure was used for the 514 negative controls. DNA eluates were stored at -20°C until further processing. 515

DNA extraction from VLPs 517
Out of 361 faecal samples, 259 were selected for VLP enrichment and VLP DNA extraction 518 based on the amount of faecal material collected. This included maternal samples from 28 519 weeks of pregnancy, birth and months 1, 2 and 3 after delivery and infant samples at birth 520 and months 1, 2, 3, 6 and 12 after birth (Fig. 1a). To study the gut virome, DNA was extracted 521

Profiling gut virome composition 571
Metavirome sequencing reads underwent quality trimming and human read removal, as 572 described above. Bacterial contamination of metaviromes was assessed by aligning reads to 573 27 the single copy chaperonin gene cpn60 database 36 . On average, metaviromes contained 8.3% 574 (95% CI: [7.2; 9.6]) of bacterial genomic DNA per sample. 575 We used a de novo assembly approach to annotate the composition of the gut virome. 576 Specifically, SPAdes (v3.14.1) 37 was utilised in metagenomic mode (-meta) with default 577 settings to perform de novo assembly per metavirome. The average number of assembled 578 scaffolds was 283,893 for maternal samples and 103,192 for infant samples. Scaffolds smaller 579 than 1 kbp were removed. Scaffolds that were at least 1 kbp underwent rigorous filtering per 580 sample for the following gut virome annotation. The Open Reading Frames (ORFs) in these 581 scaffolds were predicted using Prodigal v2.6.3 38 in metagenomic mode. Ribosomal proteins 582 were identified using a BLASTp 39 search (e-value threshold of 10 -10 ) against a subset of 583 ribosomal protein sequences from the COG database (release 2020). We used a Hidden 584 Markov Model (HMM) algorithm (hmmsearch from HMMER v3.3.2 package) 40 to compare 585 amino acid sequences of predicted protein products against the HMM database Prokaryotic 586 Virus Orthologous Groups (pVOGs) 41 . Hits were considered significant at an e-value threshold 587 of 10 -5 . To detect viral sequences, VirSorter v1.0.3 42 was run with its expanded built-in 588 database of viral sequences ('-db 2' parameter) in the decontamination mode (--virome). 589 Scaffolds larger than 1 kbp were considered viral if they fulfilled at least one of six criteria, 590 similar to those described previously : (1) BLASTn alignments to a viral section of NCBI RefSeq 591 (release 211) with e-value≤10 -10 , covering >90% of sequence length at >50% Average 592 Nucleotide Identity (ANI), (2) having at least three ORFs, producing HMM-hits to the pVOG 593 database with an e-value≤10 -5 , with at least two ORFs per 10 kb of scaffold length, (3) being 594 VirSorter-positive (all six categories, including suggestive), (4) being circular 43 , (5) BLASTn 595 alignments (e-value≤10 -10 , >90% query coverage, >50% ANI) to 1,489 Crassvirales 596 dereplicated sequences (99% ANI and 85% length) larger than 50 kbp from the NCBI database 597 (taxid:1978007) and published datasets 44-47 and (6) being longer than 3 kbp with no hits 598 (alignments >100 nucleotides, 90% ANI, e-value of 10 -10 ) to the nt database (release 249). 599 281,789 scaffolds fulfilled at least one of these six criteria. 600 To remove putative cellular contamination from the virus sequences, scaffolds 601 meeting the filtering criteria were dereplicated at 99% ANI with all negative control scaffolds 602 with no filtration applied other than the size of the scaffold (larger than 1 kbp) using CheckV 603 at 85% alignment fraction (relative to the shorter sequence). Sequence clusters containing 604 negative control scaffolds were excluded from further consideration. The remaining 280,633 605 28 putative virus scaffolds were dereplicated at 95% ANI and 85% length to represent vOTUs at 606 the species level. 48 The resulting 110,526 vOTU representatives were screened for the 607 presence of ribosomal RNA (rRNA) genes using a BLASTn search in the SILVA 138.1 NR99 rRNA 608 genes database 49 with an e-value threshold of 10 -3 . An rRNA gene was considered detected 609 in a scaffold if the gene and the scaffold produced a hit covering >50% of the gene length. 610 Additionally, vOTU representatives were clustered with 1,489 dereplicated Crassvirales 611 sequences larger than 50 kbp and the genomes of the reference database 612 "ProkaryoticViralRefSeq211-Merged" using vConTACT2 v0.11.3 with default parameters 50 . To align quality-filtered metavirome reads to the final curated vOTU representatives, 624 we used Bowtie2 v2.4.5 in 'end-to-end' mode. A count table was then generated using 625 SAMTools v1.14. 51 The sequence coverage breadth per scaffold was calculated per sample 626 using the SAMTools v1.14 'mpileup' command. To remove spurious Bowtie2 alignments, read 627 counts with a breadth of sequence coverage less than 1 × 75% of a scaffold length were set 628 to zero 44  Confidence.score from iPHoP. To ensure consistency with the bacterial taxonomic annotation 651 of MetaPhlAn4, the predicted host taxonomy from iPHoP was manually curated. For 652 associations with phenotypes, the RPKM counts of vOTUs were aggregated based on the 653 genus and species levels of the predicted host taxonomy provided by iPHoP. 654 655

Ecological measurements and statistical analyses 656
To assess bacterial and viral alpha diversity, no filters were applied to the relative abundance 657 (bacteria) and RPKM counts (viruses). The alpha diversity for both the bacteriome and virome 658 was calculated using the Shannon diversity index using the diversity() function in R package 659 'vegan' v.2.6-4. 52 660 Beta diversity analysis of the virome and microbiome communities was performed at 661 the vOTU and bacterial species levels using Bray-Curtis dissimilarity. The Bray-Curtis 662 dissimilarity between samples was calculated using the function vegdist() from the R package 663 'vegan'. We used NMDS to visualise the similarity of bacteriome and virome samples. For that, 664 the function metaMDS() from the R package 'vegan' was employed with 2 dimensions for 665 visualisation purposes and 1 dimension for the analyses related to biome composition 666 30 changes. Additionally, envfit() with 999 permutations was used to determine the correlation 667 between NMDS and timepoint along with the vector coordinates for Fig, 1b-

e. 668
To test the difference in overall composition of virome and total microbiome (between 669 mothers and infants and between different timepoints), we used a linear mixed model using 670 lmerTest (3.1-3) 53 . The outcome was NMDS1 (dimension). The predictor variable was 671 timepoint (expressed as exact ages in years or days after birth), and we corrected for the 672 number of quality-trimmed reads and DNA concentration as fixed effects. Individual ID was 673 incorporated as a random effect. 674 To test the difference in the Shannon diversity index between mothers and infants for 675 bacterial and viral abundances, we used a linear mixed model. Here the variable tested was 676 sample type (mother or infant), and we corrected for the number of quality-trimmed reads 677 and DNA concentration as fixed effects and considered individual ID as a random effect. A 678 similar linear mixed model was employed to analyse the effect of timepoint on Shannon 679 diversity in mothers and infants separately, with timepoint (expressed as exact ages in years 680 or days after birth) being the predictor variable, and we corrected for the number of quality-681 trimmed reads and DNA concentration as fixed effects and individual ID as a random effect. 682 Similar linear mixed models were used for the vOTUs richness (number of detected vOTUs 683 per sample) comparison between mothers and infants. 684 To compare virome Shannon indices between mothers and infants at 1 year of age, 685 we performed a Wilcoxon rank sum test. To analyse changes in the abundance of vOTUs 686 aggregated at the level of host genus and microbial genus over the first year of an infant's life, 687 a centred log-ratio (clr) transformation was applied using the function decostand() from the 688 R package 'vegan'. The pseudo count specific to the biome, expressed as half of the minimal 689 abundance in community data, was utilised. Only microbial genera and host genera vOTU 690 aggregates present in more than n (10%) of infant samples were considered. Subsequently, a 691 linear mixed model was used with timepoint (expressed as exact ages in days after birth) as 692 the predictor variable and correction for the number of quality-trimmed reads, DNA 693 concentration and mode of delivery as fixed effects and individual ID as a random effect. 694 We employed a bootstrap resampling approach with replacement to calculate the 695 95% CIs for the metrics of interest. The goal was to estimate the range within which the true 696 population values for the metrics were likely to fall. We calculated the mean value from each 697 bootstrap sample and repeated this process multiple times (n=1000). A 95% CI was 698 31 determined by computing the quantiles corresponding to the lower and upper bounds of the 699 distribution (0.025 and 0.975 quantiles, respectively). 700

Association of vOTUs aggregated by predicted host and bacterial species with phenotypes 701
The association analysis with phenotypes was conducted on infant samples using a linear 702 mixed model, focusing exclusively on bacterial species and vOTU aggregates by bacterial hosts 703 present in at least 10% of the infant samples. In each model, we tested the predictor 704 (maternal age, infant sex, feeding mode, birthweight, place of birth and gestational age) as a 705 fixed effect. We further corrected for timepoints (expressed as exact ages in days after birth), 706 the number of quality-trimmed reads, DNA concentration and mode of delivery as fixed 707 effects. Individual ID was included as a random effect. 708 For all analyses, an FDR correction was applied to correct for multiple testing, with 709 changes considered statistically significant at FDR<0.05 using the Benjamini-Hochberg 710 method. All statistical tests are two-sided unless explicitly stated. 711

vOTU and bacterial strain-specific analysis 713
To study viral strain-sharing within mother-infant pairs, we focused on a subset of vOTU 714 representatives that were covered by reads at over 95% of the genome length and shared 715 between maternal metaviromes and/or metagenomes and infant metaviromes. This subset 716 consisted of 4,965 vOTUs. We then selected vOTU representatives for further analysis based 717 on the following criteria: 1) a high-quality or complete genome predicted by CheckV or 718 circularised genomes 43 , 2) sequence length ≥3 kbp and 3) present in samples from at least five 719 different families. There were 51 vOTU representatives fulfilling these criteria. For each 720 selected vOTU, we reconstructed consensus sequences for all samples where the vOTU of 721 interest was covered at over 95% of the genome length. We employed the function 722 `consensus` with flags `-m simple -r` from SAMTools on the read alignments from Bowtie2 723 output that were used for the RPKM table construction. 724 We next performed global alignments of consensus sequences per vOTU using kalign 725 v1.04 54 . To improve alignment quality, we trimmed 100 bp from both ends of the global 726 alignment, which were enriched in gaps. Pairwise genetic distances were then calculated 727 using the dist.dna() function from the R package ape v.5.7-1 55 with default parameters 728 32 resulting in Kimura 2-parameter (pairwise nucleotide substitution rate between strains) 729 pairwise distances. To compare the pairwise Kimura distances for virus strains between 730 samples of related individuals, we used a one-sided Wilcoxon rank sum test with an 731 alternative hypothesis that the distances between strains identified in samples of related 732 individuals are smaller than the distances between unrelated samples. Significance of the 733 distance comparison was derived in a permutation test with 1,000 iterations, designed to 734 account for the highly unequal number of distances between strains of related and unrelated 735 individuals. FDR correction for multiple testing was applied as described above. 736 To investigate strain-sharing between mothers and infants, we selected those viruses 737 with lower distances between strains of related individuals compared to unrelated 738 individuals. To define strain-sharing events within mother-infant pairs, we used an approach 739 similar to that used in Valles-Colomer et al. 56 . In short, we compared the median-normalised 740 distances within individual strains over the entire study period (maximum 9 months for 741 maternal samples and maximum 12 months for infants) to the normalised distances between 742 strains of unrelated individuals, per vOTU. The strain identity cut-off was calculated using the 743 cutpointr() function from the R package cutpointr v.1.1.2 57 . For the identification of the 744 optimal cutpoint, we used the oc_youden_kernel parameter along with the youden metric. 745 Additionally, empirical FDR was defined as the 5th percentile of the unrelated individual 746 comparisons when Youden's index was above 5%. We then compared the percentage of 747 shared versus different dominant strains in related and unrelated mother-infant pairs (all 748 timepoint pairs) using the defined strain identity cut-off. If normalised distances between 749 strains were greater than the cut-off, the strains were deemed different. If they were smaller, 750 this was considered a strain-sharing event. This allowed us to calculate a percentage of 751 dominant strain-sharing between related and unrelated mother-infant pairs, which we then 752 tested for significance using the one-sided Fisher's exact test with subsequent FDR correction 753 for multiple testing using Benjamini-Hochberg. 754 We found 26 viruses to be shared between mothers and infants. Bacterial hosts for 25 755 of the 26 transmitted viruses were predicted at the species level using iPHoP. All predicted 756 hosts of viruses were used for co-transmission analysis. 757 758 Bacterial-species-specific strain analysis 759 33 We reconstructed bacterial strain SNP haplotypes for the predicted hosts of 25 transmitted 760 viruses using StrainPhlAn4 35 , resulting in 37 bacterial strain SNP haplotypes. This method is 761 based on reconstructing consensus sequence variants within species-specific marker genes 762 and using them to estimate strain-level phylogenies 50 . We then performed multiple sequence 763 alignment and used the Kimura 2-parameter method from the 'EMBOSS' package 51 to create 764 phylogenetic distance matrices that contain the pairwise nucleotide substitution rate 765 between strains. We next employed the same methods for the identification of strain-sharing 766 events as described above. 767 768

Virus-host co-transmission from mothers to infants 769
To determine if the shared viruses were co-transmitted with their predicted bacterial hosts, 770 we employed the Mantel partial test on modified genetic distance matrices for bacterial and 771 virus strains. This test assessed the correlation between the strain-sharing events for bacteria 772 and phages while controlling for longitudinally collected samples. 773 First, constructed Kimura genetic distance matrices were normalised by the median 774 genetic distance per bacterial strain and vOTU, respectively. Next, the normalised values of 775 genetic distances were replaced with 0 if the distance did not exceed the vOTU-or bacterial-776 strain-specific cut-off for individual strain variation (Youden index or 5% FDR), otherwise, it 777 was replaced with 1. This modification allowed us to focus on strain-sharing events rather 778 than on the correlation between genetic distances themselves. 779 Next, for each vOTU and bacterial strain, we selected subsets of concurrent samples 780 where both the vOTU and bacterial strain were reconstructed. For bacterial strains, only total 781 metagenomes were used. For the viral strains, strains reconstructed from both total 782 metagenomes and metaviromes were included. If both types were available for the same 783 individual and timepoint, strains from metavirome samples were prioritised. 784 To account for repeated measurements, we created control matrices for the selected 785 subsets of concurrent samples. In this matrix, we assigned a value of 0 when the strain was 786 reconstructed in samples from the same individual and a value of 1 when the strain was 787 reconstructed in samples from different individuals. In this analysis, mothers and their infants 788 were treated as different individuals. 789 The Mantel partial test, using mantel.partial() function from the R package vegan was 790 performed on the modified genetic distance matrices for bacteria and viruses. We