The population-level impact of Enterococcus faecalis genetics on intestinal colonisation and extraintestinal infection

Enterococcus faecalis is a commensal pathogenic bacterium commonly found in the human gastrointestinal tract and a cause of opportunistic infections typically associated with multidrug resistance. The E. faecalis genetic changes associated with pathogenicity and extraintestinal infection, particularly through gut-to-bloodstream translocation, are poorly understood. Here, we investigate the E. faecalis genetic signatures associated with intestinal colonisation and extraintestinal infection and infection of hospitalised and non-hospitalised individuals using heritability estimation and a genome-wide association study (GWAS). We analysed 750 whole-genome sequences of faecal and bloodstream E. faecalis isolates from hospitalised patients and non-hospitalised individuals, respectively, predominantly in Europe. We found that E. faecalis infection of individuals depending on their hospitalisation status and extraintestinal infection are heritable traits and that ∼24% and ∼34% of their variation is explained by the considered genetic effects, respectively. Further, a GWAS using linear mixed models did not pinpoint any clear enrichment of individual genetic changes in isolates from different isolation sites and individuals with varying hospitalisation statuses, suggesting that these traits are highly polygenic. Altogether, our findings indicate that E. faecalis infection and extraintestinal infection are influenced by variation in genetic, host, and environmental factors, and ultimately the opportunistic pathogenic lifestyle of this versatile host generalist bacterium. Author summary Enterococcus faecalis is a commensal host generalist bacterium that also causes life-threatening invasive hospital- and community-associated infections usually associated with multidrug resistance globally. Although E. faecalis typically cause opportunistic infections associated with antibiotic use, immunocompromised immune status, and other factors; it also possesses an arsenal of virulence factors crucial for its pathogenicity. Despite this, the relative contribution of these virulence factors and other genetic changes on the pathogenicity of E. faecalis strains remain poorly understood. Here, we investigated whether specific genomic changes in the genomics of E. faecalis isolates influence its pathogenicity – infection of hospitalised and non-hospitalised individuals and the propensity to cause extraintestinal infection and intestinal colonisation, are poorly understood, partly due to the limited availability of well-sampled E. faecalis population genomic datasets. Our findings indicate that E. faecalis genetics partially influence infection of hospitalised and non-hospitalised individuals, and the propensity to cause extraintestinal infection compared to intestinal colonisation, possibly due to gut-to-bloodstream translocation, highlighting the potential substantial role of host and environmental factors, including the gut microbiota, on the opportunistic pathogenic lifestyle of this bacterium.


Introduction
Enterococcus faecalis is a versatile generalist commensal bacterium which colonises the gastrointestinal tract and other niches in humans and animals, and survives in the environment, including nosocomial settings [1]. E. faecalis is a subdominant core member of the human gut microbiota usually acquired early after birth and its origin dates to the Paleozoic era ~400 to 500 million years ago [2]. Although E. faecalis predominantly exhibits a commensal lifestyle, the disruption of this harmless interaction with its host triggers the commensal-to-pathogen switch, ultimately making it a conditional or opportunistic pathogen [3,4]. Such accidental commensalto-pathogen switching causes life-threatening opportunistic infections, including bacteraemia, endocarditis, intra-abdominal infection, pneumonia, and meningitis infections typically associated with high mortality [5,6]. Since the 1970s, E. faecalis has emerged as a leading cause of community-acquired and nosocomial infections, most of which have become increasingly difficult to treat due to intrinsic and acquired antibiotic resistance, making it a major threat to public health globally [4,[6][7][8][9]. Such increasing antibiotic resistance has reignited calls to develop enterococcal vaccines.
The commensal-to-pathogenic switch of E. faecalis is marked by its overgrowth in the gut and subsequently translocation into the bloodstream via the intestinal epithelium [10]. Such extraintestinal translocation can lead to bacteraemia, infective endocarditis, and infections in other distal tissues from the intestines. However, the specific mechanisms driving E. faecalis bloodstream invasion, survival, and virulence are still being uncovered [3,5,11,12]. Observational studies have shown that antibiotics, such as cephalosporins, promote overgrowth and extraintestinal translocation of E. faecalis into the bloodstream [13,14], an observation supported by in vivo murine experimental models [14][15][16]. Such overgrowth of E. faecalis reflects the impact of ecological side-effects of broad-spectrum antibiotics in driving dysbiosis of the gut microbiota, a phenomenon similarly observed with Clostridioides difficile (formerly known as Clostridium difficile) [17,18]. E. faecalis also harbours a diverse arsenal of putative virulence factors [19][20][21], which foster its adaptation and survival in the harsh clinical and midgut environments, and potentially promote extraintestinal translocation into the bloodstream. These virulence factors appear to be enriched in dominant epidemic E. faecalis lineages [22,23], highlighting their importance to the success of these clones. For example, the gelatinase (gelE) gene, encodes a metalloprotease exoenzyme commonly associated with epidemic clones [22] and it's important for infective endocarditis [24] and extraintestinal translocation into the bloodstream [25]. Other exotoxins, namely haemolysin and enterococcus surface protein (esp), are also important for virulence in endocarditis [26] and biofilm formation [27], respectively, although the role of the former on intestinal colonisation and translocation has been questioned [28,29]. Acquisition of extrachromosomal elements, including pathogenicity islands [30,31] and plasmids [32], has also been associated with virulence and survival in nosocomial settings [33]. Understanding the distribution of these known and novel E. faecalis virulence factors in strains sampled from different tissues and individuals with contrasting pathogenicity could potentially reveal mechanisms for enterococcal pathogenicity and uncover therapeutic targets.
Remarkable advances in whole-genome sequencing and computational biology have revolutionised population genomics since the sequencing of the first enterococcal genome [34]. To date, due to the feasibility of large-scale whole-genome sequencing and analysis have facilitated detailed population-level studies to uncover the genetic basis of bacterial phenotypes [35]. For example, the application of genome-wide association studies (GWAS) to bacteria has revealed genetic variants associated with diverse phenotypes, including antimicrobial resistance [36], host adaptation [37], and pathogenicity [38]. A key feature of the GWAS approach is that it can identify novel genetic variants associated with phenotypes through systematic genome-wide screening, which does not bias the analysis towards "favourite" genes and mutations commonly studied in different laboratories. Although previous studies have attempted to compare the genetic and phenotypic differences between E. faecalis isolates causing intestinal colonisation and invasive disease [39], clinical and non-clinical strains [40], and isolates of diverse origins [41], these studies were limited by the small sample sizes and use of low-resolution molecular typing methods such as pulsed-field gel electrophoresis. Recent studies of E. faecalis and E. faecium species identified unique mutations associated with outbreak strains, highlighting the potential effects of specific genetic changes on pathogenicity [12,42]. Despite the increasing affordability of population-scale microbial sequencing, the genetic basis of E. faecalis infection in individuals with different hospitalisation statuses, i.e., pathogenicity and extraintestinal infection, including those due to extraintestinal translocation, remains poorly understood. The application of GWAS approaches to discover the genetic changes driving the pathogenicity and virulence of E. faecalis could expedite antibiotic and vaccine development.
Here we leveraged a collection of 750 whole-genome sequenced E. faecalis isolates sampled from the faeces and blood specimens of hospitalised and non-hospitalised individuals [43]. We undertook a GWAS of the isolates to investigate if specific genomic variations including singlenucleotide polymorphisms (SNP) and insertions/deletions were associated with infection of hospitalised and non-hospitalised individuals and extraintestinal infection. We show a predominantly higher differential abundance of virulence factors and antibiotic resistance in E. faecalis isolates from hospitalised than non-hospitalised individuals, as well as isolates from blood compared to faeces. This largely reflects the effects of the genetic background or lineages as no specific individual genetic changes showed population-wide effects on infection of individuals and extraintestinal infection. Additionally, we found that infection of individuals depending on their hospitalisation status and extraintestinal infection are heritable traits partially explained by E. faecalis genetics. Altogether, our findings provide evidence suggesting that collective effects of several genetic variants and genetic background or lineages, and gut ecological factors drive pathogenicity and extraintestinal infection of E. faecalis rather than population-wide effects of individual bacterial genetic changes. These findings have broader implications on E. faecalis disease prevention strategies, specifically, the need to target all genetic backgrounds when designing vaccines to achieve optimal protection against severe enterococcal invasive diseases.

Results
Clinical and genomic characteristics of E. faecalis isolates To investigate the population genomics of E. faecalis pathogenicity, marked by infection of hospitalised and non-hospitalised individuals and extraintestinal infection, we compiled a dataset of whole genome sequences of E. faecalis isolates sampled from blood and faecal specimens of hospitalised and non-hospitalised individuals between 1996 to 2016 [43] (Figure 1A). We included isolates from countries where both faecal and bloodstream isolates were collected but not necessarily from the same individual. In total, our final dataset comprised isolates predominantly from Europe: the Netherlands (n=300) and Spain (n=436), with additional isolates, from Tunisia (n=14) in northern Africa ( Figure 1B). By infection of individuals, 499 isolates were obtained from hospitalised patients while 251 isolates were sampled from nonhospitalised individuals (Supplementary Data 1). Regarding the isolation of E. faecalis from human body sites, 452 isolates were sampled from the blood while 298 isolates were from faeces. There was a strong association between the hospitalisation status and body isolation site (Chi-squared χ 2 =568.44, P<2.2×10 -16 ), suggesting that most E. faecalis isolates associated with extraintestinal infection were acquired during hospitalisation. Specifically, E. faecalis isolates sampled from blood were predominantly from hospitalised individuals while the faecal isolates were mostly from non-hospitalised individuals ( Figure 1C). This implied that hospitalisation status and body isolation site were similar correlated phenotypes reflecting the pathogenicity of E. faecalis infections.
Hospital-acquired and extraintestinal infections are heritable but predominantly explained by genetic background or lineages To assess the overall genetic basis of the infection of individuals with different hospitalisation statuses, we quantified the proportion of the variability in the phenotypes explained by E. faecalis genetics. We calculated the narrow-sense heritability (h 2 ) based on the kinship matrix generated using unitig sequences [44]. After adjusting for the geographical origin of the isolates, we found a heritability of h 2 =0.24 (95% CI: 0.10 to 0.39) and h 2 =0.34 (95% CI: 0.16 to 0.52) for infection of hospitalised and non-hospitalised individuals and extraintestinal infection, respectively. Next, we calculated the heritability for infection of individuals and extraintestinal infection using only the Spanish cohort, which had an even number of isolates from hospitalised and non-hospitalised individuals as well as from blood and faeces. We found consistent, but slightly higher, estimates of heritability for both infection of individuals (h 2 =0.28, 95% CI: 0.12 to 0.45) and extraintestinal infection (h 2 =0.43, 95% CI: 0.22 to 0.63) than estimated based on the combined dataset. We then re-estimated the heritability after adjusting for antibiotic resistance to different antibiotic classes without intrinsic resistance by collectively including antibiotic susceptibility data as covariates. We estimated heritability of h 2 =0.26 (95% CI: 0.13 to 0.40) for infection of individuals with varying hospitalisation statuses and h 2 =0.14 (95% CI: 0.041 to 0.23) for extraintestinal infection. These estimates represented a 7.14% (h 2 =0.28 to 0.26) and 67.44% (h 2 =0.43 to 0.14) decrease in heritability for infection of individuals and extraintestinal infection, respectively. Such effect of the use of antibiotics, mainly those administrated to treat or prevent Gram-negative infections which are negative for extended-spectrum beta-lactamases (ESBL) and carbapenemases or Gram-positive infections in cancer and other severely ill patients, select for resistant Gram-positive pathogens, such as E. faecalis, as also seen in COVID-19 patients [45,46]. These findings suggest that E. faecalis extraintestinal infection of hospitalised individuals is a moderately heritable trait and extraintestinal infection may be mostly explained by antibiotic resistance.

Infection of individuals with different hospitalisation statuses and extraintestinal infection of E. faecalis isolates vary across lineages
We sought to investigate the distribution of the hospitalisation and body isolation site phenotypes in the context of the E. faecalis population structure. We generated a maximum likelihood phylogenetic tree using 251,983 core genome single nucleotide polymorphisms (SNPs), exclusively containing non-ambiguous nucleotide and deletion characters, and annotated it with the hospitalisation status and body isolation site phenotypes. The isolates were widely distributed across different genetic backgrounds based on the country of origin as well as body isolation site and hospitalisation status, a finding consistent with the literature that the severity of E. faecalis infections is not restricted to specific lineages in contrast to the genetic separation between commensal and hospital-adapted lineages observed in E. faecium [47,48] (Figure 2). We then performed an in-depth analysis of the E. faecalis population structure using lineage definitions based on the PopPUNK genomic sequence clustering framework [49] by Pöntinen et al [43]. Our isolates clustered into 96 clades, which corresponded to 120 sequence types (ST) or clones defined by the E. faecalis multi-locus sequence typing scheme (MLST) [50] (Figure 2). There was no single dominant ST associated with isolates sampled from hospitalised patients and blood ( Figure 1D, E). As expected, the clusters defined by the MLST scheme were concordant with the PopPUNK clusters, although the latter were less granular than the former as they are defined based on genome-wide variation, therefore, are robust to subtle genomic variation ( Figure 1F). Therefore, as similarly observed with the STs, there was no dominant clade association with hospitalisation status and human body isolation site ( Figure 1G, H).
We then compared the relative frequency of individual STs and PopPUNK clades between isolates collected from hospitalised patients and non-hospitalised individuals. We found three clades more common in hospitalised patients than non-hospitalised individuals, namely clade 2 (adjusted P=1.01×10 -06 ), 6 (adjusted P=6.88×10 -07 ), and 7 (adjusted P=0.0464). In contrast, clade 4 was more common in non-hospitalised individuals than in hospitalised patients (adjusted P=2.24×10 -11 ) ( Figure 3A; Supplementary Table 1). Due to the correlation between the hospitalisation status of the individuals and the human body isolation site, we found similar patterns in the relative abundance of the clades between blood and faecal isolates ( Figure 3B; Supplementary Table 2). We found a higher abundance of ST6 (clade 2; adjusted P=1.65×10 -04 ) and ST28 (clade 6; adjusted P=1.32×10 -07 ), among hospitalised patients than in nonhospitalised individuals ( Table 2). Together these findings suggest that certain E. faecalis populations are most prevalent an impact of some E. faecalis genetic backgrounds extraintestinal infection in hospitalised individuals who likely acquired infections in the hospital setting. Such E. faecalis populations are the most prevalent, and therefore, likely to be selected and cause infection in the human host at a higher propensity than other lineages

Only a few virulence factors show variable prevalence in individuals with different hospitalisation statuses and isolation sites
As a host generalist species, E. faecalis exhibits high levels of recombination [50], which may facilitate the acquisition of genes promoting colonisation and virulence, driving the success of its clones [23]. We hypothesised that certain known virulence factors would be enriched among E. faecalis isolates from hospitalised patients, especially those with bloodstream infection compared to non-hospitalised individuals without bloodstream infection. We used a candidate gene approach to compare the enrichment of a catalogue of E. faecalis virulence factors obtained from the virulence factor database (VFDB) [51] between isolates from individuals with different hospitalisation statuses and body isolation sites. We found a single gene (ecbA) encoding a collagen-binding protein known to promote adherence to epithelial surfaces, first described in E. faecium [52], which was enriched in isolates from hospitalised patients compared to nonhospitalised individuals (adjusted P=1.26×10 -06 ). In contrast, three genes, namely, ecbA (adjusted Additionally, nine capsule biosynthesis genes (cpsC to cpsK) were more common among hospitalised than non-hospitalised individuals as well as isolates from the extraintestinal infection compared to intestinal colonisation ( Figure 3E, F; Supplementary Table 3). These findings are partly consistent with previous studies [22,23], although the present study investigated a larger catalogue of virulence factors. Therefore, we conclude that certain virulence factors are associated with individuals with different hospitalisation statuses, and possibly promote extraintestinal translocation of E. faecalis into the bloodstream in hospitalised individuals.

Distribution of antibiotic resistance among individuals with different hospitalisation statuses and body isolation tissues
Hospitalised patients are more exposed to antibiotics than non-hospitalised individuals in hospitals as more antibiotics are used in hospital settings than outside. Therefore, it is likely that E. faecalis isolates from hospitalised patients with intestinal colonisation and extraintestinal infection are more likely to have acquired resistance than isolates from non-hospitalised individuals. Because most patients were probably hospitalised because of other complaints and developed the E. faecalis infection during hospitalisation we hypothesised that E. faecalis isolates sampled from hospitalised individuals and extraintestinal infection would show higher frequency of antibiotic resistance traits than isolates from non-hospitalised individuals and intestinal colonisation. The rationale behind this hypothesis was that antibiotic-susceptible E. faecalis strains are more likely to be cleared from the gut following antibiotic use, leaving more space for the surviving antibiotic-resistant strains for extraintestinal infection and for subsequently causing severe disease ( Figure 1C, Supplementary Table 4), would be due to the surviving antibiotic-resistant strains. We investigated this hypothesis by comparing the abundance of antibiotic resistance genes for seven antibiotic classes, namely, glycopeptide (vancomycin), aminoglycosides, macrolides, tetracyclines, phenicols, and oxazolidinones (linezolid), in E. faecalis isolates from hospitalised and non-hospitalised individuals, and blood and faeces. Regressing the number of antibiotic classes susceptible to the hospitalisation status while adjusting for the country of origin showed resistance to more antibiotic classes among isolates from hospitalised than non-hospitalised individuals (effect size β=1.78, P<2×10 -16 ) ( Figure 4A, Supplementary Table 4). As expected, due to the correlation between hospitalisation status and body isolation site ( Figure 1C, Supplementary Table 4), we found a similar pattern for isolation site, i.e., isolates from the extraintestinal infection harbouring resistance traits to a higher number of antibiotic classes than isolates from intestinal colonisation (effect size β=1.64, P=7.18×10 -18 ) ( Figure 4B, Supplementary Table 4). Next, we compared the relative abundance of genotypically antibiotic-resistant isolates for each antibiotic class among E. faecalis isolates from hospitalised and non-hospitalised individuals. We found a higher relative abundance of genotypically-inferred antibiotic-resistant isolates in hospitalised than nonhospitalised individuals for aminoglycosides (adjusted P=8.53×10 -11 ), macrolides (adjusted P=8.48×10 -11 ), phenicols (adjusted P=0.00054), and tetracyclines (adjusted P=1.17×10 -06 ), and glycopeptides (adjusted P=1) and oxazolidinones (adjusted P=1), which had almost negligible resistance (adjusted P=8.53×10 -11 ) ( Figure 4C, Supplementary Table 4). Again, we observed similar patterns in blood and faecal isolates ( Figure 4D, Supplementary Table 4).

No evidence of population-wide effects of individual E. faecalis genetic changes on infection of individuals with different hospitalisation statuses and body isolation sites
Having demonstrated differences in the prevalence of virulence factors, likely driven by lineage or strains' genetic background effects, we next undertook a GWAS using linear mixed models to identify individual E. faecalis genetic changes with population-wide events on infection f individuals with varying hospitalisation status. We hypothesised that genetic variation in known and unknown virulence factors would be disproportionately distributed among E. faecalis isolates from hospitalised and non-hospitalised individuals. In total, we selected 99,355 SNP variants and 461,699 unitig sequences, which capture variation in both the core and accessory genome, present at a frequency of 5 to 95% of the isolates for the GWAS. Contrary to our hypothesis, we found no statistically significant differences in the distribution of SNPs and unitigs between isolates from hospitalised and non-hospitalised individuals independent of the strain genetic background in the GWAS using linear mixed models [53] (Figure 5A, B). Altogether, these findings demonstrated that the infection of individuals with varying hospitalisation status with E. faecalis is not driven by individual genetic changes independently of their genetic background, suggesting that all E. faecalis strains are intrinsically adapted for extraintestinal infection partly through translocation into the bloodstream.
We then carried out an additional GWAS to identify genetic changes associated with extraintestinal infection of the E. faecalis strains by comparing faecal and bloodstream isolates. Like the GWAS based on the hospitalisation status, we found no SNPs and unitigs associated with the human body isolation site independent of the strains' genetic background ( Figure 5C,  D). However, we found the strongest signal in a ~48.1Kb genomic region from positions ~1,390,000 to 1,450,000bp in the V583 E. faecalis genome [34]. Since horizontal gene transfer is a critical process in the mobilisation of pathogenicity-associated genes [31,54], we hypothesised that this region may represent a pathogenicity island. Re-annotation of the nucleotide sequence for this region revealed several phage-associated genes, which suggested the potential integration of a bacteriophage. We then performed phage prediction using the entire V583 E. faecalis genome sequence to annotate the SNPs and unitig sequences identified in the GWAS. We found a total of nine prophage sequences in the genome, including one with intact attL and attR attachment sites and integrase sequences located at genomic positions 1,398,051 to 1,446,151bp. This prophage showed high genetic similarity to prophages including PHAGE_Entero_phiFL3A_NC_013648, PHAGE_Lister_B054_NC_009813 (27), and PHAGE_Lactob_LBR48_NC_027990. Furthermore, most of the phage-associated genes and protein sequences showed high genetic similarity to those found on prophages associated with several bacterial genera, including Enterococcus, Lactobacillus, Bacillus, Listeria, and Staphylococcus. These findings highlighted a potential virulence locus that should be prioritised for further investigation to understand its role in E. faecalis pathogenicity.

Discussion
Tremendous advances in sequencing technology and analytical approaches occurred over the past two decades since the sequencing of the first enterococcal genome -E. faecalis strain V583 [34]. We have witnessed a remarkable shift in microbiology from focusing on the biology of a single strain to the population, due to the unavailability of sequencing large microbial datasets. However, despite the increasing availability of population-level E. faecalis genomic datasets, no systematic studies have investigated the population-wide effects of individual genetic changes on infection of individuals with varying hospitalisation status and extraintestinal infection, and the overall contribution of E. faecalis genetics to these phenotypes [5]. Such studies could reveal critical pathways for E. faecalis virulence, including survival in the bloodstream through evasion of innate host immune defences, and inform the development of therapeutics [12]. Here, we address this knowledge gap by investigating the effect of known and novel virulence factors, lineages, and the entire repertoire of E. faecalis genomic changes in a large collection of human faecal isolates, representing a snapshot of the E. faecalis diversity in the gut; and isolates sampled from blood specimens of individuals with different hospitalisation status. Our findings demonstrate that the abundance of certain virulence and antibiotic resistance determinants is higher in E. faecalis isolates associated with severe disease and extraintestinal infection, largely driven by the effects of the strains, lineages, or genetic background effects, but not populationwide effects of individual genetic changes. This is consistent with observations in the hospitals that any strains of any lineage mostly cause infection in patients with underlying diseases or bad body conditions. E. faecalis is a versatile pathogen that survives in a wide range of challenging niches, including the human gut, blood, and the environment, such as in clinical settings. Such adaptation and survival of E. faecalis in these diverse environments are modulated by several mechanisms, including antimicrobial resistance [55], intracellular survival [56][57][58][59], and biofilm formation [27]. Although several virulence factors of E. faecalis have been described [24][25][26][27], how (and if this happens) these factors contribute to infection of individuals with varying hospitalisation status and extraintestinal infection, especially through gut-to-bloodstream translocation, remains poorly understood. Previous genetic studies shed light on how the distribution of virulence factors shapes the adaptation of E. faecalis clones to different environments despite the limitation of small sample sizes [39,41]. In this study, we demonstrate enrichment of known virulence genes in isolates associated with different hospitalisation status using a larger collection of isolates. These include genes encoding for aggregation substance adherence factors (EF0485 and EF0149) [32]; lantipeptide cytolysin subunits CylL-L and CylL-S (cylL-l and cylL-s), cytolysin subunit modifier (cylM), and cytolysin regulator R2 (cylR2) exotoxins [60], and polysaccharide capsule biosynthesis genes (cpsC to cpsK) involved in immune modulation or anti-phagocytosis [61]. These findings suggested that the variable abundance of these virulence genes in hospitalised and non-hospitalised individuals could influence E. faecalis pathogenicity possibly because they primarily contribute to intestinal colonisation, survival and fitness or competitiveness in different intestinal compartments in the dysbiotic gut of hospitalised patients. Once the strains harbouring these genes are established in higher numbers in the gastrointestinal tract, this promotes transmission, which in turn promotes the evolution and fixation of these virulence genes in the population. Interestingly, the observed higher antibiotic resistance, especially aminoglycosides, in isolates from blood and hospitalised individuals than in faeces and non-hospitalised individuals suggests that antibiotic-resistant E. faecalis strains are more likely to survive and overgrow after the use of these antibiotics, consistent with findings reported elsewhere [14][15][16]62,63]. Conversely, while the distribution of the virulence factors and clades or STs were observed, the observation from the GWAS of E. faecalis pathogenicity, after adjusting for the genetic background of the isolates, implied that no individual genetic changes influence the severity of diseases at the population level. These findings are consistent with the notion that genetic traits influencing virulence are less likely to be selected compared to those promoting colonisation as similarly seen in other pathogens [64]. Altogether, these findings suggest that the distribution of the E. faecalis virulence factors may largely depend on the genetic background, implying that the lineage effects on pathogenicity may be more pronounced than the population-wide effects of individual genetic changes. Alternatively, there may be a predominance of certain lineages in the elderly, as seen with other opportunistic pathogens [65], whose risk factors for infection, including hospital exposure history, antibiotic treatment, and other underlying conditions, make them favourable for the selection of E. faecalis strains enriched in antibiotic resistance genes and other adaptive traits.
Likewise, the distribution of known E. faecalis virulence factors by isolation site mirrored the patterns observed for infection of individuals with varying hospitalisation status due to the correlation between these phenotypes. These findings suggested that no individual genetic changes are overrepresented in blood and gut niches independent of the genetic background, which implied that while individual genetic changes may have an impact on extraintestinal infection, their effect at the population level is likely minimal. However, some genetic changes could be linked to specific lineages making disentangling their effects from the genetic background a challenge. However, the absence of genetic changes statistically associated with the body isolation site, after adjusting for the population structure, suggests that these variants are not under positive selection likely because extraintestinal infection represents evolutionary dead-ends for E. faecalis [66]. Therefore, even if such genetic changes exist, they may be rare and likely exhibit small effect sizes making their detection challenging without analysing large datasets with thousands of genomes. We speculate that the observed strong, but non-statistically significant signals in a single prophage, integrated at chromosome coordinate 1,398,051 to 1,446,151bp in the V583 E. faecalis genome [34], could exemplify a potential locus with small population-wide effects on virulence. Indeed, prophages play a critical role in the pathogenicity of E. faecalis [67][68][69][70] and other bacterial pathogens, such as Staphylococcus aureus [37,71]. Therefore, further studies using even larger genomic datasets than in the present study and adjusting for other important covariates, such as prior antibiotic usage and immune status, are required to fully investigate the impact of the identified E. faecalis prophage in modulating extraintestinal infection. Crucially, such studies should prospectively collect samples to minimise confounding due to cohort and temporal variability between the cases and controls for a robust GWAS. Furthermore, definitive E. faecalis genetic signals for extraintestinal infection may be identified by comparing isolates obtained from the blood of patients and faeces from individuals with confirmed negative blood culture as controls. Inclusion of E. faecalis strains from community-acquired infections could also overcome the confounding effects due to factors related to hospitalisation, such as E. faecalis from individuals with community-acquired bacteraemia who are at a higher risk of developing infective endocarditis [72]. Altogether, our findings demonstrate that no individual E. faecalis genetic changes exhibit population-wide statistical association with extraintestinal infection implying that all E. faecalis strains are capable of translocating into the bloodstream and causing severe diseases, consistent with their known opportunistic pathogenic lifestyle. Although E. faecalis genetic changes that are important for survival in the blood may exist, these would not be fixed in the population, especially if they have no impact on colonisation, as individual strains would have to accidentally "re-discover" them repeatedly. Therefore, vaccination strategies targeting all rather than specific genetic backgrounds would lead to increased protection from severe E. faecalis diseases.
The estimated heritability of ~24% for infection of individuals with different hospitalisation status and ~34% for extraintestinal infection suggests that the contribution of E. faecalis genetics to these phenotypes is not negligible but relatively modest compared to that observed for other phenotypes, such as antimicrobial resistance [73]. Our findings are consistent with findings from a recent bacterial GWAS of pathogenicity in Streptococcus pneumoniae [74] and Group B Streptococcus (Streptococcus agalactiae) [75]. However, other studies have found negligible heritability for pathogenicity in Neisseria meningitidis [64], which supports our findings and suggests that the evolution of the pathogenicity trait is neutral. Interestingly, the observation that antibiotic resistance explained ~70% of the heritability in E. faecalis extraintestinal infection, but surprisingly only ~10% of the heritability in infection of individuals with different hospitalisation statuses, consistent with the prevailing hypothesis that antibiotic resistance plays a major role in bloodstream invasion [14][15][16]62,63]. Indeed, broad-spectrum antibiotic use disrupts the stable gut microbial community by removing typically antibiotic-susceptible competitor species leading to the overgrowth and dissemination of E. faecalis into the bloodstream [62,63]. However, follow-up studies of E. faecalis isolates sampled from faeces of healthy individuals and bloodstream of patients, adjusting for other important variables, such as prior antibiotic use, are required to determine specific genetic changes modulating pathogenicity and virulence and account for potential missing heritability. Overall, these findings suggest that the effect of the host and gut environmental factors, such as microbiota perturbations due to antibiotic use, likely outweigh the population-wide impact of individual E. faecalis genetic changes in modulating its virulence and pathogenicity [76].
Our findings derived from a geographically and temporary diverse whole-genome dataset of E. faecalis isolates demonstrate that the severity of E. faecalis infections is not primarily driven by specific population-wide effects of individual genetic changes, potentially enhancing extraintestinal infection, further illustrating the opportunistic pathogenic lifestyle of this bacteria and that pathogenicity; infection of individuals with different hospitalisation status and extraintestinal infection could be an accidental consequence of gut colonisation dynamics as seen in other gut commensals [66]. Ultimately, the commensal-to-pathogen switch and virulence of E. faecalis may be predominantly modulated by multiple genetic variants, i.e., polygenic, genetic background or lineages, epigenetic mechanisms, host factors and the gut milieu, including the ecological side-effects of broad-spectrum antibiotics on the gastrointestinal microbiota.

Sample characteristics and microbiological processing
For this study, we selected a total of 750 human E. faecalis isolates from a collection of wholegenome sequences from isolates collected from several European countries described by Pöntinen et al [43]. We included isolates from countries where both faecal and blood specimens were collected, namely The Netherlands (n=300), Spain (n=436), and Tunisia (n=14). The isolates, representing collections from University Medical Center Utrecht (UMCU), Utrecht, The Netherlands (n=300); European Network for Antibiotic Resistance and Epidemiology at the University Medical Center Utrecht (ENARE-UMC), Utrecht, The Netherlands (n=6); Hospital Ramòn y Cajal (HYRC), Madrid, Spain (n=375); University of Porto, Porto, Portugal (n=14); and Spain (n=55). By human body isolation site, 298 isolates were sampled from faeces while 452 were from blood. Of these, 499 were collected from hospitalised patients while 251 were from non-hospitalised individuals. The isolates were collected over a twenty-one-year period (1996 to 2016); therefore, our dataset was both geographically and temporally diverse. We did not use clinical metadata related to the patients and all isolate identifiers were de-identified, therefore, additional institutional review board approval was not required.
Genome sequencing, molecular typing, assembly, and annotation Short-read sequencing was done at the Wellcome Sanger Institute using Illumina HiSeq X paired-end sequencing platform. As part of our quality control procedures, we used Kraken (version 0.10.66) [77] to check potential species contamination. We assembled sequence reads that passed the quality control using Velvet de novo assembler (version 1.2.10) [78] and annotated the resultant draft assemblies using Prokka (version 1.14.6) [79]. To generate multiple sequence alignments for the whole genome sequences, we mapped the reads against the V583 E. faecalis reference genome [34] using the Snippy (version 4.6.0) haploid variant calling and core genome pipeline (https://github.com/tseemann/snippy). We performed in silico genome-based typing of the isolates using multi-locus sequence typing (MLST), using sequence type (ST) or clone definitions in the MLST database (https://pubmlst.org/efaecalis) [50,80], implemented in SRST2 [81].

Phylogenetic reconstruction and population structure analysis
To generate a phylogeny of the E. faecalis isolates, we first identified genomic positions containing single nucleotide polymorphisms (SNPs) using SNP-sites (version 2.3.2) [82]. Next, we used the SNPs to construct a maximum-likelihood phylogenetic tree using IQ-TREE (version 2.1.2) [83]. We selected the general time reversible (GTR) and Gamma substitution models. We processed and rooted the generated phylogeny at the midpoint of the longest branch using the APE package (version 4.3) [84] and phytools (version 0.7.70) [85]. We annotated and visualised the rooted phylogeny using the "gridplot" and "phylo4d" functions implemented in phylosignal (version 1.3) [86] and phylobase (version 0.8.6) packages (https://cran.rproject.org/package=phylobase), respectively.

Antibiotic resistance and virulence gene profiles
We identified genotypic antibiotic resistance for seven major antibiotic classes, namely, glycopeptide (vancomycin), aminoglycosides, macrolides, tetracyclines, phenicols, and oxazolidinones as described by Pöntinen et al [43]. We screened the sequencing reads for the presence and absence of antibiotic resistance genes using ARIBA (version 2.14.4) [87] using ResFinder 3.2 database [88]. We included additional genes conferring resistance to vancomycin, namely, vanA (European Nucleotide Archive [ENA]: accession: AAA65956.1), vanB (ENA accession: AAO82021.1), vanC (ENA accession number: AAA24786.1), vanD (ENA accession: AAD42184.1), vanE (ENA accession: AAL27442.1), and vanG (ENA accession: NG_048369.1), and linezolid, namely, cfrD (ENA accession: PHLC01000011). We compared the abundance of antibiotic resistance genes per isolate using a generalised linear regression model with a Poisson log link function with pathogenicity or hospitalisation status and country of origin as covariates, the latter to adjust for geographical differences. We used the test of equal proportions to compare the relative abundance of genotypic antibiotic resistance for each antibiotic class among hospitalised and non-hospitalised individuals, and blood and faeces.
We also assessed the presence and absence of E. faecalis virulence genes obtained from the virulence factor database (VFDB) [51]. These included genes encoding protein involved in adherence to the epithelial surfaces (ace, ebpA, ebpB, ebpC, ecbA, EF0149, EF0485, efaA, and srtC), exoenzymes (EF0818, EF3023, gelE, and sprE), biofilm formation (bopD, fsrA, fsrB, and fsrC), immune modulation or anti-phagocytosis (cpsA-K), and exotoxins (cylL-l, cylL-s, cylM, and cylR2), between isolates from hospitalised and non-hospitalised individuals, and those associated with intestinal colonisation and extraintestinal infection. We used BLASTN (version 2.9.0+) [89] to determine the presence and absence of the virulence genes. To avoid incorrectly missing genes potentially split between multiple contigs during de novo genome assembly, we considered all the highest scoring pairs with a minimum length of 100bp using BioPython [90]. We used the test of equal proportions to compare the relative abundance of genotypic antibiotic resistance for each antibiotic class among hospitalised and non-hospitalised individuals, and blood and faeces.

Genome-wide association study
To generate the input SNP data for the GWAS, we used VCFtools (version 0.1.16) [91] to convert bi-allelic SNPs into the pedigree file accepted by PLINK software [92]. We filtered out genomic positions with SNPs with minor allele frequency <5% or missing variant calls in >10% of the isolates using PLINK (version 1.90b4) [92]. Next, we identified unitig sequences, variable length k-mer sequences generated from non-branching paths in a compacted De Bruijn graph. First, we build a De Bruijn graph using assemblies of all the isolates based on 31bp k-mer sequences using Bifrost (version 1.0.1) [93]. We then queried the generated De Bruijn graph using the query option in Bifrost to generate the presence and absence patterns of each identified unitig in the assemblies of each isolate. We then combined the presence and absence patterns of all the isolates into a single file and then merged them with the phenotype data (isolation source or hospitalisation status) to generate PLINK-formatted pedigree files which were used for the downstream GWAS analysis. We used the same threshold for variant frequency to filter out rare unitigs before the GWAS.
We undertook GWAS analyses using SNPs and unitigs to identify genetic variants associated with pathogenicity (hospitalisation) and extraintestinal infection of E. faecalis. We used FaST-LMM (FastLmmC, version 2.07.20140723) [53], which uses a linear mixed model for the GWAS. For both methods, we specified a kinship matrix based on the unitig presence and absence data as a random covariate to adjust for the clonal population structure of the isolates, which is a major confounder in bacterial GWAS analyses [35]. Since the GWAS tools used in this study were originally developed to mostly handle human diploid DNA data, we coded the variants as human mitochondrial DNA (which is haploid) by specifying the chromosome number as 26 [94,95]. To control the false discovery rate, we used the Bonferroni correction method to adjust the statistical significance (P-values) inferred by each GWAS method based on the likelihood ratio test. We specified the genome length of the E. faecalis V583 reference genome (3,218,031bp) as the maximum possible number of genomic variants possible, assuming variants can independently occur at each genomic position. Since this assumption may not necessarily be true, our approach is likely to be more conservative than the Bonferroni correction based on the number of tested variants; therefore, it may minimise false positives but may slightly increase false negatives. The advantage of our approach is that by using the same number of possible variants based on the genome length, a consistent P-value threshold can be used to adjust different types of genetic variation, i.e., SNPs, accessory genes, k-mers, and unitigs, to simplify interpretation and comparison of statistical significance across different studies.
We visualised the GWAS results using Manhattan plots generated using standard plotting functions in R (version 4.0.3) (https://www.R-project.org/). Specific genomic features associated with each SNP and unitig were analysed further by comparing the genomic sequences to the V583 E. faecalis reference genome [34] using BLASTN (version 2.5.0+) [96] and BioPython (version 1.78) [90]. To identify potential issues arising due to the population structure, we generated Q-Q plots to compare the observed and expected statistical significance using qqman (version 0.1.7) [97]. We calculated the overall proportion of the variance of the phenotype explained by E. faecalis genetics, i.e., narrow-sense heritability, using GCTA (version 1.93.2) [44].

Data availability statement
Sequence data used in this study have been deposited at the ENA with accession codes "PRJEB28327" and "PRJEB40976". Specific accession codes for each isolate are provided in Supplementary Data 1.