The pathobiology of Mycobacterium abscessus revealed through phenogenomic analysis

The medical and scientific response to emerging pathogens is often severely hampered by ignorance of the genetic determinants of virulence, drug resistance, and clinical outcomes that could be used to identify therapeutic drug targets and forecast patient trajectories 1–5. Taking the newly emergent multidrug-resistant bacteria Mycobacterium abscessus as an example 6, we show that combining high dimensional phenotyping with whole genome sequencing in a phenogenomic analysis can rapidly reveal actionable systems-level insights into bacterial pathobiology. Using in vitro and in vivo phenotyping, we discovered three distinct clusters of isolates, each associated with a different clinical outcome. We combined genome-wide association studies (GWAS) with proteome-wide computational structural modelling 7 to define likely causal variants, and employed direct coupling analysis (DCA) 8 to identify co-evolving, and therefore potentially epistatic, gene networks. We then used in vivo CRISPR-based silencing to validate our findings, defining a novel secretion system controlling virulence in M. abscessus, and illustrating how phenogenomics can reveal critical pathways within emerging pathogenic bacteria.


INTRODUCTION
Over the last two decades, M. abscessus, a rapid growing species of nontuberculous mycobacteria (NTM), has emerged as a major threat to individuals with Cystic Fibrosis (CF) and other chronic lung disease 9 . Rates of infection of CF patients have increased around the World 9-11 , in part due to hospital-based person-to-person transmission 12,13 and the emergence of globally-spread dominant circulating clones that are associated with increased virulence and worse clinical outcomes 14 . Infections with M. abscessus are challenging and sometimes impossible to treat 9,15,16 , lead to accelerated inflammatory lung damage 17,18 , and may prevent safe transplantation 19 .
To date, very little is known about how M. abscessus infects humans 6 , how it causes inflammatory lung damage, and how it resists antibiotics 6 . There is thus an urgent need to better understand the pathophysiology of M. abscessus, to define optimal drug targets, and to predict the virulence and antibiotic susceptibility of clinical isolates.
We therefore sought to combine detailed in vitro and in vivo phenotyping, whole genome sequencing, computational structural modelling, and epistatic analysis to provide a phenogenomic map of M. abscessus that might define critical pathways involved in virulence and drug resistance.

RESULTS
We first characterised 331 clinical M. abscessus isolates across 58 phenotypic dimensions exploring five key pathogenic traits: planktonic growth in different carbon sources; antibiotic resistance (at early and late time points) against a selection of drugs recommended by clinical treatment guidelines 9 ; in vitro infection of a human macrophage cell line model (differentiated THP1 cells), monitored using high content confocal microscopy; in vivo infection of Drosophila melanogaster, measuring host survival and inflammatory responses; and clinical outcomes following infection, available through previously collected metadata 14 (Figure 1a,   Supplementary Figure 1).
We examined the relationship between phenotypes, finding correlations within, and sometimes between, pathogenic traits (Figure 1b, Supplementary Figure 2). To explore whether there were distinct patterns of bacterial behaviours, we used experimentally-derived data to plot individual isolates in phenotypic space, identifying three discrete groups, each associated with different clinical outcomes (Figure 1c,d, Supplementary Figure 3), representing distinct heritable traits (Supplementary Figure 4). Isolates in Groups 1, 2, and 3 demonstrated progressively faster growth in liquid culture and within macrophages, higher rates of macrophage and Drosophila death, and greater inflammatory responses. While Group 2 isolates were associated with the most favourable clinical outcome, potentially related to their macrolide susceptibility (a key determinant of treatment response 20,21 ), we found that Group 1 and Group 3 isolates, although similarly macrolide resistant, had very different clinical outcomes, highlighting the importance of other phenotypic characteristics in determining prognosis, and suggesting that more virulent (and thus immunogenic) isolates might be cleared more easily by patients (as previously reported for other pathogenic bacteria [22][23][24][25] ).
To understand the genetic basis for these important variations in M. abscessus behaviour, we used whole genome sequence data to perform a genome-wide association study (GWAS) for each phenotype (Figure 2a), evaluating approximately 270,000 genetic variants comprising single nucleotide polymorphisms (SNPs), insertions, and deletions. We used mixed models corrected for population structure 26 to identify locus effects, as well as uncorrected linear models to ensure we captured lineage effects 27 . In total, we identified 1926 hits (involving 1000 genes) across 46 phenotypes (Supplementary Table 1 Current GWAS approaches are limited in their ability to accurately identify causal variants by both the presence of linkage disequilibrium, which in the case of M. abscessus (as with other bacteria 28,29 ) is extensive and genome-wide (Figure 2a, Supplementary Figure 6), and by a failure to consider the impact of mutations on protein function 30,31 .
We therefore applied proteome-wide computational structural modelling to evaluate the likely functional impact of all nonsynonymous SNPs across the genome, by applying our graphbased machine learning method mCSM 7 to our comprehensive M. abscessus structural database Mabellini 32 (Figure 2b), in order to identify likely causal mutations.
As an example, the GWAS for intracellular replication of M. abscessus within macrophages identified a number of hits at genome-wide significance including a cluster of variants within the mycobactin operon (Figure 2c), containing genes involved in iron scavenging 33,34 .
Structural modelling predicted that one variant, a missense mutation (Ile256Thr) in the mycobactin polyketide synthetase (mbtD) gene, was most likely to result in loss of protein function and therefore be causally related to the phenotypic change, probably through altering the ability of intracellular M. abscessus to access iron.
To understand whether mutations across the genome might have co-evolved, indicating potential epistatic interactions between genes, we deployed correlation-compressed direct coupling analysis (ccDCA 8 ) on whole genome sequences from 2366 clinical isolates of M.
abscessus to identify whether variant co-occurrence deviated from the expected frequencies based on linkage disequilibrium 35,36 , and thus indicate evolutionary co-selection. We evaluated 10 12 potential couplings (resulting from approximately 10 6 genetic variants), and identified 1,168,913 that were significantly enriched (accepting a false discovery rate of 10 -6 ; Finally, we sought to integrate outputs from our detailed multidimensional phenotyping, structure-guided GWAS analysis, and DCA-based epistatic mapping, to achieve a systemslevel understanding of the genetic basis for important pathological processes in M. abscessus. We focused on in vivo infection in Drosophila, a model that replicates many of the features of human mycobacterial infection (Figure 4a) [40][41][42][43] . Amongst the top hits from our structureguided GWAS analysis (Figure 4b, Supplementary Figure 8) were a deletion in a component of a putative Type II secretion system (MAB_0471) and a deleterious mutation in a nonribosomal peptide synthetase (MAB_3317c). Both variants had independently arisen as homoplastic mutations across the M. abscessus phylogenetic tree (Figure 4c), including within the ancestor of one of the dominant circulating clones (DCC2) of M. a. abscessus, responsible for several transmission networks amongst CF patients 13,14 . We found that isolates with deletions in the Type II secretion system were associated with prolonged survival of infected Drosophila and more persistent clinical infection of CF patients (Figure 4d).
We sought to experimentally validate both these GWAS hits through inducible CRISPR-based transcriptional silencing (CRISPRi) as previously described 44 . Although we found no effect of Our DCA analysis revealed that both these GWAS hits were part of a discrete network of probably epistatic genes involved in bacterial secretion, cell wall biosynthesis, metabolism, and transcriptional regulation (Figure 4f, Supplementary Figure 10). To experimentally test this prediction of epistasis, we selected another gene from the same network (MAB_0472) and transcriptionally silenced it during in vivo infection. We found that Drosophila survival was also increased by its CRISPRi knockdown (Figure 4g)