Achromobacter genetic adaptation in cystic fibrosis

Achromobacter is an emerging pathogen in patients with cystic fibrosis (CF) and Achromobacter caused infections are associated with more severe disease outcomes and high intrinsic antibiotic resistance. While conventional CF pathogens are studied extensively, little is known about the genetic determinants leading to antibiotic resistance and the genetic adaptation in Achromobacter infections. Here, we analyzed 101 Achromobacter genomes from 51 patients with CF isolated during the course of up to 20 years of infection to identify within-host adaptation, mutational signatures, and genetic variation associated with increased antibiotic resistance. We found that the same regulatory and inorganic ion transport genes were frequently mutated in persisting clone types within and between Achromobacter species indicating convergent genetic adaptation. Genome-wide association study (GWAS) of six antibiotic resistance phenotypes revealed the enrichment of associated genes involved in inorganic ion transport genes, transcription gene enrichment in β-lactams, and energy production and translation gene enrichment in the trimethoprim/sulfonamide group. Overall, we provide insights into the pathogenomics of Achromobacter infections in patients with CF airways. Since emerging pathogens are increasingly recognised as an important healthcare issue, our findings on evolution of antibiotic resistance and genetic adaptation can facilitate better understanding of disease progression and how mutational changes have implications for patients with CF.


Introduction
Achromobacter species is an emerging pathogen causing chronic bacterial infections in patients with CF and is a major cause of morbidity and mortality in these patients. (1,2) Analysis of pathogen genomes, i.e. pathogenomics, have shown that within-host pathogen genetic adaptation plays a role in these infections. (3,4) While conventional CF pathogens, e.g. Pseudomonas aeruginosa and Staphylococcus aureus, are studied extensively, little is known about the extent within-and between-patient genetic adaptation has in Achromobacter infections, particularly A. ruhlandii, A. insuavis and A. xylosoxidans as they mainly cause chronic, long-term infections in patients with CF. (5-7) Furthermore, genetic features could be used for successful resistance profile predictions as conventional methods are both time consuming and occasionally do not reflect the in vivo susceptibility profiles. (8)(9)(10) Knowledge of within-host Achromobacter adaptation and genetic factors leading to antibiotic resistance development are key for urgently needed new treatment strategy development and pathogen elimination.
Here, we analyzed 101 previously whole genome sequenced (WGS) Achromobacter isolates from 51 patients to investigate the genetic relatedness and within-host genetic changes of Achromobacter over the course of up to 20 years of infections. First, we aimed to identify the main gene content differences between-and within-Achromobacter isolates. Second, we aimed to identify pace and patterns in genetic changes, and compare it to the other pathogenic bacteria in CF. Finally, we attempted to define the most significant associations between Achromobacter genetic features and antibiotic resistance phenotypes. Ultimately, this work on the main Achromobacter genomic changes acquired during infections in patients with CF, leads to the possibility of genomic-based disease progression prediction, and improved strategies to track and treat persistent airway infections.

Bacterial isolates
Our analysis included 101 clinical isolates of Achromobacter that were defined in detail previously by . (11) The isolates were sampled from 51 patients with CF attending the Copenhagen Cystic Fibrosis Center at Rigshospitalet, Copenhagen, Denmark. Over the timespan of 0-20 years (median 6.5 years), 64 isolates from 25 patients were longitudinally collected (median 2 isolates) and 37 isolates from 29 patients were single isolates. 29, 18 and 52 isolates belonged to A. ruhlandii, A. insuavis and A. xylosoxidans, respectively. Furthermore, single isolates belonging to A. aegrifaciens and a new genogroup were sequenced.

Bacterial genome sequencing and definition of clone type
Genomic DNA was prepared from Achromobacter isolates using a DNeasy Blood and Tissue kit (Qiagen) and sequenced on an Illumina MiSeq platform, generating 250 base paired-end reads. On average, 1,124,551 reads (range of 350,677-2,118,817) for each of the genomic libraries were obtained. Clone types were defined by Pactyper (12) using the default parameters and species' core genome defined by GenAPI. (13) Lineage was defined as all isolates belonging to the same species and the same clone type.

Bacterial genome assembly
Sequence reads from each isolate were corrected and assembled by SPAdes 3.10.1 (14) using default parameters and k-mer sizes ranging from 21 to 127. Assembled contigs were joined to 216 scaffolds on average (92-506).

Aggregated pan-genome generation, characterization and visualization
Aggregated pan-genome was created by clustering all pan-genomes from longitudinally collected lineages and de novo assemblies from single-isolate lineages with GenAPI. (13) Every gene in the aggregated pan-genome was then aligned back to the individual pangenomes/de novo assemblies to determine if the gene is (1) non-present in the lineage, (2) present and variable within the lineage or (3)

Substitution rate estimation
The nucleotide substitution rate (21) estimation was performed for each lineage containing 3 or more isolates sampled at different timepoints (10 lineages in total) by using BEAST 2.6.1.
(22) Sequence alignments from BacDist were used as input with the following parameters: (1) sequences were annotated with the sampling date ("dated tips"), (2) HKY substitution model with strict clock parameters, (3) gamma prior for clock rate, (4) prior for population size: 1/X, (5) tree prior: coalescent constant population. MCMC was run for 50,000,000 iterations. Convergence was checked by inspecting an effective sample size and parameter value traces in the Tracer 1.7.1 software. (23) Multiple tests for each sample were performed to ensure reproducibility and convergence. The obtained clock rate (per site per year) was multiplied by the alignment size to obtain a substitution rate per genome per year.

Virulence and antibiotic resistance gene identification
Orthologs of resistance and virulence genes in 61 Achrmobacter lineages were identified with Abricate (24) using VFDB (containing 2,597 genes; retrieved: 2020.09.21) (25) for virulence genes and Resfinder 4 (containing 3,122 genes; retrieved: 2020.09.21) (26) for resistance genes. Gene ortholog was considered present in the corresponding database if the alignment made up minimum 50% of the gene length and its identity was minimum 75%.

Frequently mutated gene definition
Most frequently mutated genes were defined as the top 1% of all mutated genes for the species. If there were more genes mutated with the same frequency as the 1% most frequently mutated genes, these genes were also included in the final analysis. The identified most mutated genes were annotated by EGGNOG-mapper 2.0.1 (27) using DIAMOND and EGGNOG's bacterial database.
P. aeruginosa orthologs were identified by performing clustering with CD-HIT (28) using word size of 3 and 50% identity thresholds. Joint Achromobacter most frequently mutated genes were identified by clustering with CD-HIT (28) with word size of 3 and 80% identity threshold. different antibiotic resistance phenotypes and de novo assembled scaffolds of 92 isolates for which the antibiotic susceptibility profiles were available (Gabrielaite et al., 2020 (11) for detailed information on isolate susceptibility). Core genome SNP-based phylogenetic tree was used to correct for population structure while all available annotations of Achromobacter genes from UniProt database (30) were used for unitig annotation (271,851 genes; retrieved: 2020.04. 19). Ten most significant unitigs for each antibiotic test were used for further analysis as the tool authors advise against using a p-value threshold when testing several phenotypes. (29) Achromobacter gene annotations were identified by clustering significant unitig gene sequences with CD-HIT (28) with word size of 3 and 80% identity threshold.

Genome-wide association study with antibiotic resistance phenotypes
Enrichment of COG annotated genes was estimated by comparing the fraction of the associated genes with the fraction in the Achromobacter reference genomes for the 5 most frequently associated gene groups where more than one gene was present in a group.

Achromobacter dataset and incorrect annotations of public genomes
The genomes of 101 Achromobacter isolates from the airways of 51 patients with CF attending the Copenhagen Cystic Fibrosis Center at Rigshospitalet were sequenced to follow the within-host evolution and genetic adaptation of the lineages over the initial 0-20 years of infection ( Figure 1). All isolates were previously defined as belonging to 5 different species: A. ruhlandii (N=29), A insuavis (N=18), A. xylosoxidans (N=52), A. aegrifaciens (N=1) and a new genogroup (N=1). The latter two species were excluded from further analysis as the species contained only a single isolate. The remaining 99 Achromobacter genomes were grouped to 61 lineages which were defined as all isolates from one patient belonging to the same species and clone type.
We first performed phylogenetic analysis for all Achromobacter genomes available in the RefSeq database (141 samples, Table S1) together with our clinical Achromobacter genomes.
Our sequenced genomes were widely distributed across the genetic variability observed  Table S1).

Aggregated pan-genome
Aggregated pan-genome was constructed from pan-genomes of each of the 61 lineages (35 of which were single-isolate lineages). This approach allowed us to account for the nature of the dataset where multiple clonal isolates from the same patient were available. The aggregated Achromobacter pan-genome consisted of 21,198 genes: 2,887 core genes, 18,311 accessory genes of which 6,917 genes were unique to a single lineage.
The aggregated pan-genome ( Figure 2B) contained Achromobacter species-specific genes (649 for A. ruhlandii, 648 for A. insuavis, and 494 for A. xylosoxidans) present in all isolates of the respective species but not in the isolates from other species. Pan-genomes for each Achromobacter species were defined by using all bacterial isolates available  for the species. The size of the species' pan-genomes contained 7,070-14,833 genes, of which 4,225-5,130 were core genes, 1,940-10,608 accessory genes and 976-3,162 isolate-unique genes ( Table 1).  The aggregated pan-genome of 61 Achromobacter lineages containing gene presence, absence, and gene variability (i.e., gene is present in some isolates while absent in other isolates within the lineage) information.

Genetic adaptation: Mutations of the same genes across lineages
To explore within-host genetic adaptation in Achromobacter, we first identified the genes which were most frequently mutated within each species. Genes were defined as frequently mutated if they were among the 1% most commonly mutated genes within species. If more than 1% of the genes were mutated with the same frequency, those genes were also included in the analysis. A total of 27, 16, and 28 genes were identified as most frequently mutated for A. ruhlandii, A. insuavis, and A. xylosoxidans, respectively ( Figure 3B). The clusters of orthologous groups (COG) functional annotations were performed for all species ( Figure 4A; Table S2 for detailed information) with the highest mutation frequency in genes coding for signal transduction (COG T); inorganic ion transport and metabolism (COG P); replication, recombination and repair (COG L); intracellular trafficking, secretion and vesicular transport (COG U); and transcription (COG K).
Although knowledge is lacking about many Achromobacter gene functions, bacterial sequence similarity analysis allowed us to identify possible antibiotic resistance and virulence-related genes among the most frequently mutated genes in Achromobacter. After manual literature search 10, 4 and 4 genes were defined as related to antibiotic resistance in A. ruhlandii, A. insuavis and A. xylosoxidans, respectively, whereas 8, 2 and 7 genes were defined as virulence-related genes (Table S2).
Seven genes were defined as candidate pathoadaptive genes in several Achromobacter species. One gene was frequently mutated in all three species (10 out of 26 lineages) and six genes were observed as most frequently mutated in two of the Achromobacter species ( Figure   3C; Table 2). Six lineages did not acquire a single mutation in any of the seven pathoadaptive genes; however, 13 lineages acquired mutations in three or more genes ( Figure 3C). Lineages which acquired mutations in three or more pathoadaptive genes acquired significantly less non-synonymous mutations than lineages with less mutated pathoadaptive genes (p-value=4.99·10 -2 ; Fisher's exact test). Furthermore, lineages only acquired non-synonymous mutations in the seven candidate pathoadaptive genes (Table S3).  The ratio of non-synonymous to synonymous substitutions (dN/dS) was significantly different between the 1% most frequently mutated genes and non-frequently mutated genes in Achromobacter (dN/dS 99% =1.21 vs dN/dS 1% =2.10; Fisher's exact test; p=3.4·10 -4 , respectively).

Finally, we investigated gene loss and acquisition patterns in 26 longitudinally collected
Achromobacter lineages. We observed genes to be 2 times more often lost than acquired, and lost/acquired in groups rather than individually; however, no convergent evolution patterns in Achromobacter gene loss/acquisition were identified (detailed analysis in Text S1).

Genome-wide association between Achromobacter genotypes and antibiotic resistance
To test for associations between bacterial genetics and antibiotic resistance phenotypes we performed unitig-based DBGWAS analysis. Out of 21 antibiotics where the resistance was phenotypically tested, only 10 antibiotics had both susceptible and resistant isolates from all three Achromobacter species. Accordingly, we performed the association analysis for these 10 antibiotics (see Materials and Methods for detailed information). Unitigs passed a 5% FDR corrected q-value threshold only for CHL, IPM, MEM, TZP, SMZ and SXT. Ten most significant unitigs were used for the six remainder association test analysis, resulting in 60 genes (50 unique genes) significantly associated with antibiotic resistance phenotypes ( Figure   4B; Table S5). The most abundant group of associated unitigs belonged to inorganic ion transport (N=13; 2.4x enriched; COG P) genes. The other four most abundantly associated genes belonged to transcription (N=8; COG K); energy production and conversion (N=7; COG C); translation, ribosomal structure and biogenesis (N=6; COG J), and defense mechanism (N=5; COG V) groups. Furthermore, transcription genes were enriched in the β lactam (IPM, MEM and TZP) antibiotic group (N=7; 2.1x enriched) while translation, ribosomal structure and biogenesis genes were 5.4x enriched in trimethoprim/sulfonamides (SMZ and SXT) group (N=4). Defense mechanism genes were 9.2x and 9.0x enriched in β lactam and trimethoprim/sulfonamide groups, respectively while energy production and conversion genes were 2.9x enriched in a trimethoprim/sulfonamide group.
Of 9 antibiotic resistance genes from the ResFinder 4 database which were present in the aggregated pan-genome, none were associated with antibiotic resistance phenotypes in the GWAS analysis. Furthermore, core and accessory genes were equally associated with antibiotic resistance phenotypes (Table S5).

Discussion
Achromobacter is an emerging pathogen causing chronic respiratory tract airway infections in patients with CF; however, the genetic epidemiology of these infections is little understood. We sequenced and analyzed 101 genomes of Achromobacter isolates from 51 patients with CF which is the largest longitudinally collected Achromobacter genome dataset available to-date. This allowed us to investigate the population genomics and within-host adaptation, including genome-wide association analysis with antibiotic resistance phenotypes.
Phylogenetic analysis of our dataset with 141 publicly available Achromobacter genomes from the RefSeq database revealed that our dataset well represented the genetic diversity of A. xylosoxidans and A. insuavis species. However, because of DES overrepresentation, A.
ruhlandii isolates did not reflect the species genetic diversity. Furthermore, ANI together with core-genome based phylogenetic analysis revealed that more than 10% of publicly available Achromobacter genomes are supposedly misannotated in the RefSeq database which we anticipated to unravel to ease future research on Achromobacter (Table S1).
From the aggregated pan-genome analysis we showed that core genome sizes were comparable between A. ruhlandii, A. insuavis and A. xylosoxidans which are similar to core genome size identified previously by Li et al. 2013 (35). Nevertheless, the number of accessory and unique genes in species' pan-genomes varied greatly as a result of the different number of independent genomes for each species. A. ruhlandii isolates mostly belonged to DES, which is spread through patient-to-patient transmission (11,31), therefore, have lower pan-genome plasticity than A. xylosoxidans or A. insuavis.
Unlike many bacterial species, substitution rates for Achromobacter are not known. ( SNPs/year per site) (41).
While it was previously suggested to use bla OXA genes for Achromobacter species typing (42)(43)(44), we showed that in some cases such strategy would not be sufficient for species identification as isolates can carry none of the bla OXA genes. Furthermore, we identified that A. insuavis can carry one out of several highly similar bla OXA genes which might further complicate such species typing approach. We also show that none of the A. insuavis isolates carried catB10 chloramphenicol resistance gene ortholog; however, these identified resistance genes alone were not sufficient to explain the differences in antibiotic resistance phenotype between lineages and species. insuavis and A. xylosoxidans genes confirms that there is strong selective pressure for changes in these genes during adaptation to the patients with CF airway. Altogether, dN/dS>1 in both frequently and non-frequently mutated genes show that more than the top 1% mutated genes are under selective pressure; nonetheless, a larger dataset is needed to identify more genes without sacrificing analysis accuracy.
AxyZ (mexZ ortholog), which was the only candidate pathoadaptive gene in all three species, is involved in the development of multidrug resistance by regulating AxyXY-OprZ RND-

Conclusions
In conclusion, by using the largest dataset to-date of Achromobacter clinical isolates from patients with CF, we used a comprehensive analytical framework for thorough bacterial genomic data analysis. Thus, we identified pathoadaptive and antibiotic resistance genes in Achromobacter causing CF airway infections. Furthermore, we showed that current knowledge about antibiotic resistance gene presence or mutations in those genes cannot sufficiently explain the resistance phenotypes, and GWAS offers a new approach of addressing this problem. The gained knowledge allows us to better understand the requirements for successful Achromobacter adaptation during infection in airways of patients with CF which could help accurately predict antibiotic susceptibility and clinical progression of Achromobacter infections, and further the development of urgently needed optimized treatment strategies.

Availability of data and materials
Achromobacter whole genome sequencing data is available at European Nucleotide Archive under study accession number PRJEB39108.

Supplementary information
Text S1. Achromobacter gene presence-absence analysis results.    (2) variable in a group or individually.