Genetic adaptation and transmission of Achromobacter in cystic fibrosis

Abstract Achromobacter species are increasingly being detected in patients with cystic fibrosis (CF), and this emerging pathogen has been associated with antibiotic resistance and more severe disease outcomes. Nonetheless, little is known about the extent of transmission and genetic adaptation in Achromobacter infections. We sequenced the genomes of 101 clinical isolates of Achromobacter collected from 51 patients with CF—the largest longitudinal dataset to-date. We performed a comprehensive pathogenomic analysis to identify pathogen population structure, within-host adaptation, mutational signatures, patient-to-patient transmission events, and associated genetic variation with antibiotic resistance phenotypes. We found that the population of Achromobacter isolates was composed of five different species where A. xylosoxidans accounted for 52% of the infections. Most patients were infected by unique Achromobacter clone types; nonetheless, patient-to-patient transmission events identified by shared clone types were observed in 35% (N=18) of patients. We found that the same regulatory and inorganic ion transport genes were frequently mutated in persisting clone types within and between species indicating convergent genetic adaptation. Genome-wide association study (GWAS) of six antibiotic resistance phenotypes revealed that the majority of associated genes were involved in transcription and inorganic ion transport. Overall, we provide insight into pathogenomics of chronic Achromobacter infections and show the relevance of whole genome sequencing of clinical isolates. Our findings on evolution and genetic adaptation can facilitate the understanding of disease progression, inform antibiotic treatment, and identify patient-to-patient transmission.


Introduction
patients with CF, ultimately leading to the possibility of genomic-based disease progression prediction and improved strategies to track and treat persistent airway infections. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint

Population structure-five different species
The genomes of 101 Achromobacter isolates from the airways of 51 patients with CF were sequenced to follow the within-host evolution and genetic adaptation of the lineages over the initial 0-18 years of infection ( Figure 1). All isolates prior to this study were identified in the routine clinical microbiology laboratory by MALDI-TOF or API N20 typing to belong to A. xylosoxidans species.
We first compared the sequenced genomes with eight publicly available complete Achromobacter reference genomes. While the reference genomes were annotated as A.
xylosoxidans in the RefSeq database, average nucleotide identity (ANI) analysis with the identity threshold of 95% showed that the eight reference genomes represented four different species (Table S1). We then conducted further ANI analysis and phylogenetic analysis on all Achromobacter genomes available in the RefSeq database (141 samples, Table S2) together with clinical Achromobacter isolates to identify how our dataset is representative of the overall Achromobacter genomes. Our phylogenetic analysis ( Figure 2A) revealed that Achromobacter annotation is inconsistent among the RefSeq genomes and requires correction to improve species designation (suggested corrections in Table S2).
The genomes of our clinical isolates clustered into five species when using ANI threshold of 95%: A. ruhlandii, A. xylosoxidans, A. insuavis, A. aegrifaciens and a new genogroup ( Figure   2B). In further analysis, each species was analyzed separately using genetically closest complete reference genome (see Materials and Methods for detailed information).
Out of 51 patients, 15 (25%) were infected with A. ruhlandii, 12-with A. insuavis (20%), 31-with A. xylosoxidans (52%), and 2-with other Achromobacter species. Furthermore, isolates from 13 out of 15 patients infected with A. ruhlandii species belonged to the previously defined . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint  CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint phylogenetic tree can be accessed on Microreact webserver (70). (B) Phylogenetic tree of 101 Achromobacter clinical isolates. Colors represent bacterial isolates from different patients; black arrows point to Achromobacter reference genomes which were included in the phylogenetic analysis. The phylogenetic tree can be accessed on Microreact webserver (71). (C) Core genome SNP-based phylogenetic tree from de novo assembled DES isolates of which 7 were from patients attending CF center in Aarhus (blue label background; (11)) and 25 were from patients attending CF center in Copenhagen. The four patient clusters are separated by branch and label colors. The phylogenetic tree can be accessed on iTOL webserver (72). (D) All identified putative patient-to-patient transmission events, the smallest pairwise distances between the isolates are stated above the arrows. Hypermutator lineages are marked with an asterix. AX01-A. ruhlandii, AX02-A. insuavis and AX03-A. xylosoxidans isolates.
DES clone type ( Figure 2C). Finally, 10 out of 51 patients were infected with more than one clone type (N=2) or species (N=9) of Achromobacter ( Figure 1). The A. aegrifaciens and the new genogroup isolates were both genetically distant from all other isolates, and were excluded from further analysis. The remaining 99 Achromobacter isolates were grouped into 61 lineages: i.e., all isolates of the same species and the same clone type isolated from the same patient (1-4 isolates per lineage).

Aggregated pan-genome analysis
The aggregated pan-genome was constructed from pan-genomes of each of the 61 lineages (35 of which were single-isolate lineages). This approach allowed us to account for the nature of the dataset where multiple clonal isolates from the same patient are available. The aggregated pan-genome consisted of 21,198 genes: 2,887 core genes, 18,311 accessory genes of which 6,917 genes were unique to a single lineage.
The aggregated pan-genome ( Figure 3A) contained Achromobacter species-specific genes (649 for A. ruhlandii, 648 for A. insuavis, and 494 for A. xylosoxidans) present in all isolates of the respective species but not in the isolates from other species which further supported the three species separation. Pan-genomes for each Achromobacter species were defined by using . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint all available bacterial isolates available  for the species . The size of the species' pangenomes contained 7,070-14,833 genes, of which 4,225-5,130 were core genes, 1,940-10,608 accessory genes and 976-3,162 isolate-unique genes (Table 1).

Achromobacter transmission between patients
For all three Achromobacter species, our genomic analysis identified events of patient-topatient transmission. Transmission events were identified based on pairwise SNP distance and phylogenetic relationships of isolates. We furthermore attempted to estimate the date of the most recent common ancestor (MRCA) for all suspected transmission events ( Figure 1; Table S3). In total, 13 transmission events were identified with one, two, and three transmitted clone types in A. ruhlandii, A. insuavis and A. xylosoxidans, respectively, among 16 patients ( Figure 1). It was previously reported that DES clone type (AX01DK01) has spread among Danish patients with CF (11), and we were able to identify several more recent transmission events between patients. Isolates where patient-to-patient transmission was suspected had relatively small SNP distances ( Figure 2D; Table S4) and/or a phylogenetic relationship ( Figure S1) in which isolates from one patient are within the clade of the isolates from another patient. Ultimately, we observed that patients were always infected with unique clone types if no transmission between patients was suspected (DES isolates were assumed to have spread through patient-to-patient transmission).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint Overall, we observed high Achromobacter isolate diversity between and within patients in 26 longitudinally collected lineages: 0-1,034 SNP differences (average: 74 SNPs, median: 15 SNPs) were identified within lineages where late isolates contained 3-fold more SNPs than early isolates ( Figure 3B; Table S5).  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint

Substitution rate and hypermutational signatures
Eleven of the 26 longitudinally sampled lineages showed an increased number of SNPs when accounting for the time period between isolates, and we defined these lineages as hypermutable ( Figure 3B). The excess number of SNPs in the hypermutable lineages was driven by an excess number of transition substitutions (higher than three-fold transition-totransversion (Ts/Tv) ratios were identified in hypermutator lineages; Figure 3C). We aimed to identify if any of the hypermutator lineages had insertions, deletions or frameshifts in the mismatch repair (MMR) system and a total of seven mutations were identified in mutL (p.Ala376fs; p.Ala384delinsProPro; p.Ala384fs; p.Ala444-Ala445insAlaLeuAlaProGlnAla; p.Thr369-Pro370delinsAlaSerThrAla) and mutS (p.Gln174-Ala185del; p.Glu435fs) genes.
Of the 61 lineages, 18 (30%) contained at least one of these mutations in genes coding for the MMR system ( Figure 4A). Two hypermutable lineages did not have any insertions, deletions, or frameshift mutations in mutL and mutS genes while two non-hypermutator lineages had mutations in the MMR system genes. In sum, all nine longitudinal A. ruhlandii lineages were identified as hypermutators while only one lineage was defined as hypermutator for both A.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint

Genetic adaptation: Mutation of the same genes across lineages
To explore within-host convergent evolution in Achromobacter, we first identified the genes which were most frequently mutated within each species. Genes were defined as frequently mutated if they were among 1% most commonly mutated genes between lineages. If more than 1% of the genes were mutated with the same frequency, those genes were also included in the analysis. A total of 27, 16, and 28 genes were identified as most frequently mutated for A. ruhlandii, A. insuavis, and A. xylosoxidans, respectively ( Figure 4C). The clusters of orthologous groups (COG) functional annotations are summarized in Figure 5A for all species (Table S8 for detailed information) with the highest mutation frequency in genes coding for signal transduction; inorganic ion transport and metabolism; replication, recombination and repair; and transcription.
Although knowledge is lacking about many Achromobacter gene functions, bacterial sequence similarity analysis allowed us to identify possible antibiotic resistance and virulence-related genes among the most frequently mutated genes in Achromobacter. After manual literature search 10, 4 and 4 genes were defined as related to antibiotic resistance in A. ruhlandii, A. insuavis and A. xylosoxidans, respectively, whereas 8, 2 and 7 genes were defined as virulence-related genes (Additional File 1: Table S9).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint TonB-dependent hemin, ferrichrome receptor Transport One gene was frequently mutated in all three species and six genes were observed as most frequently mutated in two of the Achromobacter species (Table 2) Finally, the ratio of non-synonymous to synonymous substitutions (dN/dS) was significantly different between the 1% most frequently mutated genes and non-frequently mutated genes ( Figure 5C) in A. ruhlandii and A. xylosoxidans (Fisher's exact test; p<2.2·10 -16 and p=1.6·10 -6 , respectively).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint Figure 5. Functional groups and relative rate of nonsynonymous mutations of genes associated with host adaptation. COG annotation of (A) three Achromobacter reference genomes and the most frequently mutated genes, and (B) the aggregated pan-genome and most frequently lost or acquired genes. (C) Non-synonymous to synonymous (dN/dS) mutation ratio between most frequently mutated and non-frequently mutated genes in each Achromobacter species. Red stars mark significant differences between two groups after performing Fisher's exact test. (D) COG annotations of the most significant unitigs associated with six different antibiotic resistance phenotypes.

Gene presence-absence analysis
As a crucial part of bacterial adaptation and evolution, we investigated the most frequently lost and acquired genes in the 26 longitudinally sampled Achromobacter lineages. We defined a gene as lost if it was present in the first isolate but absent in one or more of the later . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint isolates and a gene was defined as acquired if it was absent in the first isolate but present in one or more of the later isolates. The lineage pan-genomes contained on average 5,940 (5,709-6,377) genes of which on average 62 (14-183) were lost or acquired. In total, we found 1,464 genes to be variable within lineages and 735 of these were unique to a single lineage in the aggregated pan-genome ( Figure 3A). We observed that genes were 1.8 times more often lost than acquired and lost/acquired in groups 37 times more commonly than individually (Table S10).
Most frequently lost or acquired genes were defined if a gene was lost/acquired in a minimum of 3 lineages. In total, 38 genes passed these criteria; however, after the manual inspection, one gene was defined as a false positive: the gene was manually assessed as present, yet the gene was not assembled with a minimum requirement of 25% of the gene length coverage, thus was predicted to be absent by GenAPI. The remaining 37 genes that were frequently lost or acquired, were annotated with EGGNOG-mapper: transport and metabolism genes ( Figure 5B, colored in blue) were more frequently lost/acquired, especially amino acid transport and metabolism genes, when compared to the composition of the aggregated pan-genome. Contrary, structural, cellular organization and cell cycle genes; reproduction; and gene expression regulating genes were less frequently lost/acquired ( Figure   5B). Overall, aggregated pan-genome gene composition is comparable to individual reference genome gene composition ( Figure 5A and 5B).

Genome-wide association between Achromobacter genotypes and antibiotic resistance
To test for associations between bacterial genetics and phenotypic antibiotic resistance we performed unitig-based DBGWAS analysis. Out of 21 antibiotics where the resistance was phenotypically tested, only 10 had both susceptible and resistant isolates from all three Achromobacter species for which the association analysis was performed (see Materials and . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint Methods for detailed information; Figure S2; Table S11). No unitigs passed a 5% FDR corrected q-value threshold for AMC, CAZ, CST, and TGC. Ten most significant unitigs were used for the six remainder association test analysis, resulting in 60 gene features (49 unique genes) significantly associated with antibiotic resistance phenotypes ( Figure 5D; Table S12). The majority of associated unitigs were annotated by EGGNOG-mapper as belonging to transcription regulation (N=11) or inorganic ion transport (N=14) genes. Furthermore, energy production and conversion (N=7), carbohydrate production and conversion (N=4), and defense mechanism (N=4) genes were also observed to be significantly associated with the resistance phenotype in one or more antibiotics. Of 49 genes, 12 were associated with known antibiotic resistance phenotypes and were previously described as important for antibiotic resistance development in other bacteria (Table S12). xylosoxidans isolates were not susceptible to any antibiotic.

Discussion
Achromobacter is an emerging pathogen causing chronic respiratory tract infections in patients with CF; however, the genetic epidemiology of these infections is little understood.
We sequenced and analyzed 101 genomes of Achromobacter isolates from 51 patients with CF which is the largest longitudinally collected Achromobacter genome dataset available todate. This allowed us to investigate the population genomics, within-host adaptation, mutational mechanisms, including antibiotic resistance phenotype and bacterial genetics association, and patient-to-patient transmission events with high sensitivity. Furthermore, we defined a comprehensive analytical framework that can be applied to other research involving exploration of within-host adaptation of pathogenic bacteria (Figure 6).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint Our genomic analysis showed that MALDI-TOF/API 20 is not accurate for Achromobacter species-level typing as it has also been recently indicated by others (18,19); thus we suggest that sequencing of marker genes (e.g., bla OXA or nrdA) or whole genome (WGS) should be used for species typing in the clinical setup. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint Core genome sizes were comparable between A. ruhlandii, A. insuavis and A. xylosoxidans even though the number of accessory and unique genes in species' pan-genomes varied greatly. A. ruhlandii isolates mostly belong to DES (as confirmed by the phylogenetic analysis) which are known to be spread through patient-to-patient transmission and, therefore, have lower pan-genome plasticity than A. xylosoxidans or A. insuavis. Moreover, the phylogenetic analysis from patients attending Aarhus and Copenhagen CF centers also suggests multiple transmission events between patients attending different CF centers. insuavis species. However, because of DES overrepresentation, A. ruhlandii isolates did not reflect the species genetic diversity. Furthermore, ANI and core-genome based phylogenetic analysis revealed that more than 10% of publicly available Achromobacter genomes are supposedly misannotated in the RefSeq database which we anticipated to unravel to ease future research on Achromobacter.
The majority of Achromobacter infections are acquired from the environment and prevalence of patient-to-patient transmission remains controversial: while some studies identified patients being infected with unique clone types of Achromobacter (20)(21)(22) other studies . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint reported the cases of suspected patient-to-patient transmission. (23,24) We identified multiple transmission events in each Achromobacter species which occasionally were transmitted to multiple patients. However, unlike in P. aeruginosa, the observed SNP distance between the transmitted isolates was higher (13); hence, complicated the successful transmission identification and might explain why no patient-to-patient transmission was discovered in previous studies. (20)(21)(22) Interestingly, DES-a known hypermutator-is widely transmitted between patients with CF even though some studies show reduced transmissibility of hypermutator strains. (25) Finally, we observed that infection of the same clone type in several patients was always a sign of patient-to-patient transmission of Achromobacter. This knowledge could be applied in diagnostics where sharing an Achromobacter clone type would be treated as the first sign of suspected transmission between patients and would lead to further testing. Unlike many bacterial species, substitution rates for Achromobacter are not known. (26) Here, we successfully estimated the within-host substitution rate for A. xylosoxidans species included in the analysis. Our A. ruhlandii dataset only consisted of hypermutator lineages which led to a high substitution rate estimate, and our substitution rate estimate for A.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint Moreover, ortholog virulence gene analysis revealed that Achromobacter carries several virulence factors without clear differences between species with several virulence gene orthologs coding for host cell invasion (cheW and cheY) and facilitating evasion of the host immune response (brkB (31)) were observed in all lineages. Furthermore, contrary to being commonly found among A. insuavis and A. xylosoxidans, 9 virulence gene orthologs (bcrD, bplA, bplB, bplC, bplF, bscJ, bscN, bscR, and bscS) were not identified in any of the DES lineages, which are well-adapted to human airway, further supporting the adaptive trade-off evolution hypothesis that virulence genes are not required or are selected against in chronic infections. (26) While it was previously suggested to use bla OXA genes for Achromobacter species typing (32)(33)(34), we showed that in some cases such strategy would not be sufficient for species identification as isolates can carry none of the bla OXA genes. Furthermore, none of the A.
insuavis isolates carried catB10 chloramphenicol resistance gene; however, these observations alone were not sufficient to explain the differences in antibiotic resistance phenotype between lineages and species.
Among the candidate pathoadaptive genes (frequently mutated and lost/acquired genes), we identified multiple antibiotic resistance genes which were markedly more frequently mutated among A. ruhlandii isolates than A. insuavis or A. xylosoxidans isolates. This phenomenon might signal about the continuous adaptive evolution even in highly antibiotic-resistant strains. (7) Nonetheless, more antibiotic resistance and virulence genes among frequently mutated genes might be identified if gene annotation of Achromobacter reference genomes improved. Furthermore, axyZ (mexZ ortholog), which was the only frequently mutated gene in all three lineages, is involved in the development of multidrug resistance by regulating AxyXY-OprZ RND-type efflux system, hence is crucial during adaptation to the host . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint environment. (35) In sum, our identified candidate pathoadaptive genes (belonging to defense mechanism; signal transduction; inorganic ion transport and metabolism; transcription and replication gene functional classes) are comparable to the observations in P. aeruginosa infecting patients with CF (13) and other smaller-scale studies on Achromobacter. (16) The observed gene loss and acquisition patterns were comparable to the ones observed in P.
aeruginosa: genes were almost 2 times more often lost than acquired and 37 times more frequently lost in groups than individually; however, unlike in P. aeruginosa, no convergent loss or acquisition of gene clusters was observed. (36) High between-patient diversity of Achromobacter is comparable to previous observations of Marvig et al. (2015) (13) in P. aeruginosa. Noticeably higher dN/dS ratios among the most frequently mutated genes when compared with non-frequently mutated genes indicate adaptation and selective pressure from the environment for these genes to be mutated.
Additionally, high dN/dS ratios in both frequently and non-frequently mutated genes show that more than top 1% mutated genes are under selective pressure; nonetheless, a larger dataset is needed to identify more genes without sacrificing analysis accuracy.
To further explore the differences in antibiotic susceptibility between Achromobacter isolates, we performed a k-mer based GWAS analysis. Limited number of bacterial isolates and high innate resistance to certain antibiotics restricted our analysis to only 6 successful GWAS associations which revealed that transcription regulation and inorganic ion transport genes are likely playing a role in Achromobacter antibiotic resistance development. These findings are consistent with our identified putative pathoadaptive genes indicating that presence of antibiotics might be driving the selective pressure. Inorganic ion transport gene changes could have a secondary influence on antibiotic resistance as such changes help . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint overcome the problem of iron deficiency in the human airways allowing better intrinsically resistant bacteria survival despite the presence of antibiotics. Similar patterns were previously identified in other bacteria causing chronic infections in patients with CF. (26,37,38) Another frequently associated feature was transcription regulation genes; changes in such genes might modify the transcription of intrinsic antibiotic resistance genes. This phenomenon was previously described in both Achromobacter (39) and other bacterial pathogens (40,41).
Nevertheless, including more isolates for the bacterial GWAS analysis could reveal more putative genomic features associated with antibiotic resistance phenotype.
Our study has several limitations. First, even larger studies are necessary to further characterize and identify the genetic epidemiology of Achromobacter infections. Second, the lack of genome annotation and overall knowledge about Achromobacter limited the interpretation of putative pathoadaptive genes and genes associated with resistance phenotypes. Finally, a single isolate at a given time point is not sufficient to completely reflect the genetic diversity of the bacterial population (42,43); therefore, some of our findings might be the result of diversification and not the fixation of the adaptive mutations in

Achromobacter.
In conclusion, by using the largest dataset to-date of Achromobacter clinical isolates from patients with CF, we used a comprehensive analytical framework for thorough bacterial genomic data exploration which could facilitate future pathogenic bacteria genomic studies.
Our work revealed that WGS can be used for successful Achromobacter typing and patientto-patient transmission identification. Our analytical framework allowed us to explore the Achromobacter mutational and transmission patterns, identify putative pathoadaptive genes, locate virulence and antibiotic resistance genes. The gained knowledge allows us to better . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint understand the requirements for successful Achromobacter adaptation during chronic infection in airways of patients with CF which could help predict clinical progression of Achromobacter infections, further the development of urgently needed better treatment strategies and patient-to-patient transmission prevention.

Bacterial isolates
The analysis included 101 clinical isolates of Achromobacter that prior to this study were identified in the routine clinical microbiology laboratory as A. xylosoxidans by API N20 (bioMérieux) or MALDI-TOF typing (Bruker, Germany). The isolates were sampled from 51 patients with CF attending the Copenhagen Cystic Fibrosis Center at Rigshospitalet, Denmark. Over the timespan of 0-18 years (median 6.5 years), 64 isolates from 25 patients, belonging to the same species and clone type, were longitudinally collected (median 2 isolates) and 37 isolates from 29 patients were single isolates.

Bacterial genome sequencing and definition of clone type
Genomic DNA was prepared from Achromobacter isolates on a QIAcube system using a DNeasy Blood and Tissue kit (Qiagen) and sequenced on an Illumina HiSeq 2000 platform, generating 250-bp paired-end reads and using a multiplexed protocol to obtain an average of 1,124,551 reads (range of 350,677-2,118,817) for each of the genomic libraries. Clone types were defined by Pactyper (https://github.com/MigleSur/Pactyper) (44) using the default parameters and species' core genome defined by GenAPI. (45,46) Lineage was defined by all isolates belonging to the same species and the same clone type.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint

Bacterial genome assembly
Sequence reads from each isolate were corrected and assembled by SPAdes version 3.10.1 (47) using default parameters and k-mer sizes ranging from 21 to 127. Assembled contigs were joined to 216 scaffolds on average (92-506) and were used for further analysis.

De novo assembly-based phylogenetic tree generation
Core genome SNP-based phylogenetic trees of 101 de novo assembled Achromobacter isolates and DES isolates were generated using parsnp version 1. The joint phylogenetic analysis of all publicly available Achromobacter genomes from RefSeq database (50) (access date: 2019.10.29) and clinical Achromobacter isolates from patients with CF was performed on core genome SNPs by RAxML (51) using GTRCAT model. The core genome was defined during the aggregated pan-genome construction and was used as a reference for variant calling from de novo assemblies with Snippy. (52) Phylogenetic trees were visualized with Microreact webservice. (53)

Average nucleotide identity calculation
Wrong public Achromobacter genome annotation identification from RefSeq database (50) and separation of CF isolates of different species was performed by calculating ANI with fastANI version 1.11. (54) 95% threshold was used to define two bacterial isolates as different species.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint

Aggregated pan-genome generation, characterization and visualization
An aggregated pan-genome was created by clustering all pan-genomes from longitudinal lineages and de novo assemblies from single-isolate lineages with GenAPI. (46)  SNP distances and phylogenetic analysis from BacDist were used for patient-to-patient transmission identification.

Hypermutator identification
Hypermutators were identified in longitudinal lineages by evaluating the number of pairwise SNP distances between isolates over the time interval (BacDist analysis (57)). If 10 or more SNPs/year were on average observed to be introduced, the lineage was concluded to be . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint hypermutable. Insertions, deletions and frameshifts in the MMR system mutL and mutS genes were evaluated to identify which genetic changes could cause hypermutability. (58)

Substitution rate and MRCA estimation
The bacterial substitution rate (59)

Frequently mutated gene definition
Most frequently mutated genes were defined as the top 1% of all mutated genes for the species. If there were more genes mutated with the same frequency as the 1% most frequently . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint mutated genes, these genes were also included in the final analysis. The identified most mutated genes were annotated by EGGNOG-mapper version 1.0.3 (65) using DIAMOND, EGGNOG's bacterial database, 1·10 -8 e-value cutoff, and 0.8 minimum query sequence coverage settings. The predicted COGs were used for further gene analysis.
Pseudomonas aeruginosa orthologs were identified by performing clustering with CD-HIT (66) using word size of 3 and 50% identity thresholds. Joint Achromobacter most frequently mutated genes were identified by clustering with CD-HIT (66) with word size of 3 and 80% identity threshold.

Gene loss/acquisition analysis
De novo assembled genomes were annotated using Prokka version 1.12 (67)  isolates for which the antibiotic susceptibility profiles were available. Core genome SNPbased phylogenetic tree was used to correct for population structure while all available annotations of Achromobacter genes from UniProt database (69) were used for unitig . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint annotation (271,851 genes; retrieved: 2020.04. 19). Ten most significant unitigs for each antibiotic test were used for further analysis as the tool authors advise against using a p-value threshold when testing several phenotypes. (68)  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Availability of data and materials
Achromobacter whole genome sequencing data is available at European Nucleotide Archive under study accession number PRJEB39108.

Acknowledgments
Ulla Johansen is thanked for expert technical assistance and Niels Høiby is thanked for collecting the earliest Achromobacter isolates. All figures were partly or completely created using BioRender (https://biorender.com/).

Disclosure declaration
We declare no conflict of interest.

Ethics declarations
Use of the stored clinical isolates was approved by the local ethics committee at the Capital Region of Denmark RegionH (registration number H-4-2015-FSP). Table S1. ANI between eight complete Achromobacter genome RefSeq sequences used in this study. new proposed annotations which were based on ANI and core genome SNP-based phylogeny. Colored cells in the second column corresponds to the annotation which does not match with the new annotation. Isolates belonging to the same undescribed species are colored in the third column with the same color. Table S3. Estimated date of patient-to-patient transmission of Achromobacter isolates based on BEAST analysis and inferred time since most recent common ancestor (MRCA). Table S4. Smallest SNP distances within and between lineages involved in patient-to-patient transmission.   . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Supplementary information
The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint Table S8. List of most frequently mutated genes in Achromobacter with their COG function, corresponding E-value and gene annotation. xylosoxidans with gene description and gene relatedness to antibiotic resistance or virulence.
References defining gene's relatedness to antibiotic resistance or virulence are provided where relevant. Table S10. Table of pan-genome, core, accessory and unique genes in Achromobacter lineages. Variable genes are separated by whether they were (1) lost or acquired and (2) variable in a group or individually. Table S11. Isolate susceptibility data to 21 tested antibiotics. R -resistant to the antibiotic, Iintermediately resistant to the antibiotic, S -susceptible to the antibiotic. Table S12. Ten most significant GWAS associations (q-value<0.05) for each antibiotic susceptibility association test.  . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted August 4, 2020. ; https://doi.org/10.1101/2020.08.04.235952 doi: bioRxiv preprint