Abstract
Pseudomonas aeruginosa uses quorum sensing (QS) to coordinate the expression of multiple genes necessary for establishing and maintaining infection. lasR QS mutations have been shown to frequently arise in cystic fibrosis (CF) lung infections, however, there has been far less emphasis on determining whether QS system mutations arise across other environments. To test this, we utilized 852 publicly available sequenced P. aeruginosa genomes from the Pseudomonas International Consortium Database (IPCD) to study P. aeruginosa QS mutational signatures. We found that across all isolates, LasR is the most variable protein sequence compared to other QS proteins. In order to study isolates by source, we focused on a subset of 654 isolates collected from CF, wounds, and non-infection environmental isolates, where we could clearly identify their source. Using this sub-set analysis, we found that LasR mutations are not specific to CF lungs, but are common across all environments. We then used amino acid length as a proxy for observing loss of function in LasR proteins among the strains. We found that truncated LasR proteins are more abundant in P. aeruginosa strains isolated from human infection than the environment. Overall, our findings suggest that the evolution of lasR QS mutations in P. aeruginosa are common and not limited to infection environments.
Introduction
Pseudomonas aeruginosa is a Gram-negative opportunistic pathogen equipped with a large genome, which enables it to be metabolically versatile and capable of occupying a range of different habitats, especially human and animal impacted environments [1, 2]. It is intrinsically resistant to many classes of antibiotic, and it produces a range of tissue damaging extracellular products such as exoenzymes and phenazine pigments in order to aid its dissemination and spread within a host [2]. P. aeruginosa is one of the most prominent bacterial pathogens that colonizes cystic fibrosis (CF) lungs and it is often the dominant CF lung pathogen, particularly as the infection becomes more chronic over time [3]. In addition, P. aeruginosa frequently infects chronic wounds, often in conjunction with other microbial species [4].
One of the major adaptations of P. aeruginosa during chronic infection is the loss of quorum sensing (QS). QS in P. aeruginosa regulates the expression of hundreds of genes, including those that encode for secreted products and virulence factors [5, 6]. In P. aeruginosa, QS is regulated by a complex hierarchical network of genes, composed of two complete N-acyl homoserine lactone (AHL) circuits, LasR-LasI and RhlR-RhlI, as well as an orphan regulator termed QscR [5, 7]. The Las and Rhl systems are composed of LuxR-LuxI pairs, homologous to other bacterial QS systems. The LuxR-type receptors (LasR, RhlR) act as transcriptional regulators, and the LuxI-type proteins (LasI, RhlI) are signal synthases. LasI produces 3-oxo-dodecanoyl-L-homoserine lactone (3OC12-HSL), and RhlI produces butanoyl-L-homoserine lactone (C4-HSL) [5]. Both signals can function in a combinatorial manner to regulate certain genes [8, 9]. Working in conjunction with the two AHL systems is an alkyl-quinolone (AQ) system, comprising the pqsABCDE operon and pqsH, pqsL and pqsR (mvfR) genes. These genes drive the synthesis and response of 2-heptyl-3-hydroxy-4-quinolone (the Pseudomonas quinolone signal PQS), which is used as a QS signal and which also has iron chelating properties [10, 11].
LasR was first identified in 1991 as a regulator of the lasB (elastase) gene [12]. It has since been described as a key QS regulator in the well-studied laboratory strains PAO1 and PA14 [2], where it has been shown to sit at the top of the QS hierarchy, regulating both the rhl and pqs systems [5, 6]. lasR mutants have frequently been isolated from CF lungs [13–16] and more recently, some CF strains use RhlR to regulate the rhl and pqs systems in the absence of a functional LasR [6, 16–19]. The decoupling of the AHL QS hierarchy requires the inactivation of MexT, a regulator of the multi-drug efflux pump operon MexEF-OprN [18, 20]. PqsE and RhlR have also been shown to function as a ligand:receptor pair [21].
The ecological and evolutionary implications of QS re-wiring remain to be explored, and the drivers for lasR mutation before and during infection are unknown. To date, little is known about QS mutations outside of an infection environment, and so in this study, we explored the diversity and frequency of QS mutations across a range of ecologically distinct environments in order to determine (i) which QS genes are frequently mutated; (ii) mutational signatures, or patterns in QS gene mutation specific to isolate source.
Results
We utilized the published sequences of 852 P. aeruginosa isolates from the International Pseudomonas Consortium Database (IPCD); a database representing a range of P. aeruginosa strains from different sources including rivers, infections and plants [22]. We queried a number of key QS genes from the las, rhl and pqs systems against gene sequences from PAO1 using BLASTn for all 852 isolates. We determined the putative amino acid sequence for each gene and calculated dissimilarity scores using BLOSUM80, an empirical amino acid substitution matrix [13]. All analyses were conducted in R version 4.3.
We first looked at the number of sequences we had for each QS protein, and the diversity of the protein sequences. When we queried each QS gene nucleotide sequence against the 852 isolates, the query returned less than 852 sequences for each gene. This disparity is likely due to gaps in sequences, gene deletions, and extensive mutations, preventing BLASTn from returning a query. Fig. 1A shows that a LasR query returned the fewest number of sequences, suggesting that there are many strains that contain large deletions in LasR, truncations, or are missing the LasR gene entirely. After translating the sequences, we found that LasR also had the most unique protein sequences across 852 isolates (Fig. 1B). Given this finding, we analyzed whether there was a mutational signature for the las system in order to determine whether certain kinds of mutation or divergence were specific to las genes, and if these mutations were specific to isolate source. We created PCA plots of LasR (Fig. 1C) and LasI (Fig. 1D) proteins from all returned isolates and found that the LasR protein was distributed across the PCA plot, and the most divergent strains for LasR were truncated. Compared to LasR, the other key QS proteins were more conserved across isolates.
After analyzing all QS proteins, we then specifically focused on LasR. To determine if there were LasR mutations specific to each environment, we categorized the strains by source. Using data from the IPCD, we selected a subset of 654 strains labeled as “environmental”, “cystic fibrosis” or “CF”, and “wound” or “ulcer” or “burn” and reclassified them as environmental (209 strains), CF (396 strains), or wound (wound, ulcer and burn) (49 strains); 654 total. The remaining 198 strains from the original set of 852 strains were of uncertain origin and therefore not used in this particular analysis. To establish a threshold by which a protein could be deemed functional or not, we looked at truncated LasR proteins within each environment. We compared the amino acid length of LasR in the IPCD strains to the PAO1 LasR protein - which is equal in length to many commonly researched strains including PA14, PAK and an epidemic CF strain, LESB58. Our assumption was that a truncated protein due to shortened DNA sequence or an early stop site, would lead to a nonfunctional protein. We used a stringent 100% length as a cut-off, and any protein shorter than full-length was considered truncated. Fig. 2A shows the proportion of each group that had truncated LasR proteins with CF, environmental, and wound isolates having 20%, 11% and 30% truncations respectively. We also used a PCA plot to visualize LasR amino acid variation by environment using BLOSUM80 generated dissimilarity scores (Fig. 2B). Overall, we found that lasR mutations are ubiquitous across all environments, but there is a larger percentage of strains with truncated LasR proteins found in infection environments.
Discussion
In P. aeruginosa, lasR QS mutants are frequently isolated from human chronic infection, but it has remained unclear whether such mutants specifically evolve in infection environments or are common across multiple environments. Using a publicly available database of 852 fully sequenced isolates from CF, wounds and non-infection based (environmental) isolates, we determined the frequency and pattern of lasR and other QS mutations in P. aeruginosa. We found that (i) LasR is the most variable protein of all the major QS proteins; (ii) lasR mutations are found in isolates across all environments, suggesting that any environment can drive the evolution of these mutations.
But what does drive the evolution of lasR mutations and what fitness benefits do lasR mutations provide to P. aeruginosa isolates or populations? First, lasR mutants could arise in populations through social cheating, where mutants exploit the social interactions and exoproducts produced by lasR intact cells [23, 24]. Controlled experiments have shown that lasR mutants can socially exploit wild type cells in vitro [24] and in vivo [25], although it is unclear whether the spatial structuring found within infections will allow the close proximity of different isolates to allow for regular cheating. Importantly, QS genes have recently been shown to be down-regulated during infection compared to in vitro conditions, questioning the long-held belief that a functional QS system is essential for P. aeruginosa to establish and persist in human infections [6, 26]. This would likely reduce any fitness benefits of being a lasR mutant persisting via social cheating. Second, lasR mutants may have increased fitness in particular environments due to certain phenotypes driven by the mutation being beneficial. For example, lasR mutations have previously been shown to confer a growth advantage with particular carbon and nitrogen sources, including amino acids [27]. Third, lasR mutants may be more competitive than lasR positive cells, which provides fitness benefits against other P. aeruginosa strains or other species.
There are, however, likely evolutionary benefits for both the maintenance and loss of LasR so that both lasR positive and negative strains can stably coexist in heterogenous populations and contribute to an overall community function. In recent support of this idea, it has been shown that (i) lasR- strains overproduce Rhl-associated factors and cross-feed wild type cells in low iron environments, which will likely impact infection dynamics of mixed populations [28]; (ii) mixed lasR +/− populations display decreased virulence in mouse models of infection [25]; (iii) mixed populations exhibit enhanced tolerance to beta-lactam antibiotics [29]. Taken together, this suggests there are likely to be considerable fitness advantages to cells growing in heterogeneous QS populations, perhaps as a bet-hedging mechanism for future disturbance events.
Overall, our work highlights that lasR mutations are the most commonly found QS mutation across different environments, although we do not know whether mutations in the lasR gene always results in a loss of QS function. Indeed, recent studies on QS in P. aeruginosa has revealed that the complex and intertwined las, rhl, and pqs systems can be rewired in the event that lasR becomes mutated [16, 18][9, 19]. It is not always clear whether these strains are entirely QS-null or if they have re-wired their QS systems to circumvent the loss of lasR. Further work is needed to determine why strains lose functional LasR proteins, and what fitness benefits the strains or community gains. Future work should more strongly focus on the ecology of mixed QS-phenotypes to better understand QS-involvement in infection and other environments. With ongoing work identifying QS-inhibitors targeting the las QS system, the frequency of lasR mutated strains found in our study suggests that this particular pursuit is likely to fail.
Materials and Methods
Querying QS genes from the International Pseudomonas Consortium Database
Using nucleotide sequences from PAO1, we queried QS genes (see Fig. 1) using BLASTn for isolates from the IPCD [12]. We chose this strain because it is a fully sequenced, frequently used lab strain. We then translated these sequences into protein sequences calculating putative amino acid sequence similarities using BLOSUM80 [13]. First, we compared genes found in each isolate against our reference strain, PAO1, normalized against the similarity of the reference against itself. We then calculated the mean dissimilarity score of all isolates compared to PAO1. Some isolates were missing genes due to sequencing errors or true truncations, the number of isolates with a given gene present was under 852 for all genes. All analyses, including translation steps were conducted in R version 4.3. All code and files are available on Github (https://github.gatech.edu/login?return_to=https%3A%2F%2Fgithub.gatech.edu%2Fkoconnor36%2FFrequency_of_quorum_sensing_mutations_in_Pa2021).
Creating an IPCD database using BLASTn
We pulled IPCD data from GenBank. We used the makeblastdb/ command to generate a database of all isolate contigs.
Using BLASTn to find QS genes for each isolate
Using our generated database, we queried the PAO1 sequence from each gene, found from Genbank, against the database. We generated csv files for each gene which included the gene sequences for each isolate.
Translating nucleotide to amino acid sequence
We translated genes to proteins using a custom R script. We first queried only for sequences starting with a canonical ATG start codon. We exclude sequences with fully unresolvable nucleotides (coded as “-“), but allowed fuzzy codons so long as they resolved to unambiguous amino acids. We translated the sequences meeting these criteria using the translate function from the BioStrings R package (v.2.58.0).
Calculating dissimilarity scores for isolates’ QS proteins
All sequence analyses were performed in R (v.4.0.2) using the Biostrings package v.2.58.0. We compared isolate protein sequences to PAO1 protein sequences using BLOSUM80, a matrix designed to compared protein sequences within species. We found that close to 50% of all isolate proteins were identical to PAO1.
Determining truncation rates for LasR and categorizing isolates by location
We determined the length of the reference LasR protein, from PAO1, compared to each isolate protein. If the isolate protein was 100% or less of the length of the PAO1 protein, we categorized it as truncated. Sequences were categorized as CF-originated (CF), environmental (ENV), or wound (WND). If the sequence was entered into IPCD as environmental, we adopted that label. Additionally, we included sequences labeled from animal hosts as environmental. For CF, we only included sequences with sources explicitly labeled as CF. For wound, we included sequences labeled as wound, ulcer, and burn.
Author contributions
SPD and KOC designed the study. KOC and CYZ performed the in silico analysis of the data. All authors contributed to the writing of the manuscript.
Competing interests
The authors declare no competing interests.
Funding and acknowledgements
We wish to thank The Cystic Fibrosis Foundation (DIGGLE18I0 and DIGGLE20G0) to SPD for funding. We also thank members of the Diggle Lab and Jon Gerhart for helpful discussion.