Abstract
Operons are a hallmark of bacterial genomes, where they allow concerted expression of multiple functionally related genes as single polycistronic transcripts. They are rare in eukaryotes, where each gene usually drives expression of its own independent messenger RNAs. Here we report the horizontal operon transfer of a catecholate-class siderophore biosynthesis pathway from Enterobacteriaceae into a group of closely related yeast taxa. We further show that the co-linearly arranged secondary metabolism genes are actively expressed, exhibit mainly eukaryotic transcriptional features, and enable the sequestration and uptake of iron. After transfer to the eukaryotic host, several genetic changes occurred, including the acquisition of polyadenylation sites, structural rearrangements, integration of eukaryotic genes, and secondary loss in some lineages. We conclude that the operon genes were likely captured in the shared insect gut habitat, modified for eukaryotic gene expression, and maintained by selection to adapt to the highly-competitive, iron-limited environment.
Main Text
The core processes of the Central Dogma of Biology, transcription and translation, are broadly conserved across living organisms. Nonetheless, there are seemingly fundamental differences between the domains of life in how these processes are realized. Eukaryotic transcription is spatially and temporally separated from translation and generally operates on individual genes through a complex interplay of transcription factors and chromatin remodeling complexes. Nascent mRNAs are co-transcriptionally processed by adding 3’ polyadenosine (poly(A)) tails and 5’ caps of 7-methyl-guanosine (m7G) before they are trafficked out of the nucleus for translation. In bacteria, transcription is tightly coupled with translation, and both occur inside the cytosol. Furthermore, bacterial transcription often operates on clusters of genes, known as operons, where a single regulatory region regulates the expression of physically-linked genes into a polycistronic mRNA that is minimally processed and translated into several polypeptides at similar abundance. In contrast, eukaryotic operons, which are rare in most taxa but are frequently found in nematodes (1, 2) and tunicates (3, 4), are processed by trans-splicing and related mechanisms. Operon dissemination has been proposed to occur predominantly via horizontal gene transfer (HGT) (5, 6), a process where organisms acquire genes from sources other than their parents. HGT is pervasive and richly documented among bacteria, but it is thought to be rarer in eukaryotes (7–11). Known examples of bacterium-to-eukaryote HGT occurred as single genes, but never as operons. Nonetheless, horizontal operon transfer (HOT) into eukaryotes would allow even complex pathways to spread rapidly, especially in environments where competition for key nutrients is intense.
One such nutrient is iron, which plays crucial roles in many essential cellular processes (12–14) and is a key determinant of virulence in both animal and plant pathogens (15–17). Many specialized systems have evolved to sequester it from the surrounding environment, one of which is the biosynthesis of small-molecule iron chelators called siderophores. Most bacteria synthesize catecholate-class siderophores (18), whereas hydroxamate-class siderophores are commonplace in fungi (19). A notable exception is the budding yeast lineage (subphylum Saccharomycotina), which has long been thought to completely lack the ability to synthesize their own siderophores, despite its ability to utilize them (19). Here we survey a broad range of fungal genomes for known components of iron uptake and storage systems. Although most systems are broadly conserved, we identify a clade of closely related yeast species that contains a bacterial siderophore biosynthesis pathway. Through phylogenetic hypothesis testing, we show that this pathway was acquired through horizontal operon transfer (HOT) from the bacterial family Enterobacteriaceae, which includes Escherichia coli, Erwinia carotovora, Yersinia pestis, and relatives that share the insect gut niche with many of these yeasts (20). After acquisition, the operon underwent structural changes and progressively gained eukaryotic characteristics, while maintaining the clustering of functionally related genes. Transcriptomic experiments and analyses show that the siderophore biosynthesis genes are actively expressed, contain poly(A) tails, and exhibit evidence of mostly monocistronic transcripts, as well as some potentially bicistronic transcripts. In vivo assays also demonstrate the biosynthesis and secretion of functional catecholate-class siderophores in several of these yeast species. This remarkable example shows how eukaryotes can acquire a functional bacterial operon, while modifying its transcription to domesticate and maintain expression as a set of linked eukaryotic genes.
Results
Iron uptake and storage is conserved in fungi
We surveyed the genome sequences of 175 fungal species and observed broad conservation of genes involved in low-affinity iron uptake, vacuolar iron storage, reductive iron assimilation, heme degradation, and siderophore import systems (Fig. 1, Table S1). In contrast, genes involved in siderophore biosynthesis pathways were more dynamic. Siderophore biosynthesis was thought to be completely absent in budding yeasts (19), but the genomes of Lipomyces starkeyi and Tortispora caseinolytica contain homologs of the SidA, SidC, SidD, SidF, and SidL genes involved in the biosynthesis of ferricrocin and fusarinine C, which are hydroxamate-class siderophores synthesized from L-ornithine by many filamentous fungi, such as Aspergillus nidulans (19). Since these species are the earliest-branching budding yeast taxa, the presence of this pathway in their genomes is likely an ancestral trait inherited from the last common ancestor of the Pezizomycotina and Saccharomycotina, while its absence in most yeasts is likely due to a loss early in budding yeast evolution. Surprisingly, the genomes of three closely related Trichomonascaceae species (Candida versatilis, Candida apicola, and Starmerella bombicola) contain multiple homologs of bacterial siderophore biosynthesis genes (entA-F) that are predicated to enable the synthesis of catecholate-class siderophores from chorismate (21) (Fig. S1). These genes are co-linear and predicted to be expressed from the same strand of DNA, features that are both reminiscent of the operons where these genes are found in bacteria.
Plus (green) and minus (orange) signs indicate the presence and absence of iron uptake and storage systems in specific taxonomic groups. The numbers in parentheses (green) indicate the number of species in a taxonomic group that possess a specific system, if it is not ubiquitous in that group. Blue box indicates the budding yeasts. RIA - Reductive Iron Assimilation. IRGF – Iron-Responsive GATA Factor. For details about specific taxa and individual genes see Table S2. Asterisks (*) mark paraphyletic groups. Note that only Wickerhamiella/Starmerella (W/S) clade fungi contain the bacterial or catecholate-class siderophore biosynthesis pathway.
Horizontal operon transfer (HOT) from bacteria to yeasts
To investigate the evolutionary history of these genes, we sequenced and analyzed 18 additional genomes from the Wickerhamiella/Starmerella clade (W/S clade, Table S2) and identified the catecholate-class siderophore biosynthesis pathway in 12 of these species (Fig. 2a, 2c). To determine whether the yeast siderophore biosynthesis genes were horizontally acquired from a bacterial operon, we first used the ent genes found in yeasts to perform BLAST queries against the bacterial data present in GenBank and found that the top hits belonged to a range of species from the family Enterobacteriaceae. Since no single taxon was overrepresented, we surveyed 1,336 publicly available genomes from the class Gammaproteobacteria, to which the Enterobacteriaceae belongs, for the presence of entA-entF homologs and extracted them from all 207 genomes where all six genes could be reliably identified (Table S3). We then reconstructed unconstrained maximum-likelihood (ML) phylogenies for each ent gene, as well as for a concatenated super-alignment of all six genes (entABCDEF, Table S4). Since entF contributed nearly two-thirds of the total alignment length, we also evaluated a super-alignment of the remaining five genes (entABCDE, Fig. 2a).
(B) ML phylogeny from the super-alignment of entABCDE genes from 207 Gammaproteobacteria and 12 yeasts, rooted at the midpoint. Bootstrap support values are shown for relevant branches within the Enterobacteriaceae (red). Other Gammaproteobacteria are green. (B) Detailed view of the yeast clade from the main phylogeny, with bootstrap supports. (C) Alternative scenarios for the horizontal operon transfer. (D) P-values of the AU test of different evolutionary hypotheses tested in this study; EO – Enterobacteriaceae origin; non-EO – non-Enterobacteriaceae origin; 12-mono - 12 yeast sequences are monophyletic, 11-mono - 11 yeast sequences monophyletic and one unconstrained (12 alternatives tested, lowest p-value shown, full details in Table S5); 5G – topology of the yeast clade constrained to the one inferred from the super-alignment of entABCDE genes.
The orange area indicates per-base coverage by RNA-seq reads (read coverage). The blue area indicates per-base cumulative coverage by RNA-seq reads and inserts between read pairs (span coverage). The black line indicates the ratio of the read coverage over the span coverage, which is expected to remain ∼50% in the middle of gene transcripts and rise towards 100% at transcript termini. The expected 3’ coverage bias can be observed for individual transcripts in the raw coverage data.
Diagram of siderophore biosynthesis genes in the C. versatilis genome, drawn to scale, as well as a gene encoding a class II asparaginase adjacent on the 3’ end. Counts above indicate read pairs cross-mapping between genes. Counts below indicate reads containing putative poly(A) tails.
Consistent with the BLAST results, the yeast sequences formed a highly-supported, monophyletic group nested within the Enterobacteriaceae lineage on all gene trees, placing their donor lineage after the divergence of the Serratia/Rouxiella lineage and before the divergence of the Pantoea/Erwinia lineage from closer relatives of E. coli. To formally test the hypothesis of an Enterobacteriaceae origin, we reconstructed phylogenies under the constraints that yeast sequences either group together with the Enterobacteriaceae (EO) or outside of that clade (non-EO). We then employed the approximately unbiased (AU) test to determine if the EO phylogenies were a statistically better explanation of the data than the non-EO phylogenies. The EO phylogeny was strongly preferred (p-value < 10−3) for the six- and five-gene concatenation data matrices (Fig. 2d). Individual genes carried weak signal due to their short lengths, but the entC, entE, and entF genes nonetheless strongly supported the Enterobacteriacae origin (p-value < 0.05), entA and entB had consistent but weaker support, and no individual gene rejected the EO hypothesis. Next, we sought to determine the course of the transfer event and tested a single-source, single-transfer hypothesis against multi-source and multi-transfer alternatives, each of which predicted specific phylogenetic patterns (Fig. 2c). AU tests on the reconstructed phylogenies did not support multiple transfer events and instead supported the simplest explanation that the HOT event occurred from a single source lineage directly into a single common ancestor of the W/S clade yeasts (Fig. 2d).
Transferred genes have mainly eukaryotic transcript features
To determine whether and how these yeasts overcame the differences between eukaryotic and bacterial gene expression, we sequenced mRNA from C. versatilis, C. apicola, and St. bombicola. These species were chosen due to their diverse gene cluster structures and positions on the phylogenetic tree: C. versatilis was chosen as an early-branching representative whose structure appeared to be more similar to the ancestral operon, while St. bombicola and C. apicola appeared to represent more derived stages of evolution in the eukaryotic hosts. Each of the three species expressed mRNAs for the siderophore biosynthesis genes, and C. versatilis expression was the highest (Table S6).
We then examined the transcriptomic data for characteristics that are typically bacterial or eukaryotic. The length of intergenic regions was not divisible by three, so we immediately excluded the hypothesis that they were translated as a single fused polypeptide. The C. versatilis genes were expressed at similar levels, whereas St. bombicola and C. apicola genes showed significant diversity in their expression (Table S6, Fig. 4, Figs S2-S4). Interestingly, we also observed that the siderophore biosynthesis genes in C. versatilis had much shorter intergenic sequences than their counterparts in St. bombicola and C. apicola, which were each shorter than their respective genome-wide means (within gene cluster intergenic means were 158, 484, and 377 bps versus genome-wide means of 370, 549, and 455 bps for C. versatilis, St. bombicola and C. apicola, respectively). Shorter intergenic distances can enhance transcriptional coupling between neighboring genes inside operons (22, 23), so these results suggest that C. versatilis might have retained this feature due to selection for concerted expression.
(A) CAS-based overlay assay of siderophore production. Under normal conditions, the medium remains blue, but in the presence of an iron chelator, it changes color from blue to orange. Species legend: (1) Saccharomyces cerevisiae FM1282, (2) Yarrowia lipolytica NRRL YB-423T, (3) Candida hasegawae, (4) Candida pararugosa, (5) Wickerhamiella cacticola, (6) Wickerhamiella domercqiae, (7) Candida versatilis, (8) Candida davenportii, (9) Candida apicola, (10) Candida riodocensis, (11) Candida kuoi, (12) Starmerella bombicola, (13) Escherichia coli MG1655 (positive control). Results of the CAS assay for all analyzed species can be found in Fig. S5. (B) Distribution of siderophore biosynthesis genes in the genomes of species depicted in panel A.
To further investigate operon-like characteristics that may have been retained, we analyzed read pairs in which the forward and reverse reads mapped to different genes, providing physical evidence of transcripts composed of multiple genes. To quantify this signal, we calculated the per-site ratio of the actual sequence coverage and the coverage spanned by the inserts between read pairs (i.e. coverage/span coverage). Ratios of 50% are expected for most of the length of a transcript, while ratios of 100% indicate the ends of the transcripts. Thus, transcript boundaries are visualized as a coverage trough between two spikes approaching 100% ratios. Ratios below 100% at the putative 5’ or 3’ ends of annotated transcripts, coupled with non-zero coverage of their intergenic regions, provide evidence of overlapping (and potentially bicistronic) transcripts. Most transcripts predicted to be involved in siderophore biosynthesis were monocistronic in St. bombicola, C. apicola, and C. versatilis, but C. versatilis had a sub-population of potentially overlapping mRNAs, including the entB and entD genes on one end, as well as the entE, entA, and entH genes on the other (Fig. 4), with the entE-entA-entH genes showing the strongest signal of overlap. Previously reported yeast bicistronic transcripts have been attributed mainly to inefficiencies in the RNA transcription machinery (24, 25), whereas the yeast ent transcripts we have described here encode functionally related steps of a biosynthesis pathway that may retain some polycistronic characteristics from their ancestry as parts of a bacterial operon.
We also examined the transcriptomic data for evidence of transcriptional processing and found that many of the siderophore biosynthesis genes contained putative polyA tails (Fig 4., Figs S2-S4). We did not find any evidence suggesting that 5’ caps were added by trans-splicing (26) or by alternatively cis-splicing a common cassette exon upstream of each protein-coding region (27). Thus, we conclude that, even in C. versatilis, the majority of transcripts are likely transcribed and processed through conventional eukaryotic mechanisms that involve distinct promoters and polyadenylation sites for each gene. These results further suggest that most sequence modifications for eukaryotic expression act pre- or co-transcriptionally, rather than through specialized sequences to enable translation.
Bacterial siderophore biosynthesis is functional in yeasts
To determine whether yeasts that contain the ent biosynthesis genes actually produce siderophores, we grew them on a low-iron medium overlaid with iron-complexed indicators. In presence of iron chelators, such as siderophores, the medium changes color from blue to orange, in a characteristic halo pattern that tracks the diffusion gradient of siderophores secreted from colonies into the surrounding medium. We tested the 18 yeast species from the W/S clade that we sequenced, together with eight outgroup species spread broadly across the yeast phylogeny (including S. cerevisiae) and E. coli as a positive control, and we observed unambiguously strong signals of siderophore production in five species, all of which contained the siderophore biosynthesis genes (Fig. 4, Fig. S5). The lack of signal in other species harboring the siderophore biosynthesis genes could suggest the secondary inactivation of the pathway (through mechanisms other than nonsense or frameshift mutations, which are absent), but it is more likely that siderophore production is below the sensitivity of the CAS assay or is not induced under the conditions studied. Nevertheless, this experiment conclusively shows that the bacterial siderophore biosynthesis are, not only transcriptionally active, but also fully functional in at least some W/S clade yeasts.
Evolution of a bacterial operon inside a eukaryotic host
Given the significant differences in Central Dogma processes between bacteria and eukaryotes, we investigated how the horizontally transferred operon was successfully assimilated into these yeasts by mapping key changes in gene content, structure, and regulation onto the phylogeny (Fig. 5a). First, the phylogenetic distribution of the operon genes suggests at least five cases of secondary loss in W/S clade yeasts, a common occurrence for other fungal gene clusters (28–31). Although all taxa contain the six core genes (entA-F), C. versatilis uniquely harbors a homolog of the entH gene, which encodes a proofreading thioesterase that is not strictly required for siderophore biosynthesis (32). Since no homologs or remnants of other genes from the bacterial operon could be identified, we hypothesize that they were lost due to functional redundancy with genes already present in yeast genomes (e.g. the bacterial ABC transporters fepA-G are redundant with the yeast major facilitator superfamily transporters ARN1-4, the bacterial esterase fes is redundant with yeast ferric reductases FRE1-8). Second, most extant Enterobacteriaceae species closely related to the source lineage share an operon structure similar to that of E. coli (Table S4), which is more complex than that of the W/S clade yeasts (Fig. 5b). Based on this evidence and a molecular clock (33), we infer that an ancient bacterial operon, whose structure was somewhere between that of E. coli and C. versatilis, was horizontally transferred into a yeast cell tens of millions of years ago. The operon may have contained fewer genes than extant bacterial operons, or some shared gene losses or rearrangements may have occurred to produce a structure similar to that of C. versatilis in the last common ancestor of the W/S clade yeasts. Modern yeasts of this clade evolved at least four different structures through several lineage-specific rearrangements that tended to create derived gene cluster structures with more eukaryotic characteristics, including increasing the size of the intergenic regions, splitting the gene cluster in two in C. apicola, and intercalating at least four eukaryotic genes. The intercalation of a gene encoding a eukaryotic ferric reductase (FRE), which is involved in reductive iron assimilation, between two operon genes in a subset of species offers a particularly telling example. The genetic linkage of these two mechanisms for acquiring iron shows that bacterial and eukaryotic genes can stably co-exist, and perhaps even be selected together as gene clusters for co-inheritance or co-regulation, through eukaryotic mechanisms.
(A) ML phylogeny reconstructed from the concatenated alignment of 661 conserved, single copy genes (834,750 sites), with branch supports below 100 shown. Species in bold denote genomes sequenced in this study, while species in red denote genomes containing the siderophore biosynthesis genes. Black diamonds indicate secondary losses in yeast lineages, accompanied by losses of the siderophore importer ARN genes, which are often found in close proximity. (1) Horizontal operon transfer from an Enterobacteriaceae lineage. (2) Rearrangement and integration of genes encoding ferric reductase (FRE) and an uncharacterized transmembrane protein (TM). (3) Disruption by integration of the SNZ-SNO gene pair and translocation. (B) Genetic structure of the siderophore biosynthesis operon in E. coli and yeasts. Individual colors represent homologous ORFs, drawn to scale, and gray marks genes not found in yeasts. Black circles represent contig termini within 25kb.
Discussion
The horizontal transfer of this siderophore biosynthesis operon is the first clearly documented example of the acquisition of a bacterial operon by a eukaryotic lineage. Several examples of horizontal gene transfer between different domains of life have been uncovered (9, 34–37), but the transfer of entire operons into eukaryotes has been merely speculated upon as an intriguing potential route of acquisition of secondary metabolism pathways (34, 38). The previous lack of evidence for HOT into eukaryotes led authors to propose barriers due to pathway complexity (39) and differences in core Central Dogma processes (7, 40). Where could the transfer of the siderophore biosynthesis operon between Enterobacteriaceae and yeasts have occurred, and how could the bacterial operon have been functionally maintained in the yeasts’ genomes? Eukaryotes have been proposed to acquire bacterial genes through several mechanisms, including virus-aided transmission (41), environmental stress-induced DNA damage and repair (42, 43), and a phagocytosis-based gene ratchet (44). The species that harbor the siderophore biosynthesis operon have been isolated predominantly from insects (45–47), where stable bacterial and eukaryotic communities coexist inside their guts (20). Moreover, this niche harbors diverse Enterobacteriaceae populations in which horizontal gene transfer has been reported (48, 49), and insect guts have recently been described as a “mating nest” for yeasts (50). Since Enterobacteriaceae and yeasts can conjugate directly in some cases (51), it is plausible that the last common ancestor of the W/S clade yeasts incorporated the operon from a bacterial co-inhabitant of an insect gut. Due to the intense competition for nutrients in this ecosystem, including a constant arms race with the host organism itself (52), yeasts able to make their own siderophores and sequester iron may have had a substantial advantage over those relying on siderophores produced by others.
Given the fundamental differences between bacterial and eukaryotic gene regulation, how could a bacterial operon have been maintained in a eukaryotic genome upon transfer? If it had not been actively expressed and functional, the genes of the operon would have been rapidly lost from the genome through neutral evolutionary processes. Although eukaryotes do not encode proteins with significant similarity to the bacterial regulator Fur that controls the expression of the bacterial ent genes, their iron response is governed by transcription factors that also belong to the GATA family. Indeed, the consensus Fur-binding site (5’-GATAAT-3’) is remarkably similar to that of the fungal transcriptional factors that respond to iron (5’-WGATAA-3’) (19, 53). This similarity suggests the intriguing possibility that the siderophore genes could have readily switched from being regulated by a bacterial transcription factor to a eukaryotic transcription factor, at least for the most 5’ promoter. Siderophores are potent chelators that can efficiently sequester iron even at low concentrations (54), so even a low basal expression level of the newly acquired bacterial genes could have been enough to convey a considerable selective advantage. This initial eukaryotic expression, perhaps aided by noisy transcriptional and translation processes that include leaky scanning and internal ribosome entry sites (IRESs), could then have been optimized by acquiring more eukaryotic characteristics, such as longer intergenic regions that were gradually refined into promoters, distinct polyadenylation sites, and a shift from polycistronic to bicistronic and eventually to primarily monocistronic transcripts. The incorporation of a eukaryotic gene encoding a ferric reductase would have further improved the efficiency of iron acquisition in the highly competitive ecological niche of insect guts, while enhancing the eukaryotic characteristics of the gene cluster. Our HOT finding dramatically expands the boundaries of the cross-domain gene flow. The transfer, maintenance, expression, and adaptation of a multi-gene bacterial operon to a eukaryotic host underscore the flexibility of transcriptional and translational systems to produce adaptive changes from novel and unexpected sources of genetic information.
List of Supplementary Materials
Materials and Methods
Captions for Tables S1-S6 (separate files)
Acknowledgments
We thank David J. Eide and Michael D. Bucci for advice on low-iron media; Nicole T. Perna and Jeremy D. Glasner for E. coli strain MG1655; the Eide, Perna, Rokas and Hittinger labs for comments and discussions; RIKEN for publicly releasing 20 genome sequences; Lucigen Corporation (Middleton, WI) for use of their Covaris for gDNA sonication; and the University of Wisconsin Biotechnology Center DNA Sequencing Facility for providing Illumina sequencing facilities and services. This work was conducted in part using the computational resources of the Wisconsin Energy Institute and the Center for High-Throughput Computing at the University of Wisconsin-Madison. This material is based upon work supported by the National Science Foundation under Grant Nos. DEB-1442113 (to A.R.) and DEB-1442148 (to C.T.H. and C.P.K.), in part by the DOE Great Lakes Bioenergy Research Center (DOE Office of Science BER DE-FC02-07ER64494 to Timothy J. Donohue), the USDA National Institute of Food and Agriculture (Hatch Project 1003258 to C.T.H.), and the National Institutes of Health (NIAID AI105619 to A.R.). C.T.H. is a Pew Scholar in the Biomedical Sciences, supported by the Pew Charitable Trusts. D.T.D. is supported by a NHGRI training grant to the Genomic Sciences Training Program (5T32HG002760). Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. USDA is an equal opportunity provider and employer.
Raw DNA and RNA sequencing data were deposited in GenBank under Bioproject ID PRJNA396763. Whole Genome Shotgun assemblies have been deposited at DDBJ/ENA/GenBank under the accessions NRDR00000000-NREI00000000. The versions described in this paper are versions NRDR01000000-NREI01000000.
Author contributions: J.K. (study design, genome assembly, annotation, phylogenetic analyses, RNA-seq data analysis, text); D.T.D. (study design, CAS assays, RNA isolation and strand-specific library preparation, text); D.A.O., J.D., and A.B.H. (genomic DNA isolation and library preparation); X.S and X.Z. (preliminary genomic analyses); and C. P. K., A.R., and C.T.H. (study design, text).