Abstract
The anaerobic ammonium oxidation (anammox) bacteria transform ammonium and nitrite to dinitrogen gas, and this obligate anaerobic process accounts for nearly half of the global nitrogen loss for surface environments. Yet its origin and evolution, which may give important insights into the biogeochemistry of early Earth, remains enigmatic. Here, we compile a comprehensive sequence data set of anammox bacteria and confirm their single origin within the phylum Planctomycetes. After accommodating the uncertainties and factors influencing time estimates with different statistical methods, we estimate that anammox bacteria originated at around the so-called Great Oxidation Event (GOE; 2.32 to 2.5 billion years ago [Ga]) which is thought to have fundamentally changed global biogeochemical cycles. We further show that during the origin of anammox bacteria, genes involved in oxidative stress, bioenergetics and anammox granules formation were recruited, which may have contributed to their survival on an increasingly oxic Earth. Our findings suggest the rising levels of atmospheric oxygen, which made nitrite increasingly available, was a potential driving force for the emergence of anammox bacteria. This is one of the first studies that link the GOE to the evolution of obligate anaerobic bacteria.
Introduction
Anaerobic ammonium oxidation (anammox, NH4+ + NO2− → N2 + 2H2O), which usually occurs in anoxic marine, freshwater and wetland settings, accounts for up to 50% of the removal of fixed nitrogen (N) in nature. Along with denitrification, it is recognized as an important biological process that leads to N loss from the environment 1,2. In contrast to the nitrification-denitrification process for wastewater treatment, coupling partial nitrification and anammox is cost-effective and environmentally friendly due to its lower oxygen requirement for aeration (nitrite [N(+III)], instead of nitrate [N(+V)] is sufficient for the anammox metabolism), carbon free cultivation (some denitrifying bacteria are heterotrophic) and its negligible emissions of greenhouse gases like N2O. Consequently, anammox bacteria, the organisms that perform anammox, have been widely used in wastewater treatments 2–5. Nevertheless, their evolutionary history and antiquity are poorly known, which hinders accurate reconstructions of the biogeochemical nitrogen cycle over geologic time.
Previous studies investigated the roles of key genes driving anammox, including hzsABC (hydrazine synthase)6, hdh (hydrazine dehydrogenase), hao (hydroxylamine oxidoreductase) 7 and nxr (nitrite oxidoreductase) 8. In general, the vital enzymes encoded by these genes either are directly involved in anammox or participate in replenishing electrons to the cyclic electron flow which is consumed by the reduction of NAD+ during the ATP synthesis 9,10. Further, all members of anammox bacteria are found within the phylum Planctomycetes 11. Six candidate genera of anammox bacteria, namely ‘Candidatus Brocadia’, ‘Candidatus Kuenenia’, ‘Candidatus Jettenia’, ‘Candidatus Scalindua’, ‘Candidatus Anammoximicrobium’ and ‘Candidatus Anammoxoglobus’, have been proposed based on 16S ribosomal RNA (rRNA) gene sequences 12, but none of them have been successfully isolated into pure cultures. Apparently, the habitat of anammox bacteria requires simultaneous presence of reduced (ammonia) and oxidized (nitrite) inorganic N compounds. Such habitats are often found at the aerobic-anaerobic interface in aquatic ecosystems, including the margins of oxygen minimum zones (OMZs) in the ocean and sediment-water interfaces, where ammonium originates from the anaerobic degradation of organic matter and nitrite can be produced by aerobic ammonia oxidation 13. Although a recent study 14 revealed the potential of anammox bacteria to utilize metal oxides as the electron acceptor instead of nitrite, the feasibility of this metabolism in nature remains largely unresolved. Thus, the availability of nitrite is presumably the determinant factor to the origin of anammox. It is believed that the early Earth (before 3 Ga) was fully anoxic and deficient in aerobic ecosystems that were able to generate nitrite/nitrate 15–17. The first transient and/or localized occurrences of aerobic nitrogen cycling appear in the Mid- to Neoarchean (2.7 Ga 18–21), but evidence of widespread nitrate availability does not appear until around 2.4 billion years ago (Ga), the so-called Great Oxidation Event (GOE) 22–24. The GOE marks the rise of free O2 in the atmosphere above 10-5 times modern levels, and was ultimately a result of the emergence of oxygenic cyanobacteria 25. In contrast to nitrite, ammonium, the other substrate of anammox metabolism, was thermodynamically stable in the deep ocean throughout the Archean 26 and probably extending well into the Proterozoic 27. It is therefore tempting to hypothesize that the origin of anammox coincided with Neoarchean oxygen oases or the GOE itself, when molecular oxygen (O2) rose permanently to a concentration that is biologically meaningful 28, thereby creating the (micro-)environments satisfying anammox metabolism by providing enough nitrite.
Several tools exist for investigating the link between the evolution of metabolic pathways and geo-environmental transformations. One way to explore the evolutionary history of a specific metabolic pathway is based on organic biomarkers, which, however, are often affected by the poor preservation over geologic timescales 29. For instance, ladderanes, a type of lipids that is unique to anammox bacteria 30, are rarely preserved to a level that can be used to date their evolutionary origin. Another approach is based on nitrogen isotope ratios of sedimentary records 26,31,32. However, the isotopic fractionation factors for different metabolic pathways overlap widely (denitrification: -5 to -30‰; anammox: -16 or -24‰27), making it difficult to single out and elucidate the evolution of the different redox reactions within the N cycle by this method 33. Alternatively, molecular dating, which estimates the age of the last common ancestor (LCA) of analyzed lineages by comparing their sequences based on the molecular clock theory 34, provides a powerful strategy to investigate this issue. Here, we hypothesize that the rising level of O2 during GOE, which made nitrite available to power anammox 27, was a driving force of the origin of anammox bacteria. To test this hypothesis and to obtain more insights into the evolution of anammox bacteria, we compiled an up-to-date genomic data set of Planctomycetes (see Supplementary Text section 1; Dataset S1.1), placed the evolution of anammox bacteria into the context of geological events, and investigated genomic changes underlying the origin of this ecologically important bacterial lineage.
Results and discussion
Monophyletic origins of anammox bacteria and anammox genes
The anammox bacteria form a monophyletic group within the phylum Planctomycetes in a comprehensive phylogenomic tree with 881 Planctomycetes genomes (Fig. S1). The anammox bacteria clade in this phylogenomic tree (Fig. S1) comprises four known genera including the early-branching Ca. Scalindua and Ca. Kuenenia, and the late-branching Ca. Jettenia and Ca. Brocadia. It also includes two understudied lineages which we named ‘basal lineage’ and ‘hzsCBA-less lineage’ (Fig. S1), both of which have relatively low genome completeness compared to the other anammox bacteria and are represented by metagenome-assembled genomes (MAGs) sampled from groundwater (see Supplementary Text section 2.1). Note that the absence of hzsCBA in several genomes within the ‘hzsCBA-less lineage’ might be ascribed to the loss of these genes in evolution or their low genome completeness (Fig. S1). Because of their shallow (late-branching) phylogenetic position, this uncertainty should not affect the inference of the origin of anammox bacteria. Two described genera (Ca. Anammoximicrobium and Ca. Anammoxoglobus) do not have genomes available by the time of the present study (last accessed: April 2021), and are therefore not included in the phylogenomic analysis. Furthermore, using the last updated set of 2,077 Planctomycetes genomes (retrieved in April 2021) from NCBI, we obtained a consistent topology of anammox bacteria (Fig. S2).
To check whether important groups, particularly early-branching lineages, have been encompassed in our analyzed genome data sets, we built a 16S rRNA gene tree (Fig. S3) using the identified 16S rRNA genes from genomic sequences of anammox bacteria and the deposited 16S rRNA gene amplicons in SILVA from the class Brocadiae, which comprises both anammox and non-anammox bacteria (see Supplementary Texts section 2.2). Since the anammox bacteria with genome sequences in this 16S rRNA gene tree (Fig. S3) showed a branching order congruent with the topology in the phylogenomic tree (Fig. S1), and since the 16S rRNA gene tree included 913 (clustered from 20,142) sequences sampled from a wide array of habitats (marine, sediments, man-made reactor, freshwater and terrestrial ecosystems), the 16S rRNA gene phylogeny, which could better capture the diversity of anammox bacteria by including uncultured samples, likely encompasses the early-branching lineages of anammox bacteria. Taken together, our genome set, which encompasses the genomes of the early-branching lineages, is appropriate to be used to study the evolutionary origin of anammox bacteria (Dataset S1.3).
We further analyzed the evolution (Fig. S4) of the genes (hzsCBA) critical to the anammox reaction 7,35. Reconciliation (see Supplementary Texts section 2.3) of the gene and the genome-based species phylogenies pointed to a single origin of anammox metabolism at the LCA of anammox bacteria (Fig. S4). Taken together, the above analyses suggest that our genome data sets have encompassed the earliest-branching lineages and deep phylogenetic diversity of anammox bacteria, and indicate its monophyletic origin, thereby providing a foundation for estimating the origin of anammox metabolism by estimating the origin time of anammox bacteria.
The evolutionary origin of anammox bacteria coincided with the rising O2
We employed MCMCTree to estimate the divergence times of the anammox bacteria with i) soft bounds (time constraint allowing a small probability of violation) based on cyanobacteria fossils, and ii) the relaxed molecular clock approaches allowing substitution rates to vary among branches. Using a best-practiced dating scheme (Fig. 1; see Supplementary Text section 3), we dated the origin of anammox bacteria to 2,117 million years ago (Ma) (95% highest posterior density [HPD] interval, 2,002 - 2,226 Ma). Further, we performed repeat analysis based on the expanded Planctomycetes genomes (Genome set 2; Dataset S1.1) or based on the alternative constraint topology inferred with a better profile mixture model (LG+C60+G; Genome set 1; Dataset S1.1). In general, both analyses resolved consistent topologies (Fig. S5) as shown in Figure 1. Overall, they gave similar posterior times of the LCA of the anammox bacteria at 2,105 Ma (95% HPD: 1,961 - 2,235 Ma) or 2,005 Ma (95% HPD: 1,869 - 2,146 Ma).
The evolutionary timeline of anammox bacteria using MCMCTree. The chronogram was estimated based on the calibration set C1 (see Supplementary Text section 3). The blue bars on the four calibration nodes and the last common ancestor (LCA) of anammox bacteria represent the posterior 95% highest probability density (HPD) interval of the posterior time estimates. These alternative calibration sets were selected by choosing the three that accommodate different calibration constraints (see Supplementary Texts section 3.3). More alternative time estimates are provided in Figure S4. The vertical grey bar represents the period of the GOE from 2,500 to 2,320 Ma. The calibration constraints used within the phylum Cyanobacteria are marked with orange texts: the LCA of Planctomycetes and Cyanobacteria (Root), the total group of oxygenic Cyanobacteria (Node 2), the total group of Nostocales (Node 3), and the total group of Pleurocapsales (Node 4). The genome completeness estimated by checkM is visualized with a gradient color strip. The right next color strip indicates the genome type of genomic sequences used in our study including metagenome-assembled genomes (MAGs) and whole-genome sequencing (WGS) from either enriched culture sample or isolate. The diagram below the dated tree illustrates the change of atmospheric partial pressure of O2 (PO2) and nitrogen isotope fractionations. The blue line shows the proposed model according to Lyons, et al. 25. The green arrow suggests the earliest evidence for aerobic nitrogen cycling at around 2.7 Ga. PAL on right axis means PO2 relative to the present atmospheric level. The black dots denote the change of nitrogen δ N isotope values according to Kipp, et al.
Analyses with different combinations of calibrations and parameters broadly converge to similar time estimates (Fig. S6; Dataset S2.1). Specifically, assigning a more ancient time constraint to the total group of oxygenic cyanobacteria, which is based on the biogeochemical evidence of the presence of O2 at nearly 3.0 Ga 36 instead of GOE (Dataset S2.1), shifted the posterior dates of the LCA of anammox bacteria to the past by ∼300 million years (Myr) (e.g., C1 versus C10; Fig. S6). Varying the maximum bounds of the root of the tree of life between 4.5 Ga, 3.8 Ga or 3.5 Ga had a minor impact (around 100 Myr) on the time estimates of anammox bacteria (e.g., C1 versus C3; Fig. S6). Though the auto-correlated rates (AR) model was less favored than the independent rates (IR) model based on the model comparison (see Supplementary Text section 3.3; Dataset S2.2), it is worth noting that the posterior ages estimated by the AR model were 1-12% younger than estimated by the IR model (Dataset S2.1). Besides, there remains a possibility that there are unsampled or even extinct lineages of anammox bacteria that diverged earlier than all anammox bacteria analyzed in the present study. This scenario, if true, hints that the first anammox bacterium could have originated before the occurrence of the LCA of sequenced anammox bacteria but later than their split from the sister lineage (Node X in Fig. 1; up to 2,600 Ma). Accommodating the above uncertainties, our analyses suggest that the origin of anammox bacteria, hence the origin of anammox occurred most likely falls into the 2.6-2.0 Ga interval. Running MCMCTree analysis with no sequence data showed different distributions of time estimates for both the clade of anammox bacteria and the four calibration points (Fig. S7), suggesting that sequence data are informative for our molecular clock analysis 37.
The geochemical context of the origin of anammox bacteria
We speculate that the timing of the origin of the anammox metabolism is linked to the increasing availability of nitrite in surface environments, because ammonium was likely present in the deep anoxic ocean throughout the Precambrian 27 and therefore probably not a limiting substrate. The first abiotic source of nitrite on the early Earth would have been lightning reactions between N2 and CO2 in the Archean atmosphere 38,39. A prior study suggested that this process led to micromolar levels of nitrite in seawater 38, which implies the possibility of an earlier origin of anammox bacteria. However, this estimate was based on perhaps unrealistically high amounts of CO2 40. Furthermore, there is no isotopic evidence in the early Archean rock record prior to 3 Ga for the presence of a significant nitrite or nitrate reservoir in the early ocean 15–17, and it is conceivable that any nitrite that was supplied to the ocean by lightning was rapidly reduced to ammonium or N2 by ferrous iron, possibly even abiotically 41,42. Hence, although a background lightning flux of nitrite almost certainly existed, there is no evidence that it supplied a reliable metabolic substrate for early life.
The first transient appearances of nitrite/nitrate-dependent metabolisms are captured by the sedimentary nitrogen isotope record in late Mesoarchean soils at 3.2 Ga 19 and in Neoarchean shallow-marine settings at 2.7 Ga and 2.5 Ga 18,20,21. These observations may reflect local and/or temporally restricted oxygen oases 43,44. Widespread nitrite/nitrate availability is inferred for the Paleoproterozoic (2.4-1.8 Ga), i.e. in the immediate aftermath of the GOE 22–24. Importantly, O2 is necessary, although a trace amount is feasible, to extant nitrifying organisms for the oxidation of ammonium to nitrite and nitrate 45–47. Hence the rise of oxygen almost certainly triggered the growth of the nitrite/nitrate reservoir, and therefore provided one of the key substrates for anammox bacteria. It has been shown that a low concentration of nitrite significantly decreases the N removal rate by anammox bacteria in reactors 48. In modern OMZs, which may to some extent serve as analogues for the Paleoproterozoic redox-stratified ocean, the concentration of nitrite is the rate-limiting factor to the anammox metabolism 49. Hence it was most likely that the rise of nitrite, driven by the rise of O2, facilitated the appearance of anammox bacteria at 2.6-2.0 Ga.
In modern environments, the majority of nitrite used in anammox is derived from nitrate reduction or aerobic ammonia oxidation 50, the latter performed by ammonia-oxidizing archaea (AOA) or bacteria (AOB). As noted above, the nitrification likely appeared by at least 2.7 Ga, as indicated by nitrogen-isotope evidence for local accumulation of nitrate in surface ocean waters 27,51,52. A recent study inferred that the LCA of AOA dates back to ∼2.3 Ga and first appeared on land, driven by increasing O2 concentrations in the atmosphere at that time, and that the expansion of AOA from land to the ocean did not occur until nearly 1.0 Ga 53. This hints at a dominant role of AOB in early ocean, before 2.3 Ga. Consistent with this idea, the vast majority of the early-branching lineages of anammox bacteria (Ca. Scalindua) and the sister Planctomycetes lineages of the anammox clade in the phylogenomic tree (Fig. 1) are found in marine environments. Furthermore, in the 16S rRNA gene tree which represents a greater diversity than the phylogenomic tree, marine lineages still account for the majority of the earliest-branching lineages of anammox bacteria (Fig. S3). Thus, it seems likely that the LCA of anammox bacteria originated in the marine realm where AOB thrived first. The above arguments, although speculative, tentatively suggest that the nitrite demand for anammox had been readily met by GOE by taking advantage of the significant amount of O2 newly available, which is broadly consistent with our molecular dating results suggesting an origin of anammox shortly before or after the GOE. Note that there are recently reported anammox bacteria lineages showing less dependence on nitrite, which may use nitric oxide or hydroxylamine as the electron receptor in anammox 12. These lineages are affiliated with Ca. Kuenenia and Ca. Brocadia, which evolved recently, implying that the ability to use alternative electron receptors are likely not ancestral to all anammox bacteria. This aspect further indicates that lightning was likely not an important source of metabolic substrates in the early Archean, because nitric oxide is the major product of the lightning reaction between N2 and CO2 38,39.
Genomic changes related to anammox metabolism upon the origin of anammox bacteria
We further explored the genomic changes characterizing the origin of anammox bacteria (see Supplementary Text section 4). This includes the gains of the aforementioned genes that directly participate in anammox, viz., hzsABC, hdh, and hao. The current view of the anammox metabolism posits that electrons consumed by anammox for carbon fixation are replenished by the oxidation of nitrite to nitrate by nitrite oxidoreductase (NXR) 10, which was inferred to be acquired upon the origin of anammox bacteria (Fig. 2). Moreover, multiple auxiliary genes for N assimilatory pathways including N regulatory protein (glnB), assimilatory nitrite reductase large subunit (nirB), transporters like amt (ammonium), nirC (nitrite) and NRT family proteins (nitrate/nitrite), were also gained at the origin of anammox bacteria (Fig. 2). However, genes for nitrogen-related dissimilatory pathways encoding nitrite reduction to nitric oxide (nirK or nirS) and to ammonium (nrfAH), respectively, were likely acquired after the origin of anammox bacteria (Fig. 2). Anammox occurs at the membrane of anammoxosome, an organelle predominantly composed of a special type of lipid called ladderane. The ladderane membranes show low proton permeability, which helps maintain the proton motive force during the anammox metabolism 30. Although the biosynthetic pathway of ladderane is yet to be characterized, a previous study 54 predicted 34 candidate genes responsible for the synthesis of ladderane lipids, four of which were potentially acquired during the origin of anammox bacteria (see Supplementary Text section 4.2; Fig. 2), hinting at their potential roles in ladderane synthesis and providing clues to future experimental investigations.
The phyletic pattern of ecologically relevant genes in the comparison between anammox bacteria and non-anammox bacteria. The phylogenomic tree on the left was constructed with 887 genomic sequences described in Supplementary Texts section 2.1. Solid circles at the nodes indicate the ultrafast bootstrap values in 1,000 bootstrapped replicates. Those Planctomycetes genomes not used for comparative genomics analyses are collapsed into grey triangles, and the numbers of collapsed genomes are labelled next to the triangles. The target group and reference group for comparative genomic analysis are within an orange or blue box, separately. For each genome, the genome completeness estimated by CheckM is visualized with a color strip and labelled besides leaf nodes. The right next color strip represents the type of genomic sequences used in our study including metagenome-assembled genomes (MAGs) and whole-genome sequencing (WGS) from either enriched culture sample or isolate. The filled and empty circles, respectively, represent the presence and absence of particular genes in corresponding genomes. For gene clusters, only genomes with at least half of the members of the gene cluster are indicated by a filled circle. The classifications of annotated genes are labelled above the gene names. hzsCBA, hydrazine synthase subunit C, B and A; hdh, hydrazine dehydrogenase; hao, hydroxylamine dehydrogenase; nxrAB, nitrite oxidoreductase subunit A and B; nirK, copper-containing and NO-forming nitrite reductase; nirS, cytochrome NO-forming nitrite reductase; nrfAH, ammonia-forming nitrite reductase subunit A and H; glnB, nitrogen regulatory protein P-II; nirC, nitrite transporter; NRT, nitrate/nitrite transporter; amt, ammonium transporter; kuste2805, 3603, 3605-3606, proposed genes relative to the synthetic pathways for ladderane at Rattray, et al. 54; cbiG, cobalt-precorrin 5A hydrolase; cbiD, cobalt-precorrin-5B(C1)-methyltransferase; cbiOMQ, cobalt/nickel transport system; AhpC, peroxiredoxin; Ccp, cytochrome c peroxidase; dfx, superoxide reductase; fprA, H2O-forming enzyme flavoprotein; CcsAB, cytochrome c maturation systems; petB, Cytochrome b subunit of the bc complex; petC, Rieske Fe-S protein; ahbABCD, heme biosynthesis; sat, sulfate adenylyltransferase; aprAB, adenylylsulfate reductase, subunit A and B; fsr, sulfite reductase (coenzyme F420); higB-1, toxin; higA-1, antitoxin; mnhABCDEG, multicomponent Na+/H+ antiporter; fdhAB, formate dehydrogenase subunit A and B; nuo(A-N), NADH-quinone oxidoreductase; ndh(A-N), NAD(P)H-quinone oxidoreductase; nqrABCDEF, Na+-transporting NADH:ubiquinone oxidoreductase; rnfABCDEG, Na+-translocating ferredoxin:NAD+ oxidoreductase; atp(A-H), F-type H+-transporting ATPase; lacAZ, beta-galactosidase; melA, galA, alpha-galactosidase; ebgA, beta-galactosidase; galK, galactokinase;, fructokinase; fruK, 1-phosphofructokinase; rhaB, rhamnulokinase; rbsK, ribokinase; araB, L-ribulokinase; xylB, xylulokinase; kdgK, 2-dehydro-3-deoxygluconokinase; GALK2, N-acetylgalactosamine kinase; cah, cephalosporin-C deacetylase; argE, acetylornithine deacetylase; nagA; N-acetylglucosamine-6-phosphate deacetylase; HDAC11, histone deacetylase 11.
Anammox bacteria occur in anoxic habitats and exhibit low oxygen tolerance 12. Specifically, the O2-sensitive intermediates generated by anammox, e.g., hydrazine, a powerful reductant, requires anoxic conditions. Accordingly, the likely acquired peroxidases include cytochrome c peroxidase (Ccp), which could scavenge hydrogen peroxide in the periplasm 55, and the most prevalent peroxidase 56, thioredoxin-dependent peroxiredoxin (AhpC). Besides peroxide scavengers, the desulfoferrodoxin (Dfx), which functions as superoxide reductase (SOR) to reduce superoxide to hydrogen peroxide and which is broadly distributed among anaerobic bacteria 56, were likely acquired during the origin of anammox bacteria (Fig. 2). Another acquired gene is fprA, which encodes flavo-diiron proteins that scavenge O2. Furthermore, cbi genes, which are exclusively involved in the anaerobic biosynthesis of vitamin B12 57, are widely present in the genomes of anammox bacteria, suggesting their acquisition during the early evolution of anammox bacteria (Fig. 2).
Investigating other metabolisms that were also acquired upon the origin of anammox bacteria allow reconstructing the coeval ecology. Our data show that sat (Fig. 2), which encodes sulfate adenylyltransferase for assimilatory incorporation of sulfate into bioavailable adenylyl sulfate, as well as aprAB and fsr (Fig. 2) for encoding dissimilatory sulfite reductase, were acquired upon the origin of anammox bacteria (Fig. 2). This acquired capability of sulfate metabolism is coincident with the increased concentrations of seawater sulfate from ∼100L or less throughout much of the Archean 58,59 to over 1mM after the rise of atmospheric O2 60,61.
Unlike other Planctomycetes, anammox bacteria are generally autotrophs that use the Wood-Ljungdahl pathway for carbon assimilation 62. Consequently, they potentially lost many genes involved in carbohydrate utilization (Fig. 2). The metabolic loss is further strengthened by the enrichment analysis where the genes predicted to be lost by the LCA of anammox bacteria are enriched in pathways involving hydrolase activity, intramolecular oxidoreductase activity and carbohydrate kinase activity (Fig. 2; see Data and Code availability), hinting at a decreased ability of anammox bacteria to degrade organics. Also lost are genes (mnhABCDEG) encoding the Na+/H+ antiporter (Fig. 2), which usually regulates sodium tolerance and pH homeostasis in marine environments 63. Only members of the dominant marine genus Ca. Scalindua keep this gene cluster. Although building up a proton motive force is vital to energy harvesting, the mechanism for this process was far unexplored in anammox bacteria 64. Our analysis shows that nuo genes, which encode NADH-quinone oxidoreductase that can build up a proton motive force, was likely acquired by the LCA of anammox bacteria (Fig. 2). Moreover, the presence of genes coding for Na+-coupled respiratory enzymes, for example rnf (ferredoxin-NADH oxidoreductase) and nqr (ubiquinone-NADH oxidoreductase), suggests that anammox bacteria could also use the sodium motive force for ATP synthesis (Fig. 2).
Additionally, iron is a vital element for HZS 6 and HDH 65, and in vitro studies revealed that increased iron concentrations can promote the growth of anammox bacteria 66. The ancient oceans are thought to have been ferruginous (high iron availability) according to sedimentary records of iron speciation across several coeval marine settings from the Archean through most of the Proterozoic 67, meaning that large amounts of soluble iron would have been available for anammox bacteria. Interestingly, a series of iron-related genes to make use of the abundant iron from the environment were likely acquired upon the origin of anammox bacteria. First, anammox bacteria likely acquired the gene fur, which encodes a ferric uptake regulator for iron assimilation. Moreover, the iron-containing proteins, specifically cytochrome, are encoded by the acquired genes petBC, which potentially encodes Rieske-heme b 10, and CcsAB, which encodes a protein involved in cytochrome c maturation systems (Fig. 2). Notably, ahbABCD, which encodes proteins-synthesizing b-type hemes in anammox bacteria 68, were likely acquired before the origin of anammox bacteria (Fig. 2). Furthermore, we identified that the gene Bfr that encodes the oligomeric protein, bacterioferritin involved in the uptake and storage of iron 69 was likely acquired during the origin of anammox bacteria (Fig. 2), which may help to hoard iron in settings where iron became locally sparse, such as along euxinic (sulfide-rich) or oxic marine margins in the Paleoproterozoic 70. It is thought that the appearance of euxinic margins trapped iron that upwelled from the deep ocean, leading to the demise of iron oxide deposits (i.e., banded iron formations) 71. Our results tentatively support this model, if the microenvironments that those primordial anammox bacteria colonized were located near the euxinic-oxic interface where nitrite and ammonium were available while upwelled iron was titrated out of the water column by freely dissolved hydrogen sulfide.
Taken together, the timelines and corresponding genomic changes of anammox bacteria provide implications for the physiological characteristics of descendant anammox bacteria and the origin of other nitrogen-transforming pathways in the context of an estimated evolutionary timeline of anammox bacteria. For example, the earlier branching lineage Ca. Scalindua exhibit high nitrite affinity (0.45 μM) compared to other anammox genera (up to 370 μM) 12,72, which may reflect a progressive growth of the marine nitrite reservoir, linked to the protracted oxygenation of the surface environments over geologic timescales 25. Likewise, genomic data reveal that also canonical denitrification radiated across the tree of life after the GOE 73, and the gains (Fig. S8; see Supplementary Text section 4.4) of genes nrfAH for dissimilatory nitrate reduction to ammonium (DNRA) at the late-branching lineages (Ca. Brocadia and Ca. Kuenenia) at around 800 Ma are consistent with the time when the nitrite/nitrate availability was increased by the Neoproterozoic Oxygenation Event 74. The time estimate and genomic changes of the anammox bacteria have gone some way towards enhancing the understanding of the physiological characteristics of modern anammox bacteria and the historical N cycle.
Concluding remarks
Traditionally, the metabolic evolution of the biosphere across the GOE has mainly been investigated in the phylum Cyanobacteria 75, likely because oxygenic photosynthesis is believed as a major driver of the GOE. More recently, scientists have increasingly realized that the impact of the GOE on the bacterial world may not be limited to cyanobacteria. This was well illustrated by Ren, et al. 53, who showed that the GOE was likely a driving force of the origin of the aerobic AOA and thus an aerobic nitrogen cycle on Earth. Furthermore, another molecular dating study suggested expanded arsenic resistance systems as biological adaptations to the increased toxicity of oxidized arsenic species after the GOE 76. However, to our knowledge, little was so far known about the relationships between the GOE and obligate anaerobic metabolisms.
Here, using molecular dating and comparative genomics approaches, we link the emergence of an obligate anaerobic bacterial group, which drives the loss of fixed N in many environments, to the rise of O2, and highlight their evolutionary responses to major environmental disturbances. Apparently, the GOE opened novel niches for the origin and subsequent expansion of diverse aerobic prokaryotes such as AOA and AOB, which in turn facilitated the origin of other bacterial lineages, including some anaerobes like the anammox bacteria, by providing the resources for their energy conservation. Our results open up the possibility that a significant proportion of the Proterozoic nitrite budget was consumed by anammox bacteria, and that the sedimentary nitrogen isotope record may be influenced by their activity. Given that anammox bacteria tend to produce less N2O as a by-product compared to canonical nitrification/denitrification 4,5, our findings may have implications for Proterozoic climate, because N2O has previously been invoked as an important greenhouse gas at that time 77. Lastly, our study suggests that the impacts of the GOE go well beyond aerobic prokaryotes. Specifically, the origin of anammox bacteria was likely driven by the expansion of nitrite/nitrate availability, as supported by isotopic evidence from sedimentary records. We therefore conclude that molecular dating is a powerful approach to complement isotopic evidence for resolving the timeline of biological evolution and for providing additional constraints on climate models of the distant past.
Materials and Methods
Overall, we compiled two genome sets and one 16S rRNA gene set in our study (Dataset S1.1). A phylogenomic tree (Fig. S1) was generated based on the concatenated alignment of 120 ubiquitous proteins (bac120; Dataset S1.3) proposed for tree inference by the Genome Taxonomy Database 78. All published genomic sequences (952 in total) affiliated with the phylum Planctomycetes in NCBI Genbank up to December 2019 were retrieved (Dataset S1.2). To further examine whether the patterns obtained with this Planctomycetes dataset hold the same, we further constructed an expanded set of Planctomycetes genomes by retrieving a total of 2,077 Planctomycetes genomes released by April 2021 at Genbank (Fig. S2; see Data and Code availability). Key anammox genes hzsCBA and hdh were identified against manually curated reference sequences by BLASTP. For each gene, identified protein sequences were aligned using MAFFT (v7.222) 79 and the alignments were refined by trimAl (v1.4) 80. All phylogenies were constructed by IQ-tree (v1.6.2) 81 with substitution models automatically selected by ModelFinder 82 and branch support assessed with 1,000 ultrafast bootstrap replicates. Note that the constraint topologies for dating analysis (Fig. 1; Fig. S5) were inferred with the profile mixture model (LG+C60+G) which better accommodates across-site heterogeneity in deep-time evolution. Following manual curation, four well-recognized genera and two separate lineages of anammox bacteria were highlighted with different colours, and the presence of key genes were annotated with filled symbols beside the labels (Fig. S1). Furthermore, the phylogenomic tree was generated with similar methods using 16S rRNA genes identified from downloaded genomes and those retrieved from the SILVA database (Fig. S3; Dataset S1.3). All trees (including phylogenomic and 16S trees) in our study were visualized with iTOL 83. More details are shown in supplementary text, section 2.
Molecular dating analysis was carried out using the program MCMCTree from the PAML package (4.9j) 84. In our study, 13 calibration sets constructed with different time constraints of the root and three calibration nodes within the cyanobacteria lineage were used (see Supplemental Texts section 3.2). The topology constraint for dating analysis was generated using 85 genomes, which were sampled from the phylogenomic tree of phylum Planctomycetes (see Supplemental Texts section 3.1), and the model LG+C20+F+G under posterior mean site frequency (PMSF) approximation 85. To perform dating analysis, clock models and different calibration sets were compared in 26 schemes (see Supplemental Texts section 3; Dataset S2.1) 86,87. For each scheme, the approximate likelihood method 88 of MCMCTree were conducted in duplicate with identical iteration parameters (burn-in: 10,000; sample frequency: 20; number of sample: 20,000). The convergence of each scheme was evaluated by comparing the posterior dates of two independent runs. With the updated genome set 2 (Dataset S1.1), we repeated the taxon sampling process and generated an alternative phylogeny for dating analysis (Fig. S5) using LG+C20+F+G model (C20: 20 classes of site-specific amino acid profiles) under PMSF approximations. Similarly, we also generated another phylogeny for dating analysis using the same genome set as used in Fig. 1 but with the profile mixture model (LG+C60+G) which uses 60 classes of amino acid profiles 89. Note that the missing parameter +F would not significantly change the results since it only adds another profile amino acid calculated from the original data. These two repeat dating analyses were conducted with the best practice dating scheme (IR model and calibration C1).
The phylogenetic differences between gene trees and the species tree were reconciliated by GeneRax 90 with unrooted gene tree as input and automatically optimized duplication, transfer, loss (DTL) rates. For each gene identified from the genome set 2 (Dataset S1.1), we used a species tree comprised by all anammox bacteria pruned from the phylogenomic tree (Fig. S2) as the reference. We used recommended parameters including SPR strategy, undated DTL model and a maximum radius of five.
For comparative genomics analysis, the protein-coding sequences were annotated against KEGG 91, CDD 92, InterPro 93, Pfam 94, TIGRFAM 95 and TCDB 96, individually (see Supplemental Texts section 4). Following annotations, the potentially gained/lost genes between groups (anammox bacteria versus non-anammox bacteria; an anammox bacterial genus versus all other anammox bacteria) were statistically tested using two-sided fisher’s exact tests. Finally, the resulting p-values were corrected with the Benjamini-Hochberg FDR (false discovery rate) procedure. Moreover, genes with corrected p-values smaller than 0.05 and with larger ratio in the study group were defined as gained genes, and genes with corrected p-values smaller than 0.05 and with smaller ratio in the study group were defined as potentially lost genes. Those potentially gained/lost genes were summarized for their detailed functions (see Data and Code availability).
Data and Code availability
All used genomic sequences, generated phylogenetic trees, estimated divergence times and the used python codes are deposited in the online repository https://github.com/luolab-cuhk/anammox-origin.
Conflict of interest
The authors declare that they have no conflict of interest.
Acknowledgements
We thank Bite Pei for building the basic dataset for this project, Yang Qian for his help in editing the earlier version of the manuscript, Jinjin Tao for her helpful discussion, and Mario dos Reis for his suggestion on dating analysis. This work is funded by the National Science Foundation of China (92051113), the Hong Kong Research Grants Council Area of Excellence Scheme (AoE/M-403/16), the Direct Grant of CUHK (4053495), and The CUHK Impact Postdoctoral Fellowship Scheme to (S.W.).