Abstract

Heme biosynthesis represents one of the most essential metabolic pathways in living organisms, providing the precursors for cytochrome prosthetic groups, photosynthetic pigments, and vitamin B12. Using genomic data, we have compared the heme pathway in the diatom Thalassiosira pseudonana and the red alga Cyanidioschyzon merolae to those of green algae and higher plants, as well as to those of heterotrophic eukaryotes (fungi, apicomplexans, and animals). Phylogenetic analyses showed the mosaic character of this pathway in photosynthetic eukaryotes. Although most of the algal and plant enzymes showed the expected plastid (cyanobacterial) origin, at least one of them (porphobilinogen deaminase) appears to have a mitochondrial (α-proteobacterial) origin. Another enzyme, glutamyl-tRNA synthase, obviously originated in the eukaryotic nucleus. Because all the plastid-targeted sequences consistently form a well-supported cluster, this suggests that genes were either transferred from the primary endosymbiont (cyanobacteria) to the primary host nucleus shortly after the primary endosymbiotic event or replaced with genes from other sources at an equally early time, i.e., before the formation of three primary plastid lineages. The one striking exception to this pattern is ferrochelatase, the enzyme catalyzing the first committed step to heme and bilin pigments. In this case, two red algal sequences do not cluster either with the other plastid sequences or with cyanobacterial sequences and appear to have a proteobacterial origin like that of the apicomplexan parasites Plasmodium and Toxoplasma. Although the heterokonts also acquired their plastid via secondary endosymbiosis from a red alga, the diatom has a typical plastid-cyanobacterial ferrochelatase. We have not found any remnants of the plastidlike heme pathway in the nonphotosynthetic heterokonts Phytophthora ramorum and Phytophthora sojae.

Introduction

The heme biosynthesis pathway is common to prokaryotes and eukaryotes. It is a fundamental metabolic pathway needed for biosynthesis of cytochromes, chlorophylls, phycobilins, and the corrin nucleus of vitamin B12. However, the first part of the pathway, the synthesis of 5-aminolevulinate (ALA), differs in various organisms. In photosynthetic eukaryotes and all prokaryotes that are not members of the α-proteobacterial group, ALA is synthesized by the C5 pathway starting with the five-carbon precursor glutamate, which is ligated to tRNAGlu by glutamyl-tRNA synthase (GluRS) (fig. 1A), reduced to form glutamate-1-semialdehyde by glutamyl-tRNA reductase, and then transaminated by glutamate-1-semialdehyde 2,1-aminomutase (GSA) to give ALA (Beale 1999). On the other hand, all eukaryotes that do not possess a photosynthetic plastid (animals, fungi, and apicomplexans), as well as α-proteobacteria, form ALA by condensation of succinyl-CoA with glycine in a reaction catalyzed by ALA synthase (fig.1B).

FIG. 1.—

General scheme of the heme biosynthesis pathway. The first part of the pathway, synthesis of ALA, differs in various organisms. In photosynthetic eukaryotes, ALA is synthesized in the plastid via the “glutamate” or C5 pathway (A), while in animals, fungi, and apicomplexans, ALA is produced in the mitochondrion using the Shemin or “succinyl-CoA” pathway (B).

From ALA to protoporphyrin IX, the pathway is generally the same in all organisms. In photoautotrophs, the pathway branches at protoporphyrin IX (fig. 1), with the insertion of Fe (II) to give heme or Mg (II) to give Mg-protoporphyrin IX, the first committed step of chlorophyll synthesis. In photosynthetic eukaryotes, the story is further complicated by the fact that heme is required for cytochrome biosynthesis in at least three cellular compartments: the cytoplasm, the mitochondrion, and the chloroplast. In the chloroplast, heme is needed not only for the assembly of the cytochrome b6f complex of the photosynthetic electron transport chain but also as the precursor of phytochromobilin (the chromophore of the photoregulatory molecule phytochrome) and the phycobilin pigments in those algae that use phycobilisomes for light harvesting. In plants, the steps from Mg-protoporphyrin IX to chlorophyll are localized in the chloroplast, but the subcellular localization of the reactions common to heme and chlorophyll biosynthesis have been the subject of much discussion. However, it now appears that all these enzymes are localized only in the chloroplast (Papenbrock and Grimm 2001; Cornah, Terry, and Smith 2003), with the exception of protoporphyrinogen oxidase, which is dually targeted to chloroplasts and mitochondria (Watanabe et al. 2001), and a small amount of ferrochelatase activity found in mitochondria (Cornah, Terry, and Smith 2003).

Many lines of evidence support the idea that the first chloroplast was the result of an endosymbiotic relationship between a cyanobacterium and a nonphotosynthetic eukaryote (Delwiche and Palmer 1997; McFadden 2001). This eukaryote almost certainly had a mitochondrion and was able to synthesize heme, although we have no idea where the heme biosynthesis pathway was located. Most of the cyanobacterial genes were lost from the plastid genome, but as many as 2,000 were transferred to the host nucleus (Martin et al. 2002). In order for the products of transferred genes to function in the plastid, they had to acquire presequences that would target them to the plastid. At the same time, some of the host's nuclear genes could also have acquired such presequences. An example of the latter case is the plastid-targeted fructose bisphosphate aldolase of plants and green algae, which appears to have resulted from duplication of the gene encoding the cytosolic enzyme (Gross et al. 1999).

All the algae with chlorophyll c are believed to have acquired their plastids by a process called secondary endosymbiogenesis, in which a red algal endosymbiont was engulfed and domesticated by a nonphotosynthetic eukaryotic host (Delwiche and Palmer 1997; McFadden 2001). These algae include the photosynthetic heterokonts such as diatoms, brown algae, and chrysophytes, as well as haptophytes, cryptophytes, and dinoflagellates. It is not clear if they all originated in a single endosymbiotic event, but comparison of heterokont, haptophyte, and cryptophyte plastid genes suggests that these three groups, collectively referred to as Kingdom Chromista (Cavalier-Smith 2002), had a common ancestor (Yoon et al. 2002). However, the dinoflagellates belong to the Kingdom Alveolata along with ciliates and apicomplexans. Although the apicomplexan parasites such as Plasmodium and Toxoplasma are not photosynthetic, they do have a relict plastid with a reduced plastid genome, and the plastid is the site of a number of metabolic pathways (Ralph et al. 2004). In fact, several pieces of evidence support the suggestion that there was only one secondary endosymbiotic event involving a red algal endosymbiont, giving rise to the chromalveolates (Cavalier-Smith 2002), a group including both the chromists and the alveolates (Fast et al. 2001; Patron, Rogers, and Keeling 2004).

In any case, in order for the secondary endosymbiotic relationship(s) to become established, there must have been a substantial amount of gene transfer from the red algal nucleus to the host nucleus, and these transferred genes would have included many that encoded chloroplast-targeted proteins such as the enzymes of the heme biosynthesis pathway. Now that the draft genome sequences of the diatom Thalassiosira pseudonana (Armbrust et al. 2004) (http://genome.jgi-psf.org/thaps1/thaps1.home.html) and the red alga Cyanidioschyzon merolae (Matsuzaki et al. 2004) (http://merolae.biol.s.u-tokyo.ac.jp/) are available, it is possible to investigate the evolutionary history of the heme biosynthesis enzymes in photosynthetic eukaryotes with primary and secondary plastids. We also looked for these genes in the draft genomes of two nonphotosynthetic heterokonts, the plant pathogens Phytophthora ramorum and Phytophthora sojae (http://genome.jgi-psf.org/ramorum/ and http://genome.jgi-psf.org/sojae/). If the common ancestor of all the heterokonts had a plastid, the oomycetes might have retained red algal–like genes, even though they now have no trace of a plastid. We also analyzed sequences from Plasmodium falciparum, Plasmodium yoelii (http://www.ncbi.nlm.nih.gov/), and Toxoplasma gondii (http://www.toxodb.org/ToxoDB.shtml).

Materials and Methods

Sequences

The genome of the diatom T. pseudonana (http://genome.jgi-psf.org/) was searched for putative genes of the heme pathway (Armbrust et al. 2004). The diatom gene models were compared to the entire National Center for Biotechnology Information protein database to make sure a well-identified gene was one of the best matches. Each gene model was examined and, if necessary, extended to make sure it had the N-terminal endoplasmic reticulum (ER) signal sequence required for the first step of chloroplast targeting in heterokont algae (Lang, Apt, and Kroth 1998; Kroth and Strotmann 1999; Kroth 2002). The identity of the T. pseudonana ferrochelatase was confirmed by sequencing. Sequences of the red alga C. merolae were obtained from the genomic Web site (http://merolae.biol.s.u-tokyo.ac.jp/): they were well annotated and did not need any further editing. A partial sequence of ferrochelatase from the red alga Porphyra yezoensis was assembled from two largely overlapping expressed sequence tags (ESTs) (http://www.kazusa.or.jp/en/plant/porphyra/EST/) giving a total length of 185 amino acids. The Galdieria sulfuraria EST database (http://genomics.msu.edu/galdieria) had a single EST (A438G06) encoding a 285-residue sequence clearly related to red algal ferrochelatases. Ferrochelatase from the apicomplexan parasite T. gondii was extracted from the ToxoDB Web site (http://www.toxodb.org/ToxoDB.shtml): from 30 contigs, the 2 longest were selected for phylogenetic analysis. Because they display some variability in their amino acid sequences, we used both sequences for tree reconstructions. Annotated gene models from P. sojae and P. ramorum with good EST support were obtained at the Joint Genome Initiative genome Web site (http://genome.jgi-psf.org/).

Phylogenetic Analysis

Phylogenetic trees of all enzymes of the heme biosynthesis pathway were constructed using amino acid sequences from photosynthetic eukaryotes, fungi, animals, proteobacteria, cyanobacteria, Firmicutes, and Archaea downloaded from GenBank™ (http://www.ncbi.nlm.nih.gov/); accession numbers are given in Supplementary Table 1 (Supplementary Material online). The origins of sequences not in Genbank are given in Supplementary Table 2 (Supplementary Material online). Sequences were aligned using the ClustalX program (Thompson et al. 1997), and the alignment was manually corrected to exclude gaps and ambiguously aligned regions. Phylogenetic trees were constructed using maximum likelihood (ML) and neighbor-joining (NJ) methods. PhyML program (Guindon and Gascuel 2003) was used to construct ML trees with the JTT substitutional matrix (Jones, Taylor, and Thornton 1992) and discrete gamma distribution in eight categories + invariant sites. Gamma shape parameter α and the fraction of invariant sites were estimated from the data set. ML bootstrap support was computed using SEQBOOT in PHYLIP version 3.6a3 (Felsenstein 2001) and PhyML. NJ trees were constructed with the JTT substitutional matrix, using the AsaturA program which is designed to deal with mutational saturation of amino acids (Van de Peer et al. 2002). The AsaturA program defines amino acid substitutions with high and low probabilities of occurrence as “frequent” or “rare,” respectively; for each sequence pair, the number of frequent and rare amino acid replacements is plotted against the calculated pairwise evolutionary distance. By testing several cutoff values for each tree, it is possible to determine what fraction of positions is mutationally saturated, and those positions are omitted from further phylogenetic analysis. We used the cutoff value that gave the highest number of rare (i.e., unsaturated) substitutions without change of the tree topology. NJ bootstrap support was computed in 1,000 replicates. Most trees were rooted with a representative Firmicute because (1) Firmicutes are considered a monophyletic group (e.g., Battistuzzi, Feijao, and Hedges, 2004), (2) they are not related to organelle donors (proteobacteria, cyanobacteria), and (3) their use produced trees where the major taxa formed the expected monophyletic groups. It must be pointed out that the choice of an out-group is almost always somewhat arbitrary; our trees could just as well have been presented as unrooted, with no change to our conclusions. In the case of GluRS, the tree is rooted by Thermotoga maritima, representing the most basal taxon in the data set (Woese 1987). The tree based on oxygen-independent coproporphyrinogen oxidase is presented as unrooted because there are no available Firmicutes sequences.

Targeting Prediction

Putative diatom signal peptides were predicted by SignalP (Nielsen et al. 1997; Nielsen, Brunak, and von Heijne 1999; http://www.cbs.dtu.dk/services/SignalP) and green algal and plant chloroplast transit peptides (cTPs) by TargetP (plant option) (Nielsen et al. 1997; Emanuelsson et al. 2000; http://www.cbs.dtu.dk/services/TargetP) and Predotar (http://www.inra.fr/predotar/). TargetP (plant and nonplant options) and Predotar were also used to predict putative mitochondrial-targeting presequences. The targeting of nuclear-encoded proteins from apicomplexans was predicted using the Prediction of Apicoplast Targeting Sequences program (http://gecco.org.chemie.uni-frankfurt.de/pats/pats-index.php) (Zuegge et al. 2001).

Results

Overview

In the T. pseudonana (diatom) and C. merolae (red algal) genomes, we found gene models for all the enzymes of the plant heme pathway (fig. 1A). Most of them were single copy, but as in other organisms, there were two unrelated coproporphyrinogen oxidases, an oxygen-dependent and an oxygen-independent form (Dailey 1997), and both a “cytosolic” and an “organellar” GluRS. We did not find genes encoding the alternative “α-proteobacterial–type” (succinyl-CoA) pathway for ALA synthesis (fig.1B) in either alga.

In the diatom, almost all the genes encoded an ER-targeting signal sequence at the 5′-end, as would be expected for a nuclear-encoded protein that is transported to the plastid via the endomembrane system (Apt et al. 2002, Kroth 2002). We used the ER signal sequence as our criterion for plastid targeting because signal sequences are very conserved across taxonomic groups and are predicted with 80%–85% reliability by the program SignalP (Nielsen et al. 1997; Nielsen, Brunak, and von Heijne 1999). We did not rely upon computer-based predictions for the diatom cTPs (which target the protein across the inner two membranes, i.e., those derived from the primary plastid) because both TargetP and Predotar are trained on sets derived from green plants and did not give consistent results with diatom homologs of proteins experimentally demonstrated to be plastid located in other photosynthetic eukaryotes (Armbrust et al. 2004; B. K. Chaal and B. R. Green, unpublished data).

Step 1: GluRSs

Genes encoding two types of GluRS were found in the diatom, in the red alga, and in plants. The sequences form two distinct clusters in all trees constructed (fig. 2). One diatom gene, T. pseudonana 2, and two C. merolae genes cluster with sequences from oomycetes, fungi, microsporidia, animals, and plants. This highly supported cluster is strongly affiliated with GluRSs from Archaea and is quite different from eubacterial homologues, suggesting that these gene sequences were derived from the original host cytosolic genes. There appears to have been an early gene duplication in the eukaryote lineage. Neither ML (fig. 2A) nor NJ (fig. 2B) analysis was able to resolve the branching order within the eukaryotes. Two of the plant sequences (O. sativa 2 and A. thaliana 2) possess putative cTPs at their N-termini according to the TargetP program (Emanuelsson et al. 2000). No ER-targeting domain or potential transit peptide was found for the diatom gene model, even after searching the nucleotide sequence for 1,000 nt upstream. The N-terminus of the translated gene model did not give a strong prediction with any program, so we do not know in what cell compartment its product functions.

FIG. 2.—

Phylogenetic trees of GluRS (glutamate tRNA ligase) amino acid sequences. (A) ML tree (loglk = −37,096.71883) constructed using PhyML with JTT substitutional matrix and discrete gamma distribution in eight + one categories. All parameters were estimated from data set (gamma shape = 1.473; Pinv = 0.017). Tree was rooted using Thermotoga maritima (Thermotogales) as out-group. Numbers above branches indicate ML bootstrap support (JTT, one category of sites, 100 replicates)/NJ bootstrap support (JTT, AsaturA cutoff = 446, 1,000 replicates); ML support only in eukaryotic-archaeal cluster. –(B) Alternative topology (NJ tree) of the eukaryotic-archaeal cluster obtained using AsaturA program (cutoff = 446). Numbers above branches indicate NJ bootstrap support. PT (black box), sequences with putative plastid targeting leaders; MT (white box), sequences possessing putative mitochondrial leaders.

The second diatom sequence, T. pseudonana 1, clusters with several plant and red algal sequences with good support but lies within a large badly resolved eubacterial clade. Even if the eukaryotic-archaeal cluster is excluded from the analysis, the phylogenetic position of the second plant-algal cluster cannot be resolved (data not shown). For all the proteins in this cluster, TargetP and Predotar gave inconsistent or ambiguous (mitochondrion and plastid) predictions for an N-terminal–targeting presequence. The diatom sequence has a potential mitochondrial leader but no ER signal sequence, so this protein may be imported by the mitochondrion.

Steps 2–4 and 7–9: Cyanobacterial-Plastid Clade

Glutamyl-tRNA reductase, the next enzyme of the pathway, is clearly of cyanobacterial origin (fig. 3A). The green algal and plant sequences form a well-supported clade and with the red algal and diatom sequences are sister group to the cyanobacterial sequences. Cyanobacterial and plastid-targeted GSA sequences also form a well-supported clade (fig. 3B), but the branching order within the clade is not well resolved. No sequences for either of these enzymes have been found in the genomes of Phytophthora or the apicomplexans.

FIG. 3.—

ML trees based on amino acid sequences of enzymes from Steps 2–5. (A) Glutamyl-tRNA reductase (loglk = −17,399.23256, gamma shape = 1.507; Pinv = 0.025). NJ AsaturA cutoff = 693. Tree rooted using Heliobacillus mobilis as out-group. (B) GSA (loglk = −13,748.91073, gamma shape = 1.032; Pinv = 0.000). NJ AsaturA cutoff value = 514. Tree rooted using Bacillus anthracis as out-group. (C) Porphobilinogen synthase (loglk = −15,061.76323, gamma = 1.164; Pinv = 0.043). AsaturA cutoff = 526. Tree rooted using H. mobilis as out-group. (D) Porphobilinogen deaminase (loglk = −15,466.38941; gamma = 1.340; Pinv = 0.081). AsaturA cutoff value = 614. Tree rooted using Staphylococcus epidermidis as out-group. Numbers above branches indicate ML bootstrap support (JTT, one category of sites, 300 replicates)/NJ bootstrap support (JTT, AsaturA cutoff value as specified for each tree, 1,000 replicates). Sequences putatively targeted to plastid are marked by PT (black box); sequences possessing putative mitochondrial leaders are indicated by MT (white box).

Porphobilinogen synthase (Step 4, fig. 3C) is the only protein for which there is a good selection of secondary endosymbiont sequences: T. pseudonana forms a well-supported cluster with the diatom Odontella sinensis and other heterokonts (Fucus vesiculosis and Laminaria digitata). This cluster is sister group to a cluster consisting of two rhodophyte sequences (Gracilaria gracilis and C. merolae) and the sequence from the chlorarachniophyte Bigelowiela natans. A recently published analysis of nuclear-encoded plastid-targeted proteins in B. natans showed that a significant fraction of these proteins (including GSA, fig. 3B) was not evolutionarily related to a green plastid ancestor (Archibald et al. 2003). The Plasmodium sequences form very long branches but are part of the cyanobacterial-plastid cluster. The two Phytophthora sequences branch with the fungal sequences, rather than with the other heterokont sequences.

Plastid and cyanobacterial sequences are on the same branch for uroporphyrinogen decarboxylase (Step 7, fig. 4A) and one form of protoporphyrinogen oxidase (Step 9, fig. 4D). In both cases, the red algal and diatom sequences are on the same branch but with poor statistical support, as was the case for glutamyl-tRNA reductase and porphobilinogen synthase. The two forms of red algal and diatom uroporphyrinogen decarboxylase appear to be the result of a gene duplication prior to the secondary endosymbiotic event, and both are predicted to be plastid targeted. The second form of protoporphyrinogen oxidase is found only in the green plants and appears to be of proteobacterial origin. Neither the Plasmodium nor the Phytophthora sequence is part of the cyanobacterial-plastid clade.

FIG. 4.—

ML trees based on amino acid sequences of enzymes from Steps 7–9. (A) Uroporphyrinogen decarboxylase (loglk = −18,487.67565; gamma shape = 1.453; Pinv = 0.040). AsaturA cutoff = 559. Tree rooted using Staphylococcus epidermidis as out-group. (B) Oxygen-independent coproporphyrinogen oxidase (loglk = −17,056.01128; gamma shape = 1.661; Pinv = 0.106). AsaturA cutoff = 446. Tree rooted using Thermoanaerobacter tengcongens sequence as out-group. (C) Oxygen-dependent coproporphyrinogen oxidase (loglk = −8305.35663; gamma shape = 1.369; Pinv = 0.100). AsaturA cutoff = 578. Tree rooted using Caulobacter crescentus (Proteobacteria) sequence as an out-group. (D) Protoporphyrinogen oxidase (loglk = −20,058.38689; gamma shape = 2.530; Pinv = 0.005). AsaturA cutoff = 343. Tree rooted using Listeria innocua sequence as out-group. Numbers above branches indicate ML bootstrap support (JTT, one category of sites, 300 replicates)/NJ bootstrap support (JTT, AsaturA cutoff value as specified, 1,000 replicates). Sequences putatively targeted to plastid are marked by PT (black box); sequences possessing putative mitochondrial leaders are indicated by MT (white box).

There are two unrelated coproporphyrinogen oxidases, the O2-independent (fig. 4B) and O2-dependent (fig. 4C). The cyanobacterial-plastid clade is well supported for the O2-independent enzyme. This gene was not found in apicomplexan or Phytophthora genomes. For the O2-dependent form, there is a plastid clade next to the Phytophthora branch but it is obviously not related to cyanobacteria. The tree as a whole does not support any particular origin of eukaryotic O2-dependent coproporphyrinogen oxidases.

Step 5: Porphobilinogen Deaminase: Proteobacterial Origin

In the case of porphobilinogen deaminase (fig. 3D), the plastid sequences form one cluster, which is invariably placed within an α-proteobacterial clade with high bootstrap support, suggesting a possible mitochondrial origin of this particular enzyme. The Plasmodium (apicomplexan) sequences appeared with low support at the root of the α-proteobacteria–plastid cluster, suggesting that they also might have an α-proteobacterial origin. The cyanobacterial sequences are quite separate, forming a sister group to the cluster containing opisthokonts and the two sequences from Phytophthora.

Step 10: Ferrochelatase

The red algal ferrochelatase sequences did not fit into the general plastid pattern (fig. 5). All three of them (from C. merolae, P. yezoensis, and G. sulfuraria) formed a well-supported branch which clustered with the apicomplexan and some proteobacterial sequences. The proteobacterial-apicomplexan-red algal branch was well supported in both ML and NJ trees. All the other plastid ferrochelatase sequences, including those of the diatoms T. pseudonana and Phaeodactylum tricornutum, formed a separate well-supported cluster with cyanobacterial sequences.

FIG. 5.—

Ferrochelatase (Step 10). (A) ML tree based on amino acid sequences (loglk = −15,844.57515; gamma shape = 1.630; Pinv = 0.023). Numbers above branches indicate ML bootstrap support (JTT, one category of sites, 300 replicates). (B) Cluster containing red algal and apicomplexan ferrochelatase sequences from NJ tree (JTT matrix, AsaturA cutoff = 395). PT (black box), sequences putatively targeted to plastid; MT (white box), sequences possessing putative mitochondrial leaders.

The Plasmodium ferrochelatase was previously proposed to be of proteobacterial origin on the basis of NJ and parsimony analysis, before the diatom and red algal sequences were available (Sato and Wilson 2003). Figure 5 supports this interpretation and suggests that the three red algae have also replaced their cyanobacterial-type ferrochelatase with a proteobacterial one. This was somewhat unexpected because the diatoms (with plastids of red algal origin) have the cyanobacterial form. The Phytophthora sequences again branch with the fungal sequences rather than with those of the photosynthetic heterokonts.

Discussion

Mosaic Origin

The evolution of metabolic pathways is a subject of much current interest, with a number of different hypotheses to explain the origin of new enzymes (Nara, Hshimoto, and Aoki 2000; Illingworth et al. 2003; Schmidt et al. 2003). In this work, we address the next level of evolution: gene acquisition/replacement related to the endosymbiotic origins of organelles. Because all known prokaryotes, eubacteria and archaea, require heme for numerous cellular functions, it is safe to assume that both the original amitochondriate eukaryote (assumed to have arisen from the archaeal lineage) and the α-proteobacterium that became the mitochondrion were able to synthesize heme. This may underlie the fact that in animal cells ALA is synthesized in the mitochondrion via succinyl-CoA, while the following four steps in the pathway (fig. 1) take place in the cytosol, ending with coproporphyrinogen III. Coproporphyrinogen III is then transported back to the mitochondrion where the last three steps of the heme pathway are located (Ralph et al. 2004).

The primary chloroplast endosymbiosis led to a massive transfer of cyanobacterial genes to the nucleus: at least 1,700 are still recognizable as cyanobacterial homologues in Arabidopsis (Martin et al. 2002). Some of them replaced the preexisting nuclear genes, but in other cases, a duplicate of the nuclear gene acquired a targeting sequence and took over part of the chloroplast pathway (Martin and Herrmann 1998). According to our phylogenetic analyses, most of the genes for enzymes involved in heme biosynthesis in photosynthetic eukaryotes originated in the cyanobacterial endosymbiont. For oxygen-dependent coproporphyrinogen oxidase (Step 8, fig. 4C) and uroporphyrinogen synthase (Step 6, data not shown), the plastid sequences form a well-supported cluster that is unrelated to the cyanobacterial cluster but not clearly related to any other group. The close relationship among sequences from all photosynthetic eukaryotes suggests that these genes were among those transferred to the host nucleus during the primary endosymbiogenesis, before the divergence of the two major primary plastid-containing lineages.

However, several enzymes in the pathway do not appear to be of cyanobacterial origin. In the case of porphobilinogen deaminase, the plastid enzymes are clearly related to those of α-proteobacteria (fig. 3D). One possibility is that the cyanobacterial gene was transferred to the nucleus after the primary endosymbiotic event but replaced by a gene of proteobacterial origin before the divergence of the red, green, and glaucophyte algae. Another possibility is that the cyanobacterial gene was never transferred to the nucleus and a copy of the preexisting mitochondrial nuclear-encoded gene was retargeted to the chloroplast. The animal and fungal enzymes are located in the cytosol and could be of either cytosolic or proteobacterial (mitochondrial) origin, but on our phylogenetic trees, they appear closely related to cyanobacterial sequences (fig. 3D). Such clustering of proteins from cyanobacteria and nonphotosynthetic eukaryotes has also been found in phosphoadenosine phosphosulfate reductase and heme oxygenase (M.O., unpublished data) and may be an artifact due to the limited sampling of eukaryotes.

The evolution of the GluRSs is complicated by the fact that they belong to the Glx-tRNA synthase family, which contains both GluRSs and glutaminyl-tRNA synthases (GlnRS) (Siatecka et al. 1998). Phylogenetic trees divide this family into the α group, which contains only eubacterial GluRS, and the β group, which has three well-separated subgroups: archaeal GluRS, eukaryotic GluRS, and eukaryotic GlnRS (Siatecka et al. 1998). Plants, C. merolae, and T. pseudonana have a eukaryotic GluRS that forms a sister group to the archaeal GluRS (fig. 2A). This gene is clearly not of cyanobacterial or α-proteobacterial origin: it most likely originated in the nucleus of the first photosynthetic eukaryote. Plants and the two algae have a second GluRS sequence that appears within the α group of eubacterial genes. However, we could not determine whether they originated from within the cyanobacteria (plastid) or the proteobacteria (mitochondrion), even with separate phylogenetic analysis of α and β groups. Four of the α group sequences (A. thaliana, Nicotiana tabacum, C. merolae, and T. pseudonana) possess putative mitochondrial leaders based on TargetP predictions, but Hordeum vulgare has a putative plastid leader.

It is clear that the heme biosynthesis pathway has a mosaic character in photosynthetic eukaryotes. This is not the only example of a mosaic metabolic pathway. The polyamine biosynthesis pathway in Arabidopsis (Illingworth et al. 2003), the eukaryotic pyrimidine biosynthesis pathway (Nara, Hshimoto, and Aoki 2000), and the glycolytic pathway in plants (Martin and Herrmann 1998) are all composed of enzymes with different evolutionary origins. We have also found that the shikimate pathway is an evolutionary mosaic (M.O. and B.R.G., unpublished data).

Secondary Endosymbiosis and the Red Lineage

Whether or not they are of cyanobacterial origin, the enzymes of plants, red algae, and diatoms appear to have a common origin, with the exception of red algal ferrochelatase (fig. 4). The simplest explanation is that the cyanobacterial ferrochelatase gene was replaced by one from a proteobacterium in the lines leading to Porphyra and Cyanidioschyzon but not in the line involved in secondary endosymbiosis.

Sato and Wilson (2003) first pointed out the proteobacterial origin of apicomplexan ferrochelatase, confirmed by our trees which have larger taxon sampling (fig. 5). It must be noted that the heme biosynthesis pathway in apicomplexans complicates any scenario involving a common ancestor of all groups with red algal plastids. Although heterokonts and apicomplexans may have had a common ancestor (Fast et al. 2001), the heme pathway in apicomplexan parasites is very different from that in plants and diatoms (Armbrust et al. 2004; Ralph et al. 2004). Part of the pathway appears to be localized in the apicomplexan complex plastid (apicoplast) (Ralph et al. 2004; Sato et al. 2004), but ALA is synthesized via the succinyl-CoA pathway in the mitochondrion and the two penultimate steps may be localized in the cytosol (Ralph et al. 2004). Three apicoplast enzymes, porphobilinogen deaminase (fig. 3D), uroporphyrinogen decarboxylase (fig. 4A), and ferrochelatase, are of proteobacterial origin. It has recently been shown that the ferrochelatase encoded by the P. falciparum genome is located in the apicoplast (Varadharajan et al. 2004), in spite of the fact that it does not have a typical apicoplast-targeting sequence (Sato and Wilson 2003). However, a substantial fraction of heme biosynthesis in the intraerythrocyte stage is due to host enzymes imported into the cytosol (Varadharajan et al. 2004). What effect this has had on the evolution of the pathway is unknown.

Nonphotosynthetic Heterokonts

Phytophthora species are plant pathogens classified as oomycetes, the sister group to the photosynthetic heterokonts (e.g., diatoms). It is still unclear whether the oomycetes once had a chloroplast and lost it or whether the photosynthetic heterokonts acquired plastids after the two branches of the family separated. If the chromalveolate hypothesis is correct, the common ancestor of the oomycetes and the diatoms must have had a plastid. However, the oomycete sequences invariably branch with fungal and animal sequences (figs. 2A, 2B, 3, and 4) and not with those of the diatom or the cyanobacteria. This suggests that the red algal nuclear genes were not successful in replacing the endogenous nuclear genes in the oomycete lineage. It also suggests that the oomycetes diverged from the other chromalveolate lineages before the secondary endosymbiont was fully integrated, which might imply a very rapid divergence of all the chromalveolate lineages after the secondary endosymbiotic event.

A recently published hypothesis suggested that all eukaryotes that currently have plastids had a primary plastid in their evolutionary history (Nozaki et al. 2003). This rather startling suggestion was based on phylogenetic trees of four concatenated proteins (α- and β-tubulin, actin, and elongation factor-1α) that divided eukaryotes into two major groups: Group A, opisthokonts (metazoans and fungi), and Group B, all photosynthetic eukaryotes and their nonphotosynthetic relatives, with red algae at the base. Group B therefore included all members of the Alveolata (ciliates, dinoflagellates, and apicomplexans) and the Discicristata (euglenoids and trypanosomatids) as well as the Heterokontophyta and green algae/plants. If this were true, the secondary host nucleus would already have contained genes for chloroplast-targeted proteins because it had once had a primary plastid. However, Phytophthora clusters with their Group A (opisthokonts) in most of our trees and shows no trace of having had a primary or a secondary plastid. Although the Nozaki et al. (2003) hypothesis cannot be rigorously tested until genomic sequences of a much wider sampling of eukaryotes are available, it is important to point out that it also would require that the mosaic character of the heme pathway evolved during the primary endosymbiogenesis, before the primary plastids and their hosts diverged into the red, green, and glaucophyte lineages.

Conclusions

In summary, we found that the evolutionary history of the heme biosynthesis pathway is very much the same in plants possessing primary green plastids as in the red algae and in the diatom with a red secondary plastid. In all photosynthetic eukaryotes, the pathway has a mosaic character and the enzymes involved display cyanobacterial (plastid), α-proteobacterial (mitochondrial), and cytosolic (eukaryotic nucleus) origin.

Charles Delwiche, Associate Editor

We thank Ross Waller for useful discussions and for suggesting the inclusion of Phytophthora sequences and Katerina Jiroutova for sequencing the T. pseudonana ferrochelatase. Financial support was provided by the Natural Sciences and Engineering Council of Canada and the award of a Canada Council Killam Fellowship to B.R.G., by the Research Plan of the Institute of Parasitology ASCR no. z60220518, and the Academy of Sciences of the Czech Republic, project no. A500220502.

References

Apt, K. E., L. Zaslavkaia, J. C. Lippmeier, M. Lang, O. Kilian, R. Wetherbee, A. R. Grossman, and P. G. Kroth.

2002
. In vivo characterization of diatom multipartite plastid targeting signals.
J. Cell Sci.
115
:
4061
–4069.

Archibald, J. M., M. B. Rogers, M. Toop, K. Ishida, and P. J. Keeling.

2003
. Lateral gene transfer and the evolution of plastid targeted proteins in the secondary plastid-containing alga, Bigelowiella natans.
Proc. Natl. Acad. Sci. USA
100
:
7678
–7683.

Armbrust, E. V., J. A. Berges, C. Bowler et al. (45 co-authors).

2004
. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism.
Science
306
:
79
–86.

Battistuzzi, F. U., A. Feijao, and S. B. Hedges.

2004
. A genomic timescale of prokaryote evolution: insights into the origin of methanogenesis, phototrophy, and the colonization of land.
BMC Evol. Biol.
4
:
44
.

Beale, S. I.

1999
. Enzymes of chlorophyll biosynthesis.
Photosynth. Res.
60
:
43
–73.

Cavalier-Smith, T.

2002
. Chloroplast evolution: secondary symbiogenesis and multiple losses.
Curr. Biol.
12
:
R62
–R64.

Cornah, J. E., M. J. Terry, and A. G. Smith.

2003
. Green or red: what stops the traffic in the tetrapyrrole pathway?
Trends Plant Sci.
8
:
224
–230.

Dailey, H. A.

1997
. Enzymes of heme biosynthesis.
J. Biol. Inorg. Chem.
2
:
411
–417.

Delwiche, C. F., and J. D. Palmer.

1997
. The origin of plastids and their spread via secondary symbiosis. Pp. 53–86 in D. Bhattacharya, ed. Origin of algae and their plastids. Springer-Verlag, Vienna, Austria.

Emanuelsson, O., H. Nielsen, S. Brunak, and G. von Heijne.

2000
. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.
J. Mol. Biol.
300
:
1005
–1016.

Fast, N. M., J. C. Kissinger, D. S. Roos, and P. J. Keeling.

2001
. Nuclear-encoded, plastid targeted genes suggest a single common origin for apicomplexan and dinoflagellate plastids.
Mol. Biol. Evol.
18
:
418
–426.

Felsenstein, J.

2001
. PHYLIP (phylogeny inference package). Version 3.6a3. Department of Genetics, University of Washington, Seattle.

Gross, W., D. Lenze, U. Nowitzki, J. Weiske, and C. Schnarrenberger.

1999
. Characterization, cloning, and evolutionary history of the chloroplast and cytosolic class I aldolases of the red alga Galdieria sulphuraria.
Gene
230
:
7
–14.

Guindon, S., and O. Gascuel.

2003
. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.
Syst. Biol.
52
:
696
–704.

Illingworth, C., M. J. Mayer, K. Elliott, C. Hanfrey, N. J. Walton, and A. J. Michael.

2003
. The diverse bacterial origins of the Arabidopsis polyamine biosynthetic pathway.
FEBS Lett.
549
:
26
–30.

Jones, D. T., W. R. Taylor, and J. M. Thornton.

1992
. The rapid generation of mutation data matrices from protein sequences.
Comput. Appl. Biosci.
8
:
275
–282.

Kroth, P. G.

2002
. Protein transport into secondary plastids and the evolution of primary and secondary plastids.
Int. Rev. Cytol. Surv. Cell Biol.
221
:
191
–255.

Kroth, P. G., and H. Strotmann.

1999
. Diatom plastids: secondary endocytobiosis, plastid genome and protein import.
Physiol. Plant.
107
:
136
–141.

Lang, M., K. E. Apt, and P. G. Kroth.

1998
. Protein transport into “complex” diatom plastids utilizes two different targeting signals.
J. Biol. Chem.
273
:
30973
–30978.

Martin, W., and R. G. Herrmann.

1998
. Gene transfer from organelles to the nucleus: how much, what happens, and why?
Plant Physiol.
118
:
9
–17.

Martin, W., T. Rujan, E. Richly, A. Hansen, S. Cornelsen, T. Lins, D. Leister, B. Stoebe, M. Hasegawa, and D. Penny.

2002
. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus.
Proc. Natl. Acad. Sci. USA
99
:
12246
–12251.

Matsuzaki, M., O. Misumi, T. Shin-I et al. (42 co-authors).

2004
. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D.
Nature
428
:
653
–657.

McFadden, G. I.

2001
. Primary and secondary endosymbiosis and the origin of plastids.
J. Phycol.
37
:
951
–959.

Nara, T., T. Hshimoto, and T. Aoki.

2000
. Evolutionary implications of the mosaic pyrimidine-biosynthetic pathway in eukaryotes.
Gene
257
:
209
–222.

Nielsen, H., S. Brunak, and G. von Heijne.

1999
. Machine learning approaches to the prediction of signal peptides and other protein sorting signals.
Protein Eng.
12
:
3
–9.

Nielsen, H., J. Engelbrecht, S. Brunak, and G. von Heijne.

1997
. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.
Protein Eng.
10
:
1
–6.

Nozaki, H., M. Matsuzaki, M. Takahara, O. Misumi, H. Kuroiwa, M. Hasegawa, T. Shin-i, Y. Kohara, N. Ogasawara, and T. Kuroiwa.

2003
. The phylogenetic position of red algae revealed by multiple nuclear genes from mitochondria-containing eukaryotes and an alternative hypothesis on the origin of plastids.
J. Mol. Evol.
56
:
485
–497.

Papenbrock, J., and B. Grimm.

2001
. Regulatory network of tetrapyrrole biosynthesis-studies of intracellular signalling involved in metabolic and developmental control of plastids.
Planta
213
:
667
–681.

Patron, N. J., M. B. Rogers, and P. J. Keeling.

2004
. Gene replacement of fructose-1,6-bisphosphate aldolase supports the hypothesis of a single photosynthetic ancestor of chromalveolates.
Eukaryot. Cell
3
:
1169
–1175.

Ralph, S. A, G. G. van Dooren, R. F. Waller, M. J. Crawford, M. J. Fraunholz, B. J. Foth, C. J. Tonkin, D. S. Roos, and G. I. McFadden.

2004
. Metabolic maps and functions of the Plasmodium falciparum apicoplast.
Nat. Rev. Microbiol.
2
:
1
–15.

Sato, S., B. Clough, L. Coates, and R. J. M. Wilson.

2004
. Enzymes for heme biosynthesis are found in both the mitochondrion and plastid of the malaria parasite Plasmodium falciparum.
Protist
155
:
117
–125.

Sato, S., and R. J. M. Wilson.

2003
. Proteobacteria-like ferrochelatase in the malaria parasite.
Curr. Genet.
42
:
292
–300.

Schmidt, S., S. Sunyaev, P. Bork, and T. Dandekar.

2003
. Metabolites: a helping hand for pathway evolution?
Trends Biochem. Sci.
28
:
336
–341.

Siatecka, M., M. Rozek, J. Barciszewski, and M. Mirande.

1998
. Modular evolution of the Glx-tRNA synthetase family.
Eur. J. Biochem.
256
:
80
–87.

Thompson, J. D., T. J. Gibson, F. Plesniak, F. Jeanmougin, and D. G. Higgins.

1997
. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
Nucleic Acids Res.
24
:
4876
–4882.

Van de Peer, Y., T. Frickey, J. S. Taylor, and A. Meyer.

2002
. Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes.
Gene
295
:
205
–211.

Varadharajan, S., B. K. C. Sagar, P. N. Rangarajan, and G. Padmanaban.

2004
. Localization of ferrochelatase in Plasmodium falciparum.
Biochem. J.
384
:
429
–436.

Watanabe, N., F.-S. Che, M. Iwano, S. Takayama, S. Yoshida, and A. Isogai.

2001
. Dual targeting of spinach protoporphyrinogen oxidase II to mitochondria and chloroplasts by alternative use of two in-frame initiation codons.
J. Biol. Chem.
276
:
20474
–20481.

Woese, C. R.

1987
. Bacterial evolution.
Microbiol. Rev.
51
:
221
–271.

Yoon, H. S., J. D. Hackett, G. Pinto, and D. Bhattacharya.

2002
. A single, ancient origin of the plastid in the Chromista.
Proc. Natl. Acad. Sci. USA
99
:
15507
–15512.

Zuegge, J., S. Ralph, M. Schmuker, G. I. McFadden, and G. Schneider.

2001
. Deciphering apicoplast targeting signals—feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins.
Gene
280
:
19
–26.

Author notes

*Department of Botany, University of British Columbia, Vancouver BC, Canada; †Institute of Parasitology, Academy of Sciences of Czech Republic, Branišovská, České Budějovice, Czech Republic; and ‡Faculty of Biological Sciences, University of South Bohemia, Branišovská, České Budějovice, Czech Republic

Supplementary data