Abstract
Some sea slugs sequester chloroplasts from algal food in their intestinal cells and photosynthesize for months. This phenomenon, kleptoplasty, poses a question of how the chloroplast retains its activity without the algal nucleus, and there have been debates on the horizontal transfer of algal genes to the animal nucleus. To settle the arguments, we report the genome of a kleptoplastic sea slug Plakobranchus ocellatus and found no evidence that photosynthetic genes are encoded on the nucleus. Nevertheless, we confirmed that photosynthesis prolongs the life of mollusk under starvation. The data present a paradigm that a complex adaptive trait, as typified by photosynthesis, can be transferred between eukaryotic kingdoms by a unique organelle transmission without nuclear gene transfer. Our phylogenomic and transcriptomic analysis showed that genes for proteolysis and immunity underwent gene expansion and are upregulated in the chloroplast-enriched tissue, suggesting that these molluscan genes are involved in this DNA-independent transformation.
Introduction
Since the Hershey–Chase experiment (Hershey and Chase, 1952) which proved that DNA is the material transferred to bacteria in phage infections, horizontal gene transfer (HGT) has been considered essential for cross-species transformation (Arber, 2014). Although the prion hypothesis rekindled interest in proteins as an element of phenotype propagation (Crick, 1970; Wickner et al., 2015), HGT is still assumed to be the cause of transformation. For example, in secondary plastid acquisition scenario in dinoflagellates, 1) a non-phototrophic eukaryote sequesters a unicellular archaeplastid; 2) endogenous gene transfer to the non-phototrophic eukaryote leads to shrinkage of the archaeplastidan nuclear DNA (nucDNA); and 3) the archaeplastidan nucleus disappears and its plastid becomes a secondary plastid in the host (Reyes-Prieto et al., 2007).
Chloroplast sequestration in sea slugs has attracted much attention due to the uniqueness of the phenotype acquisition from algae. Some species of sacoglossan sea slugs (Mollusca: Gastropoda: Heterobranchia) can photosynthesize using the chloroplasts of their algal food (Fig. 1) (de Vries and Archibald, 2018; Kawaguti, 1965; Pierce and Curtis, 2012; Rumpho et al., 2011; Serôdio et al., 2014). These sacoglossans ingest speciesspecific algae and sequester the chloroplasts into their intestinal cells. This phenomenon is called kleptoplasty (Gilyarov, 1983; Pelletreau et al., 2011). The sequestered chloroplasts (named kleptoplasts) retain their electron-microscopic structure (Fan et al., 2014; Kawaguti, 1965; Martin et al., 2015; Pelletreau et al., 2011; Trench, 1969) and their photosynthetic activity (Cartaxana et al., 2017; Christa et al., 2014a; Cruz et al., 2015; Händeler et al., 2009; Taylor, 1968; Teugels et al., 2008; Wägele and Johnsen, 2001; Yamamoto et al., 2009). The retention period of the photosynthesis differs among sacoglossan species (one day to >300 days) (Christa et al., 2015, 2014a, 2014b; Evertsen et al., 2007; Laetz and Wägele, 2017), development stages and depends on the plastid “donor” species (Curtis et al., 2007; Laetz and Wägele, 2017).
The absence of algal nuclei in sacoglossan cells makes kleptoplasty distinct from other symbioses and plastid acquisitions (de Vries and Archibald, 2018; Rauch et al., 2015). Electron microscopic studies have indicated that the sea slug maintains photosynthetic activity without algal nuclei (Hirose, 2005; Kawaguti, 1965; Laetz and Wägele, 2018; Martin et al., 2015; Pierce and Curtis, 2012). Because most photosynthetic proteins are encoded on the algal nucleus rather than plastids, the mechanism by which photosynthetic proteins are maintained in kleptoplasty is especially intriguing, given that the photosynthetic proteins have a high turnover rate (de Vries and Archibald, 2018; Pelletreau et al., 2011). Previous PCR-based studies have suggested the HGT of algal-nucleic photosynthetic genes (e.g., psbO) to the nucDNA of the sea slug Elysia chlorotica (Pierce et al., 1996, 2010, 2007, 2003; Rumpho et al., 2008; Schwartz et al., 2014). A genomic study of E. chlorotica (N50 = 824 bases) provided no reliable evidence of HGT but predicted that fragmented algal DNA and mRNAs contribute to its kleptoplasty (Bhattacharya et al., 2013). Schwartz et al. (2014) reported in situ hybridization-based evidence for HGT and argued that the previous E. chlorotica genome might overlook the algae-derived gene. Although an improved genome of E. chlorotica (N50 = 442 kb) has published recently, no mention was made of the presence or absence of algae-derived genes (Cai et al., 2019). The genomic studies of sea slug HGT have been limited to E. chlorotica, and the studies have used multiple samples with different genetic backgrounds for the genome assembling (Bhattacharya et al., 2013; Cai et al., 2019). The genetic diversity of sequencing data may have inhibited genome assembling. Although transcriptome analyses of other sea slug species failed to detect HGT (Chan et al., 2018; Wägele et al., 2011), the transcriptomic data was insufficient to ascertain genomic gene composition (de Vries et al., 2015; Rauch et al., 2015).
Here, we present genome sequences of another sacoglossan species, Plakobranchus ocellatus, to clarify whether HGT is the primary system underlying kleptoplasty. For over 70 years, P. ocellatus has been studied for its ability to retain kleptoplasts for long-term (>3 months) (Christa et al., 2013; Evertsen et al., 2007; Greve et al., 2017; Kawaguti, 1941; Trench et al., 1970; Wade and Sherwood, 2017; Wägele et al., 2011). However, recent phylogenetic analysis showed P. ocellatus to be a species complex (set of closely related species) (Fig. 1) and, therefore, useful to revisit previous studies about P. ocellatus (Christa et al., 2014c; Krug et al., 2013; Maeda et al., 2012; Meyers-Muñoz et al., 2016; Yamamoto et al., 2013). Hence, we first confirmed the photosynthetic activity and adaptive relevance of kleptoplasty to P. ocellatus type black (a species confirmed by Krug et al. (2013) via molecular phylogenetics, hereafter “PoB”). We then constructed genome sequences of PoB (N50 = 1.45 Mb) and of a related species, Elysia marginata (N50 = 225 kb). By improving the DNA extraction method, we have successfully assembled the genome sequences from a single sea slug individual in each species. Our comparative genomic and transcriptomic analyses of these species demonstrate the complete lack of photosynthetic genes in these sea slug genomes and provide evidence for an alternative hypothetical mechanism of kleptoplasty.
Results
Kleptoplast photosynthesis prolongs the life of P. ocellatus type black
To explore the photosynthetic activity of PoB, we measured three photosynthetic indexes: the photochemical efficiency of kleptoplast photosystem II (PSII), the oxygen production rate after starvation for 1-3 months, and the effect of illumination on PoB longevity. The value of Fv/Fm, which reflects the maximum quantum yield of PSII, was 0.68–0.69 in the “d38” PoB group (starved for 38 days), and was 0.57–0.64 in the “d109” group (starved for 109–110 days). These values were only slightly lower than those of healthy Halimeda borneensis, a kleptoplast donor of PoB, which showed Fv/Fm values of 0.73–0.76 (Fig. 2a, Supplementary Table 1), indicating that PoB kleptoplasts retain a similar photochemical efficiency of PSII to that of the food algae for over three months. On the measurement of oxygen concentrations in seawater, starved PoB (“d38” and “d109”) displayed gross photosynthetic oxygen production (Fig. 2c). Without sea slugs, there was no light-dependent increase in oxygen concentration: i.e., no detectable microalgal photosynthesis in the seawater (Supplementary Fig. 1). The results demonstrate that PoB kleptoplasts retain photosynthetic activity for over three months. consistent with previous P. cf. ocellatus studies (Christa et al., 2014c; Evertsen et al., 2007). We then measured the longevity of starved PoB specimens under different light conditions. Mean longevity was 156 days under continuous darkness and 195 days under a 12 h:12 h light-dark cycle (p = 0.022) (Fig. 2d, Supplementary Table 2), indicating that longevity was significantly extended when the animals were exposed to light. Our observation is consistent with the observation of Yamamoto et al. (2013) that the survival rate of PoB after 21 days under starvation is light-dependent. Although a study using P. cf. ocellatus reported that photosynthesis had no positive effect on survival rate (Christa et al., 2014c), our results indicate that this finding is not applicable to PoB. Taken together, the data for the three photosynthetic indexes indicate that kleptoplast photosynthesis increases resistance to starvation in PoB.
Photosynthetic genes in kleptoplasts
To reveal the genetic autonomy of kleptoplasts, we sequenced whole kleptoplast DNAs (kpDNA) from PoB and compared the sequences with algal plastid and nuclear genes. Illumina sequencing provided two types of circular kpDNA and one whole mitochondrial DNA (mtDNA) (Fig. 3, Supplementary Fig. 2-7). The mtDNA sequence was almost identical to the previously sequenced P. cf. ocellatus mtDNA (Greve et al., 2017) (Supplementary Fig. 4). The sequenced kpDNAs corresponded with those of the predominant kleptoplast donors of PoB (Maeda et al., 2012): i.e., Rhipidosiphon lewmanomontiae (AP014542, hereafter “kRhip”), and Poropsis spp. (AP014543, hereafter “kPoro”) (Fig. 3b and Supplementary Fig. 5).
To determine if the kpDNA gene repertoires were similar to those of green algal chloroplast DNAs (cpDNAs), we sequenced H. borneensis cpDNA and obtained 17 whole cpDNA sequences from public database (Supplementary Fig. 6, and Supplementary Tables 3 and 4). The PoB kpDNAs contained all of the 59 conserved chloroplastic genes in Bryopsidales algae (e.g., psbA, rpoA), although they lacked 4–5 of the dispensable genes (i.e.,petL,psb30, rpl32, rpl12, and ccs1) (Fig. 3c and Supplementary Fig. 7).
To test whether the kpDNAs contained no additional photosynthetic genes, we then used a dataset of 614 photosynthetic genes (hereafter, the “A614” dataset), which were selected from our algal transcriptome data and public algal genome data (Supplementary Table 5 and 6). A tblastn homology search using the A614 obtained no reliable hits (E-value <0.0001) against our kpDNA sequences, except for the gene chlD, which resembled kpDNA-encoded chlL (Fig. 3de, Supplementary Figs. 8–10). A positive control search against an algal nucDNA database (C. lentillifera, https://marinegenomics.oist.jp/umibudo/viewer/info?project_id=55) (Arimoto et al., 2019) found reliable matches for 93% (575/614) of the queries (Fig. 3d), suggesting that the method has high sensitivity. Thus, the comparison with algal plastid and nucleic genes clarified that the kpDNAs lack multiple photosynthetic genes (e.g., psbO).
Absence of horizontally transferred algal genes in the P. ocellatus type black nucleic genome
To determine if the PoB nucleic genome contains algae-derived genes (i.e., evidence of HGT), we sequenced the nuclear genome of PoB and searched for algae-like sequences in the gene models, genomic sequences, and pre-assembled short reads. Our genome assembly contained 927.9 Mbp (99.1% of the estimated genome size, 8,647 scaffolds, N50 = 1.45 Mbp, 77,230 gene models) (INSD; PRJDB3267, Supplementary Figs. 11–14, Supplementary Tables 7–10). BUSCO (Benchmarking Universal Single-Copy Orthologs) analysis using the eukaryota_odb9 dataset showed high coverage (93%) of the eukaryote conserved gene set (Supplementary Table 7), indicating that our gene modeling was sufficiently complete to enable HGT searches.
Searches of the PoB gene models found no evidence of HGT from algae. Although simple homology searches (blastp) against the RefSeq database found 127 PoB gene models with top-hits against Cyanobacteria or eukaryotic algae, the prediction of taxonomical origin using multiple blast hit results (LCA analysis with MEGAN software) (Huson et al., 2007) denied the algal-origin of the genes (Supplementary Table 11). A blastp search using the A614 dataset which contains sequences of the potential gene donor (e.g., transcriptomic data of H. borneensis) also determined no positive evidence of HGT (Supplementary Table 11).
In gene function assignment with gene ontology (GO) terms, no PoB gene model was annotated as a “Photosynthesis (GO:0015979)”-related gene, although the same method found 72–253 photosynthesis-related genes in the five algal species used as references (Fig. 4a). Our GO-analysis found six PoB genes assigned with the child terms of “Plastid” (GO:0009536). However, ortholog search with animal and algal genes did not support the algal origin of these genes (Supplementary Fig. 15 and 16; Supplementary Table 10). We consider that these pseudo-positives in the similarity search and GO assignment are caused by sequence conservation of the genes beyond the kingdom.
To confirm that our gene modeling did not overlook a photosynthetic gene, we directly searched the A614 dataset against the PoB and C. lentillifera (algal, positive control) genome sequences with tblastn and Exonerate software (Slater and Birney, 2005). Against the C. lentillifera genome, we found 455 (tblastn) and 450 (Exonerate) hits; however, using the same parameters we only detected 1 (tblastn) and 2 (Exonerate) hits against the PoB genome (Fig. 4b, Supplementary Fig. 17). The three loci detected in PoB contain the genes encoding serine/threonine-protein kinase LATS1 (p258757c71.5), Deoxyribodipyrimidine photolyase (p855c67.9), and phosphoglycerate kinase (p105c62.89). Phylogenetic analysis with homologous genes showed the monophyletic relationships of these three genes with molluscan homologs (Supplementary Figs. 18–20). This indicates that the three PoB loci contain molluscan genes rather than algae-derived ones. We thus conclude that our PoB genome assembly contains no algae-derived photosynthetic genes.
To examine if our genome assembly failed to construct algae-derived regions in the PoB genome, we searched for reads resembling photosynthetic genes among the pre-assembled Illumina data. From the 1,065 million pre-assembled reads, 1,698 reads showed similarity against 261 of the A614 dataset queries based on MMseq2 searches. After normalization by query length, the number of matching reads against algal queries was about 100 times lower than that against PoB single-copy genes: the normalized read count was 25 ± 105 (mean ± SD) for the A614 dataset and 2601 ± 2173 for the 905 PoB single-copy genes (p <0.0001, Welch’s two-sample t-test) (Fig. 4c). This large difference indicates that many of the algae-like reads were derived from contaminating microalgae rather than PoB nucDNA. Although, for five algal queries (e.g., peptidyl-prolyl cis-trans isomerase gene), the number of matching reads was comparable between PoB and the A614 dataset (Fig. 4c), the presence of homologous genes on the PoB genome suggests that the reads were not derived from an algae-derived region, but rather resemble molluscan genes. For example, simple alignment showed that the C. lentillifera–derived g566.t1 gene, which encodes peptidyl-prolyl cis-trans isomerase, was partially similar to the PoB p310c70.15 gene (Supplementary Fig. 21), and 76% (733/ 970) of reads that hit against g566.t1 were also hitting against p310c70.15 under the same MMseq2 parameter (Supplementary Fig. 21). We hence consider that no loss of algal-derived regions occurred in our assembly process.
Changing the focus to HGT of non-photosynthetic algal genes, we calculated the indexes for prokaryote-derived HGT (h index) (Boschetti et al., 2012) and eukaryotic-algae-derived HGT (hA index, see Methods, Supplementary Table 12) for PoB and two non-kleptoplastic gastropods (negative controls). We detected three PoB gene models as potential algae-derived genes (Fig. 4d); however, two of these encoded a transposon-related protein and the other encoded an ankyrin repeat protein that has a conserved sequence with an animal ortholog. Furthermore, the non-kleptoplastic gastropods (e.g., Aplysia californica) had similar numbers of probable HGT genes (Fig. 4d). Taking these results together, we conclude that there is no evidence of algae-derived HGT in our PoB genomic data.
We then examined if algae-like RNA is present in PoB, because a previous study of another sea slug species E. chlorotica hypothesized that algae-derived RNA contributes to kleptoplasty (Bhattacharya et al., 2013). This previous study used short-read–based blast searches and RT-PCR analyses to detected algal mRNA (e.g., psbO mRNA) in multiple adult E. chlorotica specimens (no tissue information was provided) (Bhattacharya et al., 2013). To analyze the algal RNA distribution in PoB, we constructed 15 RNA-Seq libraries from six tissue types (digestive gland [DG], parapodium, DG-exenterated parapodium [DeP], egg, head, and pericardium; Supplementary Fig. 22) and conducted MMseq2 searches (Fig. 4c). Although almost all (594/614) of the A614 dataset queries matched no reads, 19 queries matched 1–10 reads, and the C. lentillifera-derived g566.t1 query matched over 10 reads (Fig. 4e). This high hit rate for g566.t1 (Peptidyl-prolyl cis-trans isomerase in Fig. 4e), however, is due to its high sequence similarity with PoB ortholog, p310c70.15 as mentioned above. A previous anatomical study showed the kleptoplast density in various tissues to be DG > parapodium > DeP = head = pericardium >>> egg (Hirose, 2005). Therefore, the amount of algae-like RNA reads did not correlate with the kleptoplast richness among the tissues. Enrichment of algae-like RNA was only found in the egg. The PoB egg is considered to be a kleptoplast-free stage and is covered by a mucous jelly, which potentially contains environmental microorganisms (Fig. 4e and Supplementary Fig. 22c). We hence presume that these RNA fragments are not derived from kleptoplasts but from contaminating microalgae. We conclude that our searches for algae-like RNA in PoB found no credible evidence of algae-derived RNA transfer and no correlation between algal RNA and kleptoplasty.
Kleptoplasty-related P. ocellatus type black genes
Because we found the PoB genome to be free of algae-derived genes, we considered that a neofunctionalized molluscan gene might contribute to kleptoplasty. To find candidate kleptoplasty-related molluscan genes (KRMs), we focused on genes that were upregulated in DG (the primary kleptoplasty location) versus DeP in the RNA-Seq data described above. We found 162 DG-upregulated genes (FDR <0.01, triplicate samples) (Fig. 5a and Supplementary Figs. 23–25). By conducting GO analysis, we identified the functions of 93 of the DG-upregulated genes and showed that they are enriched for genes involved in proteolysis (GO terms: “Proteolysis”, “Aspartic-type endopeptidase activity”, “Cysteine-type endopeptidase inhibitor activity”, and “Anatomical structure development”), carbohydrate metabolism (“Carbohydrate metabolic process”, “One-carbon metabolic process”, “Cation binding”, and “Regulation of pH”), and immunity (“Defense response”) (Supplementary Table 13). Manual annotation identified the function of 42 of the remaining DG-upregulated genes. Many of these are also related to proteolysis and immunity: three genes relate to proteolysis (i.e., genes encoding interferon-gamma-inducible lysosomal thiol reductase, replicase polyprotein 1a, and phosphatidylethanolamine-binding protein), and 21 genes contribute to natural immunity (i.e., genes encoding lectin, blood cell aggregation factor, and MAC/Perforin domain-containing protein) (Supplementary Fig. 24 and 25). Our manual annotation also found four genes encoding apolipoprotein D, which promotes resistance to oxidative stress (Charron et al., 2008), and three genes involved in nutrition metabolism (i.e., genes encoding betaine–shomocysteine S-methyltransferase 1-like protein, intestinal fatty acid-binding protein, and cell surface hyaluronidase). Because the analyzed slugs were starved for a month, we consider that the DG-upregulated genes contribute to kleptoplasty rather than digestion.
We then conducted a comparative genomic analysis to find orthogroups that expand or contract in size along the metazoan lineage to PoB. Our phylogenomic analysis showed that 6 of the 193 orthogroups that underwent gene expansion in this lineage contained DG-upregulated genes, supporting the notion that these genes play a role in kleptoplasty (Fig. 5b, Supplementary Figs. 26 and 27, Supplementary Table 14 and 15). The most distinctive orthogroup was OG0000132, which contained 203 cathepsin D-like genes, 45 of which were DG-upregulated, in PoB; Fisher’s exact test supported the significant enrichment of the DG-upregulated genes (p <0.0001, Supplementary Table 15). Other heterobranchian mollusks only had 4–5 genes in OG0000132 (Fig. 5b and Supplementary Fig. 28). These gene duplications in PoB might reduce selection pressure to maintain function via redundancy, and promote new function acquirement of the paralogs, as occurs in the well-known neofunctionalization scenario (Conrad and Antonarakis, 2007). DG-upregulated genes were also detected as significantly enriched in OG0000005 (18 genes) and OG0000446 (4 genes) (both p <0.0001) (Fig. 5b), which contain lectin-like and apolipoprotein D-like genes, respectively. DG-upregulated genes were detected in OG0000002, OG0009179, and OG0000194, but were not significantly enriched in these orthogroups (p > 0.05). OG0000002 contains the DG-upregulated gene for type 2 cystatin, but also contains various genes with reverse transcriptase domains. This suggests that the reverse transcriptase domain clustered the various genes as one orthogroup and the gene number expansion was due to the high self-duplication activity of the retrotransposon in PoB. OG0009179 and OG0000194 contain DG-upregulated genes of unknown function. From the above results, we finally selected 67 genes as promising targets of study for PoB kleptoplasty: 45 genes for cathepsin D-like proteins, 18 genes for lectin-like proteins, and four genes for apolipoprotein D-like protein (gene annotation data has been deposited in DOI 10.6084/m9.figshare.12318977).
Evolution of the candidate KRMs
For a more detailed analysis of the evolution of the kleptoplasty-related in sacoglossan lineages, we constructed a new draft genome sequence of another sacoglossan sea slug E. marginata (previously identified as E. ornata (Krug et al., 2013)). PoB and E. marginata belong to the same family (Plakobranchidae, Fig. 1e). Both species sequester plastids from Bryopsidales algae; however, the kleptoplast retention time is limited to a few days in E. marginata (Yamamoto et al., 2009). This suggests that their common ancestor obtained a mechanism to sequester algal plastids, but E. marginata did not develop a system for their long-term retention. Hence, we considered that comparing gene expansion in these species would clarify the genetic basis of plastid sequestration and long-term retention.
Using the same methods as described for PoB, we sequenced one complete circular kpDNA, one complete mtDNA, and 790.3 Mbp of nucDNA (87.8% of estimated genome size; 14,149 scaffolds; N50 = 0.23 Mbp; 70,752 genes, Supplementary Tables 7, 16, and 17) for E. marginata. The constructed gene models covered 89.5 % of the BUSCO eukaryota_odb9 gene set (Supplementary Table 7). No credible photosynthetic gene was detected (annotation data, DOI 10.6084/m9.figshare.12318977).
We then phylogenetically analyzed the evolution of representative candidate PoB KRMs (i.e., cathepsin D-like, apolipoprotein D-like, and lectin-like genes). In the case of the cathepsin D-like genes, the sacoglossan (PoB + E. marginata) genes formed a specific subgroup in the OG0000132-based phylogenetic tree (Fig. 5c, far left), and the gene duplication in OG0000132 seemed to be accelerated along the PoB lineage (203 genes in PoB versus 5 in E. marginata; Fig. 5b). All sacoglossan cathepsin D-like genes belonged to a clade with several other heterobranchian homologs; this clade contained three sub-clades (α, β, γ in Fig. 5c). The basal α-clade contained three Aplysia californica genes, one Biomphalaria glabrata gene, and three sacoglossan genes. The β- and γ-clades contained sacoglossan genes only, and an A. californica gene was located at the basal position of the β- and γ-clades. Almost all duplicated PoB genes (201/203) belonged to the γ-clade, which also included one E. marginata gene (e8012c40.2). These phylogenetic relationships suggest that the γ-clade has undergone dozens of gene duplication events in the PoB lineage. Interestingly, all DG-upregulated DEGs were contained in the γ-clade, and the PoB paralogs belonging to the α- and β-clades showed different expression patterns from the γ-clade paralogs; the gene p609c69.52 (α-clade) was ubiquitously expressed in the examined tissues, and p374c67.53 (β-clade) was only expressed in the egg (Fig. 5c, center right). The mammalian genes encoding cathepsin D and its analog (cathepsin E) are ubiquitously expressed on various tissue types (Benes et al., 2008). We, therefore, consider that 1) the ubiquitously expressed p609c69.52 gene in α-clade is a functional ortholog of the mammalian cathepsin D gene; 2) the p374c67.53 gene in β-clade relates sea slug embryo development; 3) the γ-clade genes have been acquired with the development of plastid sequestration. The cathepsin D-like genes formed multiple tandem repeat structures in the PoB genome, although other Heterobranchia had no tandem repeat (Fig. 5c, far right; Supplementary Figs. 28 and 29). In E. marginata, the gene e8012c40.2 located at the basal position of the γ-clade had no repeat structure, although two genes from the α-clade (e4981c37.5 and e4981c37.4) and two from the β-clade (e2244c39.16 and e2244c39.20) made two tandem repeats (Supplementary Figs. 28 and 29). The tree and repeat structure suggest that the γ-clade separated from the β-clade as a single copy gene in the common ancestor of PoB and E. marginata, and duplicated in the PoB lineage (Fig. 5c). The revealed genomic structure indicates that the duplicates were not due to whole-genome duplication but rather a combination of several subgenomic duplication events: the dispersed duplicates are likely due to replicative transposition by a transposable element, and the tandem repeats are likely due to a local event (e.g., unequal crossover).
The gene duplication in OG0000446 seems to have happened in the PoB lineage and at the node between PoB and E. marginata (Fig. 5b, Supplementary Figs. 30 and 31). The sacoglossan genes were duplicated in a monophyletic clade (Clade I) only, and all DG-upregulated DEGs were contained in the Clade I. We hypothesize that duplication on the common lineage relates to plastid sequestration, and the PoB-specific duplication events contribute to long-term kleptoplasty. All DG-upregulated DEGs were contained in the Clade I. We hypothesize that duplication on the common lineage relates to plastid sequestration, and the PoB-specific duplication events contribute to long-term kleptoplasty.
In OG0000005 (lectin-like gene group), the gene counts were comparable between PoB (367) and E. marginata (213) (Fig. 5b); however, the phylogenetic tree suggests that this similarity is due to different gene duplication events in each species. One-by-one orthologous gene pairs were rare between PoB and E. marginata (only 14 were detected), and many of the other homologs formed species-specific sub-clades (Supplementary Fig. 32). Lectins are carbohydrate-binding molecules that mediate attachment of bacteria and viruses to their intended targets (Lis and Sharon, 1998; Yamasaki et al., 2008). The observed gene expansions of lectin-like genes in each of these sea slugs may widen their targets and may complicate the natural immune system to distinguish the kleptoplast from other antigens. We detected four clades having expanded homologs on the sacoglossan lineage (clades A–D in Supplementary Fig. 32). These clades contained all of the 34 DEGs between DG and DeP tissues. The DG-upregulated and -downregulated genes were placed on different clades, except for one DG-upregulated gene (p3334c67.98) in Clade A. These results indicate that the determined DEGs duplicated after the specification of PoB, and the homologs have different expression patterns depending on the clade.
Discussion
Here we demonstrate that 1) kleptoplast photosynthesis extends the lifetime of PoB under starvation, 2) the PoB genome encodes no algal-nucleus-derived genes, and 3) PoB individuals upregulate genes for carbohydrate metabolism, proteolysis and immune response in their digestive gland. Combination of the genomic and transcriptomic analysis identified candidate KRM genes (apolipoprotein D-like, cathepsin D-like, and lectin-like genes), which relates proteolysis and immune response. Together, the data present a paradigm of kleptoplasty in which PoB obtains the adaptive photosynthesis trait by DNA-independent transformation.
Our genomic sequence of PoB clarified the gene repertory of its kpDNA, mtDNA, and nucDNA. The whole kpDNA sequence indicated that PoB kleptoplasts can produce some proteins involved in photosynthesis (e.g., PsbA, a core protein in PSII) (Fig. 3) if gene expression machinery is sufficiently active, as reported in E. chlorotica (Green et al., 2000; Pierce et al., 2007). We then demonstrated the absence of core photosynthetic genes in PoB genome. For instance, we did not detect the genes encoding PsaF of photosystem I, PsbO of photosystem II, or RbcS of Rubisco in kpDNAs, mtDNA, or nucDNA (Figs. 3 and 4), despite our queries (e.g., A612 dataset, Supplementary Table 6) containing multiple algal orthologs of these genes; these gene products are essential for photosynthesis in various plants and algae, and are encoded on their nuclear genome, not plastid genome (Farah et al., 1995; Izumi et al., 2012; Pigolev et al., 2009). This means that PoB can perform adaptive photosynthesis (Fig. 2) without de novo synthesis of these gene products. The absence of algae-derived HGT is consistent with previous transcriptomic analyses of P. cf. ocellatus (Wägele et al., 2011) and other sacoglossan species (de Vries et al., 2015; Han et al., 2015; Wägele et al., 2011). A previous genome study of E. chlorotica predicted that fragmented algal DNA and/or mRNAs contribute to its kleptoplasty (Bhattacharya et al., 2013), and fluorescence in situ hybridization study detected algal gene signals on E. chlorotica chromosomes (Schwartz et al., 2014). Although no photosynthesis-related algal genes were found in a more comprehensive version of the E. chlorotica nuclear genome sequence, Cai et al. (2019) provided no discussion about the HGT. From our results for PoB, we propose that algal DNA and/or RNA are not an absolute requirement for kleptoplast photosynthesis.
Our combination of genomics and transcriptomics suggests that the maintenance of algae-derived protein activity is the most probable mechanism for retaining PoB photosynthesis. Because of the limited longevity of the photosynthetic proteins in algal cells and/or in vitro (Roberts et al., 2013), previous studies have discussed elongation of algal protein lifespans via protective sea slug proteins as an alternative hypothesis to HGT (de Vries and Archibald, 2018; Serôdio et al., 2014). Our study shows three types of molluscan genes as candidate protective sea slug proteins: apolipoprotein D-like, cathepsin D-like, and lectin-like genes.
Previous RNA-Seq studies of Elysia timida and E. chlorotica, found upregulation of superoxide dismutase (SOD) genes in response to photostress (Chan et al., 2018; de Vries et al., 2015), and postulated that SOD protests algal proteins in the kleptoplasts from oxidative damage. We found no significant upregulation of the SOD gene in PoB DG (Supplementary Fig. 24). However, we did find the upregulation of apolipoprotein D-like genes and cathepsin D-like genes and the expansion of these genes in the PoB lineages (Fig. 5). Both proteins are candidates for protective proteins against oxidative stress. Apolipoprotein D, a lipid antioxidant, confers resistance to oxidative stress in higher plants and animal brain (Bishop et al., 2010; Charron et al., 2008). Cathepsin D degrades intracellular proteins and contributes to the degradation of damaged mitochondria (Benes et al., 2008). In general, damaged photosynthetic proteins generate abundant reactive oxygen species (ROS), which promotes further protein damage. In PoB, the chain of protein inactivation may be broken through 1) inhibition of ROS accumulation by apolipoprotein D-like proteins and 2) active degradation of damaged proteins by cathepsin D-like proteases. Ortholog analysis of our new E. chlorotica data found no gene number expansion of these homologs in E. chlorotica: i.e., only three apolipoprotein D-like homologs and four cathepsin D-like homologs (Supplementary Table 18). Although the details of the retention process may differ among species and abiotic conditions, it is attractive to speculate that oxidative stress resilience is of major importance for kleptoplasty in multiple sacoglossan species.
The observed increase in expression of lectin genes in PoB DG tissue and their expansion in PoB lineages (Fig. 5b, Supplementary Figs. 24 and 32) are also consistent with the kleptoplast photosynthetic protein retention hypothesis. Almost all metazoans display natural immunity, and lectins are involved in self□recognition in immunity (Geijtenbeek and Gringhuis, 2009; Worthley et al., 2005). The diverse lectins expressed in PoB DG tissue may bind the antigens of algae-derived molecules, mediate detection of non-self-proteins and/or saccharides, and lead to the selective degradation and retention of algae-derived proteins and organelles.
Our genomic data indicate that proteomic analysis of kleptoplasts is warranted. A previous isotopic study indicated that function-unknown sea slug proteins are transported to E. chlorotica kleptoplasts (Pierce et al., 1996). Although several algal photosynthetic proteins have been immunoassayed in kleptoplasts, animal nuclear-encoded proteins have not been examined (Green et al., 2000; Pierce et al., 1996). Our in-silico study found no typical chloroplast localization signal in PoB KRMs (Supplementary Figs. 28 and 30), however, our genomic data will help the future identification of kleptoplast-localized sea-slug proteins by peptide mass fingerprinting.
Here, we provide the first genomic evidence of photosynthesis acquisition without horizontal DNA or RNA transfer. Previous studies have demonstrated that DNA is the core material for heredity (Hershey and Chase, 1952; Watson and Crick, 1953) and have assumed that horizontal DNA transfer causes cross-species phenotype acquisition (Acuña et al., 2012; Anderson, 1970; Boto Luis, 2014; Dehal et al., 2002). Our studies, however, indicate that PoB gains adaptive photosynthetic activity without acquiring any of the many algal-nucleic-genes involved in photosynthesis. This is evidence that a phenotype (and organelle) can move beyond a species without DNA or RNA transfer from the donor. Recent studies of shipworms (wood-feeding mollusks in the family Teredinidae) showed that they utilize several symbiont-derived proteins for their food digestion (O’Connor et al., 2014). Golden sweeper fish (Parapriacanthus ransonneti) gain the luciferase for their bioluminescence from ostracod prey, suggesting phenotype acquisition via sequestration of a non-self-protein (kleptoprotein) (Bessho-Uehara et al., 2020). These two examples of DNA/RNA-independent transformation, however, are limited to the transfer of simple phenotypes that depend on just a few enzymes. In contrast, the well-known complexity of photosynthesis suggests that sea slug kleptoplasty depends on DNA/RNA-independent transformation of complex pathways requiring multiple enzymes (e.g., entire photosystems and the Calvin-cycle). It is attractive to speculate that other symbiont-derived organelles (e.g., mitochondria), and obligate endosymbiotic bacteria and protozoan kleptoplasts (e.g., in Dinophysis acuminata) (Hackett et al., 2003) can move beyond the species via a DNA-independent system. Although several organisms have multiple HGT-derived functional genes, it is still unclear how the organism evolutionary obtained the appropriate expression control system of the non-self gene (Sasakura et al., 2016). Our PoB data suggest that the transfer of adaptive complex phenotypes sometimes precede gene transfer from the donor species, having the potential to explain the process of cross-species development of complex phenotypes. Some organisms may evolutionary obtained the HGT-derived genes and appropriate control system of the gene expression after the transfer of phenotype. Our finding of DNA-independent complex phenotype acquisition may open new viewpoints on cross-species evolutionary interaction.
Materials and Methods
Sampling of sea slugs and algae
Samples were collected from southwestern Japan; specifically, PoB and H. borneensis were collected at shares of the island of Okinawa, and E. marginata was collected from Kinkowan bay. Regarding B. hypnoides, a cultivated thallus was initially collected from Kinkowan bay and used in our laboratory for several years. Collected samples of PoB and E. marginata in seawater were transported respectively to laboratories at NIBB and Kyoto Prefectural University under dark conditions within 2 days. The samples were then acclimated in an aquarium filled with artificial seawater (REI-SEA Marine II; Iwaki, Japan) at 24 °C.
Photosynthetic activity of sea slugs and algae
The photosynthetic activity of PoB was measured after 38, 109, or 110 days of incubation under a 12 h:12 h light-dark cycle without food. During the light phase, the photosynthetic photon flux density was 10 μmol photons m-2 s-1 (LI-250A Light Meter with LI-193 Underwater Spherical Quantum Sensor, LI-COR). We did not change the seawater during the incubation period except to adjust salinity using distilled water. Photosynthetic activity indexes (oxygen generation rate and PAM Fluorometry) were measured using oxygen-sensor spots (Witrox 4; Loligo Systems, Tjele, Denmark) and PAM-2500 (WALZ, Effeltrich, Germany, Supplementary Fig. 1), respectively.
The oxygen-sensor spots were affixed to the inside of a glass respirometry chamber. Before performing measurements, the system was calibrated using sodium sulfite (0% O2 saturation) and fully O2-saturated seawater (100% O2 saturation). A sea slug was placed into a respirometry chamber filled with fully O2-saturated filtered artificial seawater (7 ml). The top of the chamber was closed with a glass slide. All visible bubbles were removed from the chamber. The chamber was maintained at a constant temperature (23-24 °C) using a water jacket attached to temperature-controlled water flow. The Witrox temperature probe for calibration was immersed in the water jacket.
The oxygen concentration was measured sequentially under changing light conditions. The percent O2 saturation was monitored continuously and recorded using AutoResp software (Loligo Systems) for 10 min after the respirometry chamber acclimation period (10 min). The oxygen consumption rate by respiration was measured under the dark condition. Next, we exposed the respirometry chambers to the red LED light (800 μmol photons m-2 s-1) to measure the change in the oxygen concentration under the light condition due to the balance between the photosynthesis and respiration rates. The chambers were illuminated from the sides because a top-mounted LED light increased the noise measured by the oxygen meter. The light direction with respect to the sea slug was inconsistent because the slug continued to move around the chamber during the measurement period. In sea slugs, the rate of photosynthesis appeared to be unaffected by the direction of illumination because the rate of O2 generation rate under a certain constant illumination did not change, regardless of the sea slug position in the chamber. The percent O2 saturation was measured for 10 min. One blank (i.e., without sea slug) condition was run as a negative control to account for background biological activity in the seawater. AutoResp software was used to convert the percent saturation to an oxygen concentration ([O2], mg O2 1-1) based on the rate of change in the percent O2 saturation, the water temperature, and barometric pressure (fixed at 1013 Pa). We performed a regression analysis by using the “lm” function in R (ver. 3.5.2, tidyverse 1.2.1 package) to calculate the changing oxygen concentration rate under dark and light conditions, and obtained a gross oxygen generation rate by photosynthesis (oL + oD = oG, oL; Oxygen production rate under light, oD; Oxygen consumption rate under dark, oG; gross oxygen generation by photosynthesis).
During the PAM Fluorometry analysis, each sea slug was caged in a single well of a 12-well cell culture plate (Corning, Corning, NY, USA) after adaptation to the dark for 15 min. To ensure reproducibility, we caged the sea slug upside down (i.e., the ventral surface was brought to the upside), softly squeezed the animal with a plastic sponge, and connected the PAM light probe to the plate from underneath the well. Consequently, the samples could not move during the measurement, and the PAM light probe always measured the fluorescence of the dorsal surface. The maximal quantum yield, Fv/Fm, was determined by a saturation pulse of >8000 μmol photons m-2 s-1 and a measurement light of 0.2 μmol photons m-2 s-1.
Effect of the light condition on P. ocellatus longevity
We measured the longevity of PoB using a modified medaka (Japanese rice fish) housing rack system (Iwaki, Japan). For the longevity measurement, we used different individuals from the photosynthetic activity measurement. The longevity was investigated from the samples used in the above-mentioned photosynthetic activity measurement. Using centrally filtered systems, our water tank rack maintained consistent water conditions (e.g., temperature and mineral concentration) among the incubation chambers (sub-tanks) and enabled a focus on the effect of the light condition. After acclimating the collected sea slugs under the same conditions for 1 week in an aquarium, the organisms were incubated separately for 8 months under different light conditions (continuous dark and 12 h:12 h light-dark cycle). We evaluated the conditions of the sea slugs daily and defined death as a sea slug that remained motionless for 30 seconds after stimulation (i.e., touching with plastic bar).
Sequencing of H. borneensis chloroplast DNA
We used a combination of pyrosequencing and Sanger sequencing to evaluate H. borneensis cpDNA. Collected H. borneensis thalli were washed with tap water to remove the attached organisms. The cleaned thalli (27 g) were frozen in liquid nitrogen, ground with a T-10 Basic Homogenizer (IKA, Germany) to a fine powder, suspended in 15 mL of AP1 buffer from a DNeasy Plant Mini Kit (Qiagen, Hilden, Germany), and centrifuged (500 × g, 1 min) to remove the calcareous parts. Total DNA was purified from the supernatant according to the protocol supplied with the DNeasy Plant Mini Kit. The resulting DNA yield (76.8 μg) was measured using a dsDNA HS Assay Qubit Starter Kit (Thermo, Waltham, MA, USA), and was used to prepare a single-fragment library for pyrosequencing. The Pyrosequencer GS-FLX Titanium platform (Roche, Germany) was used to generate 23.04 Mb of total singleton reads (68,032 reads, average read length: 368 bp). After filtering low-quality reads, the remaining 21,865 reads (~8 Mb) were submitted for assembly by Newbler (Roche). Of the 6,309 obtained contigs (N50 = 663 bp), the cpDNA sequences were identified by blastx searches (ver. 2.2.28) against the protein-coding sequences of cpDNA from the chlorophyte alga B. hypnoides (NC_013359). The gaps between the four identified contigs (55,551, 21,264, 8,076, and 3,435 bp) and ambiguous sites in the contigs were amplified by PCR using inverse primers. PCR products were sequenced by primer walking and Sanger sequencing with Takara LA Taq (Takara, Japan), a dGTP BigDye Terminator Cycle Sequencing FS Ready Reaction Kit (Thermo), and an ABI PRISM 3130xl DNA Sequencer (Thermo). Regions that could not be read by direct sequencing were amplified using specific primers, cloned with a TOPO TA cloning kit (Thermo), and sequenced with plasmid-specific primers. Complete cpDNA sequences were obtained by assembling the GS-FLX contigs and reads generated by Sanger sequencing using Sequencher ver. 4.10 (Gene Codes Corporation, MI, USA). All open reading frames >100 bp were annotated using a blastx search (ver. 2.2.31+) against the non-redundant protein sequences (nr) database in GenBank and a tblastx search of chloroplast genes from other algae (B. hypnoides, Chlamydomonas reinhardtii, V. litorea, and H. borneensis). Introns were detected using RNAweasel (Gautheret and Lambert, 2001), Rfam (Kalvari et al., 2018), and Mfold (Zuker, 2003). We confirmed the splicing sites via alignment with orthologous genes from other green algal cpDNAs. We used MAFFT v7.127b (Katoh and Standley, 2013) to perform the alignment. The origins of bacteria-like proteins were explored using a blastx search against the nr database and a phylogenetic analysis with blast-hit sequences. We used MAFFT (Katoh and Standley, 2013) for alignment, Trimal v1.4 (Capella-Gutiérrez et al., 2009) for trimming, and RaxML v8.2.4 (Stamatakis, 2014) for phylogenetic tree construction. The resulting gene maps were visualized using Circos ver. 0.69-2.
Sequencing of kpDNAs from P. ocellatus
Kleptoplast sequences were generated using an Illumina system. Total DNA was extracted from the digestive gland (kleptoplast-rich tissue) and parapodia (including the digestive gland, kleptoplast-less muscle, and reproductive systems) of a single PoB individual using a CTAB-based method (Murray and Thompson, 1980). Two Illumina libraries with 180- and 500-bp insertions were constructed from each DNA pool (DRR063261, DRR063262, DRR063263, amd DRR063264). An S220 Focused-Ultrasonicator (Covaris, MA, USA), Pippin Prep (Sage Science, MA, USA), and a TruSeq DNA Sample Prep Kit (Illumina, CA, USA) were used for DNA fragmentation, size selection, and library construction, respectively. Libraries were sequenced (101 bp from each end) on a HiSeq 2500 platform (Illumina). A total of 42,206,037 raw reads (8.53 Gb) were obtained. After filtering the low-quality and adapter sequences, the remaining 5.21 Gb of sequences were used for assembly. Paired sequences from 180-bp libraries were combined into overlapping extended contigs using FLASH ver. 1.2.9 (Magoc and Salzberg, 2011) with the default settings. An input of 14,867,401 paired-end sequences and FLASH were used to construct 13,627,554 contigs (101–192 bp). The joined fragments and filtered paired sequences from 500-bp libraries were assembled using Velvet assembler ver. 1.2.07 (Zerbino and Birney, 2008) with parameters that were optimized based on the nucDNA and kpDNA sequence coverage depths; the estimated nucDNA depth was approximately 30× based on the k-mer analysis, and the predicted kpDNA depth was 272× based on the read mapping to previously obtained kleptoplast rbcL sequences (AB619313; 1195 bp) using Bowtie2 ver. 2.0.0 (Langmead and Salzberg, 2012). After several tests to tune the Velvet parameters, the best assembly was achieved with a k-mer of 83 and exp_cov of 50. The resulting assembly comprised 1,537 scaffolds (>2000 bp) containing 4,743,113 bp (N50 = 2830 bp). We then identified two kpDNAs (AP014542 and AP014543) from this assembly based on the sequence similarity with H. borneensis cpDNA and a mapping back analysis. blastx ver. 2.2.31 assigned bit scores >1000 to the two scaffolds (AP014542 = 1382, AP014543 = 1373, database = coding sequences in the obtained H. borneensis cpDNA, query = all constructed scaffolds). Mapping back, which was performed using BWA ver. 0.7.15-r1140, showed that the coverage depths of AP014542 and AP014543 correlated with the relative abundance of kleptoplasts; the average coverage depth of the read was increased by 2–4 fold in a library from kleptoplast-rich tissue (DRR063263) relative to a library from kleptoplast-poor tissue (DRR063261). The same degree of change was never observed in other scaffolds (Supplementary Fig. 33).
The two kleptoplast sequences were annotated and visualized using the same method described for H. borneensis cpDNA. The phylogenetic positions of PoB kleptoplasts and algal chloroplasts were analyzed using the rbcL gene sequences from 114 ulvophycean green algae according to the maximum likelihood method (Fig. 3b, Supplementary Fig. 5). A phylogenetic tree was constructed according to the same method used to search for the origins of bacteria-like genes in H. borneensis cpDNA.
Analysis of the H. borneensis and B. hypnoides transcriptomes
De novo transcript profiles of H. borneensis and B. hypnoides were obtained from Illumina RNA-seq data. We extracted total RNA using the RNeasy Plant Mini Kit (Qiagen), constructed single Illumina libraries using the TruSeq RNA Sample Prep Kit, and sequenced the library of each species (101 bp from each end) on a HiSeq 2500 platform. A total of 290,523,622 reads (29 Gb) and 182,455,350 raw reads (18 Gb) were obtained for H. borneensis and B. hypnoides, respectively. After filtering the low-quality and adapter sequences, the obtained reads were assembled using Trinity ver. 2.4.0 (Grabherr et al., 2011) and clustered using CD-Hit ver. 4.6 (Fu et al., 2012) with the -c 0.95 option. The TransDecoder ver. 2.0.1 was used to identify 26,652 and 24,127 candidate coding regions from H. borneensis and B. hypnoides, respectively. The gene completeness of the transcripts was estimated using BUSCO ver. 2.0 (Waterhouse et al., 2018). The obtained H. borneensis transcripts covered 86.5% (262/303) of the total BUSCO groups), while the B. hypnoides transcripts covered 92.7% (281/303) of the conserved genes in Eukaryota (database, eukaryota_odb9). The transcripts were annotated using AHRD ver. 3.3.3 (https://github.com/groupschoof/AHRD) based on the results of a blastp search against nr, RefSeq, and Chlamydomonas proteome dataset on UniProt. The composed functional domains on the transcripts were annotated using InterProScan ver. 5.23-62 (Jones et al., 2014). To distinguish the reliable target species transcripts, we predicted the original transcript species using MEGAN ver. 5 (Huson et al., 2007) and selected 11,629 and 8,630 transcripts as viridiplantal genes. We have presented the details of the annotation procedure visually in Supplementary Figs. 8 and 9.
We manually selected a query dataset from the algal transcripts to search algae-derived genes on the sea slug DNAs. We selected 176 and 129 transcripts from H. borneensis and B. hypnoides, respectively. To perform more comprehensive searches, we also obtained queries from three public genomic datasets derived from Caulerpa lentillifera, Chlamydomonas reinhardtii, and Cyanidioschyzon merolae; these queries were termed the A614 dataset (Supplementary Table 6, DOI 10.6084/m9.figshare.12318947).
Sequencing of the P. ocellatus type black genome
The mean nucDNA size in three PoB individuals was estimated using flow cytometry. Dissected parapodial tissue (5 mm2) was homogenized in 1 ml of PBS buffer containing 0.1 % triton X-100 (Thermo) and 0.1 % RNase A (Qiagen) using a BioMasher (Nippi, Tokyo, Japan). We then filtered the homogenate through a 30-μm CellTrics filter (Sysmex, Hyogo, Japan) and diluted the filtrate with PBS buffer to a density of <5×106 cell/ml. The resulting solution was mixed with genome size standard samples and stained with a 2% propidium iodide solution (SONY, Tokyo, Japan). We used Acyrthosiphon pisum (genome size = 517 Mb) and Drosophila melanogaster (165 Mbp) samples processed using the same method described for PoB as genome size standards. The mixture was analyzed on a Cell Sorter SH800 (SONY) according to the manufacturer’s instructions. We repeated the above procedure for three PoB individuals and determined an estimated genome size of 936 Mb (Supplementary Fig. 12).
Genomic DNA was extracted from a single PoB individual using the CTAB-based method (Murray and Thompson, 1980). The adapted buffer compositions are summarized in Supplementary Table 19. A fresh PoB sample (collected on October 17, 2013, and starved for 21 days) was cut into pieces and homogenized in 2×CTAB buffer using a BioMasher. To digest the tissues, we added a 2% volume of Proteinase K solution (Qiagen) and incubated the sample overnight at 55° C. The lysate was emulsified by gentle inversion with an equal volume of chloroform; after centrifugation (12,000× g, 2 min), the aqueous phase was collected using a pipette. This phase was combined with a one-tenth volume of 10% CTAB buffer, mixed well at 60 °C for 1 min, and again emulsified with chloroform. These 10% CTAB buffer and chloroform treatment steps were repeated until a clear aqueous phase was achieved. We then transferred the aqueous phase to a new vessel, overlaid an equal volume of CTAB precipitation buffer, and mixed the liquids gently by tapping. The resulting filamentous precipitations (DNA) were removed using a pipette chip and incubated at room temperature for 10 min in High Salt TE buffer. We then purified the DNA according to the protocol supplied with the DNeasy Blood and Tissue Kit (Qiagen); briefly, we transferred the supernatant after vortex mixing, added equal volumes of buffer AL (supplied with the kit) and EtOH to the supernatant, and processed the sample on a Qiagen spin column according to the protocol. We obtained a final DNA quantity of 15 μg from a PoB individual.
The genomic sequence of PoB was obtained via Illumina paired-end and mate-pair DNA sequencing. Two paired-end Illumina libraries containing 250- and 600-bp insertions (DRR029525, DRR029526) were constructed using a TruSeq DNA Sample Prep Kit (Illumina). Three mate-pair libraries with 3k-, 5k-, and 10k-bp insertions (DRR029528, DRR029529, DRR029530) were constructed using a Nextera Mate Pair Library Prep Kit (Illumina). The libraries were sequenced (150 bps from each end) on a HiSeq 2500 platform (Illumina). A total of 1,130,791,572 and 787,040,878 raw reads were obtained for the paired-end (170 Gb) and mate-pair (118 Gb) libraries, respectively. After filtering the low-quality and adapter sequences, the remaining 161 Gb of sequences were assembled using Platanus assembler ver. 1.2.1 (Kajitani et al., 2014) with the default setting. The assembly comprised 8,716 scaffolds containing 928,345,517 bp. Repetitive regions were masked using a combination of RepeatModeler ver. open-1.0.8 and RepeatMasker ver. open-4.0.5 (http://www.repeatmasker.org). We used the default parameters for identification and masking. A total of 268,300,626 bp (29%) of the assemblies were masked with RepeatMasker.
We used strand-specific RNA-Seq sequencing for gene modeling. PoB RNA was extracted from an individual after starvation for 20 days (collected on October 17, 2013). We used TRIzol™ Plus RNA Purification Kit (Thermo) to extract RNA according to the manufacturer’s protocol. A paired-end Illumina library (DRR029460) was constructed using the TruSeq Stranded mRNA LT Sample Prep Kit (Illumina). Libraries were sequenced (150 bp from each end) on a HiSeq 2500 platform (Illumina). The library produced a total of 286,819,502 raw reads (28 Gb).
The PoB assemblies were processed using single transcript-based gene model construction pipeline (AUGUSTUS ver. 3.2) (Stanke and Morgenstern, 2005), two transcriptomic data mapping tools (Trinity ver. 2.4.0 and Exonerate ver. 2.2.0) (Grabherr et al., 2011; Slater and Birney, 2005), and two non-transcript-based model construction pipelines (GeneMark-ES ver. 4.33 and glimmerHMM ver. 3.0.4) (Majoros et al., 2004; Ter-Hovhannisyan et al., 2008). The four obtained gene sets were merged with the EVidenceModeler ver. 1.1.1 pipeline (Haas et al., 2008) to yield a final gene model set. For AUGUSTUS, we used Braker pipeline ver. 1.9 (Hoff et al., 2019) to construct PoB-specific probabilistic models of the gene structure based on strand-specific RNA-Seq data. After filtering the low-quality and adapter sequences, the remaining 181,873,770 RNA-Seq reads (16 Gb) were mapped to the PoB genome assembly using TopHat ver. v2.1.1 (Kim et al., 2013) with the default setting, as well as to the Braker pipeline-constructed PoB-specific probabilistic models from mapped read data. TopHat mapped 132,786,439 of the reads (73%) to the PoB model. AUGUSTUS then predicted 78,894 gene models from the TopHat mapping data (as splicing junction data) and the Braker probabilistic model. Using Trinity and Exonerate, we then constructed de novo transcriptomic data from the RNA-Seq data and aligned these to the genome. Trinity constructed 254,336 transcripts, which were clustered to 194,000 sequences using CD-Hit ver. 4.6 (-c 0.95); subsequently, Transdecoder identified 44,596 protein-coding regions from these sequences. Exonerate (--bestn 1 --percent 90 options) then aligned the 13,141 of the transcripts to the genome. GeneMark-ES with the default setting predicted 107,735 gene models, and glimmerHMM predicted 115,633 models after the model training, with 320 manually constructed gene models from long scaffolds. EVidenceModeler was then used to merge the model with the following weight settings: AUGUSTUS = 9, Exonerate = 10, GeneMark-ES = 1, glimmerHMM = 2. Finally, EVidenceModeler predicted 77,444 gene models.
We then removed the contaminant-derived bacterial scaffolds from the PoB assemblies. We defined bacterial scaffolds as those encoding >1 bacterial gene with no lophotrochozoan gene. The bacterial genes were predicted using MEGAN software according to a blastp search against the RefSeq database. Of the 40,330 gene hits identified from the RefSeq data, MEGAN assigned the origins for 39,113 genes. Specifically, 719 and 23,559 genes were assigned as bacterial and lophotrochozoan genes, respectively. Fifty-five of the 8,716 scaffolds contained two or more bacterial genes and no lophotrochozoan gene and were removed (Supplementary Table 8).
We also removed kleptoplast- or mitochondria-derived scaffolds from the assemblies (Supplementary Table 9). We determined the source of scaffolds based on blast bit score against the three referential organelle DNAs (kRhip AP014542, kPoro AP014543, and PoB mtDNA AP014544) and the difference of the read depth value from the other (nuclear-derived) scaffolds. Our blastn search detected 13 and one scaffolds as sequences of kleptoplast or mitochondrial origin, respectively (bit score >1000, Supplementary Table 9). Mapping back of the Illumina read (DRR029525) by Bowtie ver. 2.4.1 indicated that the depth values of the 14 scaffolds were 537-5143, and the averaged depth value of the other scaffolds was 31 (Supplementary Tables 8 and 9), indicating the 14 scaffolds are derived from organelle DNAs or repetitive region on the nuclear DNA. Mapping of the reads derived from DG (kleptoplast enriched tissue) (DRR063263) and parapodium (including a muscle, gonad, and digestive gland) (DRR063261) indicated that the relative read depth of the 13 kpDNA-like scaffolds (against the averaged depth value of other scaffolds) was higher in the DG sample than in the parapodium sample, supporting that the sequences are derived from the kleptoplast (Supplementary Fig. 34). We then confirmed the scaffolds contain no algal-nuclear-derived photosynthetic gene using two methods; dot-plots with referential organelle DNAs and homology search using A612 query set (Supplementary Fig. 17, 35-37). Hence, even if the scaffolds originate from the nucleus, our results indicate no evidence of HGT of photosynthetic algal nuclear-encoded genes. We deposited the removed scaffolds sequences in the FigShare under DOI 10.6084/m9.figshare.12587954.
The final PoB assembly comprised 8,647 scaffolds containing 927,888,823 bp (N50 = 1,453,842 bp) and 77,230 genes. Gene completeness was estimated using BUSCO ver. 2.0 (Waterhouse et al., 2018). The predicted gene models were annotated using AHRD ver. 3.3.3. The results of a blastp search against the SwissProt, Trembl, and Aplysia californica proteome datasets on UniProt were used as reference data for AHRD under the following weight parameter settings: SwissProt = 653, Trembl = 904, and A. californica = 854. The functional domains were annotated using InterProScan ver. 5.23-62.
We performed a blastp analysis against the RefSeq database to identify algae-derived genes from the constructed genes models. After translating the protein-encoding region to amino acid sequence data, we adapted the blastp search to include the “-e-value 0.0001” option. The output was analyzed using MEGAN software with the following LCA and analysis parameters: Min Score = 50, Max Expected = 1.0E-4, Top Percent = 20, Min Support Percent = 0.1, Min Support = 1, LCA percent = 90, and Min Complexity = 0.3.
The GO annotation was assigned using Blast2GO ver. 5.2.5 according to the blastp searches against the RefSeq database and InterProScan results. We then used SonicParanoid ver. 1.0.11 for orthogroup detection. The species analyzed in the orthogroup detections are summarized in Supplementary Table 10. The phylogenetic tree was constructed using IQ-tree. The resulting trees were visualized using iTol ver. 4.
We used Exonerate ver. 2.2.0 (with the --bestn 1 --model protein2genom options) to identify algal genes in the PoB genome. We used the A614 dataset as a query after translating the sequence to amino acids. Caulerpa lentillifera (green algae) genomic data were used as a control to estimate the sensitivity of our method. The results were handled and visualized using R (tidyverse packages ver. 1.2.1).
We used MMseq2 ver. 2.6 (--orf-start-mode 1) to search for algae-like reads among the trimmed Illumina reads. The matching threshold was set using a default E-value <0.001. As a positive control, we selected PoB genes from the BUSCO analysis of our genomic model data. We selected 911 gene models detected by BUSCO ver. 2 as single-copy orthologs of the metazoa_odb9 gene set and named the dataset P911 (DOI 10.6084/m9.figshare.12318977).
The horizontal gene transfer index “h” and the modified index “hA” were calculated using our R script, HGT_index_cal.R (https://github.com/maedat/HGT_index_cal; R ver. 3.6.1). The “h” index was calculated as the difference in bit scores between the best prokaryote and best eukaryote matches in the blast alignments, and “hA” was calculated as the difference in bit scores between the best lophotrochozoan and best algae matches. The blast databases of adapted species are summarized in Supplementary Table 12.
RNA-Seq analysis of P. ocellatus type black tissues
Total RNA samples from five PoB individuals and one egg mass were obtained for a gene expression analysis. An overview of sample preparation is illustrated in Supplementary Fig. 22. Collected adult PoB individuals were dissected manually after an incubation of 21–94 days. An egg mass was obtained via spontaneous egg lying in an aquarium. We used a TRIzol™ Plus RNA Purification Kit (Thermo) to extract RNA according to the manufacturer’s protocol. We constructed six paired-end and nine single-end Illumina libraries using a combination of the RiboMinus Eukaryote Kit (Thermo), RiboMinus concentration module (Thermo), and TruSeq RNA Sample Preparation Kit v2 (Illumina) according to the manufacturers’ protocols. Libraries were sequenced (101 bp) on a HiSeq 2500 platform (Illumina). A total of 280,445,422 raw reads (28 Gb) were obtained from the libraries (DOI 10.6084/m9.figshare.12301277).
After filtering the low-quality and adapter sequences, 150,701,605 RNA-Seq reads (13 Gb) were obtained. We used only the R1 reads for the six paired-end datasets. We used MMseq2 to identify algae-derived reads from the trimmed reads, using the A614 dataset as a query. We applied the same parameters as the above-described DNA read search. The PoB gene dataset P911 was also used as a positive control.
We used the Hisat-stringtie-DESeq2 pipeline to conduct a differential gene expression analysis of DGs and DG-exenterated parapodia (epidermis, muscle, and reproductive systems, Dep). According to the Stringtie protocol manual (http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual), trimmed RNA-Seq reads were mapped to the PoB genome assembly using Hisat2 ver. 2.1.0 with the default setting. The obtained BAM files were processed using Stringtie ver. 1.3.4d (-e option) with PoB gene model data (gff3 format) acquired through the above-mentioned EVidenceModeler analysis. The resulting count data were analyzed using R and the DESeq2 package, and 1,490 differentially expressed genes (p-value <0.01 and padj <0.05) were identified between the tissues. We used the GOseq (Young et al., 2010) and topGO packages in R to apply a GO enrichment analysis to the upregulated genes in DG tissue (threshold: p-value <0.01).
Sequencing of the E. marginata genome
The E. marginata genome sequencing process was nearly identical to the methodology applied to PoB. Flow cytometry yielded an estimated genome size of 900 Mb. We extracted genomic DNA from an individual using a CTAB-based method. We constructed four types of Illumina libraries: two paired-end libraries with 250- and 500-bp insertions, and two mate-pair libraries with 3k-s and 5k-bp insertions (DOI 10.6084/m9.figshare.12301277). Using the HiSeq 2500 platform (Illumina), we obtained 562,732,268 and 608,977,154 raw reads for the paired-end (84 Gb) and mate-pair (91 Gb) libraries, respectively. After filtering the low-quality and adapter sequences, the remaining 40 Gb of sequences were assembled using the Platanus assembler. The assembly comprised 14,285 scaffolds containing 791,005,940 bp.
For gene modeling, strand-specific RNA-Seq sequencing of E. marginata was performed. A paired-end Illumina library (DRR029460, also see DOI 10.6084/m9.figshare.12301277) was constructed using a TruSeq Stranded mRNA LT Sample Prep Kit (Illumina) and an RNA sample extracted from an E. marginata individual via a TRIzol™ Plus RNA Purification Kit (Thermo). Libraries were sequenced (150 bp from each end) on a HiSeq 2500 platform (Illumina). A total of 286,819,502 raw reads (28 Gb) were obtained from the library. Using the gene modeling procedure described for PoB, EVidenceModeler constructed 71,137 gene models of the genomic assemblies based on the RNA-Seq data.
We next removed the contaminant-derived bacterial scaffolds from the genomic assemblies. Using the same gene annotation as applied to PoB, we determined that the 110 of the 14,285 scaffolds contained >1 bacterial genes and no lophotrochozoan gene and removed these scaffolds. The organelle-derived scaffolds (kleptoplast DNA and mitochondrial DNA) were identified using blastn searches and removed from the final assemblies. A blastn search (query = all scaffolds, database = chloroplast DNA of B. hypnoides NC_013359.1 or mitochondrial DNA of PoB) identified 25 kleptoplast-matching and one mitochondria-matching scaffold (bit score >1000). We then reassembled the complete kleptoplast DNA and mitochondrial DNA using the same method as described for PoB organellar DNA assembling.
Ortholog analysis of sacoglossan genes
Orthologous relationships were classified using OrthoFinder (ver. 2.2.3) (Emms and Kelly, 2015), and rapidly expanded/contracted families identified from the OrthoFinder results were analyzed using CAFE (ver. 4.2) (Han et al., 2013). CAFE analysis used 16 metazoan species as reference species (Supplementary Fig. 23). Phylogenetic trees for CAFE were constructed using PREQUAL (ver. 1.02) (Whelan et al., 2018) for sequence trimming, MAFFT (ver. 7.407) for sequence alignment, IQ-tree (ver. 1.6.1) for the maximum likelihood (ML) analysis, and r8s (v1.81) (Sanderson, 2003) for conversion to an ultrametric tree. An ML tree was constructed from 30 single-copy genes in 15 major species according to the OrthoFinder results (DOI 10.6084/m9.figshare.12319862), and was converted to an ultrametric tree based on the divergence times of Amphiesmenoptera-Antliophora (290 Myr) and Euarchontia-Glires (65□Myr).
We also analyzed the expanded/contracted families using the Z-scores of the assigned gene numbers for each orthogroup. The analysis included all 16 reference species and two sacoglossan species, and an expanded group of 38 was determined on the PoB lineage (Threshold: Z-score > 2) (Supplementary Fig. 27). The phylogenetic relationships in the orthogroups were also analyzed using a PREQUAL-MAFFT-IQ-tree. The domain structures and gene positions on the constructed genome data were visualized using the GeneHere script (https://github.com/maedat/GeneHere) and Biopython packages.
Author contributions
T.Maeda, S.T., and S.Shigenobu. conceived of and designed the experiments; T.Maeda, S.T., J.M., performed the photochemical and physiological experiments and analyses; T.Maeda, A.A., T.Y., S.Shimamura., Y.T., Y.N., K.T., T.T., Y.S., M.K., N.S., T.N., M.H., T. Maruyama, J.O., and S.Shigenobu performed the genomic and transcriptomic experiments and analyses. T.Maeda and S.Shigenobu wrote the paper following advice from other authors.
Competing Interests
The authors declare no competing financial interests.
Data availability
All of the raw sequence data obtained in this research have been deposited in the DDBJ Sequence Read Archive (DRA) under BioProject PRJDB4939, PRJDB3267, PRJDB10060, and PRJDB5024. All data collected in this study that are summarized in the figures have been made available on FigShare, at DOI: 10.6084/m9.figshare.12300869, 10.6084/m9.figshare.12301865, 10.6084/m9.figshare.12301277, 10.6084/m9.figshare.12311990, 10.6084/m9.figshare.12316163, 10.6084/m9.figshare.12316283, 10.6084/m9.figshare.12316868, 10.6084/m9.figshare.12316895, 10.6084/m9.figshare.12318947, 10.6084/m9.figshare.12318962, 10.6084/m9.figshare.12318974, 10.6084/m9.figshare.12318977, 10.6084/m9.figshare.12318989, 10.6084/m9.figshare.12318992, 10.6084/m9.figshare.12318998, 10.6084/m9.figshare.12319001, 10.6084/m9.figshare.12319424, 10.6084/m9.figshare.12319532, 10.6084/m9.figshare.12318920, 10.6084/m9.figshare.12319739, 10.6084/m9.figshare.12319736, 10.6084/m9.figshare.12319802, 10.6084/m9.figshare.12319826, 10.6084/m9.figshare.12319832, 10.6084/m9.figshare.12319844, 10.6084/m9.figshare.12318908, 10.6084/m9.figshare.12319853, 10.6084/m9.figshare.12319859, 10.6084/m9.figshare.12319862, 10.6084/m9.figshare.12319889, 10.6084/m9.figshare.12628709, and 10.6084/m9.figshare.12587954. The codes used to analyze HGT index and to visualize gene distribution on scaffolds have been made available on https://github.com/maedat/HGT_index_ca and https://github.com/maedat/GeneHere.
Acknowledgment
We appreciate the incisive comments offered by Masayoshi Kawaguchi, Atsushi J. Nagano, and Kan Tanaka. The sample collection was supported by Katsuhiko Tanaka, Tohru Iseto, Rie Nakano, and Hirose Euichi. This work was supported by the MEXT/JSPS KAKENHI (grant numbers 16H06279, 25128713 and 22128001), National Institute for Basic Biology (NIBB) Functional Genomics Facility, NIBB Data Integration and Analysis Facility, and the Japan Advanced Plant Science Network. Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics.
Footnotes
Email addresses: T. Maeda: maedat{at}nibb.ac.jp, ST: shun{at}nibb.ac.jp, TY: tyoshida{at}jamstec.go.jp, S. Shimamura: shimas{at}jamstec.go.jp, YT: takakiy{at}jamstec.go.jp, YN: nagai.y{at}jamstec.go.jp, AT: atoyoda{at}nig.ac.jp, YS: ysuzuki{at}k.u-tokyo.ac.jp, AA: aarimoto{at}hiroshima-u.ac.jp, KI: hsk.ishii{at}gmail.com, NS: norisky{at}oist.jp, TN: tomoakin{at}staff.kanazawa-u.ac.jp, MH: mhasebe{at}nibb.ac.jp, T. Maruyama: pee02660{at}nifty.ne.jp, JM: minagawa{at}nibb.ac.jp, JO: junichi.obokata{at}setsunan.ac.jp, S. Shigenobu: shige{at}nibb.ac.jp
For the sequences excluded from the final genome of sea slug as kleptplast sequences, we verified that there are no horizontally transmitted genes from algae in these sequences, and added figures and other information.