Abstract
Porifera are a diverse animal phylum with species performing important ecological roles in aquatic ecosystems, and have become models for multicellularity and early-animal evolution. Demosponges form the largest class in sponges, but previous studies have relied on the only draft demosponge genome of Amphimedon queenslandica. Here we present the 125-megabase draft genome of a contractile laboratory demosponge Tethya wilhelma, sequenced to almost 150x coverage. We explore the genetic repertoire of transporters, receptors, and neurotransmitter metabolism across early-branching metazoans in the context of the evolution of these gene families. Presence of many genes is highly variable across animal groups, with many gene family expansions and losses. Three sponge classes show lineage-specific expansions of GABA-B receptors, far exceeding the gene number in vertebrates, while ctenophores appear to have secondarily lost most genes in the GABA pathway. Both GABA and glutamate receptors show lineage-specific domain rearrangements, making it difficult to trace the evolution of these gene families. Gene sets in the examined taxa suggest that nervous systems evolved independently at least twice and either changed function or were lost in sponges. Changes in gene content are consistent with the view that ctenophores and sponges are the earliest-branching metazoan lineages and provide additional support for the proposed clade of Placozoa/Cnidaria/Bilateria.
Introduction
The presence of neurons is a defining character of animals, and is symbolic of their alleged superiority over all other life on earth. Nonetheless, the four non-bilaterian phyla, Porifera, Placozoa, Ctenophora and Cnidaria, are most different from other animals in their sensory systems and are often mistakenly referred to as “lower” animals in common parlance, despite the fact that, like bilaterians, non-bilaterians exist at the tips of the tree of life. Indeed, animals such as corals and sponges appear immobile or often unresponsive, challenging early theorists in their ideas of what is and is not an animal. Yet we now know that representatives from all four non-bilaterian phyla demonstrate dynamic responses to outside stimuli.
Neural evolution has been discussed previously in the context of paleontology (reviewed in [Wray et al., 2015]) and metazoan phylogeny (reviewed in [Jékely et al., 2015]). Indeed, it has been suggested that many features of bilaterian neurons and nervous systems represent separate, parallel evolutionary events from a “simple” nervous system. A simple nervous system then must arise from proto-neurons [Schierwater et al., 2009], however it is unclear what that might look like.
Several qualities can be used to define neurons or proto-neurons [Leys, 2015, Nickel, 2010] such as synapses, electrical excitability, membrane potential, or secretory functions, though no single quality (and ultimately gene set) solely defines such cells as neurons. Two non-bilaterian groups, ctenophores and cnidarians, are thought to have true neurons. When considering the remaining two non-bilaterian phyla, sponges and placozoans, many components of neural cells are found without any neuron-like cells having been identified [Srivastava et al., 2010, Riesgo et al., 2014a, Leys, 2015], although synapse-like structures have been identified in placozoan fiber cells that show vesicles close to an osmophile contact [Grell and Benwitz, 1974].
Comparative analyses revealed a gradient of neural-like qualities indicating that “neuron-or-not” classifications are not straightforward. While ctenophores, cnidarians, and bilaterians have true neurons, structural and biochemical differences, [Moroz et al., 2014, Moroz, 2015] led to the proposition that neurons in ctenophores and cnidarians may not be homologous, but rather separate evolutionary outcomes from neural-like precursor cells. Potentially, in the case of independent evolutions, neurons are “easy” to evolve, since it involves co-expression of various pan-metazoan genetic modules in the same cell type. Alternatively, early rudimentary signaling systems may have been energetically costly and not especially useful in pre-Cambrian oceans, and in such cases, it may have been comparatively easy to lose such genes and with them neuronal-type cells.
Interpretation of neural evolution requires an accurate metazoan phylogeny, and the phylogenetic relationships of early-branching metazoans have been a topic of continued controversy. Some analyses support the traditional phylogenetic position of sponges as sister group to all other metazoans (“Porifera-sister”) [Philippe et al., 2009, Pick et al., 2010, Nosenko et al., 2013, Pisani et al., 2015, Simion et al., 2017] while others suggest that Ctenophora are the sister group to all other animals (“Ctenophora-sister”) [Dunn et al., 2008, Ryan et al., 2013, Whelan et al., 2015], and some analyses also recover the classical view, a Coelenterata clade uniting Cnidaria and Ctenophora [Philippe et al., 2009, Simion et al., 2017]. Importantly, phylogenomic analyses can be prone to systematic artifacts under some circumstances, depending on taxon sampling [Pick et al., 2010, Philippe et al., 2011], gene set [Nosenko et al., 2013], phylogenetic model [Pisani et al., 2015], or use of nucleotides instead of proteins [Jarvis et al., 2014]. Other methods based on presence or absence of the genes themselves have been proposed to provide a sequence-independent inference of phylogeny [Ryan et al., 2010, Ryan et al., 2013, Pisani et al., 2015], relying on the assumption that gene loss is a rare event. However, non-bilaterians have the additional problem that basic knowledge of many aspects of their biology is absent [Dunn et al.,2015], and so the biological context that may separate or unite groups is limited.
In the context of phylogeny, the branching order critically affects whether neurons evolved multiple times or were lost (see schematic in Figure 1). Given the gradient of neural-like qualities, the actual evolutionary scenario may be somewhere in between a simple gain-loss of neurons. While some previous studies have focused on neural evolution in ctenophores [Ryan et al., 2013, Alberstein et al., 2015, Li et al., 2015] or analysing the genomic data from A. queenslandica [Krishnan et al., 2014], these alone do not provide a comprehensive picture of all animals.
Here we have sequenced the genome of the contractile laboratory demosponge Tethya wilhelma [Sara et al., 2001] and examined the protein repertoire in the context of genes mediating the contraction, and other neural-like functions. Many metabolic genes show unique expansions in different sponge clades, as well as other phyla, making it challenging to clearly assign functions based on similarity to human proteins. We consider these expansions in the context of phylogenetics, showing that even though sponges lack neurons, signaling pathways have still expanded. This gives support to the hypothesis that early neural-like cells have become neurons multiple times in the history of animals.
Results
Genome assembly and annotation
We generated a total of 61 gigabases of paired-end reads from a whole specimen of T. wilhelma (Figure 2) and all associated bacteria. Because of a close association with microbes, some contigs were expected to have derived from bacteria, as many reads have unexpectedly high GC content (Supplemental Figures 1-4). After assembly and filtration of bacterial contigs, the final assembly was 125Mb, similar to A. queenslandica, with a N50 value of 70kb (Supplemental Table 1). Gene annotation was done with a combination of a deeply-sequenced RNAseq library from an adult sponge and ab initio gene predictions. Because of high density of genes, extensive manual curation was often necessary to correct genes of the same strand that were erroneously merged. After correction and filtering of the ab initio predictions, we counted 37,416 predicted genes, comparable with the counts in A. queenslandica (40,122) [Fernandez-Valverde et al., 2015] and S. ciliatum (40,504) [Fortunato et al., 2014].
General trends in splice variation were similar between T. wilhelma and A. queenslandica (Supplemental Tables 2 and 3), suggesting similar underlying biology or genome structure. One-to-one orthologs from T. wilhelma and A. queenslandica had relatively low identity (Supplemental Figure 5), with the average identity of 57.8%, showing a high genetic diversity within Porifera. The average identity is lower when compared to S. ciliatum (49.7%), N. vectensis (53.5%) and human (52.0%), which is not surprising given that A. queens-landica and T. wilhelma are both demosponges. Although both genomes are too fragmented to find syntenic chromosomal regions, ordered blocks of genes are still identifiable between T. wilhelma and A. queenslandica (Supplemental Figure 6), though not with S. ciliatum.
Neurotransmitter metabolism across early-branching metazoans
Compared to most other metazoans, sponges have a limited set of behaviors (contraction, closure of osculum or choanocyte chambers to control flow), yet respond to many signaling molecules present in bilaterians [Ellwanger and Nickel, 2006, Ellwanger et al., 2007]. Some genes involved in vertebrate-like neurotransmitter metabolism have been found in sponges [Riesgo et al., 2014a, Krishnan and Schiöth, 2015]. although many display a sister-group relationship to homologs found in other animals and appear to have a complex evolutionary history with duplications in sponges and other non-bilaterian animals (Figure 3, Supplemental Figures 7-9), making the prediction of their functions difficult. Implicitly, presence of a gene usually means a one-to-one orthology relationship with a functionally annotated protein, probably a human protein. Since many of the non-bilaterian proteins in our set are many-to-many orthologs to human proteins with known functions, declaring presence or absence of any individual gene or genetic module is not correct in the strictest sense, as one-to-many or many-to-many orthologs are not the same gene. In such cases, it is not currently possible to computationally predict which, if any, of the sponge orthologs shares its function with a human protein.
For instance, biosynthesis of monoamine neurotransmitters (dopamine, serotonin, etc.) requires two enzymes, tryptophan hydroxylase and tyrosine hydroxylase. These two enzymes appear to have arisen in bilaterians from duplications of an ancestral phenylalanine hydroxylase [Cao et al., 2010], though evidence is lacking as to whether this ancestral protein had multiple functions that specialized after duplication (sub-functionalization) or developed new functions (neofunctionalization) post-duplication. The absence of these proteins in non-bilaterians seems to be ancestral; in other words, they had not evolved yet when these groups split and diversified.
Among other non-bilaterians, some monoamine neurotransmitters are found in cnidarians [Carlberg and Rosengren, 1985], but are mostly absent in ctenophores (or at detection limit) [Moroz et al., 2014]. Indeed, previous studies were unable to find homologs of DOPA decarboxylase (AADC, Supplemental Figure 8), dopamine β-hydroxylase (DBH, Supplemental Figure 7), monoamine oxidase (MAO, Supplemental Figure 9), or tyrosine hydroxylase (TH) in the genome of the ctenophore M. leidyi or any available ctenophore transcriptome, and it was suggested that some of these proteins were absent in sponges as well (see Supplementary Tables 17 and 19 in [Ryan et al., 2013]). However, we found orthologs of MAO and homologs of AADC and DBH in several sponges, though it is unclear if they perform the same function as the human proteins. Additionally, homologs of four enzymes, AADC, MAO, DBH, and ABAT, are present in single-celled eukaryotes but not ctenophores, implying a secondary loss of these protein families in this phylum.
GABA receptors
The neurotransmitter gamma-amino butyric acid (GABA) has been shown to affect contraction in T. wilhelma [Ellwanger et al., 2007] and the freshwater sponge E. muelleri [Elliott and Leys, 2010]. The genome of T. wilhelma contains metabotropic receptors (GABA-B, mGABARs), but not ionotropic GABA receptors (GABA-A, iGABARs). While humans have two mGABARs and the ctenophore M. leidyi has only one, the T. wilhelma genome has nine. Sponges appear to have undergone a large expansion of this protein family (Figure 4), similar to the expansion of glutamate-binding GPCRs previously observed in sponges [Krishnan et al., 2014]. Based on the structure of the binding pocket of human GABAR-B1 [Geng et al., 2013], many differences are observed across the mGABAR protein family, even showing that many residues involved in coordination of GABA are not conserved between the two human proteins or all other animals (Supplemental Figure 11). Contrary to previous reports [Ramoino et al., 2010], we were unable to find normal mGABARs in the two calcareous sponges S. ciliatum and L. complicata. Instead, in these two species, the best BLAST hits from human GABA-B receptors (the putative mGABARs) had the best reciprocal hits to Insulin-like growth-factor receptors. Structurally, this was due to the normal seven-transmembrane domain being swapped with a C-terminal protein kinase domain (Figure 5), meaning these are not true metabotropic GABA receptors. Similarly, in the filasterean Capsaspora owczarzaki, the N-terminal ligand binding domain is also exchanged with other domains, suggesting as well that these are not true metabotropic GABA receptors.
Glutamate receptors
Glutamate is of particular interest as it is a key metabolic intermediate and the main excitatory neurotransmitter in animal nervous systems, acting on two types of receptors: the metabotropic glutamate receptors (mGluRs) and the ionotropic ones (iGluRs). Some sponge species possess iGluRs, though these receptors were absent in the transcriptomes of several demosponges [Riesgo et al., 2014a]. We were unable to find iGluRs in the genome of T. wilhelma, in the genome and transcriptomes of any other demosponge, or in the genomes of two choanoflagellates (M. brevicollis and S. rosetta). The top BLAST hits in demosponges have a GPCR domain instead of the ion channel domain, indicating that these are not true iGluRs (Supplemental Figure 13). Because the domain structure in plants is the same as most animal iGluRs, the ligand binding domain was swapped out in demosponges.
The homoscleromorpha/calcarea clade appears to have an independent expansion of iGluRs (Supplemental Figure 12), though the normal ion transporter domain is switched with a SBP-bac-3 domain (PFAM domain PF00497) compared to all other iGluRs (Supplemental Figure 13). Additionally, ctenophores and placozoans appeared to have dramatic expansions of this protein family as well [Ryan et al., 2013, Moroz et al., 2014, Alberstein et al., 2015], suggesting that a small set of iGluRs was present in the common ancestor of eukaryotes and have diversified multiple times in both plants and animals, while other clades appear to have modified or lost these proteins.
Vesicular transporters
Secretory systems are a common feature of all eukaryotes, as most cells have endoplasmic reticulum to secrete proteins or make membrane proteins. Neurons secrete peptides (conceptually identical to any other protein) or small-molecule neurotransmitters in a paracrine fashion, specifically to other neural cells. Compared to peptides, small-molecule neurotransmitters need to be loaded into vesicles by dedicated transport proteins. Vesicular glutamate transporters (VGluTs, SLC17A6-8) are part of a superfamily of transporters [Sreedharan et al., 2010] that carry glutamate, aspartate, and nucleotides. The position of sponge proteins in the tree is inconsistent with a clear role in glutamate transport (Supplemental Figure 14), as several sponge clades and ctenophores occur as sister group to multiple duplications. Transporters in sponges, ctenophores, and choanoflagellates may well act upon glutamate or other amino acids, but this needs to be experimentally investigated.
Similar to glutamate, GABA is loaded into vesicles with the vesicular inhibitory amino acid transporter (VIAAT). Ctenophores, sponges, and placozoans lack one-to-one homologs of VIAAT (Supplemental Figure 15). Several other transporters are thought to transport GABA (ANTL or SLC6 class) and many other amino acids. SLC6-class transporters, which transport diverse amino acids, are found in all non-bilaterian groups, so the function of VIAAT may be redundant.
Glycine receptors
Glycine is known to affect the contraction of T. wilhelma [Ellwanger and Nickel, 2006]. Some ctenophore iGluRs have been shown to bind glycine [Alberstein et al., 2015] due to the substitution of serine for arginine (S687 in human GluN1), though this appears to be specific only to ctenophores, as essentially all other iGluRs have the conserved serine/threonine at this position. Because no ionotropic glycine receptors could be identified in the T. wilhelma genome (or any other sponges, ctenophores or placozoans), other proteins may be responsible for mediating this effect.
Mechanical receptors
Some sponges can contract in response to mechanical agitation, as reported for the demosponges E. muelleri [Elliott and Leys, 2007] and T. wilhelma [Nickel, 2010]. Several diverse protein families appear to be responsible for the sense of touch [Árnadóttir and Chalfie, 2010]. A subgroup of the TRP (transient receptor-potential) channels, TRP-N, thought to mediate mechanosensation was determined to be absent in sponges [Schuler et al., 2015], and we were unable to identify any in either T. wilhelma or S. ciliatum, although other TRP-class channels were found [Ludeman et al., 2014, Schuler et al., 2015]. Because the mechanosensory function of TRP channels may be redundant, we analysed for the presence of PIEZO, a 280kDa trimeric protein [Ge et al., 2015] involved in touch sensation in mammals [Coste et al., 2012]. Although two homologs were found in vertebrates, we found one copy in all other animals (Supplemental Figure 16) as well as fungi, plants and most other eukaryotic groups, suggesting an ancient and conserved function of this protein.
Voltage-gated channels
Voltage-gated ion channels are necessary for the propagation of electrical signals down axons and dendrites [Zakon, 2012, Moran et al., 2015], and have specificities for sodium, potassium, or calcium. Previous analyses were unable to clearly identify potassium or sodium channels in sponges [Liebeskind et al., 2011]; only one partial potassium channel was found in the transcriptome of the homoscleromorph Corticium candelabrum [Riesgo et al., 2014a, Li et al., 2015]. We were unable to find any voltage-gated sodium or potassium channels in the genome or transcriptome of any sponge. We then examine voltage-gated hydrogen channels (hvcn1), as these proteins have been found in a number of single-celled eukaryotes [Smith et al., 2011], and are extremely conserved. These channels were found in all sponge groups, although the high protein identity resulted in a poorly-resolved tree (Supplemental Figure 19).
Reports of action potentials in hexactinellids [Leys et al., 1999, Leys et al., 2007, Nickel, 2010] showed that sponge action potentials were inhibited by divalent cations [Leys et al., 1999], suggesting a role of calcium channels instead. Because voltage-gated sodium and calcium channels arose from a duplication event [Gur Barzilai et al., 2012], the ion selectivity may be variable within this protein family. Most sponges have only a single CaV-channel (Supplemental Figure 18) and several Hv-channels, and no voltage-gated channel of any kind was found in any glass sponge. However, all glass sponge sequences are from transcriptomes, therefore either the expression level of the true channels is low in glass sponges, or they have independently evolved another mechanism to propagate action potentials.
Discussion
Gene content variation of metazoa
Among the thousands of genes in the genome, we focused on genes that may be mediating contractile behavior in T. wilhelma, and the interactions of those genes within broader metabolic pathways. Many of the “housekeeping” genes in our study have lineage-specific duplications in at least one animal phylum. Considering the importance of “single-copy” proteins in phylogenetic analyses, as taxon sampling improves, it may be found that very few or no genes are single copy across most or all animal phyla. Many other genes that are critical for neural functioning in bilaterians have independent losses in other animal lineages (Figure 6).
Glutamate and GABA receptor evolution
There is stark contrast in the relative abundance of mGABARs and iGluRs in sponges and ctenophores. The relative dearth of mGABARs in ctenophores may reflect the apparently absence of amino-butyrate amino-transferase (ABAT) in ctenophores, suggesting that ctenophores use an alternate pathway to produce glutamate or metabolize GABA (Figure 3), rarely use GABA as a neurotransmitter, or simply are missing this pathway. Other aminotransferases such as GLUD or TAT may perform some of the exchange between α-ketoglutarate and glutamate, particularly as ctenophores have two copies of GLUD while most animals have only one. Ctenophores also have multiple (variable) copies of glutamate synthase and three copies of KYAT one of which may serve to balance glutamate metabolism in these animals.
There are two explanations for the diversity of mGABARs in sponges. Given the high variability of amino acids in the mGABAR binding pocket (Supplemental Figure 11), it is plausible that many of these receptors do not bind GABA at all, and have diversified for other ligands. There is precedent for this as it was shown that the independent expansion of ctenophore iGluRs also included several key mutations to the binding pocket which changed the ligand specificity of these proteins [Alberstein et al., 2015]. For the other hypothesis, all of the receptors could bind GABA, essentially mediating the same contraction signal, but their kinetics could differ and be influenced by factors such as, for instance, temperature. Because sponges are mostly immobile, they often can be subject to environment variation in terms of light, oxygen, and temperature. The possession of a set of proteins capable of triggering the same response (e.g. contraction) with varying daily or seasonal environmental conditions (e.g. temperature) would be beneficial and may explain the diverse set of receptors observed in sponges. Experimental characterization of these binding domains is necessary and may even show that a combination of these hypotheses explains the diversification of mGABARs in Porifera.
The apparent absence of true mGABARs in calcareous sponges (the genome of S. ciliatum and transcriptome of L. complicata) conflicts with a previous study that identified key proteins in the GABA pathway by immunostaining [Ramoino et al., 2010]. The best mGABARs BLAST hits found in the two calcareous sponges display a conserved ligand binding domain but the seven-transmembrane domain has been swapped with a tyrosine kinase domain (Figure 5). Structural similarity of the conserved N-terminal domain may result in a false-positive signal in studies using immunostaining with standard antibodies [Ramoino et al., 2010]. On the other hand, compared to ctenophores, which apparently lack ABAT, this enzyme was found in both of the calcareous sponges analyzed. Thus it would be surprising if these sponges had no capacity to create or respond to GABA. Since true vertebrate-like mGABARs are found in all other sponge classes, and our study could only examine two calcareous sponges, it could be that mGABAR presence is variable in this class. The genome of S. ciliatum contains 40 proteins annotated as mGluRs [Fortunato et al., 2014], so a third possibility is that even in the absence of true mGABARs, some of these proteins may have evolved affinity for GABA and mediate its signaling in calcareous sponges.
Although a putative iGluR was identified in the transcriptome of the demosponge Ircinia fasciculata, this sequence was only a fragment, so the glutamate affinity and domain structure could not be determined. As with the mGABARs, the domain structure is different between the sponge classes. Otherwise, it appears that only calcareous sponges and homoscleromorphs have NMDA/AMPA-like iGluRs. The presence of these proteins in plants and other single-celled eukaryotes suggests that at least iGluRs were present in the common ancestor of all eukaryotes, and their absence in demosponges is likely the product of secondary losses. In the context of contractions of T. wilhelma, the abundance of mGluRs and mGABARs could plausibly work in antagonistic ways via the action of different G-proteins making ionotropic channels not necessary for the modulation of this behavior.
Variation in neurotransmitter metabolism
Many of the oxidative enzymes in the monoamine pathway require molecular oxygen, suggesting an important role of this molecule both the synthesis of the neurotransmitters (with PAH, TH, and DBH) and their inactivation (with MAO). Two catabolic pathways arise from tyrosine (Figure 3) and require oxygen at nearly all steps. It is unclear why intermediate products of one of these two pathways, the catecholamine pathway, became neurotransmitters and the other did not, particularly as hydroxyphenylpyruvate pathway is universally found in animals and catabolic intermediates are likely to be ubiquitous.
MAO was found in most animal groups, but we were unable to find any in placozoans or ctenophores. The topology of the MAO phylogenetic tree suggests a secondary loss of this protein in these phyla (Supplemental Figure 9). Related genes (PAOX, polyamine oxidase) were found in placozoans with several placozoan-specific duplications, and again, potentially one of these may catalyze the oxidation of aromatic amines. The analysis of these proteins also uncovered a clade including sponges, cnidarians, and lancelets, though the function of these proteins cannot be predicted based on homology searches. In vitro characterization of these enzymes may reveal the function to provide evidence as to how these could have been important for metabolism in early animals, and was subsequently replaced or lost in most other metazoan lineages.
Remarkably, the DBH group has independent expansions in three sponge classes as well as placozoans and cnidarians (Supplemental Figure 7). No DBH homologs were identified in calcareous sponges or in ctenophores. A putative homolog of this group was found in the choanoflagellate M. brevicollis but not in any other non-metazoan. The alignment and the phylogenetic position of the M. brevicollis protein suggest that it may be a member of the copper-binding oxygenase superfamily, rather than a true homolog of DBH (see Supplemental Alignment).
The presence of DBH-like and AADC-like enzymes in most animal groups suggests the possibility to make phenylethanolamines (like octopamine or noradrenaline) from tyrosine, and then subsequently inactivate them with MAO. All demosponges appear to lack AADC, and ctenophores appear to lack both of these enzymes calling into question a previous report of the detectability of monoamine neurotransmitters in ctenophores [Carlberg and Rosengren, 1985].
Conserved properties of neurons
Neurons are defined by the presence of five key aspects: membrane potential, voltage-gated ion channels, secretory pathways, ligand-gated ion channels, and cell-cell junctions to form synapses. Voltage-gated channels, secretory systems, and ligand-gated ion channels are discussed above. Membrane potential is maintained in animal cells by sodium-potassium pumps (ATP-ases), which are a class of cation pumps exclusively found in animals [Stein, 1995, Sáez et al., 2009]. It is thought that such pumps are necessary because animals are the only multicellular group that lacks any kind of cell wall, thus careful control of ionic balance is necessary to resist osmotic stress [Stein, 1995]. For non-bilaterian animals, cell layers were in direct contact with water, so potentially all cells needed this protein to function normally. Therefore having neuron-like functionality is unlikely to rest upon the gain or loss of this gene. The last feature is the presence of cell-cell connections. Many proteins involved in synapse structure or neurotransmission are found in sponges, [Srivastava et al., 2010, Riesgo et al., 2014a, Moran et al., 2015, Leys, 2015] though it is not clear which genes are necessary for neural functioning, or may have evolved independently.
Neural evolution and losses
Based on recent phylogenies, both Porifera-sister and Ctenophora-sister evolutionary scenarios require either at least one loss of neurons or two independent gains (Figure 1) of this cell-type. The only scenario that allows for a single evolution of neurons and no losses is the “Coelenterata” hypothesis (reviewed in [Jékely et al., 2015]), which joins cnidarians and ctenophores in a clade. However, many molecular datasets [Dunn et al., 2008, Ryan et al., 2013, Whelan et al., 2015, Pisani et al., 2015] and morphological evidence [Harbison, 1985] argue against this scenario (but also see [Philippe et al., 2009] and [Simion et al., 2017]). One other alternative is that placozoans have an unidentified neuron-like cell in a Porifera-sister context, which would therefore allow for a single origin of neural systems in animals and no losses.
What do the two different scenarios mean for evolution of neuronal cells? Considering the basic properties of neurons related to electrical signaling or secretory pathways, it had been shown before that many of the genes involved are universally found in animals. A single origin and multiple losses implies that the genetic toolkit necessary for all of these functions was present in the same single-celled organism or the same cell type (an hypothetical proto-neuron) of the last common ancestor of crown-group Metazoa, and either that cell type was lost or its functions were split up.
Sponges and ctenophores both appear to have lost several gene families (Figure 6), though ctenophores nonetheless have neural cells. Thus, the losses of the GABA or monoamine pathways are not critical for the functioning of neural cell types overall. However, voltage-gated potassium and sodium channels are thought to be essential for the propagation of electrical signals down axons and dendrites and have been found in all animal groups except sponges [Moran et al., 2015]. The NaV-channel tree shows a single origin of this protein family (Supplemental Figure 17), and presence of these channels in choanoflagellates suggests they were present in the common ancestor of all animals; the apparent absence in sponges therefore is probably a secondary loss. By comparison, ctenophores have a mostly-unique expansion of Kv-channels relative to the rest of metazoans [Li et al., 2015] and a duplication in NaV channels. Together with the loss of this protein family in sponges, the gene content argues for a combination of both multiple, independent gains and a loss of neural-type cells and their associated functions across animals.
Properties of the earliest metazoans are unknown, including life cycle or number of cell types, but it is most parsimonious that the first obligate multicellular animals did not have anything resembling a modern bilaterian nervous system [Wray et al., 2015]. Yet, the genomic evidence shows that these animals have cellular capacity to respond to environmental or paracrine signals, regulate the cell internal ion concentrations and respond to changes in their concentrations, and secrete small molecules that could serve as effectors in unconnected (but proximal) cells. Thus, the earliest animals likely had the capacity to develop nerve cells using the genetic toolkit they possessed, though the number of times this occurred is unclear. This capacity appears to have been lost in sponges with the loss of voltage-gated channels. As we were unable to find putative genes to mediate action potentials in glass sponges, either all of the four transcriptomes were incomplete or the unique action potentials of glass sponges may represent a third case of the evolution of neural-like functions in Metazoa.
Methods
Sequence data
Project overview can be found at spongebase.net. Reference data from the demosponge Tethya wilhelma are available at: https://bitbucket.org/molpalmuc/tethya_wilhelma-genome
Raw genomic reads for T. wilhelma are available on NCBI SRA under accession numbers SRR2163223 (genomic reads), SRR2296844 (mate pairs), SRR5369934 (DNA Moleculo), and SRR4255675 (RNAseq).
Genome assembly
Processing and assembly
We generated 25Gb of 100bp paired-end Illumina reads of genomic DNA and 35Gb of 125bp Illumina gel-free mate-pair reads. Contigs were assembled with SOAPdenovo2 [Luo et al., 2012] using a kmer of 83bp. We also generated 436Mb of Moleculo synthetic long reads. Because both haplotypes are represented in the Moleculo reads, we merged the Moleculo reads using HaploMerger [Huang et al., 2012]. Contigs and merged Moleculo reads were then scaffolded using the gel-free mate-pairs with SSPACE [Boetzer et al., 2011] and BESST [Sahlin et al., 2014]. The first draft assembly had 7,947 contigs, totaling 145 megabases.
Removal of low-coverage contigs
To examine the completeness of the genome, we generated a plot of kmer coverage against GC percentage for the contigs (Supplemental Figure 1) using custom Python scripts (available at http://github.org/wrf/lavaLampPlot). This revealed 1,040 contigs with a coverage of zero that were carried over from the Moleculo reads and were not assembled (Supplemental Figure 2), accounting for 6 megabases. As these reads likely derived from bacterial contamination in the aquarium water, these contigs were removed, leaving 6,907 contigs totalling 138 megabases.
Separation of bacterial contigs
Additionally, the plot revealed many contigs with lower coverage (20x-90x) and high GC content (50-75%) suggesting the presence of bacteria (Supplemental Figure 3). Because many of these contigs were shorter than 10kb, separation of the bacterial contigs was done through several steps. We found 4,858 contigs with mapped RNAseq reads and GC content under 50%, as expected of metazoans. These contigs accounted for 88% of the sponge assembly, or 121 megabases. For the 2,014 contigs with no mapped RNAseq, we used blastn to search the contigs against the A. queenslandica scaffolds and all complete bacterial genomes from Genbank (5,242 sequences). Based on subtraction of bitscores, 62 contigs were identified as sponge and 565 were identified as bacterial. For the remaining 1,387 contigs, most of which were under 10kb, we repeated the search with tblastx against A. queenslandica scaffolds and the genomes of Sinorhizobium medicae and Roseobacter litoralis, which were the most similar complete genomes to the two bacterial 16S rRNAs identified in the contigs. After all sorting, 798 putative bacterial contigs accounted for 12.7 megabases and were separated to bring the total to 6,109 sponge contigs. Contigs for the two bacteria were binned by tetranu-cleotide frequency using MetaWatt [Strous et al., 2012] (Supplemental Figure 4).
Genome coverage and completeness
Coverage was estimated two ways: kmer frequency and read mapping. Kmers of 31bp were counted using the Jellyfish kmer counter [Marçais and Kingsford, 2011] and analyzed using custom Python and R scripts (“fastqdumps2histo.py” and “jellyfish_gc_coverage_blob_plot_v2.R”, available at http://github.org/wrf/lavaLampPlot). As expected, the kmer distribution showed two peaks, one for kmers at heterozygous positions and one for homozygous positions, whereupon the coverage peak was at 131-fold coverage for homozygous positions. Because of sequencing errors, this method often underestimates coverage, and so to confirm this estimate we then mapped all reads to the genome using Bowtie2 [Langmead and Salzberg, 2012]. The sum of mapped reads divided by the total length provided an estimated coverage of 159-fold physical coverage.
Of the original reads, 185 million (86.5%) mapped back to the assembled sponge contigs. Completeness for gene content was assessed with BUSCO [Simão et al., 2015], whereupon we found 728 (86%) complete genes and 42 (4.9%) predicted-incomplete genes. Overall, these data suggest that the genome assembly is adequate for downstream analyses.
Genome annotation
Transcriptome versions
The transcriptome for T. wilhelma was assembled de novo using Trinity (release r20140717) [Grabherr et al., 2011, Haas et al., 2013]. Default parameters were used, except for strand specific assembly, in silico read normalization, and trimming (–SS_lib_type RF –normalize_reads –trimmomatic). This produced 127,012 transcripts with an average length of 913bp. Assembled transcripts were mapped to the genomic assembly using GMAP [Wu and Watanabe, 2005] to produce a GFF file of the transcript mapping. Of these, 114,744 transcripts were mapped 166,847 times, allowing for multiple mappings.
For the genome-guided transcriptome, strand-specific RNAseq reads were mapped against the genome build using Tophat2 v2.0.13 [Kim et al., 2013] using strand-specific mapping (option –library-type fr-firststrand) and otherwise default parameters. Mapped reads were then joined into transcripts using StringTie v1.0.2 [Pertea et al., 2015] with default parameters.
Additionally, ab initio gene models were predicted using AUGUSTUS [Stanke et al., 2008]. AUGUSTUS was trained on the webAugustus server [Hoff and Stanke, 2013] using the highest expression transcripts for each Trinity component and the assembled contigs. This identified 27,551 putative genes. The majority of these overlapped partially or completely with a predicted gene based on the Trinity mapping or Stringtie genes. However, 3,866 genes (4,321 transcripts) had no overlap with any predicted exon from either the Trinity or StringTie set, and were kept for the final set. Considering the possibility that some of these may be pseudogenes, we aligned these proteins to the SwissProt database with BLASTP [Camacho et al., 2009]. Of these, only 759 had reliable hits (E-value < 10−5) to 688 proteins. The annotated functions were diverse, including proteins similar to many receptors and large structural proteins such as fibrillin (potentially any protein with EGF repeats), dynein heavy chain, and titin; because very large proteins may be split across multiple contigs, the predicted genes may be only fragments of the full gene. Only 42 of the hits were against transposable elements.
Filtering of the final gene set
Because assembly of transcripts for both StringTie and Trinity relies on overlaps in the genome or RNAseq reads, genes that overlap in the untranslated regions (UTRs) can sometimes be erroneously fused. For StringTie, we developed a custom Python script to separate non-overlapping transcripts belonging to the same “gene” (stringtiesplitgenes.py, available at https://bitbucket.org/wrf/sequences/). Tandem duplications can lead to RNAseq reads bridging the two tandem copies and result in both copies being called the same gene. The original StringTie set contained 46,572 transcripts for 32,112 genes, while the corrected set contained 33,200 genes and identified 1,088 new non overlapping genes.
Positional errors in the genome or allelic variations may result in some RNAseq reads not mapping to the genome, so some genes are fragmented in the genome-guided transcriptome but not the de novo assembly. Making use of the protein predictions from TransDecoder, we compared the predicted proteins between the two transcriptomes using a custom Python script (transdecodersplitgenes.py, available at https://bitbucket.org/wrf/sequences/). This identified 406 StringTie transcripts that were better modeled by Trinity transcripts.
Functional gene annotation
Many genes of functional importance were examined manually, and the best transcript from StringTie, Trinity, or AUGUSTUS was retained for the final gene set. In the GFF and fasta versions of the transcriptomes, names of protein functions were assigned several ways. Target genes that were manually curated and edited, such as those used in all trees, are named by the generic function or the annotated function of the closest human protein. For instance, the dopamine beta-hydroxylase (DBH) homolog in T. wilhelma was manually corrected, and the position in a phylogenetic tree demonstrated that demosponges diverged before the duplication which created DBH and the two DBH-like proteins in humans, thus the T. wilhelma protein is annotated as DBH-like. Secondly, automated ortholog finding pipelines (HaMStR [Ebersberger et al., 2009]) used for phylogeny [Cannon et al., 2016] have identified homologs in T. wilhelma, which have been manually checked based on positions in the phylogenetic trees. Thirdly, single-direction BLAST results were kept as annotations provided that the BLAST hit had a bitscore over 1000, or a bitscore over 300 and the T. wilhelma protein covered at least 75% of the best hit against the human protein dataset from SwissProt. The bitscore and length cutoffs were applied to reduce the number of annotations based on a single domain.
Analysis of splice variation
Using the transcriptome from StringTie, splice variation was assessed using a custom Python script (splice-variantstats.py, available at https://bitbucket.org/wrf/sequences/). In this script, several ambiguous definitions were clarified to define the different splice types. Firstly, single exon genes with no variants are distinguished from single exon genes with variations, that is, a gene with two exons can have a variant with one exon. For loci with only two transcripts, the canonical or main transcript is defined as the one with the higher expression level, as measured by the higher FPKM value reported from StringTie. For loci with three or more transcripts, main or canonical exons are those included in at least two transcripts. A cassette exon must occur in less than 50% of the transcripts for a locus, otherwise such case is defined as a skipped exon. A retained intron is any portion that exactly spans two other exons; for highly expressed transcripts this may include erroneously retained introns due to intermediates in splicing. A summary of the splicing types is displayed in Supplemental Table 2.
Intron retention was recently reported to be a common mode of alternative splicing in A. queens-landica [Fernandez-Valverde et al., 2015]. We found 3,295 transcripts with 3,565 retained intron events (Supplemental Table 2). We then analyzed the length of the retained introns and found the phase of the retained piece to be randomly distributed (unlike cassette exons, Supplemental Table 3), suggesting that many of the retained introns result from incomplete splicing rather than functional retention.
Microsynteny across sponges
Putative synteny blocks were identified using a custom Python script (microsynteny.py, available at https://bitbucket.org/wrf/sequences/). Briefly, the script combines the gene positions on scaffolds for both the query and the reference with BLASTX hits for the query against the reference. If a minimum of three genes in a row on a query scaffold match to different genes on the same reference scaffold, the group is kept. By default, this mandated a gap of no more than five genes before discarding the block, and that the next gene must occur within 30kb. This method was designed to work for highly fragmented genomes with thousands of scaffolds, so the order and direction of the corresponding genes on the reference scaffold do not need to match those of the query scaffold.
StringTie transcripts for T. wilhelma were aligned against the A. queenslandica v2.0 protein set with BLASTX [Camacho et al., 2009], and positions were taken from the accompanying A. queenslandica v2.0 GTF. The same procedure was attempted against the S. ciliatum gene models, though essentially no syntenic blocks were detected, indicating either substantial differences in gene content or gene order between demosponges and calcareous sponges.
Collection of reference data
Proteins for Oikopleura dioica [Denoeud et al., 2010] were downloaded from Genoscope. Gene models for Ciona intestinalis [Dehal et al., 2002], Branchiostoma floridae [Putnam et al., 2008], Trichoplax adherens [Srivastava et al., 2008], Capitella teleta, Lottia gigantea, Helobdella robusta [Simakov et al., 2013], Saccoglossus kowalevskii [Simakov et al., 2015], and Monosiga brevicollis [King et al., 2008] were downloaded from the JGI genome portal. Gene models for Sphaeroforma arctica, Capsaspora owczarzaki [Suga et al., 2013] and Salpingoeca rosetta [Fairclough et al., 2013] were downloaded from the Broad Institute.
We used genomic data of the cnidarians Nematostella vectensis [Moran et al., 2014], Exaiptasia pallida [Baumgarten et al., 2015], and Hydra magnipapillata as well as transcriptomes from 33 other cnidarians [Bhattacharya et al., 2016, Zapata et al., 2015, Pratlong et al., 2015, Brinkman et al., 2015, Ponce et al., 2016], mostly corals.
For demosponges, we used the genome of Amphimedon queenslandica [Srivastava et al., 2010, Fernandez-Valverde et al., 2015] and transcriptomic data from: Mycale phyllophila [Qiu et al., 2015], Petrosia fici-formis [Riesgo et al., 2014a], Crambe crambe [Versluis et al., 2015], Cliona varians [Riesgo et al., 2014b], Hal-isarca dujardini [Borisenko et al., 2016], Crella elegans [Pérez-Porro et al., 2013], Stylissa carteri, Xestospongia testutinaria [Ryu et al., 2016], Scopalina sp., and Tedania anhelens. We used data from the genome of the calcareous sponge Sycon ciliatum [Fortunato et al., 2014] and the transcriptome of Leucosolenia complicata. For hexactinellids (glass sponges), we used transcriptome data from Aphrocallistes vastus [Ludeman et al., 2014], Hyalonema populiferum, Rosella fibulata, and Sympagella nux [Whelan et al., 2015]. For homosclero-morphs, we used two transcriptomes from Oscarella carmela and Corticium candelabrum [Ludeman et al., 2014].
We used data from the two published draft genomes of ctenophores [Ryan et al., 2013, Moroz et al., 2014], as well as transcriptome data from 11 additional ctenophores: Bathocyroe fosteri, Bathyctena chuni, Beroe abyssicola, Bolinopsis infundibulum, Charistephane fugiens, Dryodora glandiformis, Euplokamis dunlapae, Hormiphora californensis, Lampea lactea, Thalassocalyce inconstans, and Velamen parallelum.
We used data from the unpublished draft genome of a novel placozoan species, designated H13.
Gene trees
For protein trees, candidate proteins were identified by reciprocal BLAST alignment using blastp or tblastn. All BLAST searches were done using the NCBI BLAST 2.2.29+ package [Camacho et al., 2009]. Because most functions were described for human, mouse, or fruit fly proteins, these served as the queries for all datasets. Candidate homologs were kept for analysis if they reciprocally aligned by blastp to a query protein, usually human. Alignments for protein sequences were created using MAFFT v7.029b, with L-INS-i parameters for accurate alignments [Katoh and Standley, 2013]. Phylogenetic trees were generated using either FastTree [Price et al., 2010] with default parameters or RAxML-HPC-PTHREADS v8.1.3 [Stamatakis, 2014], using the PROTGAMMALG model for proteins and 100 bootstrap replicates with the “rapid bootstrap” (-f a) algorithm and a random seed of 1234.
Domain annotation
Domains for individual protein trees were annotated with “hmmscan” v3.1b1 from the HHMER package [Eddy, 2011] using the PFAM-A database v27.0 [Finn et al., 2016] as queries. Signal peptides were predicted using the stand-alone version of SignalP v4.1 [Petersen et al., 2011]. Domain structures were visualized using a custom Python script, “pfampipeline.py”, available at https://github.com/wrf/genomeGTFtools.
Acknowledgments
W.R.F would like to thank K. Achim, M. Nickel, J. Musser, J. Ryan, and I. Oldenburg for helpful comments. G.W and D.E. would like to thank M. Nickel for providing the initial T. wilhelma specimens to set up the culture in Munich. This work was supported by a LMUexcellent grant (Project MODELSPONGE) to D.E. and G.W. as part of the German Excellence Initiative, partially by research grant 9278 (“Early evolution of multicellular sponges”) from VILLUM FONDEN to G.W., and NIH grant NIGMS-5-R01-GM087198 to S.H.D.H. The authors declare no competing interests.