Abstract
Gap junction channels are formed by two unrelated protein families. Non-chordates use the primordial innexins, while chordates use connexins that superseded the gap junction function of innexins. Chordates retained innexin-homologs, but N-glycosylation prevents them from forming gap junctions. It is puzzling why chordates seem to exclusively use the new gap junction protein and why no chordates should exist that use non-glycosylated innexins to form gap junctions. Here, we identified glycosylation sites of 2270 innexins from 152 non-chordate and 274 chordate species. Among all chordates, we found not a single innexin without glycosylation sites. Surprisingly, the glycosylation motif is also widespread among non-chordate innexins indicating that glycosylated innexins are not a novelty of chordates. In addition, we discovered a loss of innexin diversity during the early chordate evolution. Most importantly, the most basal living chordates, which lack connexins, exclusively possess innexins with glycosylation sites. A bottleneck effect might thus explain why connexins have become the only protein used to form chordate gap junctions.
Introduction
Animals from hydra to human use gap junction channels to couple adjacent cells and thus enable direct intercellular communication. Interestingly, gap junction channels are formed by two unrelated integral membrane proteins: innexins and connexins. The innexins are the primordial gap junction proteins that have been identified in all eumetazoans except sponges, placozoa and echinoderms (Slivko-Koltchik et al., 2019). The connexins arose de novo during the early chordate evolution and constitute the gap junction channels of all living chordates except lancelets (Abascal et al., 2013; Mikalsen et al., 2021; Slivko-Koltchik et al., 2019). Despite the lack of sequence homology (Alexopoulos et al., 2004), the topology (Maeda et al., 2009; Michalski et al., 2020; Oshima et al., 2016) (Figure 1A, B) and function (Pereda et al., 2017; Skerrett et al., 2017) of connexin- and innexin-based gap junction channels are remarkably similar. Nevertheless, it is thought that chordates have completely replaced the innexin-based gap junctions by the novel connexin-based gap junctions. Vertebrates still express innexin-homologs (Baranova et al., 2004; Panchin et al., 2000), called pannexins, but it is supposed that they stopped forming gap junctions and since then only function as non-junctional membrane channels (Dahl et al., 2014; Esseltine et al., 2016; Sosinsky et al., 2011). This hypothesis is based on the discovery that the three pannexins of humans and mice are glycoproteins. Each of the pannexins contains an identified consensus motif (Asn-X-Ser/Thr) for asparagine (N)-linked glycosylation within either the first or the second extracellular loop (Penuela et al., 2007; Penuela, Simek, et al., 2014; Ruan et al., 2020; Sanchez-Pupo et al., 2018). This enables the posttranslational attachment of sugar moieties at the asparagine residue within the consensus sequence which hinders two pannexin channels of adjacent cells to form intercellular channels (Ruan et al., 2020) (Figure 1C). Based on these findings, it has been assumed that each vertebrate pannexin is equipped with a N-linked glycosylation site (NGS) and thus lost its gap junction function (Sosinsky et al., 2011). However, it remains unclear whether really all vertebrate pannexins are glycosylated and thus presumably only function as single membrane channels. It is also unknown whether glycosylation is indeed a novel modification gained by chordates to prevent their innexin-homologues from forming gap junctions. Specifically, previous studies have shown that at least two non-chordate species, Aedes aegypti (Calkins et al., 2015) and Caenorhabditis elegans (Kaji et al., 2007), possess an innexin protein with an extracellular NGS that can be glycosylated. These findings raise the intriguing possibility that N-glycosylation might actually be rather common in both chordate and non-chordate innexins, and that N-glycosylation might have played an important role in the evolution of gap junction proteins.
Since the experimental identification of N-glycosylated proteins is technically demanding, time consuming and expensive, accurate computational methods are commonly used to identify N-linked glycosylations sites (NGS) in primary amino acid sequences (Gupta et al., 2002; Pitti et al., 2019). In this study, we used the wealth of genomic data that is now available in several public protein and genomic databases to analyze the occurrence of NGSs in non-chordate and chordate innexins in silicio. Based on our findings, we suggest a new evolutionary scenario in which a loss in innexin diversity could explain why the connexins arose de novo during the early chordate evolution and why connexins have completely replaced the innexins that so successfully serve diverse functions in the nervous systems of invertebrates.
Results and Discussion
We first screened for innexin proteins across multiple non-chordate taxa by using innexin proteins as sequence queries in BLAST searches. Only hits that fulfilled defined criteria were included in our study (for more details see Materials and Methods). In total, we collected the amino acid sequences of 1405 non-chordate innexins from 152 species across 7 higher-level taxonomic groups (ctenophores, cnidarians, molluscs, annelids, platyhelminthes, nematodes and arthropods). We subsequently searched in each of the sequences for the consensus motif for N-glycosylation (Asn-X-Ser/Thr). As the extracellular glycosylation of pannexins hinders the gap junction formation (Ruan et al., 2020), we only included NGSs that are located in the extracellular loops of the innexins in our study. Surprisingly, we found that innexins with extracellular NGSs are widespread among the examined non-chordate phyla, comprising more than 80 % of the innexins in the ctenophores (Figure 1D, E, Figure 1–source data 1). The position of the NGSs within the extracellular loops as well as the residues around the N-glycosylation consensus motifs are not conserved between the phyla (Figure 1F). Within the single phyla, we found some innexin orthologs that have highly conserved NGSs and extracellular loops (Figure 1–figure supplement 1). However, we did not find any extracellular NGS that was conserved in all species within a phylum. This finding is presumably based on the phylum-specific diversification of innexins. As shown in previous studies (Abascal et al., 2013; Hasegawa et al., 2014; Moroz et al., 2014), and demonstrated in Figure 1D, innexins originated early in the metazoan evolution and have undergone diversification within the different non-chordate phyla. Thus, innexins with extracellular NGSs evolved independently numerous times within the single phyla.
The wide occurrence of innexins with NGSs in all non-chordate phyla (Figure 1D, E) as well as the experimentally confirmed NGSs (Calkins et al., 2015; Kaji et al., 2007) strongly suggest that a large fraction of non-chordate innexins are glycoproteins. These glycosylated innexin channels might then also not be able to form gap junction channels but rather function as non-junctional channels. However, some of the innexins with identified NGSs have previously been shown to form functional gap junction channels (Figure 1F, Figure 1–source data 3). This means that either the predicted NGSs of these innexins are not glycosylated (Apweiler, 1999) or that glycosylation does not necessarily entail the loss of gap junction function in the diverse innexins of invertebrates.
Our findings demonstrated that non-chordate animals possess a vast diversity of innexins, with and without NGSs, and thus function either as non-junctional membrane channels or as intercellular gap junction channels. This finding is in sharp contrast to the situation in chordates, where all innexins are assumed to be glycosylated and unable to form gap junction channels (Dahl et al., 2014; Esseltine et al., 2016; Sosinsky et al., 2011). But is there really not a single chordate species that uses pannexin-based gap junctions? Up to know, extracellular NGSs were only identified in human (Ruan et al., 2020), mouse (Penuela et al., 2007), rat (Boassa et al., 2007) and zebrafish (Kurtenbach et al., 2013; Prochnow et al., 2009) pannexins. To clarify the prevalence of extracellular NGSs in chordates, we again used public protein and genomic databases to screen for innexin proteins across multiple chordate taxa. In total, we collected the amino acid sequences of 865 chordate innexins from 274 species across 9 higher-level taxonomic groups (lancelets, tunicates, lampreys, cartilaginous fish, bony fish, amphibians, reptiles, birds, and mammals) and then searched, as previously described, in the extracellular loops of each of the sequences for the consensus motif for N-glycosylation (Asn-X-Ser/Thr). Our results clearly show that each single innexin in every chordate species has at least one NGS in its extracellular loops (Figure 2A, B, Figure 2–source data 1). Moreover, we show that in vertebrates the sequence of the extracellular loops as well as the positions of the glycosylation motifs are highly conserved (Figure 2C-F). Among the three pannexins, conservation is particularly high in the extracellular loops of Pannexin-2. Furthermore, conservation is still seen even after the whole-genome duplication in the common ancestor of teleost fishes (Glasauer et al., 2014), an event that generally provides a source of genetic raw material for evolutionary innovation and functional divergence. Still, each single species retained their pannexins with NGSs (Figure 2F and Supplementary File 4). This is remarkable because a single mutation in the N-glycosylation motif might be sufficient to recover the ability of pannexins to form gap junction channels (Ruan et al., 2020). The surprisingly high conservation of the location and the surrounding sequences of NGSs suggests that the pannexins serve essential roles, with correspondingly high stabilizing selective pressures (Abascal et al., 2013).
In summary, we show that N-glycosylation is present in both non-chordate and chordate species. Already simple organisms at the beginning of the metazoan evolution attached sugar moieties to some of their innexins to presumably prevent them from forming gap junction channels. In consequence, the vertebrate pannexins did not diverge and change their function driven by the appearance of the connexins but rather originate from an innexin already equipped with NGS. This would be consistent with findings that single membrane channels formed by pannexins and innexins have the same physiological functions and are similar in their biophysical and pharmacological properties (Dahl et al., 2014).
If it is typical for invertebrates to use a great diversity of glycosylated and non-glycosylated innexins and to even form gap junctions from both (Figure 1-figure supplement 2), then the situation in the vertebrates becomes even more puzzling: Why do all vertebrates exclusively retain glycosylated innexins, why do they not form gap junction from them (Ruan et al., 2020) and instead evolved and exclusively use the new connexins for functions that could equally be fulfilled by an innexin? We suggest that looking at the early chordate evolution may solve this puzzle. The chordates are comprised of three subphyla: the lancelets, the tunicates, and the vertebrates. The lancelets represent the most basal chordate lineage that diverged before the split between tunicates and vertebrates (Putnam et al., 2008). The vertebrates split into the jawless fish (lampreys), the most ancient vertebrate group (Smith et al., 2018), and the jawed vertebrates (Figure 3B). As our previous analysis revealed, the lancelets and the lampreys as well as most of the tunicates have only one innexin. In the jawed vertebrates, we find three innexins (called pannexins) in each single species. This is expected from the two whole-genome duplications at the early vertebrate lineage leading first to Pannexin-2 and afterwards to Pannexin-1 and Pannexin-3 (Abascal et al., 2013; Fushiki et al., 2010). Only teleost fishes have a fourth pannexin generated by a whole-genome duplication event during the teleost evolution (Bond et al., 2012; Glasauer et al., 2014). The limited genetic diversity is thus in strong contrast to the rich innexin diversity within the non-chordate phyla (Figure 1C) (Abascal et al., 2013; Hasegawa et al., 2014; Moroz et al., 2014). Moreover, we identified extracellular NGSs in each of the innexins of lancelets, tunicates, and lampreys (Figure 2A, B and Figure 3A). The innexin sequences of these groups are less conserved compared to those of jawed vertebrates and the position of extracellular NGSs are different in lancelets and tunicates. The most important finding is that the sequence of the only innexin of lancelets, which do not yet express connexins (Mikalsen et al., 2021; Slivko-Koltchik et al., 2019) (Figure 3D), contains a NGS in its extracellular loop 1. This suggests that the most basal chordates not only had a limited number of innexins but might also not be able to form functional gap junctions. Interestingly, at this time of the chordate evolution, the connexins arose de novo (Figure 1B-D) and developed into diverse gene families (up to 22 connexins in mammals and 46 connexins in bony fish) (Mikalsen et al., 2021).
Based on our findings, we propose that a bottleneck effect at the origin of chordates might have been crucial for the evolution of the novel connexins (Figure 3E). In this evolutionary scenario, innexins were recruited as gap junction proteins in the common cnidarian/bilaterian ancestor. While the innexins functionally diverge in cnidarians and protostomes, the last common ancestor of the deuterostomes had lost all diverse innexins and retained only one that presumably was glycosylated and did not form gap junction channels (Figure 3B). The high conservation of NGSs in the vertebrate innexins that we describe here (Figure 2C-F), their expression in every organ (Penuela et al. 2014) and their association with a variety of diseases (Esseltine et al., 2016; Penuela et al., 2014) suggest that the non-junctional innexin channels already served essential physiological functions in the basal chordates and could not be converted into gap junctions. The loss of innexin diversity on the one hand and the strict conservation of the NGSs in the remaining innexin could thus explain rather simply why the connexin family arose de novo and why it became the exclusive gap-junction protein in all deuterostomes although innexin-based gap junctions would have been fully capable to serve all functions (Baker et al., 2014; Bao et al., 2007; Bhattacharya et al., 2019; Lane et al., 2018; Liu et al., 2016; Phelan et al., 2001; Skerrett et al., 2017; Welzel et al., 2018; Yaksi et al., 2010) as they do so successfully in the sophisticated nervous systems of invertebrates (Calabrese et al., 2016; Hall, 2017; Kristan et al., 2005; Marder et al., 2005; Otopalik et al., 2019).
Materials and Methods
Database searches
We used public databases to collect innexin amino acid sequences of chordate and non-chordate species. The taxonomic groups that we have analyzed in this study were constrained by the availability of publicly available genomic data. We screened for innexin proteins across multiple taxa by using diverse sequences of the innexin family (PF00876) as sequence queries in BLAST searches. All retrieved sequences were further assessed and only innexin sequences that fulfilled all the following properties were included into our analyses: (1) The sequence was already assigned to the innexin family (PF00876) or a reciprocal BLAST with the sequence hit as query against the UniProt database identified a known innexin sequence as a top hit; (2) The sequence is predicted to contain four transmembrane domains that are connected by two extracellular and one intracellular loop as well as an intracellular N- and C-terminus (see Figure 1A). To clarify this, we used the TMHMM Server v2.0 (http://www.cbs.dtu.dk/services/TMHMM/) to predict membrane topology; (3) The sequence is not fragmented or a duplicate entry. In total, we retrieved 1405 innexin protein sequences of seven non-chordate groups (phylum ctenophores, phylum cnidarians, phylum molluscs, phylum annelids, phylum plathyhelminthes, phylum nematodes and phylum arthropods) and 865 sequences of nine chordate groups (subphylum lancelets, subphylum tunicates, class lampreys, class cartilaginous fish, superclass bony fish, class amphibians, class reptiles, class birds and class mammals). All innexin sequences of molluscs, annelids, plathyhelminthes, nematodes, arthropods, cartilaginous fish, bony fish, amphibians, reptiles, birds, and mammals were obtained from the protein databases at NCBI (http://www.ncbi.nlm.nih.gov) and UniProt (http://www.uniprot.org). The innexin sequences of the ctenophore species were obtained from the Neurobase genome database (http://neurobase.rc.ufl.edu/Pleurobrachia). The innexin sequences of the cnidarian species were obtained from UniProt and the Marimba genome database (http://marimba.obs-vlfr.fr). The LanceletDB database (http://genome.bucm.edu.cn/lancelet) was used to retrieve innexin sequences of lancelets. The innexin sequences of the tunicate species were obtained from NCBI and the ANISEED database (https://www.aniseed.cnrs.fr). The innexin sequences of lampreys were retrieved from the NCBI and the SIMRBASE database (https://genomes.stowers.org). The full list of species and taxa, along with accession numbers and links to the corresponding databases can be found in Figure 1-source data 1 and Figure 2-source data 1.
Identification of potential N-glycosylation sites (NGS)
To identify potential N-glycosylation sites within the extracellular loops of the non-chordate and chordate innexins, we generated 16 multiple sequence alignments for each taxonomic group (seven non-chordate and nine chordate groups). For each group, we first imported all innexin protein sequences of each species into the Jalview software (version 2.11.1.4) (Waterhouse et al., 2009). The innexin sequences obtained from the UniProt database were automatically retrieved into Jalview by the UniProt sequence fetcher. The sequences obtained from other databases were manually added to Jalview. After aligning the innexin sequences with ClustalW (Thompson et al., 1994), the resulting multiple sequence alignments of each group were used to identify potential N-glycosylation consensus sites (NGS) in the extracellular domains of each innexin protein. NGSs in innexins were predicted by the NetNGlyc 1.0 Server (http://www.cbs.dtu.dk/services/NetNGlyc/) that uses an artificial neural network to examine the sequence context of the N-X-S/T motif. We used the following criteria to include the NGS into our analyses: (1) X in the N-X-S/T motif could be any AA except proline; (2) the potential score was > 0.5 and the agreement between the nine artificial neural networks was ≥ 5/9; (3) the NGS was located in extracellular loop 1 or 2. The positions of all extracellular N-glycosylation sites are reported in Figure 1-source data 1 and Figure 2-source data 1.
Phylogenomic tree construction
We visualized the incidence of innexins with N-glycosylation motif in their extracellular domains within different taxonomic groups by using phylogenetic trees. To generate a phylogenetic tree of the 1398 non-chordate and the 865 chordate innexins, respectively, we first created two global alignments including the available alignments of the seven non-chordate or the nine chordate groups. Both alignments were generated using MEGA version X (Kumar et al., 2018; Stecher et al., 2020) with the default parameters of ClustalW. Both multiple sequence alignments were then processed by the G-blocks server (http://molevol.cmima.csic.es/castresana/Gblocks.html) (Castresana, 2000) to automatically detect and remove poorly aligned, nonhomologous, and excessively divergent alignment columns. We reconstructed a phylogenetic tree of the non-chordate and the chordate innexins, respectively, by using the raxmlGUI 2.0 software (Edler et al., 2021). Before the phylogenetic analyses, ModelTest-NG (Darriba et al., 2020) was run on the two trimmed alignments with the default parameters to determine the best probabilistic model of sequence evolution. Both phylogenetic trees were built using the maximum likelihood (ML) method based on the JTT model and 100 bootstrap replications. The phylogenetic trees of chordate and non-chordate innexins were visualized, edited and annotated with iTOL v5 (https://itol.embl.de) (Letunic et al., 2021).
Data availability
All data generated or analyzed during this study are included in the manuscript and in the supporting files.
Acknowledgements
We thank Antje Halwas for generating the multiple sequence alignments and Andreas Möglich for inspiring discussions.