Evolutionary analyses reveal independent origins of gene repertoires and structural motifs associated to fast inactivation in calcium-selective TRPV channels

Important for calcium homeostasis, TRPV5 and TRPV6 are calcium-selective channels belonging to the transient receptor potential (TRP) gene family. In this study, we investigated the evolutionary history of these channels to add an evolutionary context to the already available physiological information. Phylogenetic analyses revealed that paralogs found in mammals, sauropsids, amphibians, and chondrichthyans, are the product of independent duplication events in the ancestor of each group. Within amniotes, we identified a traceable signature of three amino acids located at the amino-terminal intracellular region (HLH domain). The signature correlates well with both the duplication events and the phenotype of fast inactivation observed in mammalian TRPV6 channels. Electrophysiological recordings and mutagenesis suggest that calcium-induced fast inactivation represents an evolutionary innovation that emerged independently after gene duplication.


Introduction
Calcium ions (Ca 2+ ) are essential to maintain cellular life; as a consequence, Ca 2+ homeostasis is highly controlled in all living organisms. TRPV5 and TRPV6 are ion channels that are highly calcium-selective and are members of the Transient Receptor Potential (TRP) gene family (Van Abel et al., 2005). These proteins, expressed at the apical membrane of Ca 2+ transporting epithelia, serve as entry channels in transepithelial Ca 2+ transport (Van Abel et al., 2005). As part of the molecular machinery controlling the passage of calcium ions through epithelial barriers, these channels are essential for calcium homeostasis in vertebrates (Bianco et al., 2007;Hoenderop et al., 2003;Van Abel et al., 2005).
The biophysical features of TRPV5 and TRPV6 make them ideally suited to facilitate intestinal absorption and renal reabsorption of Ca 2+ in mammals (Peng et al., 2017). In agreement with their importance, TRPV6 deficient mice have impaired Ca 2+ homeostasis, evidenced by poor weight gain, decreased bone mineral density, and reduced fertility (Bianco et al., 2007). In contrast, TRPV5 null mice have less severe physiological consequences .
The calcium selectivity of TRPV5 and TRPV6 is accompanied by different mechanisms of Ca 2+dependent inactivation (Hoenderop et al., 2001), which controls Ca 2+ entry during the first steps of transcellular transport (Van Abel et al., 2005). One major difference between TRPV5 and TRPV6 rests in how they are distributed among tissues. TRPV5 is predominantly expressed in the distal convoluted tubules (DCT) and connecting tubules (CNT) of the kidney. In contrast, TRPV6 is more broadly expressed; it is found not only in the Ca 2+ absorbing epithelia of the intestine but in placenta, pancreas, and prostate among others. This suggests that the physiological role of TRPV6 is not limited to the maintenance of Ca 2+ homeostasis (Peng et al., 2017).
Both TRPV5 and TRPV6 channels have been described in mammals, yet there is only one calcium-selective TRP channel encoded in the genomes of bony fish (TRPV6) and birds (TRPV6) (Peng, 2011;Qiu and Hogstrand, 2004). It has been suggested that TRPV5 and TRPV6 originated by gene duplication in the last common ancestor of mammals, between 312 and 177 million years ago (Peng, 2011;Saito et al., 2011). Besides these observations, not much else is known about the evolutionary history of the calcium-selective TRP channels in vertebrates. In principle, comparative analysis of calcium-selective TRP channels in representative species of vertebrates should provide valuable information regarding the mode of evolution of genes that are required for calcium homeostasis. Phylogenetic analyses in combination with functional assays should allow inferences regarding their duplicative history, the evolution of function and expression, as well as, the ancestral state of all vertebrates.

Results and Discussion
To understand the evolutionary history of the TRPV5 and TRPV6 genes, we first reconstructed a phylogenetic tree including sequences from species of all major groups of vertebrates.
According to our maximum likelihood analyses, the monophyly of the group that includes TRPV5 and TRPV6 sequences from vertebrates is strongly supported (Fig. 1). Synteny analyses provide further support for the monophyly of the clade as genes encoding TRPV5 and TRPV6 are located in a conserved genomic region (Fig. 2). A close inspection of the alignment also supports the monophyly of the clade as several conserved residues can be easily mapped throughout the alignment and gaps introduced by the outgroups (TRPV1-4) are not shared by any of the TRPV5 and TRPV6 proteins (Supplementary Figure 1). Thus, phylogenetic, synteny, and primary sequence analyses provide strong evidence for the monophyly of the TRPV5 and TRPV6 channels in vertebrates.
Within the TRPV5/TRPV6 clade, our phylogenetic analyses suggest not a single but four independent expansion events during the evolutionary history of vertebrates (Fig. 1). The ancestral condition of a single gene copy (referred to here as TRPV5/6) is present in cyclostomes, bony fish, and coelacanths ( Fig. 1). However, in chondrichthyans (e.g. sharks, rays, and chimaeras), the ancestral gene underwent a duplication event, giving rise to an extra copy ( Fig. 1). Among tetrapods, our analysis suggests that the repertoire present in mammals, sauropsids, and amphibians originated independently in the ancestor of each group (Fig. 1).
Amphibians possess the most diverse repertoire of all vertebrates where four well-supported gene lineages were recovered (Fig. 1). These likely originated in the ancestor of the group that can be traced back to the carboniferous period. Among mammals, the single gene copy present in the ancestor underwent a duplication event, originating a repertoire of two genes (TRPV5 and TRPV6), which have been retained in all major groups (Fig. 1). The case of sauropsids is less straightforward. The single gene copy present in the ancestor of the group also underwent a duplication event originating a repertoire of two genes ( Fig. 1). However, during the radiation of the group, TRPV5 was retained in all major lineages whereas TRPV6 was lost in birds ( Fig. 1). A comparison of the chicken and crocodile genomic regions provides further support that the gene present in birds corresponds to TRPV5 and not to TRPV6 as it has been previously suggested (Peng, 2011;Saito et al., 2011) as it shows that birds still possess some exons of the TRPV6 gene in their genomes (supplementary fig. 2). Thus, our results provide strong evidence that essential genes required for calcium homeostasis have been duplicated four independent times during the evolutionary history of vertebrates. This observation implies that TRPV5 and TRPV6 genes in mammals, sauropsids, amphibians, and chondrichthyans are not 1:1 orthologs.
From a physiological perspective, our results suggest that the division of labor needed for Ca 2+ absorption and reabsorption has occurred independently in mammals, sauropsids, amphibians, and chondrichthyans. In mammals this phenomenon is well characterized, while TRPV6 helps in Ca 2+ absorption, TRPV5 is mainly dedicated to calcium re-absorption in the kidney (Peng et al., 2017;Van Abel et al., 2005). This correlates well with their expression in tissues; while TRPV6 channels are broadly expressed in a variety of tissues (including the kidney, intestine, placenta, pancreas, and prostate among others), TRPV5 is restricted to the distal convoluted tubule and connecting tubule of the kidney (Peng et al., 2017;Van Abel et al., 2005).
In contrast to mammals, much less is known regarding the expression of these genes in other vertebrates; thus, we characterized the transcription profiles of TRPV5 and TRPV6 in representative species of vertebrate (Fig. 3). According to our results, and in support of previous observations in zebrafish (Vanoevelen et al., 2011), marine vertebrates mostly express calciumselective TRPV channels in their gills, no matter the number of gene copies present in their genomes. The case of chondrichthyans suggests that the expansion of the gene repertoire did not give rise to a pattern of division of labor rather served to increase the number of channels expressed in the gills. Only one of the four gene lineages present in the genome of amphibians is highly expressed in the kidney (Fig. 3), suggesting an association with early adaptations to a terrestrial lifestyle (Mahasen, 2016). In reptiles, both lineages are expressed in the kidney, however TRPV6 was detected in testis as well (Fig. 3). Finally, birds represent an interesting case study as they retained only one of the copies that were present in their ancestor (i.e. TRPV5). As seen in other amniotes, the TRPV5 gene of birds is mainly expressed in the kidney (Fig. 3). Thus, our results show that the expression of calcium-selective TRPV channels in the gills is the ancestral condition of vertebrates. Moreover, it seems that together with the conquest of land, the repertoire of genes not only expanded but also divided the territories of expression. While one of the copies is mainly expressed in kidneys, the other is broadly expressed in a variety of tissues.
The division of labor observed in calcium-selective TRP channels has been accompanied by subtle but important functional adaptations. Given negative membrane potentials, calciumselective TRP channels permeate calcium ions to the cytoplasm. Together with this, they also exhibit two-phase calcium-dependent inactivation (Hoenderop et al., 2001). The slow component of this inactivation is shared by both channels and is determined by the binding of the Ca 2+ -Calmodulin complex to the C-terminal region of the channel (Hoenderop et al., 2001;Nilius et al., 2003;Singh et al., 2018). In contrast, in physiological conditions, rapid inactivation that depends solely on calcium ions is observed only in TRPV6 channels (Hoenderop et al., 2001;Nilius et al., 2002). In support of our evolutionary analyses, whole cell recordings in the presence of 2mM extracellular Ca 2+ show an absence of fast inactivation in chicken channels (chTRPV5) while this is normally seen in human TRPV5 (hTRPV5) channels (Fig. 4A). Thus, functional and evolutionary evidence strongly suggest that birds retained TRPV5 in their genomes.
Previous studies have identified residues located at the S2-S3 linker (Nilius et al., 2002) and downstream of the transmembrane segment S6 (Suzuki et al., 2002) as important for the rapid inactivation of TRPV6 channels. A close inspection of the S2-S3 linker showed the presence of a loose signature that mirrors with the duplication events that occurred in mammals and, to some extent, reptiles and birds (Fig. 4B). According to the recently published structures (Hughes et al., 2018;Saotome et al., 2016), the S2-S3 loop seems to be conveniently adjacent to the TRP domain helix (TD-helix) and close to an N-terminal helix-loop-helix (HLH) motif. This is interesting given that the TD-helix is a known modulator of TRP channel activity (Gregorioteruel et al., 2015;Valente et al., 2008) and together with HLF forms a structural triad (Fig. 4C). This triad is supported by a cluster of interactions that have been previously implicated in the modulation of TRP channel activity (Liao et al., 2013;Rosasco and Gordon, 2017;Steinberg et al., 2014). A closer inspection of the alignment used for our phylogenetic reconstruction revealed a second and stronger sequence signature that correlates well with the observed gene duplication events (Fig. 4D). The signature is composed of three amino acids located at the HLH domain in mammals, reptiles, and birds. The presence of these signatures, at two distant locations of the primary sequence and in channels that have originated independently, suggest a strong functional association between the S2-S3 loop and the HLH domain. To test this prediction, we transferred this signature observed at the HLH of the TRPV6 channels (inactivation signature 1, IS1) into the TRPV5 channels and describe the inactivation profile by means of whole-cell patch clamp electrophysiological recordings in the presence of 2mM extracellular Ca 2+ . In agreement with our prediction, engineered mammalian and avian TRPV5 channels now containing the three-residue inactivation signature of TRPV6 showed a phenotype of calcium-dependent fast inactivation (Figs. 4E and 4F). Furthermore, the process of fast-inactivation is absent in the TRPV5/6 channel of the spotted gar (Lepisosteus oculatus; gTRPV5/6) ( Fig. 4G and 4H), suggesting that the absence of fast inactivation (i.e the TRPV5 phenotype) represents the ancestral condition of vertebrates. Similar to our previous experiment, introducing IS1 in the single copy gene of the spotted gar was enough to transfer the fast inactivation phenotype to the fish channel ( Fig. 4G and 4H). The anole lizard (Anolis carolinensis, aTRPV5) represents a valuable opportunity provided by a natural mutant to understand the molecular basis of the inactivation. In this case, the amino acid signature at the S2-S3 linker (inactivation signature 2, IS2) suggests that the anole lizard TRPV5 channel should be capable of inactivation (Fig. 4B), whereas IS1 suggests the opposite (Fig. 4D). Electrophysiological recordings showed that while the anole lizard TRPV6 (aTRPV6) channel is not capable of inactivation (in contrast to that of mammals), aTRPV5 elicits modest inactivation where nearly 20% of the residual current is lost at steady states ( Fig.   4G and 4H). We then explored whether this modest inactivation was correlated with the "inverted" IS2 signature present in the anole lizard TRPV5 channel. By reversing the natural oddity to what the analysis suggested should be the correct signature for lack of inactivation (I415V, F420Y); we were able to completely suppress inactivation ( Fig. 4G and 4H). Overall, this confirm previous observations where mutations introduced in the S2-S3 linker of the mammalian TRPV6 channels significantly delayed inactivation (Nilius et al., 2002). On the other hand, the introduction of IS1 in the anole lizard TRPV5 channel produced a robust inactivation phenotype similar to the mammalian counterpart regardless of the presence of the inactivation signature at the linker ( Fig. 4G and 4H). Thus, our results strongly suggest that the coordinated activity of both the HLH motif and the S2-S3 linker determine the phenotype of fast inactivation observed in TRPV6 channels.
To explore whether changes in the elements of the HLH/S2-S3 linker/TD-helix triad correlated with the phenotypes observed, we studied the primary sequences in these three different protein regions. We observed that both the HLH and the S2-S3 linker seem to be more conserved in mammals and sauropsids and more variable in amphibians and fish (Fig. 5A). In contrast, the consensus sequence corresponding to the TD-helix is relatively conserved in all examined groups, as expected for bona fide TRPV channels (Fig. 5A). In mammals and sauropsids, the channels possess both inactivation phenotypes, and coincidently the HLH domain has a scaffold of conserved amino acids around the critical residues modulating the fast inactivation phenotype (Fig. 5A, green box, blue asterisks). This scaffold is different from that observed in amphibians and fish ( Figure 5A, green box). Moreover, the signature residues aspartic acid (position 8) and valine (position 12) are well conserved in amphibians and fish suggesting that developing fast inactivation is not possible (Fig. 5A, blue asterisks). On the other hand, a similar but less strong pattern is detectable at the S2-S3 linker. Here, mammals and sauropsids have a more conserved set of amino acids around the residues defining the phenotype of fast inactivation (Fig. 5A, yellow box, pink asterisks). Moreover, in amphibians and fish, there is conservation of just one of the linker's signature amino acids associated with fast inactivation (tyrosine at position 12). Taken together, our analyses suggest that the S2-S3 linker's ability to modulate coordination among elements of the structural triad probably precedes the phenotype of fast inactivation itself ( Fig. 4B and Fig. 5A). Additionally, the necessary requirement to accomplish calcium-dependent fast inactivation was to engineer the HLH domain in one of the duplicated genes, a scenario we can only observe in contemporary species that live in terrestrial environments.
Amino acids at or close to the HLH region have been implicated in the modulation of temperature-activation in TRPV1 and TRPV3 (Liu and Qin, 2017;Yao et al., 2011) and chemical-activation in TRPA1 (Saito et al., 2014). At the same time, the region close to the S2-S3 linker has been shown to have coordinated calcium ions in TRPM4 and TRPM2 (Autzen et al., 2018;Huang et al., 2018). To explore whether the HLH/S2-S3 linker/TD-helix triad might function as a calcium-binding region in TRPV5 and TRPV6 channels, we used the structural information publically available to perform a more detailed analysis of the region. An initial inspection of the electrostatics of the soluble domains of the channels suggests the presence of electronegative cavities close to the S2-S3 linker and the HLH domain (Fig. 5B). Moreover, these electronegative regions are dense in oxygen atoms, suggesting the presence of putative calcium binding sites (Fig. 5C). Thus, we performed short (100 ns) molecular dynamics simulations with 150 mM Ca 2+ in solution to explore whether these cavities, around the region defined by the structural triad, are able to coordinate calcium ions. Simulations performed in TRPV5 (rbTRPV5, PDB: 6B5V (Hughes et al., 2018)) and TRPV6 (rTRPV6, PDB: 5IWK (Saotome et al., 2016)) allowed us to identify three putative calcium-binding sites associated with the HLH domain/S2-S3 linker/TD-helix region ( Figure 5D). Although these putative calcium-binding sites share amino acid positions between rbTRPV5 (E288, E289, D406) and rTRPV6 (D287, D288, D405), the number of amino acids coordinating calcium is larger in rTRPV6 ( Fig. 5D;   supplementary figure 3). Our simulations suggest that the channels might rearrange differently upon calcium binding, thus originating differences in the relative position of the S2-S3 linker with respect to the TRP domain helix (supplementary videos 1 and 2). Taken together, our structural analysis suggests that the coordinated activity of the S2-S3 linker and HLH domain evolved to originate fast inactivation by introducing innovations at the HLH domain that direct calcium binding and reshape the interactions of the triad. Additionally, the degrees of freedom of the S2-S3 linker are likely important in modulating the state of the channel gate.
In summary, our study shows that the evolutionary history of TRPV5 and TRPV6 channels in vertebrates is dynamic. We propose that the ancestral condition of vertebrates included a single copy gene that encoded a non-fast inactivation channel. This gene could have had properties similar to the ones observed in contemporary species of cyclostomes, bony fish, and coelacanths, including high selectivity for divalent ions (Qiu and Hogstrand, 2004). This would suggest that calcium selectivity in these channels precedes calcium-dependent fastinactivation. During the evolutionary history of vertebrates, chondrichthyans, amphibians, sauropsids, and mammals independently originated a more diverse repertoire of calcium-selective TRP channels. The independent origin of genes seems not to be uncommon in nature and implies that gene repertoires are not directly comparable, as they do not have the same evolutionary origin (Opazo and Zavala, 2018). It has been shown that the convergence of phenotypes could be reached by different pathways (Natarajan et al., 2016), thus the reinvention of these calcium-selective TRP channels opens an opportunity to understand in more detail the diversity of mechanisms associated with transepithelial Ca 2+ transport. In terrestrial vertebrates, the origin of new lineages of TRPV channels is associated with division of the expression territories and origin of the fast inactivation phenotype. Interestingly, the change in the pattern of expression not only correlates with a clear expansion to the kidney and the intestine where calcium absorption/reabsorption takes place but coincides with the environmental conditions that the species inhabit. Among amniotes, duplicated copies are differentiated by specific amino acid substitutions in both the HLH domain and the S2S-S3 linker, and these amino acid substitutions could be strongly correlated with the fast inactivation phenotype. The presence of conserved residues in the HLH domain leads to differences in calcium coordination and interdomain interactions within the structural HLH domain/S2-S3 linker/TD-helix triad, highlighting the significance of the region for TRP channel function. Subtle differences in the coordination of this triad might extend to other channels of the TRP family and might represent one of the molecular cornerstones of TRP channel inner workings.

Sequence data and phylogenetic analyses
We retrieved calcium-selective TRP channel sequences from representative species of all major groups of vertebrates. Our sampling included species from mammals, birds, reptiles, amphibians, coelacanths, holostean fish, teleost fish, chondrichthyans and cyclostomes (Supplementary Table   1). Protein sequences were obtained from the Orthologous MAtrix project (OMA) (Altenhoff et al., 2018). In cases where the species were not included in the OMA project, we searched the NCBI database (refseq_genomes, htgs, and wgs) using tbalstn (Altschul et al., 1990) with default settings. Protein sequences were aligned using the FFT-NS-1 strategy from MAFFT v.7 (Katoh and Standley, 2013). We used the proposed model tool of IQ-Tree v1.6.6 (Trifinopoulos et al., 2016) to select the best-fitting model of amino acid substitution (JTT + F + R6). We employed maximum-likelihood to obtain the best tree using the program IQ-Tree v1.6.6 (Trifinopoulos et al., 2016). Node support was assessed with 1000 bootstrap pseudoreplicates using the ultrafast routine. TRPV1, TRPV2, TRPV3, TRPV4 and TRPA1 were used as outgroups.

Assessment of Conserved Synteny
We examined genes found upstream and downstream of the TRPV5 and TRPV6 genes in species representative of all major groups of vertebrates. We used estimates of orthology and paralogy derived from the EnsemblCompara database (Herrero et al., 2016); these estimates were obtained from an automated pipeline that considers both synteny and phylogeny to generate orthology mappings. These predictions were visualized using the program Genomicus v93.01 (Louis et al., 2014). Our assessments were performed in humans (Homo sapiens), chicken (Gallus gallus), American alligator (Alligator mississippiensis), Chinese softshell turtle (Pelodiscus sinensis), anole lizard (Anolis carolinensis), African clawed frog (Xenopus laevis), coelacanth (Latimeria chalumnae), spotted gar (Lepisosteus oculatus), zebrafish (Danio rerio), and Callorhinchus milii, the elephant fish, which is sometimes labeled as elephant shark. In the case of the elephant fish, and American alligator, flanking genes were examined using the Entrez gene database from NCBI (Maglott et al., 2011).
RNASeq data from a spectrum of tissues including brain, heart, intestine, kidney, liver, muscle, ovary, testis, gills and skin, were gathered from the Short Read Archive (SRA). Accession numbers for species and tissue specific libraries can be found in Supplemental Table 2. For all species except the elephant shark and the African clawed frog, we used Ensembl predicted gene sequences for gene expression references. Elephant fish and X. laevis cDNA sequences were collected from NCBI. To remove redundancy from libraries, all predicted TRPV5 and TRPV6 transcripts were removed from each library and replaced with our annotations. Adapter sequences were removed from the RNASeq reads using Trimmomatic 0.38 (Bolger et al., 2014), and reads were filtered for quality using the parameters HEADCROP:1, LEADING:30, TRAILING:30, SLIDINGWINDOW:5:30, and MINLEN:50. We used RSEM 1.3.1 (Li and Dewey, 2011) to map reads with Bowtie 1.2.2 (Langmead et al., 2009) and to estimate gene expression in units of TPM.

Electrophysiology and solutions
Whole-cell currents were measured with an Axopatch-200B amplifier. We used borosilicate pipettes (o.d. 1.5mm, i.d. 0.86mm, Warner Instruments, Hamden, CT) with resistance of 2 to 4.0 M to form seals with access resistance between 2 and 6 G . Macroscopic currents were recorded in response to a voltage step protocol from zero to -160 mV. Inactivation was analyzed in a time window of 60 ms. Recordings were digitized at 10KHz and filtered at 5KHz using a digitada 1320 (Molecular Devices, LLC, California). The analysis was performed using Clampfit 10.3 (Molecular Devices, LLC, California). Fast inactivation was assessed by computing the residual currents, defined as the ratio of the current value at the end of the negative pulse over the current at the beginning of the voltage pulse (Derler et al., 2006). The standard extracellular solution (Ringer-Na + ) contained (in mM) 140 NaCl, 5 KCl, 2 CaCl 2 , 2MgCl 2 , 8 glucose and 10 HEPES, at pH 7.4 adjusted with NaOH. The standard internal (pipette) solution contained (in mM) 105 CsF, 35 NaCl, 10 EGTA and 10 HEPES, at ph 7.4 adjusted with CsOH. All of the experiments were performed at room temperature (20 to 25ºC).

Statistical Analysis
Data are expressed as mean ± S.E. Overall statistical significance was determined using analyses of variance (ANOVA one way and two ways) with a Bonferroni post-test. For all conditions, the average was obtained from at least seven independent experiments. Outliers were defined using GraphPad QuickCalcs (https://www.graphpad.com/quickcalcs/Grubbs1.cfm), and outliers were removed from the analysis.

Bioinformatics and Molecular Dynamics Simulations
Three-dimensional structures of rat TRPV6 (rTRPV6, PDB: 5IWK) (Saotome et al., 2016) and rabbit TRPV5 (rbTRPV5, PDB: 6B5V) (Hughes et al., 2018) were prepared at pH 7.0 and subjected to energy minimization in a vacuum using the Maestro-Schrödinger suite (Schrödinger Release 2018-2: Maestro, Schrödinger, LLC, New York, NY, 2018. The channels were embedded into a pre-equilibrated POPC lipid bilayer in an orthorhombic box with periodic borders. This was filled with Single Point Charge (SPC) water molecules and an ionic concentration of 150 mM Ca 2+ and Clions. Full atom Molecular Dynamics (MD) simulations of 100ns were performed in triplicate in a NPT ensemble (P = 1 atm, T = 310 K) using Desmond (Bowers et al., 2006) andOPLS v.2005 force field. Structures were collected every 0.12 nanoseconds during the MD simulations; thus, 835 frames were obtained per simulation. The electrostatic potential of the region of interest was calculated using the ABPS plugin (version 1.3) of VMD (Humphrey et al., 1996). Protein residues interacting with calcium ions were identified taking into account a maximal distance of 2.5 Å and the three simulations together.