Abstract
The Channichthyidae family (icefish) are the only known vertebrate species to be devoid of haemoglobin. Mitochondrial changes such as tight coupling of the mitochondria have facilitated sustained oxygen and respiratory activity in the fish. This makes it important to appreciate features in the sequence and structure of the proteins directly involved in proton transport, which could have physiological implications. ATP synthase subunit a (ATP6) and subunit 8 (ATP8) are proteins that function as part of the F0 component (proton pump) of the F0F1complex. Both are encoded by the mitochondrial genome and involved in oxidative phosphorylation. To explore mitochondrial sequence variation for ATP6 and ATP8 we have gathered sequences and predicted structures of these two proteins of fish from the Notothenioidei sub-order, a sub-Antarctic species. We compared these with seven other vertebrate species in order to reveal whether there might be physiologically important differences that can help us to understand the unique biology of the icefish.
Introduction
The oceans which surround Antarctica, and their sub-zero temperatures, are home to fish of the suborder Notothenioidei - a prime example of a marine species flock. This suborder comprises four families, three of which include only sub-Antarctic species (Bovichtidae, Pseudaphritidae, Eleginopsidae) and the Nototheniidae that encompasses all Antarctic and secondary non-Antarctic species, and are sometimes referred as Cryonotothenioidea[1–3].
Notothenioids are renowned for their physiological adaptations to cold temperatures. This includes the ability to synthesise antifreeze glycoproteins (AFGP) and antifreeze-potentiating proteins (AFPP)[4]. The capacity to synthesise antifreeze glycopeptides (AFGPs) is a biochemical adaptation that enabled the Notothenioidei to colonize and thrive in the extreme polar environment[5]. These proteins are largely composed of a Thr-Ala-Ala repeat with a conjugated disaccharide via the hydroxyl group of the Thr residue and reduce the freezing point of the animals internal fluids[6,7].
Within the Nototheniidae family the subfamily Channichthyinae, also known as icefish, are remarkable due to the absence of haemoglobin and, in some species, myoglobin too[8–10]. The sub-zero temperatures of the water they inhabit allow the highest levels of oxygen solubility, which is suggested to facilitate their survival despite the loss of globin proteins[10].
Myoglobin is absent in the oxidative skeletal muscle in all icefish, but the absence of myoglobin in cardiac muscle has been reported in only six of the sixteen species of the Channichthyinae subfamily[11,12]. While the molecular genetics of how myoglobin expression has been lost have been studied, the physiological differences between those that express and those that do not express myoglobin are not fully understood. Small intracellular diffusion distances to mitochondria and a greater percentage of cell volume occupied by mitochondria are two evolutionary adaptations that can compensate for the absence of myoglobin[13,14]. In the particular case of Champsocephalus gunnari, the mRNA transcript of myoglobin is present in the cardiac tissue but a 5-bp frameshift insertion hinders the synthesis of protein from the mRNA transcript[11,15].
Notothenioidei have high densities of mitochondria in muscle cells, versatility in mitochondrial biogenesis and a unique lipidomic profile[16–18]. These features have also been hypothesised to facilitate sustained oxygen consumption and respiratory activity in the absence of haemoglobin and myoglobin.
Complex V of the electron transport chain, ATP synthase, is responsible for the production of intracellular ATP from ADP and inorganic phosphate. Composed of an F0 and F1 component, the F0 component is responsible for channelling protons from the intermembrane space across the inner mitochondrial membrane and into the mitochondrial matrix[19–21]. The rotation of the c-ring in F0, and with this the γ-subunit of the central stalk, facilitates the translocation of protons across the inner mitochondrial membrane that ultimately drives the catalytic mechanism of the F1 component[22,23].
The motor unit F0, embedded in the inner membrane of mitochondria, is composed of subunits b, OSCP (oligomycin sensitivity conferring protein), d, e, f, g, h, i/j, k which are encoded by nuclear genes and subunits a (ATP6) and 8 (ATP8), which are encoded by mitochondrial genes[24]. Despite the structure of the complex having been first resolved decades ago, and hypotheses of the chemical mechanism were developed over half a century ago, significant breakthroughs continue to be made in our understanding of both the structure and function of the enzyme and its F0 component[25–28].
Both ATP synthase subunit a (ATP6) and subunit 8 (ATP8) are proteins that function as part of the F0 component of ATP synthase, encoded by genes that overlap within the mitochondrial genome[29]. This overlap is over a short, but variable between species, base pair sequence where the translation initiation site of subunit 8 is contained within the coding region of subunit 6.
The peripheral stalk is a crucial component of the F0 component forming a physical connection between the membrane sector of the complex and the catalytic core. It provides flexibility, aids in the assembly and stability of the complex, and forms the dimerization interface between ATP synthase pairs[30]. ATP8 is an integral transmembrane component of the peripheral stalk, serving an important role in the assembly of the complex[31]. The C-terminus of ATP8 extends 70 Å from the surface of the makes contacts with subunits b, d and F6, while the N-terminus has been reported to make connections with subunits b, f and 6 in the intermembrane space[32,33]. Subunit 8 is also known to play a role in the activity of the enzyme complex[34].
ATP6 is an α-helical protein embedded within the inner mitochondrial membrane and it interacts closely with the c-ring of F0, providing aqueous half-channels that shuttle protons to and from the rotating c-ring[20,35]. It has previously been reported that ATP6 has at least five hydrophobic transmembrane spanning α helices domain, where two of the helices h4 and h5 are well conserved across manyspecies[36].
Proteins coded by mitochondrial DNA (mtDNA) are involved in oxidative phosphorylation and can directly influence the metabolic performance of this pathway. Evaluating the selective pressures acting on these proteins can provide insights in their evolution, where mutations in the mtDNA can be favourable, neutral or harmful. The amino acid changes can cause inefficiencies in the electron transfer chain, causing oxidative damage by excess production of reactive oxygen species and eventually interrupting the production of mitochondrial energy. Due to the tight coupling of icefish mitochondria relative to their red-blooded relatives, any changes in the structure of ATP Synthase subunits, particularly those directly involved in the transport of protons across the membrane, could result in significant physiological outcomes[37].
In this work, we combine sequence analyses and secondary structure prediction analyses to explore mitochondrial genetic variation for ATP6 and ATP8 in the Notothenioidei suborder species Champsocephalus gunnari (C. gunnari), Chionodraco rastrospinosus (C. rastrospinosus), Chaenocephalus aceratus (C. aceratus), Notothenia coriiceps (N. coriiceps), Trematomus bernacchii (T. bernacchii), the sub-Antarctic Eleginops maclovinus (E. maclovinus), and compare this with seven other vertebrate species. These include: Nothobranchius furzeri (turquoise killifish N. furzeri, Cyprinodontiformes), Danio rerio (zebrafish, D. rerio, Cypriniformes), the lizard Anolis carolinensis (A. carolinesis, Squamata), the domestic guinea pig Cavia porcellus (C. porcellus), the Bowhead whale Balaena mysticetus (B. mysticetus), the naked mole-rat Heterocephalus glaber (H. glaber), and the eastern red bat Lasiurus borealis (L. borealis) to shed light on the molecular evolution of these proteins in vertebrate species.
Methodology
Extraction of gene and protein sequences of ATP8 and ATP6 suborder Notothenioid and other vertebrates
The list of complete coding sequences (CDS) and protein sequences of the proteins were obtained from the National Centre for Biotechnology Information (NCBI) protein database search, we chose only the Refseq (provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins) sequence queries (https://www.ncbi.nlm.nih.gov/ lMSast searched:17th August 2020).
Multiple Protein Sequence alignment (MSA)
(-/-) indicates absence of both haemoglobin and myoglobin genes, whereas (-/+) indicate absence of haemoglobin but presence of myoglobin. The sequences for the Notothenioidei suborder species Champsocephalus gunnari (-/-), Chionodraco rastrospinosus (-/+), Chaenocephalus aceratus (-/-), Notothenia coriiceps (+/+), Trematomus bernacchii, Eleginops maclovinus (+/+), and seven other vertebrate species, Nothobranchius furzeri, Danio rerio, Anolis carolinensis, Cavia porcellus, Balaena mysticetus, Heterocephalus glaber, Lasiurus borealis were aligned using Clustal omega[38] to prepare the initial alignment of ATP6 protein under the criteria of the presence and the absence of haemoglobin and myoglobin proteins in the species, the alignments were also verified using the other two progressive methods, MAFFT[39] and MUSCLE[38]. The same method was applied for protein ATP8. The MSA was visualised and edited using JALVIEW[40].
Codon Alignment
Complete nucleotide coding sequences for genes ATP6 and ATP8 from the thirteen vertebrate species were retrieved from NCBI GenBank database (see Table 1). The sequences were aligned using Clustal omega[38] and were manually edited and visualised as codons using MATLAB version R2018b (9.5.0).
Phylogenetic Tree
The phylogenetic tree for the ATP6 protein was created from the MSA obtained by Clustal omega[38] and saved in PHYLIP format. Using the Simple Phylogeny[38] a phylogenetic tree was created with the neighbour joining algorithm and visualised with iTOL v5[41].
Comparison of properties of amino acids among the sequence from the above-mentioned species
Using the ExPASy[42] tool ProtScale[43], different amino acid properties such as the molecular weight of amino acids across the sequence, hydrophobicity trend of amino acids, α--helix forming amino acids, average flexibility trend and mutability for the protein ATP6 were compared graphically among the thirteen species (https://web.expasy.org/protscale/).
Structure prediction for protein sequences
The MSA was structurally validated using the structure prediction tool I-TASSER[44] (Iterative Threading ASSEmbly Refinement) a hierarchical approach to protein structure and function prediction, to generate the protein structure for AT6 from different species (https://zhanglab.ccmb.med.umich.edu/I-TASSER/). Figures. Protein structure images were produced with PyMOL v. 2.3.2. (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.) Graphs were produced with MATLAB version R2018b (9.5.0). Sequence logos were created using the webserver WebLogo using 4000 vertebrate protein sequences for the protein ATP6 (http://weblogo.threeplusone.com/).
RESULTS
Codon Alignment
MSA of all the sequences of ATP6 (See Fig. 1) and ATP8 (See Fig. 2) from the different vertebrate species (Table 1 & Table 2) for both nucleotide (codon) and proteins identified several conserved codons and amino acid residues. Five of the six Antarctic fish species have twelve nucleotides (four codons) at the 5’ end of the gene sequence which are not found in the other eight vertebrate species. The codon alignment ATP6 for species E. maclovinus, N. coriiceps, C. rastrospinosus and C. aceratus show that GTG codes for methionine, as the start codon for the protein. GTG which is originally known for coding the amino acid valine has been accepted as a mitochondrion start codon for invertebrate mitogenomes[45–47]. A common feature with the species that have GTG as a start codon is that N. coriiceps, E. maclovinus, C. rastrospinosus have genes coding for myoglobin, where the latter is devoid of haemoglobin. C. aceratus do not express myoglobin due to a 15 bp sequence insertion, other than that difference, their myoglobin gene sequence is identical to that of C. rastrospinosus[12]. The only exception to this is the red-blooded species T. bernacchii, but this may be attributed to the unverified source of its sequence submission.
Another trend that has been observed through sequence alignment is that the species that are more similar and have the same amino acid for a particular position also have codons with the same nucleotide (nt) at the third position. ‘TGA’ codons or ‘stop codons’ are found within the translated sequence, here these code for tryptophan, as seen in human and yeast mitochondria[48]. A variation in the length of the sequences was observed, with an average length for ATP6 nt sequence of 683 and 74 nt for ATP8 gene sequences. The ATP6 sequence ends with a TAA stop codon in all species except the two red blooded Antarctic fish species, N. coriiceps and E. maclovinus.
Overlapping genes
The overlap between genes is encoded on the same strand (see Table 1). The length of overlap was 22 nt in ATP8-ATP6 for the five of the six species of Notothenioidei suborder, that is excluding icefish C. gunnari where the overlap was of 10nt. Species H. glaber, L. borealis and C. porcellus had an overlap of 43nt between ATP6 and ATP8. The shortest overlap between the two genes were observed in the species A. carolinesis has an overlap of 10nt and N. furzeri and D. rerio, have an overlap of 7nts.
Protein alignment and structural changes in ATP6
The complete amino acid sequences for ATP6 and ATP8 were aligned separately for the thirteen vertebrate species (see Figs. 4 & 5). Protein sequence alignment showed conserved residues across the species based on identity and similarity. Four Antarctic fish species, N. coriiceps, T. bernacchii, C. rastrospinosus, C. aceratus and the sub-Antarctic E. maclovinus have four amino acids at the N-terminal with a total of 231 residues. As previously mentioned, the only exception to this, is the species C. gunnari with 227 residues similar to that of other fish species, N. furzeri and D. rerio. Species A. carolinesis, L. borealis, H. glaber and C. porcellus have 226 residues and B. mysticetus has 225 residues. The protein ATP6 in vertebrates is known to have 226-228 residues. In humans, four point mutations in the ATP6 gene account for 82% of disease associated with this gene, suggesting point mutations could have physiological relevance[49,50]. Common features in all 13 species were as follows: (1) several hydrophobic amino acids (light pink) were observed to be conserved across the sequences in the species, (2) insertions and deletions of amino acids occurred more frequently near N-termini, and (3) the C-terminal of the protein sequence is hydrophilic. Dashes in the amino acid sequence represent gaps which may be an insertion or deletion of a residue. The gap in the alignment is observed for the species L. borealis, C. porcellus, B. mysticetus and H. glaber at position 35, and at the C-terminal end for A. carolinesis and B. mysticetus, at position 226 and 225 respectively. The amino acid at position 35 has predominantly hydrophilic residues except in the two species C. gunnari and N. furzeri, where it is substituted with alanine or leucine respectively. All the Antarctic species and E. maclovinus have a serine at this position, except C. gunnari. When we look at the codon alignment of the ATP6 gene, serine is encoded by codon TCT predominantly at position 39 for all the species except T. bernacchii and the alanine for the species C. gunnari is encoded by GCT (see Fig. 1). Multiple sequence alignment of four thousand species was used to generate the WebLogo (see Fig. 4), which shows the conservation of amino acid residues across the species. Position 42 shows proline and serine are conserved as we find in the notothenioid species.
A similar pattern was found in the amino acid alignment of ATP8, where the species, B. mysticetus, H. glaber, C. porcellus and L. borealis, that showed a gap in the previous alignment have hydrophilic residues whereas the other species have a gap at the position 47. This observation could be attributed to the overlapping nature of the nucleotide sequences coding for the two proteins. The protein sequence of ATP6 was observed to be more conserved than ATP8. The amino acid sequences at the N-terminal are more diverse, and the methionine residues are usually followed by amino acids with short polar side chains [51]. Alanine is a non-polar amino acid whereas serine is a polar amino acid. The hydrophobicity plot, average flexibility, mutability and coil prediction across the sequences has shown that T. bernacchii and E. maclovinus show similar trends in their physico-chemical properties across the sequence. Notothenia coriiceps, C. aceratus and C. rastrospinosus follow this trend. Champsocephalus gunnari is the only species out of the six notothenioids that varies from the others.
The phylogenetic tree generated from the thirteen protein sequences for ATP6 (Fig. 3) shows that the species are clustered into three visible clusters, the first cluster has species H. glaber, B. mysticetus, C. porcellus and L. borealis, the second cluster has species A. carolinesis, D. rerio and N. furzeri, and the final cluster comprised the six Notothenioidei. Notothenioidei are clustered together where the icefish, C. rastrospinosus (presence of myoglobin) and C. aceratus, share a common node with C. gunnari which can be interpreted as the most recent common ancestor in agreement with the most recent and complete phylogenies of notothenioids[3].
Protein structure differences were predicted at position 38-39 for species C. gunnari (icefish), N. furzeri, D. rerio and A. carolinesis, where a strand-strand structure is found at that position. All other species have coil structures at those positions (see Fig. 6). For species T. bernacchii and E. maclovinus there is also a prediction for a strand structure at positions 42-43.
Discussion
We present our analysis highlighting differences in sequence and structure observed in the two proteins of complex V, ATP8 and ATP6, encoded by mtDNA between the red-and white blooded notothenioids. Our analyses are based on the current genome annotation available which is subject to change as more information becomes available. We have only used the RefSeq sequences as these are reviewed by NCBI and represent a compilation of the current knowledge of a gene and protein products and is synthesised using information integrated from multiple sources. RefSeq is used as a reference standard for a variety of purposes such as genome annotation and reporting locations of sequence variation. The RefSeq and GenBank entries available for a ATP6 sequences for the Antarctic fish, NC_015653.1, AP006021.1 (N. coriiceps), NC_039543.1, MF622064.1 (C. rastrospinosus), NC_033386.1, KY038381.1 (E. maclovinus), NC_015654.1, YP_004581502.1 (C. aceratus), which are submitted by different authors, have the start codon as GTG for the five notothenioid species. The protein length of ATP6 has been consistent in all the entries, 231 amino acids.
It has previously been shown that mitochondria from icefish are more tightly coupled than those of their red-blooded counterparts[37]. Mitochondria that are tightly coupled usually have competent membranes and protons can only get into the matrix by passing through complex V. The initial difference we see is that red-blooded species N. coriiceps, E. maclovinus, T. bernacchii, and the two icefish, C. rastrospinosus that are devoid of haemoglobin but have myoglobin and C. aceratus species with a nearly identical gene to that of C. rastrospinosus, have an additional 12 nucleotides at the N-terminal. The only exception to this is the icefish C. gunnari.
GTG as an alternative start codon
The biosynthesis of proteins encoded by their respective mRNA requires an initiation codon for their translation. ATG is the usual initiation codon but GTG has been reported as initiation codon in some lower organisms, the frequency of annotated alternate codon in higher organisms is found to be less than 1 %[52]. An in-vitro study of GTG-mediated translation of enhanced green fluorescent protein suggested that initiation with GTG codon regulates expression of lower levels of the protein and a similar observation was made for the protein endopin 2B-2[53]. It has also been observed in a few human diseases that a mutation of the ATG initiation codon to a GTG are associated with diseases such as beta-thalassemia and Norrie disease, where GTG mutation leads to inactivation of the gene[54,55]. Another example is a disruption caused by GTG as the initiation codon in the gene CYP2C19, which resulted in poor metabolism of a drug, mephenytoin, when compared to the gene with an ATG initiation codon[56]. Numerous studies on bacteria and lower organisms show GTG as a start codon, where the non-methionine codon is initially coded for, however, when they act as a start codon the initial amino acid is substituted with a methionine[53,57]. There is only a single report of a vertebrate species, rat, where GTG is the start codon in mtDNA[58]. An ATG to GTG exchange in human gene FRMD7 (FERM Domain Containing 7) has been found as a first base transversion of the start codon that accounts for a mutation, causing morphological changes in the optic nerve head[59]. The level of corresponding protein expression has been shown to be lower when initiated using an alternative codon such as GTG rather than ATG[53,60]. GTG was observed as a start codon for ATP8 in fish Philomycus bilineatus, which adds onto the show GTG as an acceptable start codon[47].
A few but increasing number of mammalian genes have been found to give rise to an alternative initiation codon in regulatory proteins such as transcription factors, growth factors and a few kinases in humans and rats. The finding in all these studies have shown a similar trend of a lower level of protein production when compared to an ATG start codon[61–63]. It has been shown that the fish inhabiting colder climates had undergone stronger selective constraints in order to avoid deleterious mutations[64,65]. MtDNA coding genes such as ATP6, could be placed under selective pressures by low environmental temperatures. A larger ratio of substitution for different sites could indicate proteins undergoing adaptations[66]. A decrease in ATP6 activity previously reported, shows incomplete ATPase complexes that are capable of ATP hydrolysis but not ATP synthesis. ATPase complexes completely lacking subunit a, were capable of maintaining structural interactions between F1 and F0 parts of the enzyme but the interactions were found to be weaker[67].
The GTG initiation for protein ATP6 in these fish species could suggest a common parallel evolution of the translation machinery. The favouring of GTG as a start codon could also mean a higher stability of the protein as GC base pair has higher thermal stability when compared to the AT base pair which is attributed from stronger stacking interaction between GC bases and a presence of triple bond compared to that of AT double bond[68].
Overlap of ATP8 and ATP6 genes
Protein coding genes ATP8 and ATP6 are located adjacent to each other and are overlapping on the same strand in humans and other vertebrates, with an overlap of 44 nt (NCBI: NC_012920.1) observed in the humans for the gene. It has been previously reported that ATP8-ATP6 overlap is generally of 10 nt in the fish genome[69]. Species T. bernacchii, E. maclovinus, N. coriiceps, C. rastrospinosus and C. aceratus show an overlap of 22 nts and C. gunnari has a 10 nt overlap, as reported previously in other fish genomes mentioned above. The gene coding ATP8 ends with the stop codon TAG for all notothenioid species and TAA for the other vertebrate species, a single exception to this was H. glaber that ends with a TAG stop codon. It has been previously hypothesised that TAG is a sub-optimal stop codon which is less likely to be selected. A study showed that the protein encoding genes that end with TAA stop codons are, on average more abundant than those with genes ending with TGA or TAG and further shows that a switch of stop codon TAG from TGA might pass through the mutational path of TAA stop codon which could be subject to positive selection in several groups.[70].
Protein Alignment and Structural changes in ATP6
The four Antarctic fish species, N. coriiceps, T. bernacchii, C. rastrospinosus, C. aceratus and the sub-Antarctic E. maclovinus have four amino acids at the N-terminal of ATP6 and a total of 231 residues. As previously mentioned, the only exception to this is the species C. gunnari with 227 residues similar to N. furzeri and D. rerio. N-terminal addition of amino acids can influence the properties of the protein, as it can change the molecular weight of the protein, the charge, hydrophobicity, and this has been seen in the yeast meta-caspase prion protein Mca1[71]. Amino acid position 35 is populated with predominantly hydrophilic residues, apart for the two species C. gunnari and N. furzeri, where respectively, alanine and leucine are found. All the other Antarctic fish species and E. maclovinus have a serine at this position. When we look at the codon alignment of the ATP6 gene, serine is encoded by codon TCT at position 39 for all the species except T. bernacchii (encoded by TCC) and the alanine for the species C. gunnari is encoded by GCT. Serine is the only amino acid that is encoded by two codon sets.
A common example of a missense mutation is where the single base pair can alter the corresponding codon to a different amino acid. This base substitution even though affecting a single codon can still have a significant effect on the protein production. It has been recently discovered that serine at a highly conserved position is more often encoded in TCN fashion and will tend to substitute non-synonymously to proline and alanine, which shows that codon for which serine is coded indicate different types of selection for amino acid and its acceptable substitutions[72].
The hydrophobicity plot, average flexibility, mutability and coil prediction across the sequences highlights differences in the physiochemical properties across the sequence of protein ATP6 in the species C. gunnari.
The secondary structure of a protein is the way in which protein molecules are coiled and folded in a certain way according to the primary sequence. Beta-strands give stability to the structure of a protein, its intrinsic flexibility can sometimes return it to coil configuration in order for the protein to perform other functions. Structural changes were observed at position 38-39 for species C. gunnari, N. furzeri, D. rerio and A. carolinesis, where strand-strand structure was predicted at that position. All other species are predicted to have coil structures at those positions (see Fig. 6). Species T. bernacchii and E. maclovinus are predicted to have strand structures at positions 42-43.
Protein structure, dynamics and function are all interlinked and it is vital to understand the structure of a protein in relation to function to comprehend molecular processes[73]. We have used the unique biology of the icefish to gain a better understanding of the variability of ATP6 and ATP8 sequence and structure which has importance for mitochondrial function.
CONFLICTS OF INTEREST
The authors declare that they have no conflict of interest.
FUNDING
Gunjan Katyal was supported by Vice Chancellor’s International Scholarship for Research Excellence, University of Nottingham (2018-2021). Brad E Banks was supported by the Biotechnology and Biological Sciences Research Council [grant number BB/J014508/1.