Abstract
Upstream of the 3’-untranslated region in the SARS-CoV-2 genome is ORF10 which has been proposed to encode for the ORF10 protein. Current research is still unclear on whether this protein is synthesized, but further investigations are still warranted. Herein, this study uses multiple bioinformatic tools to biochemically and functionally characterize the ORF10 protein, along with predicting its tertiary structure. Results indicate a highly ordered, hydrophobic, and thermally stable protein that contains at least one transmembrane region. This protein also possesses high residue protein-binding propensity, primarily in the N-terminal half. An assessment of forty-one missense mutations reveal slight changes in residue flexibility, mainly in the C-terminal half. However, these same mutations do not inflict significant changes on protein stability and other biochemical features. The predicted model suggests the ORF10 protein contains a β-α-β motif with a β-molecular recognition feature occurring in the first β-strand. Functionally, the ORF10 protein could be a membrane protein. A single pocket was identified in this protein but found to possess low druggability. The ORF10 itself consists of two distinct lineages: the SARS-CoV lineage and the SARS-CoV-2 lineage. Evidence of strong positive selection (dN/dS = 4.01) and purifying selection (dN/dS = 0.713) were found within the SARS-CoV-2 lineage and SARS-CoV lineage, respectively. Collectively, these results continue to assess the biological relevance of ORF10 and its putatively encoded protein, thereby aiding in diagnostic and possibly vaccine development.
1. Introduction
An outbreak of coronavirus disease 2019 (COVID-19) in Wuhan, China has resulted in a global pandemic causing, at the time this manuscript was prepared, well over 102,135,000 cases and 2,210,000 deaths.1 The virus responsible for causing COVID-19, severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), contains an individual copy of a single-stranded positive-sense RNA genome that is ∼29.8 kilobases in length.2 In particular, the 3’-terminus encompasses multiple open reading frames (ORFs) that encode four main structural proteins: the spike (S), membrane (M), envelope (E), and nucleocapsid (N) proteins.2 In addition, shorter length ORFs have been detected and proposed to encode for approximately nine different accessory proteins (3a, 3b, 6, 7a, 7b, 8, 9b, 9c, 10) in the SARS-CoV-2 genome.2,3 Past research indicates these accessory proteins are dispensable for viral growth in vitro; however, the genes encoding for these proteins are maintained in the coronavirus (CoV) genome, suggesting they might play important roles within the environment of the infected host.4
To some degree, all of these accessory proteins have been structurally and/or functionally characterized. For example, the 3a protein was shown to induce apoptosis whereas the 9b and 9c proteins were suggested to be involved in membrane interactions during virion assembly and host-virus interactions, respectively.5,6 Prior work has been able to detect for the expression of RNA transcripts corresponding to these proteins in SARS-CoV-2; however, RNA transcripts belonging to ORF10 have not been observed in large amounts.3,7 Pancer et al. 2020 had also identified several SARS-CoV-2 variants with prematurely terminated ORF10 sequences, finding that disease was not attenuated, transmission was not hindered, and replication proceeded similarly to strains possessing intact ORF10 sequences.8 In light of these findings, there is some uncertainty regarding the biological relevance of ORF10 in SARS-CoV-2, with suggestions that its genome annotation should be amended. As an unintended consequence, this putatively encoded protein within SARS-CoV-2 has remained somewhat understudied.
Found upstream of the 3’-untranslated region (3’-UTR), ORF10 (117 nt long) encodes for a protein that is 38 amino acids in length.9 Although there exists sequences homologous to the ORF10 protein in other closely related bat and pangolin CoVs, there are no experimentally derived crystallographic structures for the ORF10 protein.10,11 Studies attempting to predict the secondary structural elements of this protein indicate one α-helix and, depending on the study, two β-strands.12,13 The ORF10 protein also lacks significant levels of disorder; however, a short molecular recognition feature (MoRF) likely spans residues 3-7.12 This protein is hydrophobic with the α-helix having been identified as a possible transmembrane (TM) helix as well.14
The exact implications are unclear, but the ORF10 protein contains high numbers of promiscuous cytotoxic T lymphocyte (CTL) epitopes, primarily on the α-helix.12,15 A recent study by Liu et al. 2020 found that in severe cases of COVID-19, the ORF10 was overexpressed when compared against ORF10 expression levels in moderate cases.16 If the ORF10 protein is synthesized, this may help to partially explain the nature behind severe cases of COVID-19. The large number of CTL epitopes in conjunction with the overproduction of the ORF10 protein could result in elevated immune responses towards SARS-CoV-2, leading to some potentially fatal immunopathological outcomes.
The ORF10 protein was shown to associate with the CUL2ZYG11B complex by interacting with the substrate adapter ZYG11B, thereby overtaking the complex to modulate aspects of ubiquitination and enhance viral pathogenesis.13 However, the mechanisms and residues involved in this interaction have not been described in detail. Alternatively, ORF10 may act itself or serve as a precursor for other RNAs in regulating gene expression/replication, translation efficiency, or interfering with cellular antiviral pathways.7
Two lineages for the ORF10 have been observed. The SARS-CoV lineage consists of sequences interrupted by a stop codon (TAA) encoding for a truncated protein and was found to be undergoing neutral evolution (dN/dS = 1.18).14 The SARS-CoV-2 lineage consists entirely of uninterrupted sequences encoding for a complete protein and was found to be undergoing strong positive selection (dN/dS = 3.82) without relaxed constraints.14 This has led to the possibility that the ORF10 in SARS-CoV-2 and closely related CoVs might encode a conserved and functional protein. Furthermore, the ORF10 present in SARS-CoV-2 and closely related CoVs is thought to have evolved from a mutation occurring within the ancestral stop codon originally found in the SARS-CoV lineage at nucleotide position 76.17
Collectively, there still remains few efforts spent on investigating the ORF10 protein within SARS-CoV-2. Herein, this current study uses a bioinformatic approach to characterize this protein, along with providing a more updated phylogenetic and evolutionary analysis. This study also examines the structural and biochemical effects of mutations found in this protein, along with assessing its potential as an antiviral drug target.
2. Materials and Methods
2.1 Sequence, Phylogenetic, and Evolutionary Analysis
The nucleotide (NC_045512.2) and protein (YP_009725255.1) reference sequences for ORF10 in SARS-CoV-2 were acquired from the NCBI RefSeq Database. BLASTn was used to collect sequences from both distantly and closely related CoVs (Table S1).18 Sequence alignments were performed using MUSCLE on the MEGA-X v10.1.7 software.19,20 Alignment reliability and total similarity were determined by overall mean distance and calculated using the p-distance substitution model. If the p-distance was found to be ≤ 0.7 then the alignment was considered reliable and suitable for subsequent analyses.21
A single phylogenetic gene tree was constructed using the neighbor-joining method and visualized on MEGA-X.22 Uniform rates among sites and complete deletion were used. The phylogeny was tested using the bootstrap method.
The average nonsynonymous/synonymous substitution (dN/dS) rate was calculated using the single-likelihood ancestor counting (SLAC) method.23 To determine either relaxed (k < 1) or intensified (k > 1) selection, the RELAX method was utilized.24 Both methods were performed using the datamonkey webtool (http://datamonkey.org/). Wu-Kabat variability coefficients were calculated using the protein variability server (http://imed.med.ucm.es/PVS/).25
2.2 Protein Characterization and Secondary Structure Prediction
Per-residue disorder scores were determined using several online tools as previously described.26 Scores were then used to calculate average protein disorder. To detect phosphorylation sites, the DEPP server (http://www.pondr.com/cgi-bin/depp.cgi) was used. A hydrophobicity plot was generated using ProtScale.27 The grand average of hydropathicity (GRAVY), aliphatic index, and instability index were determined using ProtParam.27 The propensity of residues involved in protein-binding were evaluated by SCRIBER.28 Secondary structural elements were predicted using PSIPRED v4.0.29 TM regions were predicted using TMPred.30
2.3 Protein Modeling and Evaluation
The webserver IntFOLD was employed to make use of an ab initio modeling approach in constructing the ORF10 protein.31 Models were evaluated based on quality and confidence scoring. The best model was then refined using the 3Drefine webserver.32 Of the models generated post-refinement, the one with a higher qualitative model energy analysis (QMEAN) global Z-score was viewed as most favorable.33 This final model was evaluated by PROCHECK and ERRAT on the SAVES v6.0 webtool (https://saves.mbi.ucla.edu/).34-36 This theoretical structure was deposited in ModelArchive (https://modelarchive.org/) and given the ID: ma-9yzbf. Global and local quality comparisons against two previously constructed models were also performed.37,38
The ProFunc server predicted the function of the ORF10 protein based on the newly produced model.39 The online web server DoGSiteScorer was used to predict and describe pockets within the ORF10 protein.40 A druggability score was returned between 0 and 1. The higher the score, the more druggable the pocket was estimated to be. UCSF Chimera v1.15 was used to visualize and obtain high-quality images of protein models.41 Electrostatic surfaces were generated based on the AMBER ff14SB charge model. Hydrophobicity surfaces were produced according to the standard Kyte-Doolittle scale.
2.4 Mutation Analysis
Using forty-one ORF10 protein variants, the effect these mutations had on protein stability and other biochemical features were determined. The same methods that were used to determine the GRAVY, instability index, aliphatic index, and per-residue disorder scores were used to evaluate the effects of these mutations. The effect each mutation had on protein stability was determined using the I-MUTANT v2.0 webtool.42 The free energy change value (DDG) predictor was used to assess changes in stability brought on by each mutation. If a negative value is returned, then the stability of the protein decreases. If a positive value is returned, then the stability of the protein increases. The designated temperature and pH used were 25°C and 7, respectively.
3. Results
3.1 Sequence, Phylogenetic, and Evolutionary Analysis
The whole nucleotide alignment (p-distance = 0.08) had 92% similarity. The phylogenetic gene tree revealed two lineages (Figure 1a). The SARS-CoV lineage (p-distance = 0.04) had 96% similarity whereas the SARS-CoV-2 lineage (p-distance = 0.08) had 92% similarity. An alignment of protein sequences taken from the SARS-CoV-2 lineage revealed a total of thirty-one conserved sites (Figure 1b). A Wu-Kabat plot displayed higher variability coefficients in the C-terminal half (20-38) rather than the N-terminal half (1-19) of the ORF10 protein, with amino acid position twenty-five presenting the highest variability coefficient in the lineage (Figure 1c).
The SARS-CoV lineage had a dN/dS = 0.713 to indicate purifying selection. Oppositely, the SARS-CoV-2 lineage had a dN/dS = 4.01 (Figure 1a). To determine if this dN/dS ratio was either the result of positive or relaxed purifying selection, the RELAX method was used. In doing so, a k = 6.79 was obtained; therefore, the hypothesis of a relaxed constraint was rejected in favor of strictly positive selection acting upon the SARS-CoV-2 lineage.
3.2 Protein Characterization and Secondary Structure Prediction
A per-residue disorder plot for the ORF10 protein in SARS-CoV-2 showed that residues within both termini held the highest disorder scores whereas moving inwards resulted in gradually decreasing scores (Figure 2). The average disorder score for the ORF10 protein was 0.095, which indicates a highly ordered protein. No specific phosphorylation sites had been detected within this protein (Table S2). The GRAVY score was 0.637 to indicate a very hydrophobic protein. The aliphatic and instability indices were 107.03 and 16.06, respectively. Both of these values suggest a thermally stable protein. The hydrophobicity plot uncovered two hydrophobic regions covering residues 3-19 and 28-36, along with a hydrophilic region covering residues 20-27 (Figure 3a). As for protein-binding propensity, residues in the N-terminal half had a tendency to possess greater scores than residues found in the C-terminal half (Figure 3b). In particular, residues 1-14 presented with a stretch of relatively high propensity scores to implicate this region’s possible involvement in protein-binding (Figure 3b).
The prediction of secondary structures indicates that a single α-helix (residues 11-21) and two β-strands (residues 4-8 & 26-34) occur in the ORF10 protein (Table S3). Furthermore, only one TM region was predicted to span residues 3-19, corresponding to one of the hydrophobic regions that was shown in the hydrophobicity plot (Figure 3a).
3.3 Protein Modeling and Predictive Function
IntFOLD produced five different ORF10 protein models (Table S4). The model having the lowest p-value (3.33e-03) and highest quality score (0.3714) was subjected to refinement. Of the five protein models generated post-refinement, the model with a QMEAN Z-score of −0.66 was selected as most favorable (Table S5). This final model had an ERRAT score of 93.333 and a Ramachandran plot indicated that 94% of residues (32 total) existed in the most favorable region while only 6% of residues (2 total) existed in the additional allowed region (Figure S1). Functional analysis results by ProFunc have been summarized (Table 1). The predicted ontological terms were based on “long shot” hits using a reverse template comparison against known Protein Data Bank structures; therefore, these functional characteristics should not be viewed as definitive.
Structurally, the model presented with a β-α-β motif spanning residues 3-31, along with a 3/10-helix spanning residues 34-37 (Figure 4a). An electrostatic surface map revealed two regions of positive charge and one region of negative charge (Figure 4c). Electropositive regions were influenced by R20 and R24 whereas the electronegative region was influenced by D31. As expected, a majority of the protein’s surface was hydrophobic; however, hydrophilic regions did appear (Figure 4d). For example, residues spanning 20-27 presented with surface coloring that reflected hydrophilic character, as was shown in the hydrophobicity plot (Figure 3a).
A single hydrophobic pocket was detected using DoGSiteScorer (Figure 4b). This pocket had a volume of 78.14 Å3, a surface area of 221.79 Å2, and a depth of 5.31 Å. This pocket had a total of 19 hydrophobic interactions, followed by 8 hydrogen bond acceptors and 3 hydrogen bond donors. The druggability score for this pocket was 0.14 and interpreted as a poor drug target.
Prior to this study, two ORF10 protein models were constructed. These models were built using either the QUARK or I-TASSER web server.36,37 Both models varied in placement of secondary structural elements and topology (Figure S2). The QMEAN global Z-scores for both models (QUARK: −3.88 / I-TASSER: −2.63) were lower, indicating the model produced in this study had higher global, and also local, quality scoring (Figure S3).
3.4 Mutation Analysis
One missense mutation per variant was observed. The longest conserved region only spanned residues 15-16 (Figure 5). Per-residue disorder scores in the N-terminus did not fluctuate as much when compared to per-residue disorder scores in the C-terminus (Figure 2). Regardless, the average disorder for each variant indicated highly ordered proteins. The predicted effects that each missense mutation had on protein structure, stability, and other biochemical features have been summarized (Table 2).
4. Discussion
In this study, the evolutionary history for ORF10 is similar to the history previously described by Cagliani et al. 2020.14 Currently, there exists two distinct and conserved lineages for the ORF10 in CoVs: the SARS-CoV and SARS-CoV-2 lineages. Both lineages are easily distinguishable based upon whether they encode for either a truncated (SARS-CoV lineage) or full-length (SARS-CoV-2 lineage) protein. The SARS-CoV lineage is undergoing purifying selection whereas the SARS-CoV-2 lineage is undergoing positive selection without any relaxed constraints. A bat-CoV (SARS-related-CoV Rc-o319) isolated from Japan in 2013 possessed an uninterrupted ORF10 sequence with approximately 95% nucleotide and 87% amino acid similarity to ORF10 in SARS-CoV-2.10 Following the addition of this new sequence, this study provides an updated phylogenetic and evolutionary look at ORF10 and continues to reinforce the current hypothesis that positive selection is occurring within the SARS-CoV-2 lineage. These data also suggest that ORF10 might encode a conserved and functional protein in SARS-CoV-2 and other closely related bat and pangolin CoVs. This lineage does have greater amino acid variations within the C-terminal; however, the cause (if any exist) behind this remains unclear.
Although the SARS-CoV lineage was not a major focus, it is worth noting that in contrast to the work performed by Cagliani et al. 2020 they found the SARS-CoV lineage was being subjected to neutral evolution.14 As such, a more detailed evolutionary analysis could help determine if the SARS-CoV lineage is under either purifying selection or neutral evolution.
The prediction of secondary structural elements for the ORF10 protein describe the presence of one α-helix and two β-strands. When compared to the structural assignment of residues in the ORF10 protein model, only 76% of residues match. One significant difference was the model contained a 3/10-helix at the C-terminus. Since this model is predicative and no experimentally derived structure is available for the ORF10 protein, this 3/10-helice’s existence along with its structural and functional role is unknown. The local scoring for residues in this structure were also lowest in comparison to the rest of the model. In order to better understand the ORF10 protein, it is recommended that an experimentally derived structure be obtained as to potentially further substantiate the predicted model generated during this study.
According to Hassan et al. 2020, a MoRF is thought to occur in residues 3-7.12 A unique feature associated with MoRFs is their ability to exist in semi-disordered (i.e. flexible) states until binding, after which they will assume a designated secondary structure.43 Therefore, it is possible residues 3-7 (or 3-8) function as a β-MoRF and that upon binding to a corresponding protein ligand, will assume the structure of a single β-strand. Evidence in support of this comes from the fact that residues in this region do associate with a β-strand, and have disorder scores reflecting on residues that possess moderate flexibility to accommodate changes in structure.
Protein disorder predictions show the ORF10 protein in SARS-CoV-2 is highly ordered. It is only near each terminus that residues begin to indicate some degree of flexibility. These results contradict those by Hassan et al. 2020.12 Their study indicates more flexible residues, including several disordered residues within the C-terminus, exist in the ORF10 protein.12 This disagreement is likely attributed to the method of prediction, as this study used multiple prediction algorithms to evaluate disorder. In doing so, per-residue disorder scores were not over estimated by a single algorithm; therefore, more conservative but reliable predictions were achieved.26 The results of this study indicate that mutations in the C-terminus instill changes in residue flexibility; however, these changes do not result in shifts towards disorder. Conversely, mutations in the N-terminus caused minimal differences in residue flexibility, and as noted in the C-terminus, these changes did not result in shifts towards disorder. Overall, the forty-one mutations analyzed do not pose any significant changes on the average disorder for the ORF10 protein in SARS-CoV-2.
An overwhelming number of missense mutations in the ORF10 protein decrease protein stability; however they do not result in a completely unstable protein according to the instability and aliphatic indices as determined by ProtParam. As such, these mutations are not likely to have major effects on structure or function; however, Hassan et al. 2020 does assert that some mutations are deleterious and may decrease the protein’s structural integrity.12 Modeling of these mutations could help determine if they would negatively impact the structure of the ORF10 protein.
The ORF10 protein in SARS-CoV-2 might be a membrane protein. Whether this protein is integrated within or peripheral to the membrane remains unclear, but given the extent of nonpolar residues it is reasonable to consider this protein as integrated. Furthermore, TM predictions suggest residues 3-19 span the membrane with the N-terminus exposed to the extracellular space or, if within an organelle, the cytoplasmic space. With additional evidence coming from ProFunc and the hydrophobicity plot, it appears residues 3-19 compose a TM region. Residues 28-36 could be another TM region as well; however, TMPred did not implicate these residues. Nonetheless, scores for these residues in the hydrophobicity plot were lower but still above the cutoff threshold.
In particular, CoVs encode multiple viroporins.44 For example, in SARS-CoV the 8a protein oligomerizes to form ion channels inside mitochondrial membranes that induce apoptosis by depolarizing membrane potential.14,44–46 The ORF10 protein in SARS-CoV-2 may oligomerize in such a way that polar and electrically charged residues coat the inside of a pore that facilitates the transport of ions or small molecules for viral replication, virulence, and/or pathogenicity. Based on the ORF10 protein model, it consists almost entirely of one β-α-β motif. These motifs are usually found within proteins that associate with porins. A prior study found the ORF10 protein colocalized with accessory proteins 7b and 8 in the endoplasmic reticulum (ER); however, its role in the ER was not described.47 In this case, the ORF10 protein may oligomerize and form a viroporin that associates with ER membranes as a means to assist in viral replication and/or processing of other viral proteins. Alternatively, the ORF10 protein could be interacting with another protein. As previously stated, the ORF10 protein was suggested to associate with the substrate adapter ZYG11B.13 This interaction is likely mediated by residues in the N-terminal half on account of the β-MoRF and high protein-binding propensity scores occurring throughout residues 1-14. It is tempting to think the ORF10 protein may be multifunctional; however, further studies are necessary to explore this idea.
A major limitation to this study is the quality of sequences due to the varying methods used to acquire them. Although unlikely, there is always the possibility of errors and missing information that may affect the results obtained herein. In addition, with only six sequences used, the evidence of positive selection should not be regarded as conclusive but as additional support for the hypothesis that positive selection is acting on the SARS-CoV-2 lineage for ORF10. Additional sequences of good quality from other closely related CoVs can help to resolve this matter. It is also important to note that mutations analyzed in this study are likely present in other geographic locations; therefore, the list of locations in this study should not be viewed as exhaustive.
Overexpression of ORF10 has been shown to occur in severe cases of COVID-19 whereas in milder cases its expression seems to be minimal.16 Therefore, knowing the structure and biochemical characteristics associated with the ORF10 protein might aid in developing diagnostic tests that detect for the ORF10 protein, thereby helping to determine the likely progression the disease may take in patients.16 In addition, based on results by DoGSiteScorer the ORF10 protein would not make for a suitable drug target. Despite not serving as a likely drug target, the predicted structure for the ORF10 protein may be useful in further computational analyses detailing events in viral pathogenesis or virulence. Hopefully, this study can also provide a foundation that drives further work assessing the biological relevance of ORF10 and its putatively encoded protein in SARS-CoV-2, thereby aiding in diagnostic and possibly vaccine development.
Conflict of Interest
The author declares that no conflicts of interest exist.
Funding Information
The work described in this article received no specific grant from any funding agency.
Acknowledgments
I would like to thank the previous researchers who had deposited the sequences used in this study into NCBI. Without access to those sequences, the research described in this manuscript could have never been accomplished.