Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Potential T-cell and B-cell Epitopes of 2019-nCoV

Ethan Fast, View ORCID ProfileRuss B. Altman, View ORCID ProfileBinbin Chen
doi: https://doi.org/10.1101/2020.02.19.955484
Ethan Fast
aNash, Menlo Park, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Russ B. Altman
bDepartment of Bioengineering, Stanford University, Stanford, CA, USA
cDepartment of Genetics, Stanford University, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Russ B. Altman
Binbin Chen
cDepartment of Genetics, Stanford University, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Binbin Chen
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

As of early March, 2019-nCoV has infected more than one hundred thousand people and claimed thousands of lives. 2019-nCoV is a novel form of coronavirus that causes COVID-19 and has high similarity with SARS-CoV. No approved vaccine yet exists for any form of coronavirus. Here we use computational tools from structural biology and machine learning to identify 2019-nCoV T-cell and B-cell epitopes based on viral protein antigen presentation and antibody binding properties. These epitopes can be used to develop more effective vaccines and identify neutralizing antibodies. We identified 405 viral peptides with good antigen presentation scores for both human MHC-I and MHC-II alleles, and two potential neutralizing B-cell epitopes near the 2019-nCoV spike protein receptor binding domain (440-460 and 494-506). Analyzing mutation profiles of 68 viral genomes from four continents, we identified 96 coding-change mutations. These mutations are more likely to occur in regions with good MHC-I presentation scores (p=0.02). No mutations are present near the spike protein receptor binding domain. Based on these findings, the spike protein is likely immunogenic and a potential vaccine candidate. We validated our computational pipeline with SARS-CoV experimental data.

Significance Statement The novel coronavirus 2019-nCoV has affected more than 100 countries and continues to spread. There is an immediate need for effective vaccines that contain antigens which trigger responses from human T-cells and B-cells (known as epitopes). Here we identify potential T-cell epitopes through an analysis of human antigen presentation, as well as B-cell epitopes through an analysis of protein structure. We identify a list of top candidates, including an epitope located on 2019-nCoV spike protein that potentially triggers both T-cell and B-cell responses. Analyzing 68 samples, we observe that viral mutations are more likely to happen in regions with strong antigen presentation, a potential form of immune evasion. Our computational pipeline is validated with experimental data from SARS-CoV.

Coronaviruses (CoV) first became widely known after the emergence of severe acute respiratory syndrome (SARS) in 2002. They are are positive, single-strand RNA viruses that often infect birds and a variety of mammals and sometimes migrate to humans (1). While coronaviruses that infect humans usually present mild symptoms (e.g. 15% of cases involving the common cold are caused by a coronavirus), several previous outbreaks, such as SARS and Middle East respiratory syndrome (MERS) (2), have caused significant fatalities in infected populations (3).

In December 2019, several patients with a connection in Wuhan, China developed symptoms of viral pneumonia. Sequencing of viral RNA determined that these cases were caused by a novel coronavirus named 2019-nCoV or SARS-CoV-2 (4, 5). The virus has infected more than 100,000 people across 100 countries as of March 7th 2020 according to a World Health Organization report (6). COVID-19, the disease caused by 2019-nCoV, has led to the deaths of thousands of patients and is clearly transmissible between humans (3).

The deployment of a vaccine against 2019-nCoV would arrest its infection rate and help protect vulnerable populations. However, no vaccine has yet passed beyond clinical trails for any coronavirus, whether in humans or other animals (7, 8). In animal models, inactivated whole virus vaccines without focused immune epitopes offer incomplete protection (8). Vaccines for SARS-CoV and MERS-CoV have also resulted in hypersensitivity and immunopathologic responses in mice (7, 9). The diversity among coronavirus strains presents another challenge (1, 9), as there are at least 6 distinct sub-groups in the coronavirus genus. And while SARS-CoV and 2019-nCoV share the same subgroup (2b), their genomes are only 77% identical (4). Understanding the shared and unique epitopes of 2019-nCoV will help the field design better vaccines or diagnostic tests for patients.

Both humoral immunity provided by B-cell antibodies and cellular immunity provided by T-cells are essential for effective vaccines (10, 11). Though humans can usually mount an antibody response against viruses, only neutralizing antibodies can fully block the entry of viruses into human cells (12). The location of antibody binding sites on a viral protein strongly affects the body’s ability to produce neutralizing antibodies; for example, HIV has buried surface proteins that protect themselves from human antibodies (13). For this reason, it would be valuable to understand whether 2019-nCoV has potential antibody binding sites (B-cell epitopes) near their interacting surface with its known human entry receptor Angiotensin-Converting Enzyme 2 (ACE2).

Beyond neutralizing antibodies, human bodies also rely upon cytotoxic CD8 T-cells and helper CD4 T-cells to fully clear viruses. The presentation of viral peptides by human Major Histocompatibility Complex (MHC or HLA) class I and class II is essential for anti-viral T-cell responses (14). In contrast to B-cell epitopes, T-cell epitopes can be located anywhere in a viral protein since human cells can process and present both intracellular and extracellular viral peptides (15).

These immunology principles guide the design of our current study (Fig. 1). First we focus on T-cell epitopes. 2019-nCoV carries 4 major structural proteins—spike (S), membrane (M), envelope (E) and nucleocapsid (N)—and at least 6 other open reading frames (ORFs) (1, 16). All protein fragments have the potential to be presented by MHC-I or MHC-II and recognized by T-cells. We apply NetMHCpan4 (17) and MARIA (15), two artificial neural network algorithms, to predict antigen presentation and identify potential T-cell epitopes.

Fig. 1.
  • Download figure
  • Open in new tab
Fig. 1. Computational pipeline to identify epitopes in 2019-nCoV.

The 2019-nCoV genome codes for Spike (S), Membrane (M), Envelope (E), Nucleocapsid (N) and at least 6 other open reading frames (ORFs). The S protein is a target for antibodies and we model its 3D structure with Discotope2 to identify likely binding sites (B-cell epitopes). We scan peptide sequences from individual viral proteins with NetMHCPan4 and MARIA for MHC-I and MHC-II presentation to identify potential T-cell epitopes. Our MHC-I analysis include common alleles for HLA-A, HLA-B, and HLA-C, and our MHC-II analysis include common alleles of HLA-DR.

Next we focus on B-cell epitopes. The spike (S) protein is the main trans-membrane glycoprotein expressed on the surface of 2019-nCoV and is responsible for receptor binding and virion entry to cells (1, 18). We narrow our B-cell epitope search to S protein since it is the most likely target of human neutralizing antibodies. We first perform homology modeling to estimate the full 2019-nCoV spike protein structure and compare our results to partially solved spike protein structures (18–20). Discopte2 uses a combination of amino acid sequences and protein surface properties to predict antibody binding sites given a protein structure (21). We use this tool to identify potential 2019-nCoV B-cell epitopes as the co-crystal structure of 2019-nCoV’s spike protein and antibody is not yet available.

Finally, we examine the viral mutation pattern. Global sequencing efforts (22) have enabled us to obtain a cohort of 68 2019-nCoV genomes from four continents. Based on the resulting viral mutation profile, we examine whether mutations are driven by the immune selection pressure. Highly mutated regions might be excluded from the vaccine candidate pool.

Results

T-cell epitopes of 2019-nCoV based on MHC presentation

We scanned 11 2019-nCoV genes with NetMHCpan4 and MARIA to identify regions highly presentable by HLA complexes (human MHC). A epitope is labeled positive if presentable by more than one third of common alleles among the Chinese population. We display a summary of our T-cell epitope findings in Table 1. In Fig. 2 we show regions of strong MHC presentation across HLA-A, HLA-B, HLA-C, and HLA-DR alleles, both at a medium threshold (Fig. 2A), and at a high threshold (Fig. 2B). Consistent with the similar finding from SARS-CoV, proteins ORF1AB, S, and E contain high numbers of presentable antigen sequences across both MHC-I and MHC-II (8). These highly presentable peptides on functional viral proteins provide a candidate pool for T-cell epitope identifications and epitope-focused vaccine design. The presentation scores of individual antigen peptides can be found in SI Appendix, Supplementary Tables 5 and 6.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1.

Summary of potential 2019-nCoV T-cell epitope candidates based on HLA antigen presentation scores. The HLA-A, HLA-B, HLA-C, and HLA-DR columns refer to the number of peptide sequences that were predicted to present across more than one third of common alleles among the Chinese population. Length refers to the number of amino acids in a given gene. HLA-C are more likely to present 2019-nCoV antigens across one third of alleles compared to HLA-A or B. Fisher’s exact test P-value levels are indicated with *** (< 0.001), ** (< 0.01), and * (< 0.05).

Fig. 2.
  • Download figure
  • Open in new tab
Fig. 2. MHC presentation regions for 11 2019-nCoV genes across HLA-A, HLA-B, HLA-C, and HLA-DR.

We plot results at both medium (A) and high (B) cut-offs for presentation. The X-axis indicates the position of each viral protein, and the y-axis indicates each human HLA gene family. Each blue stripe indicates the fraction of HLA alleles that can present a 9mer or 15mer viral peptide starting from a given position. A peptide being presented by more than one third of common alleles (red) is considered to be high coverage.

NetMHCpan4 predicts that a larger number of 2019-nCoV protein sequences can be presented by HLA-C alleles across all genes (Fisher’s exact test, p = 4.2e−59) compared to HLA-A or HLA-B (Table 1). Specifically, ORF1ab (p = 6.7e−54), S (p = 1.4e−9), ORF3a (p = 0.01), and M (p = 0.025) can be better presented by HLA-C alleles after Bonferroni correction.

T-cell epitope validation

To estimate the ability of antigen presentation scores to identify 2019-nCoV T-cell epitopes, we performed a validation study with known SARS-CoV CD8 and CD4 T-cell epitopes (Fig. 3). From 4 independent experimental studies (23–26), we identified known CD8 T-cell epitopes (MHC-I, n=17) and known CD4 T-cell epitopes (MHC-II, n=3). We also obtained known non-epitopes (n=1236 and 246) for CD4, while for CD8 we include all 9mer sliding windows of S protein without a known epitope as non-epitopes. MHC-I presentation (with recommended cut-off 98% (17)) has a sensitivity of 82.3%, specificity of 97.0% and an AUC of 0.98 (Fig. 3A). MHC-II presentation (recommended cut-off 95% (15)) has a sensitivity of 66.6%, specificity of 91.1% and an AUC of 0.83 (Fig. 3B).

Fig. 3.
  • Download figure
  • Open in new tab
Fig. 3. Validation of NetMHCpan and MARIA for T-cell epitope identification with SARS-CoV S protein epitopes.

NetMHCpan4 and MARIA presentation percentiles were used to differentiate experimentally identified CD8 T-cell epitopes (MHC-I, n=17)(23–25) and CD4 T-cell epitopes (MHC-II, n=3)(26) from negative peptides (n=1236 and 246). Negative peptides for CD8 T-cell epitopes are all 9mer sliding windows of S protein without a reported epitope. Negative peptides for CD4 T-cell epitopes are experimentally tested. (A) NetMHCpan4 has a sensitivity of 82.3% and specificity of 97.0% with a 98th presentation percentile cut-off, and an overall AUC of 0.98. (B) MARIA has a sensitivity of 66.6% and specificity of 91.1% with a 95th presentation percentile cut-off, and an overall AUC of 0.83.

Given the high similarity between SARS-CoV and 2019-nCoV proteins, we expect our analysis on 2019-nCoV to perform similarly against future experimentally validated T-cell epitopes. The detailed scores and sequences of this analysis can be found in SI Appendix, Supplementary Tables 7 and 8.

Viral mutation and antigen presentation

Human immune selection pressure has been shown to drive viral mutations which evade immune surveillance (e.g. low MHC presentation). We hypothesize that a similar phenomenon can occur in 2019-nCoV. We curated a cohort of 68 viral genomes across four continents and identified 93 point mutations, 2 nonsense mutation and 1 deletion mutation compared to the published reference genome (5). We plot point mutations against regions of MHC-I or MHC-II presentation in Fig. 4. The full protein sequence and mutation information can be found in SI Appendix, Dataset S2 and S3.

Fig. 4.
  • Download figure
  • Open in new tab
Fig. 4. 2019-nCoV point mutations correlates with MHC-I presentation regions.

93 point mutations at 59 unique positions (red triangles) are present in a cohort of 68 2019-nCoV cohort. Each row represents a viral protein and the shades of blue indicate whether MHC-I or MHC-II or both can present an antigen from this region. A region is labeled as presented by MHC-I or MHC-II if more than one third of the corresponding alleles can present the peptide sequences. Mutations are more likely be located in regions with MHC-I presentation (p = 0.02). No relationship was observed with MHC-II (p = 0.58) or MHC-I and MHC-II double positive region (dark blue, p = 0.14).

Mutations occur in most genes with the exception of E, ORF6, ORF7b, and N. We identify a recurrent mutation L84S in ORF8 in 20 samples. Mutations are more likely to occur in regions with MHC-I presentation (Fischer’s exact test, p = 0.02). No relationship was observed with MHC-II (p = 0.58) or an aggregate of MHC-I and MHC-II (p = 0.14). This is consistent with the role of CD8 T-cell in clearing infected cells and mutations supporting the evasion of immune surveillance (27).

We identified two nonsense mutations (ORF1AB T15372A and ORF7A C184T) in two viral samples from Shenzhen, China (EPI_ISL_406592 and EPI_ISL_406594). ORF7A C184U causes an immediate stop gain and a truncation of the last 60 amino acids from the ORF7A gene, which can be presented by multiple MHC alleles as indicated in our heatmap (Fig. 4). In a sample from Japan (EPI_ISL_407084), the viral gene ORF1AB loses nucleotides 94-118, which code for GDSVEEVL. An associated antigen FGDSVEEVL is predicted to be good presented by multiple MHC-I alleles (e.g. 99.9% for HLA-C*08:01, 99.2% for HLA-C*01:02, and 99.0% for HLA-B*39:01).

In summary, we observed that 2019-nCoV mutations often occur among or delete presentable peptides, a potential form of immune evasion. Patient MHC allele information and T-cell profiling are needed to better assess this phenomenon. However, patients typically carry fewer than 3 mutations, so a vaccine against multiple epitopes will not be nullified by such a small number of mutations.

Potential B-cell epitopes of 2019-nCoV

We obtained the likely 3D structure of the full 2019-nCoV’s spike protein (SI Appendix, Dataset S4 and S5) by homology modeling with SARS-CoV’s spike protein (31). Given our understanding of SARS (32), the spike protein has two major conformations: up and down. The down position is more stable but less accessible to human entry receptors (e.g. ACE2). The receptor-binding domain (RBD) of the protein undergoes hinge-like conformational change to enter the up conformation, which is less stable but more accessible to human receptors (31). We compared our homology modeled structures to a later solved Cryo-EM structure (18) and found high similarity (Appendix SI, Fig. S1). Notably, the Cryo-EM solved structure has missing residues in the RBD due to limited electron density. Using modeled structures overcomes this issue.

We scanned spike protein structures with Discotope2 to identify potential antibody binding sites on the protein surface (Figs. 5 and 6). For the SARS-CoV S protein down conformation, we identified a strong cluster of antibody sites (>30 residues, pink or red) on the receptor binding domain (RBD, the ACE2 interacting surface). Three independent studies (28–30) discovered B-cell epitopes in this region (blue) with a combination of in vitro and ex vivo approaches. Additionally, Discotope2 identifies residue 541-555 as another binding site, which was supported by two independent studies (Fig. 5A, blue) (29, 30). This gives us some confidence in the ability of Discotope2 to predict B-cell epitopes if given a unknown protein structure.

Fig. 5.
  • Download figure
  • Open in new tab
Fig. 5. Predicted B-cell epitopes on SARS-CoV and 2019-nCoV spike (S) protein “down” position.

3D structures of both spike proteins in “down” position were scanned with Discotope2 to assess potential antibody binding sites (B-cell epitopes). Red indicates residues with score > 0.5 (cut-off for specificity of 90%), and pink indicates residues with score > −2.5 (cut-off for specificity of 80%). (A) The receptor binding domain (RBD) of SARS spike protein has concentrated high score residues and is comprised of two linear epitopes. One additional binding site is predicted to be around residue 541-555. Three independent experimental studies (28–30) identify patient antibodies recognize these three linear epitopes (blue). (B) Modeled 2019-nCoV spike protein is predicted to have a similar strong antibody binding site near RBD and an additional site around residue 246-257. (C) The predicted antibody binding site on 2019-nCov RBD potentially overlaps with the interacting surface for the known human entry receptor ACE2 (yellow). (D) Eight 2019-nCoV spike protein point mutations are present in a cohort of 68 viral samples. No mutations are near the ACE2 interacting surface. One mutation (S247R) occurs near one of the predicted antibody binding sites (residue 246-257).

Fig. 6.
  • Download figure
  • Open in new tab
Fig. 6. Predicted B-cell epitopes on 2019-nCoV spike (S) protein in up conformation.

We scanned 3D structures of modeled (A) and experimentally solved (B) 2019-nCoV S protein in the up position with Discotope2 to assess potential antibody binding sites (B-cell epitopes). Red indicates residues with a score greater than 0.5 (with a cut-off for specificity of 90%), and pink indicates residues with score > −2.5 (with a cut-off for specificity of 80%). (A) Discotope2 identified multiple B-cell epitopes on the modeled 2019-nCov S protein. The top candidates cluster around the receptor binding domain (RBD). (B) The RBD B-cell epitotope cluster is comprised of two linear epitopes (440-460, 494-506). (C,D) Discotope2 identified highly similar B-cell epitopes with solved 2019-nCoV co-crystal structures from two independent research groups (19, 20). Predicted antibody binding sites overlap with the interacting surface for the known human entry receptor ACE2 (yellow).

For the 2019-nCoV S protein down conformation, Discotope2 identified a similar antibody binding site on the S protein potential RBD, but with fewer residues (17 residues, Fig. 5B). Residues 440-460 and 494-506 located at the top of the RBD show strong antibody binding properties (>0.5 raw scores). Discotope2 also identified another antibody binding site (residue 246-257) different from SARS-CoV but with lower scores (>-2.5 raw scores).

For the 2019-nCoV S protein up conformation, Discotope2 identified similar antibody binding sites compared to down conformation (440-460 and 494-506, Fig. 6). We wanted to explore whether these two linear epitopes are critical for viral interaction with the human entry protein ACE2. After our original analysis, two solved co-crystal structures of 2019-nCoV S protein RBD and human ACE2 become available (19, 20). We ran Discotope2 and obtained strikingly similar results (Fig. 6C and D). The main antibody binding site substantially overlaps with the interacting surface where ACE2 (yellow) binds to S protein, so an antibody binding to this surface is likely to block viral entry into cells.

Mutations in key antibody binding sites can help viruses evade the human immune system. We identified 8 point mutation sites on S protein from a cohort of 68 2019-nCoV samples (Fig. 5D). All mutations are distant from the S protein RBD, and one mutation (S247R) occurs near one of the predicted minor antibody binding site (residue 246-257).

In summary, we observed major structural and B-cell epitope similarity between SARS-CoV and 2019-nCoV spike proteins. RBDs in both proteins seem to be important B-cell epitopes for neutralizing antibodies, however, SARS-CoV appears to have larger attack surface than 2019-nCoV (Fig. 5A, 5B). Fortunately, we have not observed any mutations altering binding sites for this major B-cell epitope in 2019-nCoV.

Promising candidates for 2019-nCoV vaccines

After understanding the landscape of T-cell and B-cell epitopes of 2019-nCoV, we summarize our results in Table 2 for promising candidates for 2019-nCoV vaccines. Ideal antigens should be presented by multiple MHC-I and MHC-II alleles in a general population and contain linear B-cell epitopes associated with neutralizing antibodies. Specifically, we first identified regions with high coverage of MHC-II, then ranked them based on their MHC-I coverage. We further search these top candidates with 90% sequence similarity on IEDB (33) to assess whether any candidate has been previously reported for being a B-cell epitope or T-cell epitope.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2.

Top potential epitopes for key 2019-nCoV proteins. We ranked epitopes based on their likely coverage of presentation by MHC-I and MHC-II alleles. S protein 494-508 is highly ranked based on MHC presentation and is also one of the predicted top B-cell epitopes, localized near the S protein receptor binding domain (Fig. 5). MHC-I coverage is calculated by the 9mer with the highest MHC-I coverage for each epitope (highlighted in orange). All candidates are likely to be presented by both MHC-I and MHC-II. “SARS” under the antibody column indicates that one or more SARS homolog of this peptide is a known B-cell epitope.

Around 1000 peptide 15mers can be presented by a wide range of MHC-II (>33%), which is consistent with our knowledge about promiscuity of MHC-II presentation (15, 34). When further requiring a peptide to be presented by more than 33% or 66% of common MHC-I alleles, we narrowed the results to 405 or 44 promising T-cell epitopes, respectively (SI Appendix, Supplementary Tables 5 and 6). The top 13 candidates in key viral proteins can be presented by 52-83% of MHC-I alleles (Table 2). Most notably, S protein 494-508 is not only a good MHC presenter but also a predicted B-cell epitope near the S protein receptor binding domain (Fig. 5B), which indicates a good vaccine candidate.

When searching these 13 candidates on IEDB, we observed several of their homologous peptides in SARS are T-cell and B-cell epitopes. SARS peptide SIVAYTMSL is similar to 2019-nCoV peptide SQSIIAYMSLGAEN and elicits T-cell responses in chromium cytotoxicity assays (35). Previously discovered SARS B-cell epitopes LMSFPQAAPHGVVFLHV (36), ASFRLFARTRSMWSF(37), KRTATKQYNVTQAF-GRR (38), and SRVKNLNSSEGVPDLLV (39) share high homology with 2019-nCoV S protein 1056-1060, M 97-111, N 264-278, and E 52-66. These search results support validity of our computational pipeline.

Discussion

In summary, we analyzed the 2019-nCoV viral genome for epitope candidates and found 405 likely T-Cell epitopes, with strong MHC-I and MHC-II presentation scores, as well as 2 potential neutralizing B-Cell epitopes on the S protein. We validated our methodology by running a similar analysis on SARS-CoV against known SARS T-cell and B-cell epitopes. Analyzing viral mutations across 68 patient samples we found that they have a tendency to occur in regions with strong MHC-I presentation, a possible form of immune evasion. However, no mutations are present near the spike protein receptor binding domain, which is critical for viral entrance to cells and neutralizing antibody binding.

We observed that significant portions of 2019-nCoV viral protein can be presented by multiple HLA-C alleles. This might be due to relatively conserved HLA-C binding motif on the 9th position (the main anchor residue) where Leucine, Methionine, Phenylalanine, Tyrosine are commonly favored across alleles (40). HLA-C*07:02, HLA-C*01:02, and HLA-C*06:02 all have >10% allele frequency in the Chinese population (41). This suggests the future epitope-based vaccines should include HLA-C related epitopes beyond the common HLA-A*02:01 and HLA-B*40:01 approach.

Similar to SARS-CoV (1, 8), the 2019-nCoV spike protein is likely to be immunogenic as it carries a number of both T-cell and B-cell epitopes. Recombinant surface protein vaccines have demonstrated strong clinical utilities in hepatitis B and human papillomavirus (HPV) vaccines (11, 42). A recombinant 2019-nCoV spike protein vaccine combined with other top epitopes and an appropriate adjuvant could be a reasonable next step for 2019-nCoV vaccine development. Recently, two biotech companies Moderna and Inovio have announced early clinical trials for spike protein based vaccines.

From an economic perspective, the development of coronavirus vaccines faces significant financial hurdles as most such viruses do not cause endemic infections. After an epidemic episode (e.g. SARS) there is little financial incentive for private companies to develop vaccines against a specific strain of virus (2, 8, 10). To address these incentive issues, governments might provide greater funding support to create vaccines against past and future outbreaks.

Our pipeline provides a framework to identify strong epitope-based vaccine candidates and might be applied against any unknown pathogens. We were able to model structures of 2019-nCoV spike protein as soon as the DNA sequence became available, and the modeled structure has high similarity with later solved crystal structures (SI Appendix, Fig. S1). This suggests the researchers can rely on homology remodeling when emerging pathogens first appear if the new pathogen shares high sequence similarity with an existing pathogen.

Previous animal studies show antigens with high MHC presentation scores are more likely to elicit strong T-cell responses, but the correlation between these responses and vaccine efficacy is relatively weak (7, 14, 43). When combined with future clinical data, our work can help the field untangle the relationship between antigen presentation and vaccine efficacy.

Materials and Methods

We obtained 2019-nCoV (SARS-CoV-2019) and SARS-CoV reference sequence data from NCBI GeneBank (NC_045512 and NC_004718) (4, 16). We obtained viral sequences associated with 68 patients from GISAID on Feb 1st 2020.

We broke each gene sequence in 2019-nCoV into sliding windows and used NetMHCpan4 (17) and MARIA (15) to predict MHC-I and MHC-II presentation scores across 32 MHC alleles common in the Chinese population (41). We marked a peptide sequence as covered by MHC-I and MHC-II if it is presented by more then 33% of constituent alleles. For validation, we applied our methodology to known SARS T-cell epitopes and non-epitopes.

We predicted likely human antibody binding sites (B-cell epitopes) on SARS and 2019-nCoV S protein with Disctope2 (21) after first obtaining an approximate 3D structure of 2019-nCoV S protein by homology modeling SARS S protein (PDB: 6ACC) with SWISS-MODEL (31, 44). To validate our B-cell epitope predictions, we compared our top 3 B-cell epitopes for SARS S protein with previously experimentally identified epitopes (28–30).

For mutation identification, we translated viral genome sequences from patients into protein sequences and compared them with the reference sequence to identify mutations with edit distance analysis. We compared positions of point mutations with MHC-I or MHC-II presentable regions in the protein with Fisher’s exact test.

ACKNOWLEDGMENTS

We thank Stanford Schor and Allen Lin for the discussion about coronaviruses. We thank all labs contributing to the global efforts of sequencing 2019-nCoV samples and we attached the full acknowledgment list in SI Appendix, Supplementary Table 1.

Footnotes

  • B.C. and E.F. designed the study and implemented the analysis pipeline, and drafted the manuscript. R.A. provided insights into structural biology and reviewed the manuscript.

  • Authors have no competing interests.

  • We added the Figure 6 and updated Figure 5 as new structural data of 2019-nCoV becomes available.

References

  1. 1.↵
    S Perlman, J Netland, Coronaviruses post-sars: update on replication and pathogenesis. Nat. reviews microbiology 7, 439–450 (2009).
    OpenUrl
  2. 2.↵
    WMCR Group,, et al., State of knowledge and data gaps of middle east respiratory syndrome coronavirus (mers-cov) in humans. PLoS currents 5 (2013).
  3. 3.↵
    C Huang, et al., Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The Lancet (2020).
  4. 4.↵
    F Wu, et al., A new coronavirus associated with human respiratory disease in china. Nature, 1–8 (2020).
  5. 5.↵
    P Zhou, et al., A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 1–4 (2020).
  6. 6.↵
    World Health Organization report (https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200216-sitrep-27-covid-19.pdf) (Accessed February 2020).
  7. 7.↵
    CY Yong, HK Ong, SK Yeap, KL Ho, WS Tan, Recent advances in the vaccine development against middle east respiratory syndrome-coronavirus. Front. Microbiol. 10, 1781 (2019).
    OpenUrlCrossRef
  8. 8.↵
    CM Coleman, et al., Purified coronavirus spike protein nanoparticles induce coronavirus neutralizing antibodies in mice. Vaccine 32, 3169–3174 (2014).
    OpenUrlCrossRefPubMed
  9. 9.↵
    CT Tseng, et al., Immunization with sars coronavirus vaccines leads to pulmonary immunopathology on challenge with the sars virus. PloS one 7 (2012).
  10. 10.↵
    R Rappuoli, S Black, DE Bloom, Vaccines and global health: In search of a sustainable model for vaccine development and delivery. Sci. Transl. Medicine 11, eaaw2888 (2019).
    OpenUrl
  11. 11.↵
    SE Olsson, et al., Induction of immune memory following administration of a prophylactic quadrivalent human papillomavirus (hpv) types 6/11/16/18 l1 virus-like particle (vlp) vaccine. Vaccine 25, 4931–4939 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  12. 12.↵
    D Suarez, S Schultz-Cherry, Immunology of avian influenza virus: a review. Dev. & Comp. Immunol. 24, 269–283 (2000).
    OpenUrlCrossRefPubMedWeb of Science
  13. 13.↵
    B Briney, et al., Tailored immunogens direct affinity maturation toward hiv neutralizing antibodies. Cell 166, 1459–1470 (2016).
    OpenUrlCrossRefPubMed
  14. 14.↵
    SR Pedersen, et al., Immunogenicity of hla class i and ii double restricted influenza a-derived peptides. PloS one 11 (2016).
  15. 15.↵
    B Chen, et al., Predicting hla class ii antigen presentation through integrated deep learning. Nat. biotechnology 37, 1332–1343 (2019).
    OpenUrl
  16. 16.↵
    R Lu, et al., Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet (2020).
  17. 17.↵
    V Jurtz, et al., Netmhcpan-4.0: improved peptide–mhc class i interaction predictions integrating eluted ligand and peptide binding affinity data. The J. Immunol. 199, 3360–3368 (2017).
    OpenUrl
  18. 18.↵
    D Wrapp, et al., Cryo-em structure of the 2019-ncov spike in the prefusion conformation. Science (2020).
  19. 19.↵
    R Yan, Y Zhang, Y Guo, L Xia, Q Zhou, Structural basis for the recognition of the 2019-ncov by human ace2. bioRxiv (2020).
  20. 20.↵
    J Lan, et al., Crystal structure of the 2019-ncov spike receptor-binding domain bound with the ace2 receptor. bioRxiv (2020).
  21. 21.↵
    JV Kringelum, C Lundegaard, O Lund, M Nielsen, Reliable b cell epitope predictions: impacts of method development and improved benchmarking. PLoS computational biology 8 (2012).
  22. 22.↵
    Y Shu, J McCauley, Gisaid: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance 22 (2017).
  23. 23.↵
    H Chen, et al., Response of memory cd8+ t cells to severe acute respiratory syndrome (sars) coronavirus in recovered sars patients and healthy individuals. The J. Immunol. 175, 591–598 (2005).
    OpenUrl
  24. 24.
    M Zhou, et al., Screening and identification of severe acute respiratory syndrome-associated coronavirus-specific ctl epitopes. The J. Immunol. 177, 2138–2145 (2006).
    OpenUrl
  25. 25.↵
    YP Tsao, et al., Hla-a* 0201 t-cell epitopes in severe acute respiratory syndrome (sars) coronavirus nucleocapsid and spike proteins. Biochem. biophysical research communications 344, 63–71 (2006).
    OpenUrl
  26. 26.↵
    OW Ng, et al., Memory t cell responses targeting the sars coronavirus persist up to 11 years post-infection. Vaccine 34, 2008–2014 (2016).
    OpenUrl
  27. 27.↵
    R Channappanavar, C Fett, J Zhao, DK Meyerholz, S Perlman, Virus-specific memory cd8 t cells provide substantial protection from lethal severe acute respiratory syndrome coronavirus infection. J. virology 88, 11034–11044 (2014).
    OpenUrlAbstract/FREE Full Text
  28. 28.↵
    R Hua, Y Zhou, Y Wang, Y Hua, G Tong, Identification of two antigenic epitopes on sars-cov spike protein. Biochem. biophysical research communications 319, 929–935 (2004).
    OpenUrl
  29. 29.↵
    H Yu, et al., Selection of sars-coronavirus-specific b cell epitopes by phage peptide library screening and evaluation of the immunological effect of epitope-based peptides on mice. Virology 359, 264–274 (2007).
    OpenUrlPubMed
  30. 30.↵
    JP Guo, M Petric, W Campbell, PL McGeer, Sars corona virus peptides recognized by antibodies in the sera of convalescent cases. Virology 324, 251–256 (2004).
    OpenUrlCrossRefPubMedWeb of Science
  31. 31.↵
    W Song, M Gui, X Wang, Y Xiang, Cryo-em structure of the sars coronavirus spike glycoprotein in complex with its host cell receptor ace2. PLoS pathogens 14, e1007236 (2018).
    OpenUrlCrossRefPubMed
  32. 32.↵
    TJ Yang, et al., Cryo-em analysis of a feline coronavirus spike protein reveals a unique structure and camouflaging glycans. Proc. Natl. Acad. Sci. (2020).
  33. 33.↵
    R Vita, et al., The immune epitope database (iedb) 3.0. Nucleic acids research 43, D405–D412 (2015).
    OpenUrlCrossRefPubMed
  34. 34.↵
    Y Hu, et al., Immunologic hierarchy, class ii mhc promiscuity, and epitope spreading of a melanoma helper peptide vaccine. Cancer Immunol. Immunother. 63, 779–786 (2014).
    OpenUrlCrossRefPubMed
  35. 35.↵
    Y Lv, Z Ruan, L Wang, B Ni, Y Wu, Identification of a novel conserved hla-a* 0201-restricted epitope from the spike protein of sars-cov. BMC immunology 10, 61 (2009).
    OpenUrl
  36. 36.↵
    Y He, et al., Identification of immunodominant sites on the spike protein of severe acute respiratory syndrome (sars) coronavirus: implication for developing sars diagnostics and vaccines. The J. Immunol. 173, 4050–4057 (2004).
    OpenUrl
  37. 37.↵
    Y He, Y Zhou, P Siddiqui, J Niu, S Jiang, Identification of immunodominant epitopes on the membrane protein of the severe acute respiratory syndrome-associated coronavirus. J. clinical microbiology 43, 3718–3726 (2005).
    OpenUrlAbstract/FREE Full Text
  38. 38.↵
    Y He, et al., Mapping of antigenic sites on the nucleocapsid protein of the severe acute respiratory syndrome coronavirus. J. clinical microbiology 42, 5309–5314 (2004).
    OpenUrlAbstract/FREE Full Text
  39. 39.↵
    SC Chow, et al., Specific epitopes of the structural and hypothetical proteins elicit variable humoral responses in sars patients. J. clinical pathology 59, 468–476 (2006).
    OpenUrlAbstract/FREE Full Text
  40. 40.↵
    M Rasmussen, et al., Uncovering the peptide-binding specificities of hla-c: a general strategy to determine the specificity of any mhc class i molecule. The J. Immunol. 193, 4790–4802 (2014).
    OpenUrl
  41. 41.↵
    F Zhou, et al., Deep sequencing of the mhc region in the chinese population contributes to studies of complex disease. Nat. genetics 48, 740–746 (2016).
    OpenUrlCrossRefPubMed
  42. 42.↵
    K Tajiri, et al., Analysis of the epitope and neutralizing capacity of human monoclonal antibodies induced by hepatitis b vaccine. Antivir. research 87, 40–49 (2010).
    OpenUrl
  43. 43.↵
    P Bettencourt, et al., Identification of antigens presented by mhc for vaccines against tuberculosis. npj Vaccines 5, 1–14 (2020).
    OpenUrl
  44. 44.↵
    T Schwede, J Kopp, N Guex, MC Peitsch, Swiss-model: an automated protein homology-modeling server. Nucleic acids research 31, 3381–3385 (2003).
    OpenUrlCrossRefPubMedWeb of Science
Back to top
PreviousNext
Posted March 18, 2020.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Potential T-cell and B-cell Epitopes of 2019-nCoV
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Potential T-cell and B-cell Epitopes of 2019-nCoV
Ethan Fast, Russ B. Altman, Binbin Chen
bioRxiv 2020.02.19.955484; doi: https://doi.org/10.1101/2020.02.19.955484
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Potential T-cell and B-cell Epitopes of 2019-nCoV
Ethan Fast, Russ B. Altman, Binbin Chen
bioRxiv 2020.02.19.955484; doi: https://doi.org/10.1101/2020.02.19.955484

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Microbiology
Subject Areas
All Articles
  • Animal Behavior and Cognition (3586)
  • Biochemistry (7545)
  • Bioengineering (5495)
  • Bioinformatics (20732)
  • Biophysics (10294)
  • Cancer Biology (7951)
  • Cell Biology (11610)
  • Clinical Trials (138)
  • Developmental Biology (6586)
  • Ecology (10168)
  • Epidemiology (2065)
  • Evolutionary Biology (13578)
  • Genetics (9520)
  • Genomics (12817)
  • Immunology (7906)
  • Microbiology (19503)
  • Molecular Biology (7641)
  • Neuroscience (41982)
  • Paleontology (307)
  • Pathology (1254)
  • Pharmacology and Toxicology (2192)
  • Physiology (3259)
  • Plant Biology (7018)
  • Scientific Communication and Education (1293)
  • Synthetic Biology (1947)
  • Systems Biology (5418)
  • Zoology (1113)