A Multiple Peptides Vaccine against nCOVID-19 Designed from the Nucleocapsid phosphoprotein (N) and Spike Glycoprotein (S) via the Immunoinformatics Approach

Due to the current COVID-19 pandemic, the rapid discovery of a safe and effective vaccine is an essential issue, consequently, this study aims to predict potential COVID-19 peptide-based vaccine utilizing the Nucleocapsid phosphoprotein (N) and Spike Glycoprotein (S) via the Immunoinformatics approach. To achieve this goal, several Immune Epitope Database (IEDB) tools, molecular docking, and safety prediction servers were used. According to the results, The Spike peptide peptides SQCVNLTTRTQLPPAYTNSFTRGVY is predicted to have the highest binding affinity to the B-Cells. The Spike peptide FTISVTTEI has the highest binding affinity to the MHC I HLA-B1503 allele. The Nucleocapsid peptides KTFPPTEPK and RWYFYYLGTGPEAGL have the highest binding affinity to the MHC I HLA-A0202 allele and the three MHC II alleles HLA-DPA1*01:03/DPB1*02:01, HLA-DQA1*01:02/DQB1- *06:02, HLA-DRB1, respectively. Furthermore, those peptides were predicted as non-toxic and non-allergen. Therefore, the combination of those peptides is predicted to stimulate better immunological responses with respectable safety.

The SARS-CoV-2 is a novel strain detected firstly in the city of Wuhan, the Republic of China in December 2019 [2]. It causes fever, cough, dyspnea, bilateral infiltrates on chest imaging and may progress to Pneumonia [3]. The COVID-19 is characterized by rapid spreading; "As the 27 Feb, it is reported in 47 countries, causing over 82,294 infections with 2,804 deaths" [4] and till the 5th May, more than 3. 5 million positive cases and 0.25 million deaths have been identified globally [5]. Unfortunately, until now COVID-19 has no effective antiviral drug for the treatment or vaccine for the prevention, hence extensive researches should be conducted on the development of safe and effective vaccines and antiviral drugs [4].
To develop a safe and effective COVID-19 vaccine rapidly, the WHO recommended that "we must test all candidate vaccines until they fail to ensure that all of them have the chance of being tested at the initial stage of development". Ensuing this point, recently, there are over 120 proposed vaccines. Six of them in the clinical evaluation and 70 in pre-clinical evaluation [6]. The vaccine development is achieved by multiple approaches including the Inactivated, Live-attenuated, Non-replicating viral vector, DNA, RNA, Recombinant proteins, and Peptide-based vaccines. "As of 8 April 2020, the global COVID-19 vaccine R&D landscape confirmed 78 active candidate vaccine" [7].
Consistent with global efforts, this study aims to predict potential COVID-19 Peptide-based vaccine utilizing the Nucleocapsid phosphoprotein (N) and Spike Glycoprotein (S) via the Immunoinformatics approach. Due to the respectable antigenicity of the Nucleocapsid and Spike Glycoprotein, they are appropriate targets for vaccine design [8]. The peptide vaccines are sufficient to stimulate cellular and humoral immunity without allergic responses [9]. They are" safe, simply produced, stable, reproducible, cost-effective" [10], and permits a broad spectrum of immunity [9], consequently, they are the targets for this study as well as they utilized in multiple studies concerning COVID-19 vaccine [11][12][13][14].

Protein Sequence Retrieval
A total of 100 COVID-19 Nucleocapsid phosphoprotein (N) and Spike Glycoprotein (S) were retrieved from the National Center for Biotechnology Information (NCBI) database [15] as FASTA format in March 2020. The sequences with their accession number are listed in the Supplementary file S1.

Multiple Sequences Alignment
The retrieved COVID-19 Nucleocapsid phosphoprotein (N) and Spike Glycoprotein (S) sequences were aligned using the ClustalW algorithm [17] on the BioEdit software version 7.2.5 [18] to identify the conserved regions between sequences.

B-Cells Peptides Prediction
The B-Cells peptides were predicted from the conserved regions using the linear Epitope Prediction tool "BepiPred-test" on the Immune Epitope Database (IEDB) [19] at the default threshold value -0.500. To predict the epitopes accurately, a combination between the hidden (Parker and Levitt) method and the Markov model (HMM) [20] was used.

The Surface Accessibility Prediction
The Surface Accessibility of B-Cells Peptides was predicted via the Emini Surface Accessibility tool [21] on the IEDB [19] at the default threshold holding value.

The Antigenic Sites Prediction
To identify the antigenic sites within the Nucleocapsid phosphoprotein and Spike Glycoprotein, the Kolaskar and Tongaonker method's on the IEDB [19] at the default threshold value.

T-Cell Peptides Prediction
To predict the interaction with different MHC I alleles, the Major Histocompatibility Complex class I (MHC I) binding prediction tool on the IEDB [19] was used. All peptide length was set as 9amino acid. To predict the binding affinity, the Artificial Neural Network (ANN) prediction method was selected with a half-maximal inhibitory concentration (IC50) value of less than 100.
In contrast, to predict the interaction with different MHC II alleles, The Major Histocompatibility Complex class II (MHC II) binding prediction was used. To predict the binding affinity, the NN align algorithm was selected with an IC50 value of less than 500.
The Human allele reference sets (HLA DR, DP, and DQ) were included in the prediction.

The Population Coverage Prediction
To predict the percentage of peptides binding with various MHC I and MHC II alleles that cover the world population, the population coverage tool on the IEDB [19] was used.

3D Structure Modeling and Visualization
To model the 3D structure of the Nucleocapsid, Spike, and MHC molecules, the SWISS-MODEL server [25], and the Phyre2 web portal [26] were used. To visualize the modeled structures, the USCF Chimera 1.8 software [27] was used.

Results
According to the IEDB [19] prediction, the average binding score for the Nucleocapsid phosphoprotein and Spike Glycoprotein were 0.558, 0.470, respectively. All values equal to or greater than the default threshold were predicted as potential B-cell binders. Regarding the cytotoxic T-lymphocyte peptides, the MHC I binding prediction tool predicts 46 peptides from the Nucleocapsid and 192 peptides from the Spike Glycoprotein could interact with the different MHC I alleles. The most promising peptides were listed in Table 3.
In contrast, the MHC II binding prediction tool predicts 774 peptides from the Nucleocapsid and 1111 peptides from the Spike Glycoprotein could interact with the different MHC II alleles. The most promising peptides were listed in Table 4 According to the MDockPeP [29] and HPEPDOCK [30] servers prediction, the spike peptide (FTISVTTEI) has the highest binding affinity to the MHC I HLA-B1503 allele and the spike peptide (MIAQYTSAL) has the highest binding affinity to the MHC I HLA-C1203 allele. The Nucleocapsid peptide (KTFPPTEPK) has the highest binding affinity to the MHC I HLA-A0202 allele ( Table 6) According to the AllergenFP v.1.0 [12], AllerCatPro v. 1.7 [13], and ToxinPred servers [14], all the predicted peptides except the spike peptide (EVFNATRFASVYAWN) were Nonallergen and Non-Toxin (Tables 8and 9).  The Hydrophobicity scores of the Spike Glycoprotein peptides were not available.

Discussion
Due to the current COVID-19 pandemic, the rapid discovery of a safe and effective vaccine is an essential issue [37].
Since the successful vaccine relies on the selection of the most antigenic parts and the best approaches [38], COVID-19 Nucleocapsid phosphoprotein (N) and Spike Glycoprotein (S) were selected to design a peptide vaccine. The antigenicity of the Nucleocapsid and Spike is well predicted [8], and the advantages of the peptide vaccines are well established [9,10].
The peptide design via the Immunoinformatics approach is achieved through multiple steps including the prediction of; B-Cells and T-cell Peptides, the surface accessibility, antigenic sites, and the Population coverage. After the selection of the candidate peptides, their interaction with the MHC molecules is simulated and their safety is predicted [39].
Regarding the B-Cells peptides prediction, the successful candidates must pass the threshold scores in the Bepipred, Parker hydrophilicity, Kolaskar and Tongaonkar antigenicity, as well as Emini surface accessibility tests [40]. The IEDB Bepipred test [19] on the Nucleocapsid showed that eleven peptides were predicted, however, the peptide DAYKTFPPTEPKKDK-KKKADETQALPQRQKKQQTVTLLPAADLDD was the only one that passed all the tests. In contrast, the IEDB Bepipred test [18] on the Spike showed that forty-two peptides were predicted, but the peptides SQCVNLTTRTQLPPAYTNSFTRGVY and LGKY the only two that passed all the tests. As the length of effective B-cell peptides varies from 5-30 amino acids [41], the peptide LGKY is too short and the peptide DAYKTFPPTEPKKDK-KKKADETQALPQRQKKQQTVTLLPAADLDD is too long. Consequently, the Spike peptide peptides SQCVNLTTRTQLPPAYTNSFTRGVY is predicted to have the highest binding affinity to the B-Cells (Table 2).
Concerning the T-Cells peptides prediction, the test measures the peptides' binding affinity to the MHC molecules [19]. The available MHC I alleles HLA A, HLA B, HLA C, HLA E, and MHC II alleles HLA-DR, HLA-DQ, and HLA-DP were used. The MHC I IEDB tests [19]  The results of collective The IEDB tests [19] revealed that the Spike glycoprotein peptides FTISVTTEI, MIAQYTSAL, and the Nucleocapsid peptide KTFPPTEPK are the most promised MHC1 peptides. On the other hand, the Spike peptides EVFNATRFASVYAWN, VFRSSVLHSTQDLFL, and the Nucleocapsid peptides AALALLLLDRLNQLE, ALALL-LLDRLNQLES, PRWYFYYLGTGPEAG, RWYFYYLGTGPEAGL are the most promised MHC II peptides.
To stimulate better immunological responses by the predicted peptides, they must interact and bind effectively with the MHC1 and MHC II molecules [42], therefore we must study their interaction with the MHC molecules.
The simulation and prediction of the interaction between the predicted peptides and the MHC molecules are conducted using molecular docking studies that rely on the calculation of the binding free energy. The lowest binding energy scores of the MHC-Peptide complex will indicate the best interaction and the highest stability [43].
To validate the results of molecular docking, MDockPeP [29] and HPEPDOCK [30][31][32][33][34] servers were used. The MDockPeP server predicts the MHC-Peptides interaction by "docking the peptides onto the whole surface of protein independently and flexibly using a novel the conformation restriction in its novel iterative approach. It ranks the docked Peptides via the ITScorePeP scoring function that uses the known protein-peptide complex structures in the calculations" [29]. In contrast, HPEPDOCK uses "a hierarchical flexible peptide docking approach" to predict the MHC-Peptides interaction [30]. The MHC I, HLA-A0202, HLA-B1503, HLA-C1203 was predicted to present the highly conserved SARS-CoV-2 peptides more effectively [44], hence, they were used in molecular docking study.
The molecular docking results showed that the Spike peptide FTISVTTEI has the lowest docking energy score with the MHC I HLA-B1503 allele, hence it is predicted to have the highest binding affinity. The Spike peptide MIAQYTSAL showed the lowest docking energy score with the MHC I HLA-C1203, consequently, it is predicted to have the highest binding affinity to the MHC I HLA-C1203 allele. In contrast, regarding the MHC I HLA-A0202; the results of the MDockPeP [29] server showed that the Nucleocapsid peptide KTFPPTEPK has the lowest docking energy score, but the results of HPEPDOCK [30] server showed the Spike peptide MIAQYTSAL has the lowest docking energy score ( Table  6).
To illustrate the MHC-Peptide interaction, the PoseView [35] at the ProteinPlus web portal [36] that illustrate the 2D interactions and Cresset Flare viewer [8] that illustrate the 3D interaction were used.
The Spike peptide MIAQYTSAL interacts with the MHC I HLA-C1203 allele by forming hydrogen bonds with the amino acids Tyr33A, Arg86A, Lys90A, Gln179A, Thr187A and hydrophobic bonds with the amino acids Ile2, Gln4, Gln94A, Ala8, Arg86A, Tyr183A, Trp191A. In comparison, it interacts with the MHC I HLA-A0202 allele by forming hydrogen bonds with the amino acids Thr6, Glu87A, Arg121A, Trp180A, and hydrophobic bonds with the amino acids Ile2, Trp171A, Tyr183A, Trp191A. It forms six hydrogen bonds and seven hydrophobic bonds with the MHC I HLA-C1203 allele that is more than its bonds with the MHC I HLA-A0202 allele. This finding indicates the higher binding affinity of Spike peptide MIAQYTSAL to the MHC I HLA-C1203 allele and supports the MDockPeP [3] server score (Table 6 and Figures 5, 6).
Among the reported MHC I alleles, the HLA-B1503 allele was predicted to have "the greatest ability to present the highly conserved SARS-CoV-2 peptides" [44], therefore, the Spike peptide FTISVTTEI is predicted to make the highest response, since the binding with the MHC I stimulates the natural killer and the cytotoxic T cells [45].
Regarding the interaction with the MHC II molecule, the Spike peptide EVFNATRFASVYAWN showed the lowest docking energy score with the three MHC II alleles HLA-DPA1*01:03/DPB1*02:01, HLA-DQA1*01:02/DQB1*06:02, and HLA-DRB1, hence it is predicted to have the highest binding affinity to the three alleles. Hence, it predicted to stimulate the CD4+ (helper) T cells more effectively, since the MHC II molecule presents the antigenic peptides to the CD4+ (helper) T cells [46].
In comparison between the Spike and Nucleocapsid peptides, the Spike peptide (FTISVTTEI) showed a higher binding affinity to the MHC I HLA-B1503 allele. The Nucleocapsid peptides (KTFPPTEPK) and (RWYFYYLGTGPEAGL) showed a higher binding affinity to the MHC I HLA-A0202 allele and the three MHC II alleles HLA-DPA1*01:03/DPB1*02:01, HLA-DQA1*01:02/DQB1*06:02, HLA-DRB1, respectively, however, the total population coverage of the peptides FTISVTTEI and KTFPPTEPK is not high (Table 10). Joshi A, et al. in their predictive COVID-19 Peptide-based vaccine study found that the ORF-7A protein's peptide (ITLCFTLKR) binds most effectively with the MHC I HLA alleles HLA-A*11:01, HLA-A*68:01 [13]. Enayatkhani M, et al. in their predictive study included Nucleocapsid N, but they studied its interaction with the MHC I HLA-A*11:01 allele [11]. Kalita P, et al. included the Nucleocapsid protein and Spike glycoprotein. They used the predicted Peptides from Nucleocapsid, Spike, and Membrane glycoprotein to design a subunit vaccine [12]. Singh A, et al. used the Nucleocapsid protein to design multi-Peptides vaccine [14]. Since we didn't include the two alleles HLA-A*11:01, HLA-A*68:01 in our study and didn't design subunit or multi-Peptides vaccine, the logical comparison will not be applied.
Besides the binding with the MHC molecules, the predicted peptide must be non-toxic and non-allergen, hence, their safety was predicted using the AllergenFP v.1.0 [22], AllerCatPro [23] v. 1.7, ToxinPred [24] servers. The result showed that all the peptides were non-toxic. The AllerCatPro v. 1.7 [23] server results showed there is no evidence about the allergicity of all peptides, however, the AllergenFP v.1.0 [22] server predicts Spike peptide (EVFNATRFASVYAWN) as an allergen (Tables 8 and 9).

Conclusion
A potential COVID-19 Peptide-based vaccine was predicted from the Nucleocapsid phosphoprotein (N) and Spike Glycoprotein (S) via the Immunoinformatics approach. The Spike peptide peptides SQCVNLTTRTQLPPAYTNSFTRGVY is predicted to have the highest binding affinity to the B-Cells. The Spike peptide FTISVTTEI has the highest binding affinity to the MHC I HLA-B1503 allele. The Nucleocapsid peptides KTFPPTEPK and RWYFYYLGTGPEAGL have the highest binding affinity to the MHC I HLA-A0202 allele and the three MHC II alleles HLA-DPA1*01:03/DPB1*02:01, HLA-DQA1*01:02/DQB1-*06:02, HLA-DRB1, respectively. Furthermore, those peptides were predicted as non-toxic and non-allergen. Therefore, the combination of those peptides is predicted to stimulate better immunological responses.
Since the study is an in silico predictive work, further experimental studies are recommended to validate the obtained results.