The discovery of potential natural products for targeting SARS-CoV-2 spike protein by virtual screening

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) enters into the cells through its spike proteins binding to human angiotensin-converting enzyme 2 (ACE2) protein and causes virus infection in host cells. Until now, there are no available antiviral drugs have been reported that can effectively block virus infection. The study aimed to discover the potential compounds to prevent viral spike proteins to bind to the human ACE2 proteins from Taiwan Database of Extracts and Compounds (TDEC) by structure-based virtual screening. In this study, to rapidly discover potential inhibitors against SARS-CoV-2 spike proteins, the molecular docking calculation was performed by AutoDock Vina program. Herein, we found that 39 potential compounds may have good binding affinities and can respectively bind to the viral receptor-binding domain (RBD) of spike protein in the prefusion conformation and spike-ACE2 complex protein in silico. Among those compounds, especially natural products thioflexibilolide A and candidine that were respectively isolated from the soft corals Sinularia flexibilis and Phaius mishmensis may have better binding affinity than others. This study provided the predictions that these compounds may have the potential to prevent SARS-CoV-2 spike protein from entry into cells.


Introduction
Recently, World Health Organization (WHO, https://www.who.int/) declared that the outbreak spread of the novel coronavirus severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes over 8 million reported cases and more than 4 hundred thousand deaths in 216 countries. Most of the coronaviruses cause only mild 3 respiratory distress [1,2]. However, three coronaviruses namely the severe acute respiratory syndrome coronavirus (SARS-CoV), SARS-CoV-2, and the Middle East respiratory syndrome coronavirus (MERS-CoV) have been found so far highly pathogenic transmitted from animal to human [3]. The SARS-CoV in 2003 brings about a 10 % case fatality rate (CFR), and MERS presented a CFR of about 34.4% [4][5][6]. The CFR for the novel coronavirus cannot be stated at the moment, as the pandemic is not over yet, although 3.4% is estimated and one thing is apparent the transmission rate of the infection is higher than the previous outbreaks [7].
The SARS-CoV-2 causes mainly severe acute respiratory distress by attacking lung cells and other complications in the heart, kidney, brain, and spleen and ultimately succumbed to a disastrous effect in coronavirus afflicted patients [8]. In this hour of crisis, there is no available effective standard-of-care present to treat the coronavirus affected patients. This makes an urgent need to find out agents to get control over the pandemic. Scientists globally make an effort to do their best to come up with effective therapeutic agents. So, to design a drug for the virus first we have to see the indispensable processes of the virus, which help them to survive and replicates its copy number. The genome of the virus showed a sequence similarity of 96.3% BatCoV RaTG13 and 79% with the SARS-CoV [9,10]. SARS-CoV-2 is an enveloped single-stranded positive-sense RNA genome containing viruses. The RNA consists of 29,891 nucleotides, which transcribed for 6-12 ORFs, can be translated into around 28 identified non-structural and structural proteins (NCBI Reference Sequence: NC_045512.2) [11].
SARS-CoV-2 enters into the host cells containing ACE2, such as lung, oral, and nasal mucosa by glycosylated spike protein [12]. Human ACE2 interacts with SARS-CoV-2 spike proteins and facilitates SARS-CoV-2 entry into target cells [13]. The 4 viral glycosylated spike protein composed of S1 and S2 subunits in the coronaviruses which enables them access into the host cell. The RBD of S1 subunit in SARS-CoV-2 attaches to ACE2, led to the shed of S1 subunit and subsequently triggers the cleavage of the S2 subunit by host protease protein, TMPRSS2, that can cleave S1/S2 protease cleavage site [13]. The cleavage changes the conformation and allowing HR1, HR2 to form 6-HB, which facilitates membrane fusion and virus releases its payload RNA into the cytoplasm. The first translation product of the RNA genome is polyprotein pp1a and pp1ab. Subsequently, 3CL protease (3CLpro) and papain-like protease (PLpro) cleave pp1a and pp1ab to functional proteins required for genome amplification. The other structural proteins are produced and the host ER-Golgi system assembles and releases mature virus from infected cells [14,15]. The current understanding strongly recommends targeting the surface protein of the viral particle, which possibly prevents them from entering the host cell.
SARS-CoV-2 spike protein is a trimeric protein. The RBD of each subunit has two types of conformations. One conformation is "up" and the other is "down".
When the conformation is "up", the viral spike protein could smoothly interact with human ACE2 protein, otherwise, it is inaccessible [16]. Comparing the structure of SARS-CoV-2 and SARS-CoV in spike protein, they have similar amino acid sequences and functions [17]. However, the RBD of SARS-CoV-2 spike protein in the down conformation N packs tightly against the N terminal domain (NTD) of the neighboring protomer, whereas the SARS-CoV-2 in the "down" conformation is angled closer to the central cavity of the trimer. Additionally, the binding affinity between SARS-CoV-2 spike protein and ACE2 protein is stronger than SARS-CoV, even 10-20 folds [18]. Therefore, we deduce that the structure of SARS-CoV-2 is similar to SARS-CoV, but not the same. 5 Drug discovery and development is a time taking, costly, and complex process.
It usually takes years of effort to get clinically successful [19]. However, the virtual screening of authentic databases and the use of advanced bioinformatics and cheminformatics can reduce the time to come up with the best drug match to the selected target. This process of virtual screening has become a gold standard method for the preliminary phase of drug designing. To discover the potential hits against SARS-CoV-2, we have screened the essential entry pathway targeting virus penetration into cells. This study aims to find the potential hit compounds from Taiwan Database of Extracts and Compounds (TDEC, https://tdec.kmu.edu.tw/) for and against SARS-CoV-2 spike protein. We focus on the screen of hit compounds that can interact with viral spike protein and spike-ACE2 complex protein, leading to the prevention of viral payload into the host cytoplasm. We have selected our target mainly in the RBD of spike proteins for viral and host cell interactions. TDEC is an academic and scientific platform for investigators in different fields to share their research information. It includes much information, such as compounds' structures, physicochemical properties, and biological activities, etc. on pure natural isolates, crude extracts, and synthesized extract from plants, microbes, marine organisms, and Chinese Herbal Medicines. In the present study, an attempt has been made to virtually screen candidates from TDEC for a selected target and did all the high throughput bioinformatics and cheminformatics analysis to get an insight into the interaction of spike protein and hits. 6

Protein superimposition
The protein superimposition was performed by PyMOL software (version 0.99rc6) to compare the conformation of spike proteins in two states [20]. One was spike protein in the prefusion ACE2-free conformation, and the other was RBD of spike protein in the ACE2-bound conformation. In this study, the simulated structure of the RBD of spike protein in the prefusion ACE2-free conformation was constructed by homology modeling using the structure of the experimental structure of ACE2-free spike protein resolved by cryogenic electron microscopy (Cryo-EM) as the modeling template (S1A Fig) and the simulated spike protein was able to be obtained from SWISS MODEL website (S1B Fig) [18,21]. The other was RBD of the spike protein bound with ACE2 protein that was an experimental protein structure resolved by X-ray crystallography and it was able to be downloaded from Protein Data Bank (PDB, https://www.rcsb.org/) (S1C Fig) [22]. The protein superimposition calculations were analyzed and shown by PyMOL software.

Structure-based virtual screening
The virtual screenings with molecular docking calculations were performed by AutoDock Vina (version 1.1.2) program within PyRx (version 0.8) software to discover the potential compounds binding into the viral spike protein from the compound database [23,24]. The 2,321 structures of compounds were downloaded (May 9, 2020) from TDEC, and then collected those compounds together and save it as an sdf format file by Discovery Studio 2019 visualizer software (DS 2019) [25].
Subsequently, these structures of compounds were optimized by energy minimization with steepest descent algorithm using MMFF94 force field, and then translated and 7 divided to 2,321 individual pdbqt format files by Open Babel program within PyRx software [26]. The experimental structure of spike-ACE2 complex protein resolved by X-ray crystallography was obtained from Protein Data Bank (PDB ID: 6M0J) and the simulated structure of prefusional spike protein was downloaded from SWISS MODEL website [21,22,27]. After that, the substrates, including ligands, metal ions, and water molecules existed in both protein structures were all removed and the atoms of residues in both protein structures were modified by added the polar hydrogens and partial charges using DS 2019 software with CHARM force field.
Moreover, to discover the potential compounds binding into the RBD of spike protein in the prefusion conformation and the site of the connective interface of spike-ACE2 complex protein from the database by virtual screening, dimensions of docking search spaces were respectively set big enough to contain the residues of ACE2 binding site on the RBD of spike proteins. Therefore, the size of the docking sites was respectively set as follows. In experimental spike-ACE2 complex protein

Protein superimposition
A previous report had been shown that once the RBD of SARS-CoV-2 spike protein bound to human ACE2 protein, it can assist virus entry into the cells [17].
Recently, the 3D protein structure of the viral spike protein has been resolved by Cryo-EM technique [18]. However, we found that the structure of the spike protein resolved by Cryo-EM was incomplete because it lacked parts of residues in the RBD of spike protein (S1A Fig). Therefore, the structure of the spike protein with the complete amino acid sequences in the prefusion conformation was needed to be constructed by simulation. Fortunately, a useful and trusted simulated structure of SARS-CoV-2 spike protein in the prefusion conformation that has been constructed by homology modeling and could be easily downloaded from SWISS-MODEL website (S1B Fig). Additionally, the viral RBD of spike protein bound with human ACE2 protein, the spike-ACE2 complex protein, was also recently resolved by X-ray crystallography (S1C Fig). To investigate whether the conformation between the RBD of spike proteins in ACE2-free state and ACE2-bound state have differences or not, the protein superimposition was performed by PyMOL software using the structural aligning method. The data showed that the value of root mean square (RMS) of the two conformations was 1.735 Å (< 2Å). It meant that the superimposition between both RBD of spike proteins was good (Fig 1). However, comparing the RBD regions of both spike proteins in the simulation models, it was significantly different in structural conformation (blue ring in Fig 1). The simulation showed that the position of receptor-binding motif (RBM) regional structure in prefusion conformation was close to the RBD core than in the fusion conformation. Besides, the pose of RBM regional structure in the fusional conformation was more close to ACE2 protein than in the prefusion conformation. Generally, SAR-CoV-2 RBM contains many contacting residues binding to human ACE2 protein [22]. We concluded that the conformation changing of the RBM regional structure in the spike protein might affect the spike protein binding behavior to human ACE2 protein.

Structure-based virtual screening
The study reported that C-terminal domain (CTD) of SARS-CoV spike protein bound to ACE2 protein with different kinds of angles and its conformations or poses would be changed to well fit the conformation of dynamic ACE2 protein, such as ACE2 unbound-up conformation, ACE2-bound conformation, and ACE2 unbounddown conformation [10]. Additionally, we have observed that the conformation of the RBD of SARS-CoV-2 spike protein in the prefusional state is different from in the fusional state (Fig 1). In other words, we can speculate that the conformation of the dynamic RBD of SARS-CoV-2 spike protein would change to well bind and fit dynamic ACE2 protein. Therefore, we could design a strategy that two potential hitting sites could be set as the compound docking sites to prevent the spike protein to bind to ACE2 protein. One was that compounds docked on the RBD of the spike protein to prevent spike protein to bind to the human ACE2 protein, and the other was that compounds docked to the site of the connective interface of the spike-ACE2 complex protein to influence the affinity of binding between dynamic viral spike protein and dynamic human ACE2 protein and to cause nonfunctional changing of structural conformation to prevent viral spike protein from cleaved by human proteases. In this study, our aim was going to discover the compounds that not only can bind to the RBD of spike protein in the prefusion conformation but also bind to the site of the connective interface of spike-ACE2 complex protein. To discover the potential compounds, the virtual screening with molecular docking calculation was performed by AutoDock Vina program within PyRX software. All molecular structures prepared for virtual screening were downloaded from TDEC website. The results showed that the numbers of docked compounds (binding energy < -8 kcal/mol) in the prefusion RBD of spike protein and the spike-ACE2 complex protein were respectively 53 and 222. Among those compounds, the numbers of overlapped compounds and non-overlapped compounds were respectively 39 and 197. The binding energies and properties of 39 overlapped compounds were respectively shown in Fig 2, S2 Fig, and S1 Table. Subsequently, once the value of the screening threshold raised and limited to -9 kcal/mol, only compounds TDEC2018CN001781 (Thioflexibilolide A) and TDEC2020CN000246 (Candidine) reached the condition.      with amino acid His34, and its interactive distance was 4.71Å (Fig 5D). The data also showed that candidine could dock with the prefusional RBD of spike protein (cyan cartoon) and its value of binding energy was -9.0 kcal/mol ( Fig   6A). It also formed many interactions with amino acids in the prefusional RBD of the with amino acid Leu452, and their interactive distances were respectively 3.83Å, 3.95Å, 3.77Å, and 3.55Å. Moreover, it also formed a pi-pi T-shaped interaction (pinking purple dashed line) with amino acid Phe486, and its interactive distance was 5.39Å (Fig 6B). Candidine docked with the site near the connective interface of the 14 RBD of spike-ACE2 complex protein and its value of binding energy was -9.8 kcal/mol (Fig 6C). Besides, it also formed many interactions with amino acids in the space near the connective interface of the RBD of spike-ACE2 complex protein. The results showed that it formed an electrostatic hydrogen-bonding interaction (green dashed line) with amino acid Glu37, and its interactive distance was 2.25Å. Besides, it also formed a hydrophobic pi-sigma interaction (dark purple dashed line) with amino acid Asn33, and its interactive distance was 3.94Å (Fig 6D). vaccines for coronaviruses [13,30]. The predictions obtained from virtual screening such as binding energy, possible flexibility in conformation, are still very useful to lead the way of drug discovery or drug repurposing. As represented in Fig 4, we have started with 2,321 compounds taken from TDEC and applied the filters to get compound which can bind to the target structures with high affinity. Notably, we got the two natural products, thioflexibilolide A and candidine, which passed all the filters applied with the strong binding energy respectively -9.2 kcal/mol and -9.0 kcal/mol on the prefusion spike protein and -9.8 kcal/mol and -8.9 kcal/mol on the spike-ACE2 complex protein (Fig 5 and Fig 6). The other compounds listed in the S1 in the further phenotypic assay. Among them, hit compounds were able to influence the interaction between viral RBD of the spike protein and the human ACE2 protein in two dynamic proteins. Besides, we also found among these 39 compounds that Dioscin, Actinomycin D and Saikosaponin C had been reported that they had antivirus activity against others virus (S2 Table). Although, those compounds that we discovered from the database may have the potential ability to prevent the binding between viral spike protein and human ACE2 protein, the efficacy of compounds were still needed to be further verified by bioassay.
Thioflexibilolide A and candidine were natural products respectively isolated from soft coral Sinularia flexibilis and Phaius mishmensis [31,32]. In chemical structure, the structure of candidine is more solid than thioflexibilolide A. It it might be caused that the RBM of spike protein is major composed of chain type of structure rather than α-helix or β-sheet structure (S1 Fig). Besides, we also deduce that two compounds bind to side face respectively with the RBM of the ACE2-free spike protein and the connective interface of spike-ACE2 complex protein probably affect the affinity of binding between viral spike protein and human ACE2 protein or influence its conformation changing to prevent spike protein to be cleaved by host protease like TMPRSS2 (Fig 5 and Fig 6).
To rapidly develop useful drugs against SARS-CoV-2, computer-aid drug screening was performed in many studies. SARS-CoV spike protein were homology proteins with greater than 70% similar sequences and have the same function for binding ACE2. However, the previous report also showed that the binding affinity between SARS-CoV-2 spike protein and ACE2 was higher than SARS-CoV spike protein approximately 10-20 folds [18,39].
These reports suggested that it is highly similar but not the same in partial structure or conformation in the spike proteins between SARS-CoV-2 and SARS-CoV. Moreover, spike-ACE2 complex protein was constructed by protein-protein docking may not good enough to directly exhibit the protein-protein binding situation in fact. Recently, the RBD of SARS-CoV-2 spike protein in the prefusion conformation (PDB ID: 6VSB) and the complex protein that SARS-CoV-2 spike protein's RBD bound with ACE2 protein had already been resolved (PDB ID: 6M0J) respectively by Cryo-EM and X-ray crystallography with high resolution [18,22]. Therefore, the two new resolving protein structures were likely more close to the structural situations in fact.
In other words, it could raise the accuracy in structure-based virtual screening with docking calculation because of those structures resolved with high resolution that is why we did that in this study.
Developing the viral spike antibody is one of the good strategies to fight against SARS-CoV-2. It could bind to the RBD of spike protein to neutralize SARS-CoV-2 had been reported [40,41]. However, it is concerned about inducing antibodydependent enhancement (ADE) [42]. Herein, we reported that the 39 potential compounds against SARS-CoV-2 spike protein may have the advantage of avoiding to induce ADE.
ACE2 was a type I transmembrane protein and it functioned as a carboxypeptidase to mediate homeostasis of renin-angiotensin system (RAS), such as blood volume, lung and cardiovascular regulating functions, by cleaving angiotensin II (Ang II) to Ang(1-7) [43]. ACE2 was higher expressed in the small intestine, kidneys, testicles, adipose tissue, and thyroid and moderate in the lungs, liver, adrenal glands, and rectum and lower in the blood, blood vessels, spleen, brain, muscle, and bone marrow [44,45]. On the other hand, gender and age characters could affect ACE2 expression under inflammatory conditions in the skin, lung, brain, and thyroid [44]. Recent studies also showed that gender could affect SARS-CoV-2 infection susceptibility and virus clearance [17]. Besides, another report also pointed out that COVID-19 patients' symptoms including dysfunctions in diarrhea, taste, and smell; injuries in the liver, kidney, and heart may have a close relationship with ACE2 expression [46]. Herein, our results showed the 39 potential compounds could prevent the binding between viral spike protein and human ACE2 protein in silico. It 20 is speculated that those potential compounds could fight against COVID-19 and lower the expression in the above symptoms for humans.
Natural products proved to be beneficial due to the contribution from a long time to develop effective drugs for several diseases [47]. Therefore, we have tried our endemic TDEC which is very rich and diverse in natural product resources derived from traditional medicine, domestic microbes, and marine organisms.
thioflexibilolide A is well-known for its neuroprotective and anti-inflammatory activities derived from soft corals, whereas about gravicycle is reported non-toxic after its in vitro toxicity assessment in the cell [48]. The results we are reporting here about the compounds which are screened by employing virtual screening with molecular docking approach from the database for disabling SARS-CoV-2 spike protein's RBD from its interaction with ACE2 receptor of the host cell is very first of its kind as far as possibly know. The compounds perform well to get the filter into potential drug candidate during our screening process, and we strongly recommend our compound for further in vitro and in vivo investigation.