A framework for predicting potential host ranges of pathogenic viruses based on receptor ortholog analysis

Viral zoonoses are a serious threat to public health and global security, as reflected by the current scenario of the growing number of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cases. However, as pathogenic viruses are highly diverse, identification of their host ranges remains a major challenge. Here, we present a combined computational and experimental framework, called REceptor ortholog-based POtential virus hoST prediction (REPOST), for the prediction of potential virus hosts. REPOST first selects orthologs from a diverse species by identity and phylogenetic analyses. Secondly, these orthologs is classified preliminarily as permissive or non-permissive type by infection experiments. Then, key residues are identified by comparing permissive and non-permissive orthologs. Finally, potential virus hosts are predicted by a key residue–specific weighted module. We performed REPOST on SARS-CoV-2 by studying angiotensin-converting enzyme 2 orthologs from 287 vertebrate animals. REPOST efficiently narrowed the range of potential virus host species (with 95.74% accuracy).

can also be used to supplement our classification. At least three orthologs of each type are needed. 118 (3) Key residues are identified by comparing permissive and non-permissive orthologs (for details, 119 see the SARS-CoV-2 example). (4) Potential virus hosts are predicted by a key residue-specific 120 weighted module.  (Fig. S1A, B). Protein sequences contained 344 to 862 amino-acid residues, and the 136 length in 25% of species was the same as that of human (805 amino-acid residues) (Fig. S1C). 137 We first performed the phylogenetic analysis, and found that ACE2 orthologs from the same taxon 138 were usually clustered into the same branch. (Fig. S2). Also, we analyzed the identities of all 139 sequences pairwisely, the result indicate that the ACE2 protein sequences were highly conserved 140 across each taxon examined, as well as each subclass of mammals (Fig. 2B), suggesting that we can 141 start classification with a few representatives from each taxon. Then, we ranked the identity of ACE2 142 among all species taxon to humans, and the result yielded the following order from high to low was 143 as below: primates, rodents, carnivores, other placental species, even-toed ungulates, whales and 144 dolphins, bats, marsupials, birds, other chordates, amphibians, lizards, and bony fishes (Fig. 2C). 145 Based on the ranking result, we found that ACE2 in other vertebrate species except mammals had 146 low consistency with that of human, suggesting they were not likely to be potential hosts or 147 reservoirs for SARS-CoV-2. Previous study has supported our observation that poultry (belong to 148 birds) is not susceptible to SARS-CoV-2(Shi et al., 2020). We finally chose 16 representative ACE2 149 orthologs (very similar to that of humans) from mammals (include primates, rodents, carnivores, 150 bats, and even-toed ungulates) for following analysis. (Fig. 3). Among these species are wild 151 animals, zoo animals, pets, and livestock that are frequently in close contact with humans, and model 152 animals used in biomedical research. 153 Taken together, we eventually get 17 permissive and 3 non-permissive ACE2 orthologs. These 177 orthologs would be used for the identification of key residues which changes of them may damage 178 the ACE2-SARS-CoV-2 interaction. 179 After sequence comparison, we found 33 ACE2 protein residues of M. musculus, 20 residues of 188 C. jacchus, and 50 residues of E. fuscus that differed from those of all SARS-CoV-2-permissive 189 species (Fig. 4A, Table S1). Take the residues at the ACE2-SARS-CoV-2 interaction interface as an example, we found that substitutions in residues Q24, D30, K31, E42, M82, Y83, K353, and G354 191 that distinguished ACE2s of C. jacchus, M. musculus, and E. fuscus from those of all permissive 192 species (Fig. 4B). These residues may be the key sites affecting the ACE2-SARS-CoV-2 binding. 193 The ACE2 orthologs were characterized by a peptidase M2 domain (PD) which located outside  Based on the above work, we developed a residue-specific weighted module for the prediction of 215 susceptibility of untested mammal species ( Figure 5A). The module takes as input multiple receptor 216 orthologs, which including permissive, non-permissive orthologs from tested species and orthologs 217 from other species to predict. It will first calculate a residue-weighted distance matrix for all 218 orthologs taking into account three priors of the PD domain, the 95 key residues and the contact 219 surface with SARS-CoV-2 S protein (Fig. 4C). We then used this distance matrix to select as 220 potentially permissive (or non-permissive) candidates orthologs that were much closer to known 221 permissive (or non-permissive) orthologs than to known non-permissive (or permissive) orthologs 222

(Methods). 223
The optimized distance matrix clearly separated known permissive and non-permissive orthologs, 224 with no discernible mixing ( necessity to monitor susceptible hosts to prevent future outbreaks. In addition, we identified 10 242 previously unrecognized non-permissive ortholog sequences from primate, rodent, bat, and 243 marsupial species. We also predicted that species other than mammals were not likely to be the host 244 of SARS-CoV-2, as all non-mammalian ACE2 orthologs were too dissimilar from known 245 permissive orthologs (Fig. S3A). 246 among bats than among other tested mammals, and ACE2 orthologs of bats were located on two 264 distant branches of the evolutionary tree, highlighting the possibility that bat species act as reservoirs 265 of SARS-CoV-2 or its progenitor viruses. Notably, we also found that ACE2 orthologs from a wide 266 range of mammals, including pets (e.g., cats and dogs), livestock (e.g., pigs, cattle, rabbits, sheep, 267 horses, and goats), and animals commonly kept in zoos or aquaria, could act as functional receptors non-permissive ortholog sequences from primate, rodent, bat, and marsupial species. These findings 280 will enrich negative datasets, increasing the accuracy of the screening of key residues that affect 281 virus-receptor interaction, and will aid the establishment and training of optimized predictive 282

models. 283
We propose that REPOST will strengthen the ability to rapidly identify potential hosts of new 284 pathogenic viruses affecting not only humans, but also animals. Another advantage of REPOST is 285 the ease of key residue screening, which may lead to the identification of promising targets for the development of broad-spectrum antiviral therapies. REPOST can also be applied in other cases; for viruses with more than one cellular receptor, for example, all receptor orthologs can be used as input 288 for systematic analysis. When the viral receptor cannot be identified, sequence information for all 289 cellular membrane proteins can be integrated as input for prediction. REPOST, however, can be results. In summary, the establishment of the REPOST predictive framework may be of great 296 significance for the prevention and control of future outbreaks. 297

Protein sequence identity and phylogenetic analyses 299
The amino-acid sequences of ACE2 orthologs from 287 vertebrates were downloaded from the 300 National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/).  The optimal was selected from the 1,000 possible combinations listed in Table S3 to 335 maximize the following separation score: . 345 All orthologs with ( ) ≥ 0.5 (candidate permissives) and ( ) ≤ 0 (candidate non-346 permissives) were then prioritized (Table S2). All 11 candidates non-permissives except 347 Ornithorhynchus anatinus and Myotis davidii (whose residue conformations at the S protein contact 348 surfaces were extremely similar to that of human ACE2) were tested experimentally. 349

Acknowledgements 350
This work was supported by the Beijing Nova Program of Science and Technology 351