Predicting missing links in global host-parasite networks

Parasites that infect multiple species cause major health burdens globally, but for many, the full suite of susceptible hosts is unknown. Predicting undocumented host-parasite associations will help expand knowledge of parasite host specificities, promote the development of theory in disease ecology and evolution, and support surveillance of multi-host infectious diseases. Analysis of global species interaction networks allows for leveraging of information across taxa, but link prediction at this scale is often limited by extreme network sparsity, and lack of comparable trait data across species. Here we use recently developed methods to predict missing links in global mammal-parasite networks using readily available data: network properties and evolutionary relationships among hosts. We demonstrate how these link predictions can efficiently guide the collection of species interaction data and increase the completeness of global species interaction networks. We amalgamate a global mammal host-parasite interaction network (>29,000 interactions) and apply a hierarchical Bayesian approach for link prediction that leverages information on network structure and scaled phylogenetic distances among hosts. We use these predictions to guide targeted literature searches of the most likely yet undocumented interactions, and identify empirical evidence supporting many of the top “missing” links. We find that link prediction in global host-parasite networks can accurately predict parasites of humans, domesticated animals, and endangered wildlife, representing a combination of published interactions missing from existing global databases, and potential but currently undocumented associations. Our study provides further insight into the use of phylogenies for predicting host-parasite interactions, and highlights the utility of iterated prediction and targeted search to efficiently guide the collection of host-parasite interaction. These data are critical for understanding the evolution of host specificity, and may be used to support disease surveillance through a process of predicting missing links, and targeting research towards the most likely undocumented interactions.


Introduction
This transformation allows for changes in the relative evolutionary distances among hosts and 151 was shown to have good statistical properties for link prediction in a subset of the GMPD (El-152 masri et al., 2020). We apply these three models to the full dataset. The tree scaling parameter 153 is applied across the whole phylogeny, but since the importance of recent versus deep evolu- 154 tionary relationships among hosts is likely to vary across parasite types (Park et al., 2018), we 155 additionally run the models on the dataset subset by parasite taxonomy (arthropods, bacteria, 156 fungi, helminths, protozoa, and viruses). For all models we used 10-fold cross-validation to pre- highlight Europe as a hotspot, while those from the phylogeny model reveal highest density of 216 missing links predicted in tropical and central America, followed by tropical Africa and Asia 217 (Fig. 4B). The hotspot map generated by the combined model was closer to that produced by 218 the affinity only model, but with higher relative risk in sub-Saharan Africa, south America, and 219 parts of southeast Asia (Fig. 4C). 220 The top ranked links from the affinity and combined models were largely dominated by hu-  (Table SM   229 2).

230
By conducting targeted literature searches of the top predicted missing links for each model 231 by data subset, we found multiple links had published support, but were not included in the orig-  (Table SM 2).

237
Using a phylogeny-informed bipartite network model we were able to accurately predict miss- 238 ing links in a very large global mammal-parasite network. That we are able to make robust 239 predictions even with extremely sparse input data, indicates that this modelling approach may be useful in other large, data-poor ecological networks. We compared the performance of our           , and the Global Names Resolver (resolver.globalnames.org) were conducted to resolve taxonomic conflicts. Synonymous names were corrected to the name with the majority of references, or to the preferred name in recently published literature or taxonomic revision when this information was available. Host associations for parasite names that were later split into multiple species were removed from the dataset (e.g. Bovine papillomavirus). All hosts and parasites reported below the species level were assigned to their respective species (e.g. Alcelaphus buselaphus jacksoni was truncated to Alcelaphus buselaphus), and any species reported only to genus level or higher (e.g. Trichostrongylus sp.) was removed. Our requirement for source databases to report Latin binomials facilitated taxonomy harmonization, but resulted in the exclusion of databases such as the GIDEON, which is a useful resource for modelling outbreaks of human infections based on disease common names, but does not provide unambiguous parasite scientific names (4). Of the 215 human diseases reported in Smith et al. (2014), we identified 197 that could be attributed to a predominant causal agent or group of organisms, all of which were already represented in our more comprehensive dataset. The remaining 18 human diseases represented those with no known causes (e.g. Brainerd diarrhea, Kawasaki disease), and those regularly caused by a diverse range of pathogens (e.g. viral conjuntivitis, invasive fungal infection, tropical phagedenic ulcer)."

Parameter estimation and predictive performance
For parameter estimation we split a dataset into ten folds, and used the MCMC algorithm described in Elmasri et al. (5) to estimate model parameters across each of the ten folds. For each model we determined the number of iterations required for parameter convergence by visual inspection of parameter traceplots, auto-correlation plots, and effective sample size (see Elmasri et al. (5) for detailed discussion of convergence diagnostics). For each fold we generated a posterior interaction matrix by averaging 1000 sample posterior matrices, where each sample matrix is constructed by drawing parameters at random from the last 10000 MCMC samples.
To assess predictive performance, we employed cross-validation across each of the ten folds used for parameter estimation. For each new fold, we set a fraction of the observed interactions (1s) to unknowns (0s), and attempted to predict them using a model fit to the remaining interactions. Here we only held out links for which there was a minimum of two observed interactions as the model would not be able to recover interactions for parasites that infect a single host species. Predictive performance was quantified using the area under the receiver operating characteristic (ROC) curve, which is a popular measure of potential predictive ability for binary outcomes. For each fold, an ROC curve is obtained by thresholding the predictive probabilities of the posterior interaction matrix, then calculating the true positive and false positive rates compared to the hold out set. In this way the posterior interaction matrix is converted to a set of binary interactions at the threshold value that maximizes the area under the ROC curve (AUC). To calculate ROC curves for each model-dataset combination, we took the average ROC values across the 10 folds. However, because our study is motivated by the belief that some 0s in our data are actually unobserved 1s, AUC may not be the best metric for comparing model performance as it increases when models correctly recover observed 0s. Therefore, in addition to AUC, we also assess predictive performance based on the percent of 1s accurately recovered, following a similar procedure. For prediction and guiding of the targeted literature searches, we generate a single posterior predictive interaction matrix for each model-dataset combination, by averaging these ten posterior interaction matrices.

Model diagnostics
All models showed high predictive accuracy in cross-fold validation: area under the receiver operating characteristic curve (AUC) values ranging from 0.842 -0.978, where a maximum AUC of 1 signifies perfect predictive accuracy, and with between 72.54% and 98.00% of the held-out documented interactions successfully recovered ( Fig. 2A, Table SM 1; see Fig. SM  2 for posterior interaction matrices for the full dataset). While remaining high overall, AUC tended to decrease with the total size of the interaction matrix, with the phylogeny only model demonstrating the lowest AUC when applied to the largest interaction matrices -the full matrix, and the helminth subset ( Fig. 2A). The percent documented interactions (1s) correctly recovered from the held-out portion also decreased the total size of the interaction matrix (Fig. 2C), with the combined model outperforming the other models in all subsets except for the full dataset and the virus subset (Table SM 1). This may reflect that successfully predicting held-out links becomes more difficult in more sparse matrices, which happens here as the size of the matrix increases. Further, for the phylogeny-only model, the larger drop in performance relative to other models may reflect variation in the optimal tree scaling parameter across different parasite subsets (see Figs. SM 9 & SM 9 for examples across parasite types). In the larger interaction matrices, additional parasite diversity is represented, implying that fitting a single tree transformation may result in sub-optimal prediction as the relative importance of shallow versus deep phylogenetic distances is likely to vary across parasite subgroups within these matrices. Future research may benefit from being able to fit multiple different scaling parameters per parasite group.
For each model, we conducted literature searches for the top ten most likely links without documentation in the current database. This resulted in 177 unique host-parasite links after removing duplicate predictions across models and subsets (see Materials & Methods). Of the undocumented links for which literature searches were conducted, we identified 72 links with evidence of infection (direct observation, genetic sequencing, or positive serology), and an additional 14 links with some evidence, but for which additional confirmatory data are required (e.g. antibodies but no confirmed cases for human infections, known cross-reactivity of the serological test used, an unconfirmed visual diagnosis, or the identification of a genetically similar but previously unknown parasite). Of the remaining links for which we could not find conclusive evidence, we highlight 39 that should be targeted for surveillance. These include links where there is known geographic overlap in the ranges of the host and parasite and host ecologies likely facilitate exposure. We also identify a number of links that are highly likely in the model, but are unlikely due to the mode of disease transmission, non-overlapping host and parasite geographies, or potential competitive interactions with closely related parasites. Overall the full and phylogeny only models tended to identify a greater number of links with published evidence, and fewer ecologically unlikely links (Fig. 2D)

Phylogeny scaling
To account for uncertainty in the phylogenetic distances among hosts and improve prediction, the model estimates a tree scaling parameter (η) based on an early-burst model of evolution (6). Across models, η was estimated to be positive, corresponding to a model of accelerating evolution, and suggesting less phylogenetic conservatism in host link associations among closely related taxa than predicted under a pure-Brownian motion model (6). Not surprisingly, η varied when the data was subset by parasite type (Figs. SM 9 & SM 10). Interestingly, arthropods and fungi were estimated to have the smallest η parameters (both ∼ 8.15), perhaps reflecting the tendency for fungi to include opportunistic pathogens such as Pneumocystis carinii and Chrysosporium parvum. In contrast, larger η was estimated for helminths and viruses (10.28 and 9.54 respectively), consistent with the observation of Park et al. (7) that mean phylogenetic specificity is similar in these two groups, though viruses are more variable and contain more extreme specialist and generalist parasites. Overall the full dataset was estimated with an η parameter most similar to the helminth subset (10.76), reflecting the representation of helminths in the full dataset (roughly 64% of the observed host-parasite interactions). The discrepancy in phylogeny scaling across the Combined model and the subsets by parasite type likely contributes to explaining why the performance of the phylogeny only model is lower in the full dataset. Future extensions may benefit from developing a more flexible model to allow for an interaction between phylogenetic scaling and parasite taxonomy, rather than using a single scaling parameter.    • A recent molecular phylogeny of the Taenia genus supported the creation of a new genus and renaming of Taenia mustelae to Versteria mustelae (8). Since then there has been a report of fatal infection of a previously unknown Versteria species in a captive orangutan Pongo pygmaeus cloesly related to species found in wild mustelids, suggesting the need for increased vigilance of Versteria infections in humans (9).
• While bluetongue virus is known to infect a wide range of ruminants, it is not currently considered to infect humans (10).
• Bovine viral diarrhea viruses are not considered to be human pathogens, but there is some concern about zoonotic potential as they are highly mutable, have the ability to replicate in human cell lines, and have been isolated from humans on rare occasions (11).
• Although antibodies to Neospora caninum have been reported in humans, the parasite has not been identified in human tissues and the zoonotic potential is not known (12).
• Mastophorus muris is a rodent-specific nematode that requires arthropods as intermediate hosts and while this makes it unlikely to infect humans, it was recently documented in an urban population of rats in the UK (13), indicating the potential for human exposure.
• While natural infections of Plagiorchis species in humans are rare, the first case of human infection by the bat parasite Plagiorchis vespertilionis was reported in 2007 (14). The source of infection is uncertain, it has been suggested that freshwater fish and snails may be undocumented intermediate hosts and infection was due to ingestion of raw freshwater fish.
• Carnivore protoparvovirus can infect a number of hosts in the order Carnivora (15), though there seems to be no evidence of human infection.
• Alaria alata, an intestinal parasite of wild canids, has not been identified in humans, but is considered a potential zoonotic risk as other Alaria species have been reported to cause fatal illness in humans (16).
• Due to the characteristics of the biological cycle of Tenia pisiformis and the observation that it is innocuous in humans, this parasite has been used as a model for the study of other important zoonosis relevant to human health including T. solium (17).
• The definitive hosts of Physocephalus sexalatus are commonly wild and domestic pigs, but it is sometimes found in other mammals and some reptiles (18). However the parasite uses beetles as an intermediate host, which makes human infection unlikely.  • Currently the role of cattle in the epidemiology of Chagas disease (caused by Trypanosoma cruzi) is unknown, though the majority of cattle in Latin America may be exposed (280 million heads; 1/4 of the world population) (19). (20) report that 177 species have been documented as susceptible to infection by T. cruzi, with domestic hosts in some cases being responsible for the maintenance of local parasite populations over long periods of time. While cattle have tested positive in serological studies, cows and other domestic species are also infected by Trypanosoma and Phytomonas species which can cause cross-reactions in diagnostic tests (21).

Affinity only -Full Dataset -Domestic Hosts
• Rabies has an extremely large host range and surprisingly rats are rarely reported as suffering from rabies, though there have been a few reported cases rabid Rattus norvegicus in the United States (22).
• The natural hosts of Hymenolepis diminuta are rats and cattle have not been found to be susceptible to infection, however H. diminuta eggs have been found in the feces of dairy cattle, likely the result of ingesting forage contaminated with rodent feces (23).
• Mesocestoides lineatus has a three-stage lifecycle with two intermediate hosts and a large range of carnivorous mammals as definitive hosts (24). Human infections of the tapeworm Mesocestoides lineatus are rare but can occur through the consumption of chickens, snails, snakes, or frogs (25) and therefore it is unlikely that cows will ingest the intermediate life stages of this parasite.
• Capillaria hepatica (syn. Calodium hepaticum), is a globally distributed zoonotic parasite which uses rodents as main hosts, but is known to cause infection in over 180 mammalian species, including cattle (26).
• Birds are the primary vertebrate hosts for St. Louis encephalitis virus, though amplification by certain mammals has been suggested (27). There is some serological evidence of infection in domestic mammals, including cattle (28). The common vector Culex nigripalpus feeds primarily on birds, but shows a seasonal shift from avian hosts in the spring to mammalian hosts in the summer, indicating it may be able to act as a bridging vector among different host species (27).
• Canine distemper virus infects a wide range of hosts within the order Carnivora, but has also been found to cause fatal infection in some non-human primates and peccaries (29).
• Anisakis simplex uses cetaceans as final hosts, with marine invertebrates and fish as intermediate hosts (30). Whales are infected through ingestion, indicating that while cattle may be susceptible, though they would need sufficient exposure to marine based feed.
• Ovis aries is reported to be infected by Trypanosoma cruzi (20).
• We found no evidence of Bos taurus infection by Taenia mustelae.   • We did not find any record of rabies infecting Vulpes rueppellii, but this should be investigated as this disease is known to cause severe declines in wild canids (39).

Affinity only -Full Dataset -Wild Hosts
• Rabies has been documented to infect the endangered San Joaquin kit fox (Vulpes macrotis mutica) and is suggested to have caused a catastrophic decline of the species in the 1990s (40).
• Domestic dogs are considered a major predation threat to the Tibetan fox (Vulpes ferrilata) (41), and rabies is confirmed to circulate in wild and domestic animals in Tibet (42).
• Domestic dogs alter the ecology of Andean foxes (Lycalopex culpaeus), have been observed hunting them, and are a potential source of infection (43).
• Diseases from domestic dogs (largely canine distemper) is considered a major threat to the endangered Lycalopex fulvipes (44), indicating that rabies may also pose a risk.
• There is one report of serological evidence of rabies in Lycalopex griseus in Chile in 1989 (45) (reported as Pseudalopex griseus, a formerly accepted name (1)).
• We did not find evidence of rabies infection in Lycalopex gymnocercus, however its distribution overlaps with species known to be important in the transmission of rabies in Brazil (46).
• The endangered Dhole (Cuon alpinus) is known to suffer from rabies and was a source of fatal human infections during an outbreak in the 1940s (47).
• Rabies has been reported as potentially infecting Speothos venaticus (48) and there is a report of an individual with positive serology (49).
• We could not find evidence of Schistosoma mansoni infection in Holochilus chacarius. Congener Holochilus braziliensis was experimentally shown to be a viable host, although infection resulted in host death (50).  • Rabies in Bison bison is considered rare, but there are multiple cases reported (51).
• We could not find evidence of rabies infection in Bos frontalis, but as this is a semi-wild and endangered species (53) and other Bos species are susceptible, the disease may pose a conservation risk. This may also be the case for the endangered Bos javanicus.
• We did not find a specific report of rabies infection in Vicugna vicugna, although all South American camelids are noted to be susceptible and display clinical signs of infection (54).
• There are documented cases of rabies infecting Rattus rattus (ex. (55)), though it appears to be rare.
• Rabies in Rattus norvegicus was predicted by the affinity only model with the full dataset for domestic hosts.
• Pet guinea pigs Cavia porcellus have been infected with rabies after being bitten by a raccoon (56).
• There are reported cases of rabid Oryctolagus cuniculus in the United States (22).
• Toxoplasma gondii is known to infect Bos grunniens and cause severe economic losses (57).     • Most of these links were predicted by models discussed above except for rabies infection in Aepyceros melampus, which has been identified as suffering from spillover infections (58), and Rupicapra rupicapra, which has been documented in Europe (36).  • Rhipicephalus evertsi and R. appendiculatus are common ticks in East and Southern Africa (59) and are unlikely to interact with non-African hosts such as Cervus elaphus and Vulpes vulpes (though there are some populations of Vulpes vulpes in North Africa). Future iterations of our implemented link prediciton framework may benefit from the inclusion of information on geographic range overlap among host species, however this may reduce the ability to identify future host-parasite associations that may occur given range expansions or species translocations.

Arthropods -Affinity only
• Amblyomma hebraeum, the main vector of Ehrlichia ruminantium in southern Africa, prefer large hosts such as cattle and wild ruminants, although immature stages are found to feed on a wide range of hosts including scrub hares, guineafowl, and tortoises (60). Considering the species is restricted to southern Africa, it is unlikely to interact with Vulpes vulpes, although it's wide host range during immaturity indicates that it may be suscpetible given the opportunity.
• Hyalomma truncatum is found across sub-Saharan Africa, where it commonly infests domestic and wild herbivores, and domestic dogs (60). Considering the species is restricted to Africa, it is unlikely to interact with Vulpes vulpes, although it's tendency to infest domestic dogs indicates that it may be suscpetible given the opportunity.
• Rhipicephalus appendiculatus was found to be the most prevalent tick species on domestic pigs in the Busia District of Kenya (62), indicating that increased monitoring may also identify R. evertsi on domestic pigs.
• While multiple cervids have been reported with sarcoptic mange (61), we cannot find any report of infection in white-tailed deer Odocoileus virginianus, although there are numerous reports of infection with mange caused by Demodex sp., including the host specific Demodex odocoilei (63), potentially indicating competition among Sarcoptes and Demodex species.  • There does not appear to be a published record of sarcoptic mange in Canis adustus, however in areas with sympatric jackal species C. adustus usually display ecological segregation through preferring denser vegetation (64). This may indicate that while C. adustus may be susceptible to sarcoptic mange, differences in the ecologies of this species relative to other canids may limit transmission making overt infections difficult to document.

Arthropods -Phylogeny only
• Lycalopex fulvipes is endangered (44), meaning that its small population sizes and restricted geographic range may reduce exposure to S. scabiei, however as sarcoptic mange is implicated in the declines of other wild canids, it should be targeted in disease monitoring programs for this species.
• While Lycalopex vetulus is not endangered, it displays some adaptability to anthropogenic disturbance (65), which may expose it to sarcoptic mange through contact with domestic dogs. In addition, Lycalopex vetulus is sympatric with the crab eating fox (Cerdocyon thous) -a documented host of S. scabiei (61). The IUCN reports a gap in conservation actions for L. vetulus regarding the role of disease in population regulation, and their status as reservoirs of scabies, canine distemper, leishmaniasis, and rabies (65).
• While the Arctic fox (Vulpes lagopus) is considered the most important terrestrial game species in the Arctic (66), we were unable to find documented infection by the "human flea" (Pulex irritans). P. irritans is thought to be unable to persist in Arctic envrionments due to the temperature thresholds necessary for breeding (67), though this may change in the future with continued Arctic warming.
• The endangered Dhole (Cuon alpinus) has been documented as suffering from mange as early as 1937 (47) and appear to be especially susceptible to disease outbreaks due to their large group sizes and amicable behaviour within packs.
• S. scabei was identified in Speothos venaticus (69), and identified as potentially contributing to the loss of individuals from a group in Mato Grosso, Brazil (70).
• We cannot find a record of mange in Ovis ammon, though outbreaks of sarcoptic mange have been documented in ibex and blue sheep in the Taxkorgan Reserve, China, in which O. ammon are also present, although this population has received little study (72).  • We were unable to find any published evidence of Sarcoptes scabiei infesting Vulpes velox. However, sarcoptic mange is known to infest several species of canids, is prevalent in coyotes (Canis latrans) within the range of Vulpes velox (73). Criffield2009 surveyed for S. scabiei on V. velox, but no clinical signes were observed, and suggest that as the grey fox (Urocyon cineroargenteus) is somewhat resistant to mange in laboratory tests, V. velox may be similarly resistant, though there have been no experimental infestations. Finally, V. velox is debated to be conspecific with Vulpes macrotis (74), which has been documented with sarcoptic mange (75).

Arthropods -Combined model
• Sarcoptes scabiei has been documented in the endangered San Jaoaquin kit fox Vulpes macrotis mutica, and is considered a significant threat to it's conservation (75).
• Rhipicephalus evertsi is a common tick in East and Southern Africa (59) and is unlikely to interact with non-African hosts such as Odocoileus virginianus. Future iterations of our link prediction method may benefit from the inclusion of information on geographic range overlap among host species, however this may reduce the ability to identify future host-parasite associations that may occur given range expansions or species translocations.
• The remaining links are discussed above.  • Bartonella grahamii is a pathogen of rodents worldwide, but was first identified as causing an infection in an immunocompromised human in 2013 (76).

Bacteria -Affinity only
• Anaplasma bovis, causal agent of bovine anaplasmosis, is not currently considered zoonotic (77), but Anaplasma phagocytophilum the causative agent of human anaplasmosis is placed is as sister taxa to A. bovis in a recent phylogeny (78). Similarly, A. marginale, the causative agent of anaplasmosis in cattle, is also not considered zoonotic, but it reaches high prevalence in cattle and humans are likely exposed to the tick vector (77).
• Mycoplasma mycoides is not typically thought to infect humans, but there is one report of disease and positive serology in a farm worker exposed to multiple calves infected with M. mycoides subsp. mycoides LC (79).
• There has been one documented case of infection in an immunocompromised human with a Mycoplasma haemofelis-like bacteria (80) and the authors note that disease-causing latent mycoplasma infections in immunocompromised and non-immunocompromised patients are an emerging issue.
• The zoonotic potential of Chlamydophila pecorum is not known, although it is associated with abortions in small ruminants (81).
• Lawsonia intracellularis was recently recognised as the cause of an emerging intestinal disease in horses (Equine proliferative enteropathy), but is currently not considered to be zoonotic (82).
• Mycoplasma conjunctivae causes a highly contagious ocular infection of sheep, goats, and wild Caprinae, and is possibly zoonotic as it has been associated with eye inflammation in young children (83).
• Histophilus somni is a pathogen of bovine and ovine hosts, but has not been documented to infect humans (84).  • Leptospira sp. are commonly regarded as infecting a wide range of mammals (85). (86) identified Leptospira interrogans serovar canicola in the urine of jackals in Israel though did not identify the particular species. However, Canis aureus is the only jackal species present in the country (87) providing some support for this host-parasite association, though the findings of (86) should be verified.

Bacteria -Phylogeny only model
• We did not find any documentation of leptospirosis in Canis mesomelas or Lycaon pictus but considering it is found in multiple species in Africa including domestic dogs (88), wild canids are likely to be exposed.
• (89) sequenced DNA from blood samples of Canis mesomelas in South Africa and identified 16S rDNA sequences very similar to Anaplasma phagocytophilum. This study also identified other Anaplasma species indicating the potential for 16S rDNA sequencing to gather evidence of predicted host-parasite and discover previously unknown pathogens.
• E. coli is a ubiquitous commensal microbe of vertebrates (90) and pathogenicitiy is linked to particular strains, indicating that our approach may be expanded by identifying the host ranges of particular subspecies or virulent strains of common commensal bacteria. Canis latrans has been identified as harbouring atypical enteropathogenic E. coli and may serve as a reservoir in agricultural areas near the United States-Mexico border (91). Wild Equus burchellii have been found to harbour antibiotic resistant E. coli in South Africa (92) and Tanzania (93), and captive Equus zebra hartmannae have been found with antibiotic resistant E. coli (94). Similarly, antibiotic resistant E. coli have been found in captive Saguinus geoffroyi (95).
• We did not find any evidence of Leptospira infections in Vulpes lagopus. We did find one report of positive serology for Leptospira interrogans in Vulpes velox macrotis (96), though there is debate as to whether this subspecies is actually its own species Vulpes macrotis (1).  • E. coli is ubiquitous commensal microbe of vertebrates (90) and Ovis canadensis has been surveyed for pathogenic E. coli in Washington, USA, though no individuals tested positive (97). Bison bison have been highlighted as a potentially important reservoir of pathogenic E. coli O157:H7 for human infection (98).

Bacteria -Combined model
• While there is some debate whether sheep are relatively immune or highly susceptible to infection by Mycobacterium bovis, spillover infections have been documented to occur when animals are exposed to contaminated pasture (99).
• Ovis canadensis has been identified as exposed to Leptospira interrogans through serological surveys (100).
• While tick-borne Anaplasma phagocytophilum is found to persist in a large range of terrestrial mammalian hosts (101), we find no evidence of infection in Zalophus californianus or any other marine mammals.
• We identified one survey of Anaplasma sp. in Ovis canadensis in Montana which conducted testing for Anaplasma phagocytophilum, though no individuals were positive (102). (102) speculate that this lack of infection despite the potential for exposure may be due to the exclusion of Anaplasma genotypes such as A. ovis, which is found to infect Ovis canadensis.
• Regular exposure of Phoca vitulina to Leptospira interrogans has been reported (103), though low antibody titers were interpreted as exposure rather than infection.
• We did not find evidence of Anaplasma phagocytophilum infection in American bison (Bison bison), though A. phagocytophilum is known to infect European bison (Bison bonasus) in Poland (101).
• The remaining links are discussed above.  • Geomyces destructans is the cause of white nose syndrome in multiple bat species (104), but is not considered to be zoonotic.

Fungi -Affinity only model
• Chrysosporium parvum and related species are soil fungi that cause pulmonary infections in rodents, fossorial mammals, their predators, and occasionally humans, though the taxonomy of these pathogens is muddied in the literature (105).
• Neocallimastix frontalis appears to be a commensal fungi of bovid rumens (106) and we cannot find documentation of zoonotic infection.
• Pilobolus sp. play a role in the decomposition of herbivore dung and although they are nonpathogenic to herbivores, they can facilitate the spread of attached parasitic lungworms because of their projectile dispersal system (107). We could find no evidence of human infections.
• Pneumocystis carinii belongs to a genus that normally reside in the pulmonary parenchyma of a wide range of mammals (108). It is capable of causing life threatening pneumonia in immunocompromised hosts and is documented as causing infections in cattle (109), though we found no evidence of infection in Phascolarctos cinereus.
• Trichophpyton terrestre is part of a large species complex with some variants documented to cause human infection (110).
• We cannot find any documentation of Chaetomidium arxii infection in humans, although this genus is well known for its opportunistic animal and human pathogens (111).   • Pneumocystis carinii has been documented to infect pigs (112,108), goats (108), and Oryctolagus cuniculus (113). While it has been documented in other cervids (113), we find no evidence of infection in Cervus elaphus.

Fungi -Phylogeny only
• The remaning top links are discussed above.  • Although the distribution, ecology, and epidemiology of Echinococcus multilocularis in North America is still largely unknown, it does not appear to infect sheep or any other ungulates as it is maintained in a carnivore-rodent prey cycle (114).

Helminths -Affinity only model
• Mesocestoides lineatus has a three-stage lifecycle with two intermediate hosts and a large range of carnivorous mammals as definitive hosts (24). Human infections of the tapeworm Mesocestoides lineatus are rare but can occur through the consumption of chickens, snails, snakes, or frogs (25) and therefore it is unlikely that sheep will consume the intermediate life stages of this parasite.
• Capillaria hepatica (syn. Calodium hepaticum), is a globally distributed zoonotic parasite which uses rodents as main hosts, and while it is known to cause infection in over 180 mammalian species, a recent review of hosts did not include domesticated sheep (26). However, a recent study of Capillaria in Brazil identified two cases which were possibly caused by C. hepatica (115).
• The remaning links are discussed above.  • Schistosoma mansoni infection in Holochilus chacarius was predicted by the phylogeny only model in the full dataset (discussed above).

Helminths -Phylogeny only
• Echinococcus granulosus has not been reported to infect northern nail-tail wallabies (Onychogalea unguifera), other wallaby species including endangered bridled nail-tailed wallaby (Onychogaela fraenata) are involved in the transmission the parasite in Australia (116). Onychogaela unguifera may also be involved in Echinococcosus transmission, but its parasites may not be as well studied compared to the bridled nail-tail wallaby due to its stable conservation status.
• Similarly, Rugopharynx australis is known to infect multiple wallaby species, however the diversity of Rugopharynx and their susceptible hosts is still being discovered (117), suggesting that Onychogalea ungifera may be a promising target for future study.
• While there does not appear to be evidence of Echinococcosus granulosus infection in Canis adustus, other Canis species in Africa are known hosts (118), indicating that this should be a target for future surveillance.
• A recent molecular survey of gastrointestinal parasites of wild ruminants in Tunisia identified Nematodirus spathiger in engandered Gazella leptoceros that were genetically identical to those found in other domestic and wild ruminants (119). This is an example of a successful exploratory study aimed at describing the diversity of parasites in threatened species.
• We did not find much information on the parasites of the near threatened Kobus vardonii, however it is known to inhabit floodplains and grasslands near permanent water in south-central Africa (120) where is likely to be exposed to Cotylophoron cotylophoron, a "rumen fluke" which emerge from snail intermediate hosts and encyst on vegetation, later being ingested by ruminant definitive hosts in East Africa (121).
• Echinococcus granulosus is usually maintained by a domestic cycle of dogs eating raw livestock offal (118), and while its vertebrate-eating congener Lycalopex gymnocercus has been documented to host the parasite (122), Lycalopex vetulus is unlikely to become infected with E. granulosus as it has a largely insectivorous diet (123).
• Canis mesomelas has been reported with infection of Trichinella spiralis in the Kruger National Park, South Africa (124).
• Gazella leptoceros is also predicted to be susceptible to Trichostrongylus vitrinus. Although (119) did not identify this parasite in their study, T. vitrinus has been documented in lambs in Tunisia (125), indicating potential range overlap with G. leptoceros.
• Progamotaenia festiva is known to infect multiple Onychogalea species (126), but we could not find evidence of infection in the endangered Onychogalea fraenata.
• The remaining links are discussed above.  • The natural hosts of Hymenolepis diminuta are rats (23) and we find no evidence of infection in goats.

Helminths -Combined model
• Anisakis simplex uses cetaceans as final hosts, with marine invertebrates and fish as intermediate hosts (30). Whales are infected through ingestion, indicating that while sheep may be susceptible, though they would need sufficient exposure to marine based feed.
• We could not find evidence of Echonococcus granulosus infection in Sorex araneus, however this species and other Sorex sp. are known hosts of Echinococcus multilocularis (127).
• The other top links are discussed above.  • As T. cruzi is currently restricted to the Americas (20), it is unlikely to infect black rhinos (Diceros bicornis) or gorillas (Gorilla gorilla) in natural conditions, unless facilitated by human activities.

Protozoa -Affinity only model
• Giardia has been identified in a captive bred Diceros bicornis calf (128) in San Diego, indicating the potential for grey literature from zoo and captive breeding facilities to inform potential hostparasite interactions.
• Toxoplasma gondii has been documented to infect chimpanzees (Pan troglodytes), and interestingly appears to mirror the infection-induced behaviour in rodents and humans, with infected chimpanzees attracted to the urine of leopards, their only natural predator (129).
• Recent finding of a Gorilla gorilla individual seropositive for T. gondii at a primate center in Gabon (130).
• T. cruzi was recently identified in Equus caballus, marking the first evidence of infection in equids (131).
• Although T. cruzi naturally occurs in the Americas, and thus natural infection of chimpanzees (Pan troglodytes) is unlikely, a fatal infection was documented in a captive individual in Texas (132).  • Canis aureus with antibodies against T. gondii have been identified in captive animals in the United Arab Emirates (133).

Protozoa -Phylogeny only
• Two recent reviews of parasites in non-human primates find no documented infection of the critically endangered Cotton-top tamarin (Saguinus oedipus) by Trypanosoma cruzi, although multiple Saguinus sp. have been documented with infections (134,135). However, a 1982 study of Colombian monkeys and marmosets identified S. oedipus as a host for T. cruzi for the first time (136). This highlights the potential conservation importance of this parasite for S. oedipus and the need for periodic disease surveys of critically endangered species.
• Similarly, Saimiri oerstedii is not listed by these reviews as a host of T. cruzi, although a 1972 study identifies S. oerstedii as a reservoir for the parasite in Panama (137). While this report should be followed up with contemporary diagnostic methods, this reiterates the difficulty of exhaustively searching the literature for interaction data and the utility of link prediction methods to for directing these efforts.
• Toxoplasmoa gondii infection in Cuon alpinus has rarely been investigated, except for one captive individual which tested negative in serological testing (138).
• High prevalence of antibodies against Toxoplasma gondii was found in wild dogs (Lycaon pictus) in the Kruger National Park, South Africa, and was documented as causing a fatal infection in one pup (139), indicating that this parasite has the potential to influence the population dynamics of this endangered canid.
• We did not find any reports of T. gondii infection in Taiwan serow Capricornis swinhoei, although direct evidence of infection has been found in Japanese serow (142) and T. gondii has been found to infect multiple animals in Taiwan (143).
• The Siberian tiger Panthera tigris altaica acts as a definitive host for T. gondii and is observed to naturally shed oocysts (144).  • The first eight links were predicted by models discussed above.

Protozoa -Combined model
• Bubalus bubalis is a well established host of Toxoplasma gondii, though they are considered resistant to clinical toxoplasmosis (145).
• Natural infections of Toxoplasma gondii in Papio anubis were recently documented via genetic sequencing (146).  • Simian immunodificiency virus strains from wild primates have previously shifted to infect humans and are responsible for the AIDS pandemic (from HIV-1) (147).

Viruses -Affinity only
• Rabies positive Macaca mulatta have recently been reported in India (148).
• Bovine alphaherpesvirus 1, the main casual agent of infectious bovine rhinotracheitis, is largely restricted to cattle and not currently considered to infect humans (149).
• Canine mastadenovirus a, formerly Canine adenovirus 1 is known to infect dogs and circulate in wild carnivores (15), though we could not find evidence of human infection.
• Ovine herpesvirus 2, the casual agent of sheep associated malignant catarrhal fever, is asymptomatic in its natural hosts and cause severe disease in susceptible animals that are dead end hosts, however to date there is no evidence of infection in humans (150).
• The remaining links are discussed above.  • The first four links were predicted by previous models and discussed above.

Viruses -Phylogeny only
• Rabies has been isolated from a single Myotis nattereri individual in France (151).
• In 2017, an individual Myotis blythii from Croatia tested positive for antibodies against rabies (152).
• Rabies has been detected in Myotis myotis in a few European countries (153).
• We did not find evidence of rabies infection in Myotis macrodactylus or Myotis mystacinus.
• Identification of rabies positive Myotis dasycneme in the Netherlands (154).  • We did not find evidence of rabies infection in Chlorcebus aethiops, however several cercopithecine monkeys are known to be susceptible to infection (35).
• The remaining links are discussed above.