High-throughput Identification of Eukaryotic Parasites and Arboviruses in Mosquitoes

Vector-borne pathogens cause many human infectious diseases and are responsible for high mortality and morbidity throughout the world. They can also cause livestock epidemics with dramatic social and economic consequences. Due to the high costs, vector-borne disease surveillance is often limited to current threats, and the investigation of emerging pathogens typically occur after the reports of clinical cases. Here, we use high-throughput sequencing to detect and identify a wide range of parasites and viruses carried by mosquitoes from Cambodia, Guinea, Mali and Maryland. We apply this approach to individual Anopheles mosquitoes as well as pools of mosquitoes captured in traps; and compare the outcomes of this assay when applied to DNA or RNA. We identified known human and animal pathogens and mosquito parasites belonging to a wide range of taxa, insect Flaviviruses, and novel DNA sequences from previously uncharacterized organisms. Our results also revealed that analysis of the content of an entire trap is an efficient approach to monitor and identify potential vector-borne pathogens in large surveillance studies, and that analyses of RNA extracted from mosquitoes is preferable, when possible, over DNA-based analyses. Overall, we describe a flexible and easy-to-customize assay that can provide important information for vector-borne disease surveillance and research studies to efficiently complement current approaches.


Introduction 69
Different arthropods can, during a blood feeding, transmit viruses, protists and helminths to humans (1). 70 These organisms cause some of the most prevalent human infectious diseases, including malaria, 71 dengue, schistosomiasis or Chagas disease, and are responsible for more than 700,000 human deaths 72 worldwide every year (2-4). Vector-borne diseases are also responsible for some of the most alarming 73 recent epidemics in the western hemisphere, either due to the emergence of new pathogens (e.g., Zika 74 (5, 6)), the reemergence of historically important pathogens (e.g., Yellow Fever (7)) or the expansion of 75 diseases beyond their historical ranges (e.g., West Nile (8) and Chikungunya (7)). In addition to this 76 burden on human health, many vector-borne diseases affect domesticated animals (e.g., heartworms (9, 77 10)), livestock (e.g., Theileriosis (11,12)) and wild animals (e.g., avian malaria (13,14)). Some of these 78 animal diseases have dramatic economic consequences in endemic areas (11,12), while others are 79 zoonotic diseases, further affecting human populations (15)(16)(17)(18)(19)(20). 80 Efficient vector-borne disease surveillance is critical for reducing disease transmission and preventing 81 outbreaks. Past elimination campaigns for vector-borne diseases, usually targeting a specific human 82 pathogen, have often relied on entomological approaches such as widespread insecticide spraying and 83 disruption of larval habitats (21,22). To be successful, such efforts need to be guided by detailed 84 knowledge of the parasites' and vectors' distributions. Unfortunately, current entomological 85 surveillance approaches are extremely resource-intensive: the collection of samples is time consuming 86 requiring trained personnel, vector species identification is laborious, and the detection of pathogens is 87 expensive since hundreds of mosquitoes typically need to be screened to identify a few infected ones. 88 Consequently, public health officials and vector biologists typically focus on monitoring only a few 89 specific pathogens associated with the most current threats. These constraints are particularly 90 problematic as they hamper the early detection of emerging pathogens and vector surveillance is often 91 implemented in response to reports of clinical cases rather than preventively. 92 We have recently described a sequencing-based method using high-throughput amplicon sequencing to 93 detect known and previously uncharacterized eukaryotic parasites from biological samples in a 94 comprehensive, high-throughput and cost-efficient manner (23). Here, we present the application of this 95 approach to characterize eukaryotic parasites and arboviruses from more than 900 individual Anopheles 96 mosquitoes collected in Cambodia, Guinea and Mali, as well as from 25 pools of mosquitoes captured in 97 CDC CO 2 -baited light traps in Maryland, USA. We also compare the performance of the assay when 98 screening DNA and RNA from the same samples. Overall, our study demonstrates how this sequencing-99 based assay could significantly improve monitoring of human and animal vector-borne pathogens. 100 101

Methods 102
Samples 103 We analyzed a total of 930 individual mosquitoes, as well as 25 pools each containing 50-291 104 mosquitoes (2,589 total) ( Table 1 and Supplemental Tables 1-3). 105 First, we analyzed DNA previously extracted from 265 individual Anopheles mosquitoes collected in the 106 Cambodian provinces of Pursat, Preah Vihear, and Ratanakiri (24). These mosquitoes were collected 107 using cow-or human-baited tents, human landing collections, CDC light traps and barrier-screen fences 108 and immediately preserved by desiccation upon collection. These 265 Anopheles mosquitoes represent 109 22 different species collected between July and August of 2013 (see Supplemental Table 1 for details). 110 Second, we included DNA samples from 81 individual mosquitoes collected in Bandiagara, Mali. DNA 111 from these samples was extracted using Chelex® 100 (Bio-Rad) after incubation of bisected and 112 homogenized mosquitoes in 1% saponin in PBS. 113 Third, we extracted DNA from 584 individual Anopheles mosquitoes collected in six sites in Guinea and 114 preserved in ethanol immediately upon collection. These mosquitoes were collected by human landing 115 catch and pyrethrum spray (Supplemental Table 2). Each mosquito was homogenized in 200 µl 116 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint ATL/proteinase K solution using five RNase-free 1 mm zirconium oxide beads in a TissueLyser II for 12 117 minutes at 20 m/s. We centrifuged the solution at 2500 rpm for three minutes and incubated them at 118 55°C for one hour. We performed a second homogenization step for four minutes at 20 m/s followed by 119 a final incubation at 55°C overnight. We then isolated DNA using the Qiagen DNeasy 96 Blood & Tissue 120 Kit according to the manufacturer's instruction and eluted DNA from each sample in 200 µl. 121 Finally, we analyzed 25 pools of mosquitoes collected throughout Prince George's county (Maryland,122 USA) by the Maryland Department of Agriculture using CO 2 -baited light traps (Supplemental Table 3). 123 Each pool contains all mosquitoes from one light-trap (~50-291 mosquitoes) and was stored at room 124 temperature for up to 24 hours before long-term storage at -20°C. We homogenized each pool of 125 mosquitoes using a Qiagen TissueLyser II with Teenprep Matrix D 15 ml homogenization tubes (MP 126 Biomedicals) and isolated successively RNA and DNA from each sample using the RNeasy PowerSoil 127 Total RNA kit (Qiagen) with the RNeasy PowerSoil DNA Elution Kit and a final elution volume of 100 µl. 128

Evaluation of Arbovirus primers 129
We tested universal flavivirus primers retrieved from the literature (25) on West Nile (n=3), Zika (n=2) 130 and Dengue (n=2) viral RNAs obtained from the American Type Culture Collection (ATCC). We 131 synthesized cDNA from 2 µL of RNA using M-MLV reverse transcriptase (Promega) and random 132 hexamers, and amplified the resulting cDNA with GoTaq® DNA polymerase (Promega) under the 133 following conditions: initial two-minute denaturing step at 95°C followed by 40 cycles of 95°C for 30 134 seconds, 50°C for 30 seconds and 72°C for 40 seconds. A final extension at 72°C for ten minutes was 135 followed by incubation at 4°C. We ran the products on an agarose gel to determine whether each virus 136 RNA was amplifiable. 137 PCR amplification of pathogen nucleic acids before high-throughput sequencing 138 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 First, we synthesized cDNA using M-MLV reverse transcriptase (Promega) and random hexamers from 139 either 1) 2 µl of the nucleic acids isolated from the Guinean mosquitoes (i.e., using RNA carried over 140 during the DNA extraction), or 2) 3 µL of RNA extracted from the pools of Maryland mosquitoes. 141 Then, we amplified DNA and cDNA (when available) from each sample, as well as from 176 no-DNA 142 controls, with a total of 11 primer pairs, each targeting a specific taxon known to contain human 143 pathogens ( Table 2). For each primer pair, we amplified DNA and cDNA using GoTaq® DNA polymerase 144 (Promega) under the following conditions: initial denaturing step at 95°C followed by 40 cycles of 95°C 145 for 30 seconds, 50°C for 30 seconds and 72°C for 30 seconds. A final extension at 72°C for ten minutes 146 was followed by incubation at 4°C. All primers used in these taxon-specific PCRs included 5'-end tails to 147 serve as priming sites for a second PCR. We then pooled all PCR products generated from one sample 148 and performed a second PCR using primers targeting these tails to incorporate, at the end of each 149 amplified molecule, i) a unique oligonucleotide "barcode" specific to each sample and ii) DNA sequences 150 complementary to the Illumina sequencing primers (26, 27) (Supplemental Figure 1). Finally, we pooled 151 together the resulting barcoded libraries and sequenced them on an Illumina sequencer to generate an 152 average of 12,703 paired-end reads of 251 or 301 bp per sample. 153

Bioinformatic analyses 154
We first separated the reads generated from each sample according to their unique barcodes and 155 merged the overlapping ends of each read pair using PANDAseq (28) to generate consensus DNA 156 sequences and correct sequencing errors (that disproportionally occur at the end of the reads). Note 157 that all primers were designed to amplify DNA sequences shorter than ~450 bp, allowing overlap of at 158 least 50 bp between paired-end reads. All read pairs that did not merge correctly were discarded from 159 further analyses. We identified and trimmed the primer sequences from each read and eliminated all 160 consensus sequences shorter than 100 bp as they likely represent experimental artefacts (e.g., PCR 161 chimeras and primer dimers). Using reads from all samples together, we recorded how many unique 162 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint DNA sequences were obtained and how many reads carried each of these unique DNA sequences. 163 Sequences observed less than ten times in the entire dataset were omitted as they likely resulted from 164 PCR or sequencing errors (26). We then compared each unique DNA sequence to all sequences 165 deposited in the NCBI nt database using BLAST (29) and used custom pipelines 166 (https://github.com/MVesuviusC/2020MosquitoSurveillancePaper) to retrieve the taxonomic 167 information associated with the most similar sequence(s). For each sample, only sequences with at least 168 10 reads and more than 70% identity with an annotated NCBI sequence over the entire sequence length 169 were further considered. This identity cutoff, while low, allows inclusion of results from highly 170 genetically divergent organisms which can then be scrutinized further. This is critical when identifying 171 species without closely related sequences available. If DNA sequences from multiple species were 172 equally similar to one of our sequences, we recorded all corresponding species names. Finally, we 173 summarized, for each mosquito, the parasite species or virus identified, the percentage identity 174 between the reads and the most similar NCBI sequence(s), and the number of reads supporting the 175 identification in this sample. 176

Phylogenetic analyses 177
To better characterize specific DNA sequences with ambiguous species identification, we analyzed these 178 sequences together with orthologous sequences from closely related species. Briefly, we used 179 PrimerTree (26) to retrieve NCBI orthologous DNA sequences from all species of the targeted taxon. We 180 aligned these sequences with the DNA sequence(s) amplified from the mosquito(es) using MAFFT (30) 181 and reconstructed neighbor-joining trees using MEGA (31) to estimate the phylogenetic position of the 182 amplified DNA sequences. 183

Further determination of taxonomical assignments 184
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made corresponding to the expected 900 bp PCR product, and dissolved it in 100 µl of water at 60°C for 20 195 minutes. We then re-amplified 10 µl of this DNA using 35 PCR cycles with the same conditions. After gel 196 electrophoresis, we treated the PCR reaction with 0.046 µl of Exonuclease I (NEB) and 0.4625 µl of 197 Shrimp alkaline phosphatase (Affymetrix) at 37°C for 30 minutes, with a final five-minute inactivation 198 step at 95°C. We then Sanger sequenced each PCR product in both directions using the forward and 199 reverse primers. We manually trimmed the reads and merged them using Flash (32). We aligned the 200 reads, along with known Theileria sequences from the NCBI nucleotide database, using MAFFT (30, 33) 201 and generated a neighbor joining tree with 500 bootstraps and plotted it in MEGA7 (34). 202 To identify the species of the filarial worms detected in two individual mosquitoes, we designed primers 203 to amplify a 3.5 kb portion of the mitochondrial DNA. Briefly, we downloaded all available filarial worm 204 (Filarioidea) mitochondrial sequences from the NCBI nucleotide database, aligned them, generated a 205 consensus sequence and designed primers using primer3 (35). We then used these primers 206 (TTCGTCGTGAGACAGAGCGG, AGGCCATTGACGGATGGTTTGTAC) to amplify DNA from the two positive 207 mosquitoes using the Expand™ Long Range dNTPack kit (Sigma) using the following conditions: initial 208 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint denaturing step at 95°C for two minutes followed by 45 cycles of 92°C for 30 seconds, 55°C for 30 209 seconds and 68°C for five minutes. A final extension at 68°C for ten minutes was followed by incubation 210 at 4°C. We then performed a second PCR to add 10 bp barcodes to the 5' end of both forward and 211 reverse primers to allow differentiating both samples after sequencing. The two barcodes differed by 8 212 and 7 bases for the forward and reverse primers, respectively, with no more than 2 identical bases in a 213 row (Supplemental Table 4). For this second PCR, we used the following conditions: initial denaturing 214 step at 95°C for two minutes followed by 10 cycles of 92°C for 30 seconds, 55°C for 30 seconds and 68°C 215 for five minutes. A final extension at 68°C for ten minutes was followed by incubation at 4°C. We 216 purified the amplicons using AMPure XP beads (Beckman Coulter) (2:1 DNA:beads ratio) and then 217 combined equimolar amounts of each barcoded PCR product before circular consensus sequencing on a 218 PacBio Sequel. We then generated a consensus sequence for each sample and aligned these sequences 219 to known nematode mitochondrial sequences using Mafft (30) and generated a neighbor joining tree in 220 MEGA (36). 221

Assessment of the dynamics of viral and mosquito RNA degradation 222
To assess the dynamics of viral RNA degradation over time, we analyzed colony Culex pipiens 223 mosquitoes known to carry Culex flavivirus. The colony was initiated from diapausing adult Culex pipiens 224 that were collected from Oak Lawn and Des Plaines, IL, on 2/8/10. These two collections were combined 225 to make one colony, which was determined to be Culex flavivirus positive according reverse 226 transcriptase PCR (25). We examined three pools of five mosquitoes for each condition (i.e., stored with 227 no preservative, in ethanol or in RNAlater (Invitrogen)) and at each time point (i.e., fresh, after two-228 week or after four-week storage at room temperature). After 0, 2 or 4 weeks at room temperature, the 229 mosquitoes were stored at -80°C until RNA isolation. We isolated RNA from each pool of mosquitoes 230 using Qiazol (Qiagen) and eluted into 50 µl. We synthesized cDNA from 7 µl of RNA using m-MLV 231 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Amplicon sequencing for high-throughput characterization of microorganisms in mosquitoes 246
We analyzed 265 Anopheles mosquitoes collected in Cambodia, 665 Anopheles mosquitoes collected in 247 Guinea and Mali as well as the content of 25 light traps, each containing 50-291 mosquitoes, collected in 248 Maryland, USA. We screened each sample for a wide range of eukaryotic parasites using 10 primer sets 249 designed to amplify DNA from all species of the taxa known to include human pathogens: 250 Apicomplexans, Kinetoplastids, Parabasalids, nematodes, Platyhelminthes and Microsporidians (Table  251 2). We also screened RNA extracted from the individual African Anopheles and from the pools of 252 mosquitoes from Maryland for flaviviruses (see Materials and Methods, Table 2). After taxon-specific 253 amplification, we pooled all PCR products generated from the same mosquito together, barcoded them 254 and sequenced all libraries to generate an average of 12,703 paired-end reads per sample 255 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint (Supplemental Figure 1). After merging read pairs, stringent quality filters and removal of the products 256 of off-target amplification (e.g., Anopheles and bacteria DNA sequences), we obtained 61,177 unique 257 DNA sequences, each represented by ten reads or more, and accounting in total for 6,796,105 reads 258 (Supplemental Table 5). These sequences were amplified with all primers and from a total of 185 259 samples: 42 out of 265 Cambodian mosquitoes (16%), 120 out of 665 African mosquitoes (18%), and 23 260 out of the 25 pools (92%) of mosquitoes collected in Maryland were positive for at least one of the taxa 261 tested. On average, each sequence was supported by 1,306 reads per sample (range: 10-43,440). By 262 contrast, out of 176 negative controls, only 12 (7%) yielded any sequence from the targeted taxa and 263 those were represented by 213 reads on average (range: 10-3,539). 264

Identification of eukaryotic parasites 265
We retrieved DNA sequences identical to sequences previously amplified from Theileria parasites from 266 22 African and 15 Cambodian mosquitoes, as well as from seven of the Maryland traps. Theileria 267 sequences were successfully amplified with both the Apicomplexa and Eimeronia primer pairs. All 268 samples positive for Theileria with the Eimeronia primers were also positive with the Apicomplexa 269 primers. On the other hand, the Eimeronia primers provided sufficient information to assign each 270 sequence to a single species, while the sequences amplified with Apicomplexa primers were unable to 271 differentiate among the Theileria species (see also below). We detected sequences identical to 272 Plasmodium falciparum in eight African samples and two Cambodian samples, while sequences most 273 similar (82.0%-99.5% identity) to bird Plasmodium species were amplified from 20 of the 25 traps in 274 Maryland ( Table 3). We also amplified a sequence that was identical to several Babesia species (100% 275 identity) in one trap by two different primer pairs. Finally, we detected DNA from a known 276 apicomplexan parasite of mosquitoes, Ascogregarina barretti (37), in two of the traps. 277 From all individual mosquitoes, only one Cambodian Anopheles yielded a Kinetoplast sequence that was 278 most similar to Strigomonas culicis (96.9% identity). By contrast, 22 of the traps were positive for 279 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The taxonomic resolution of the nematode primers was lower than that of the other taxon-specific 298 primer pairs and the amplified sequences often matched multiple species (or even genera). We 299 amplified nematode sequences from 33 African Anopheles, including sequences most similar to 300 Abursanema, Acanthocheilonema,Auanema,Caenorhabditis,Dipetalonema,Filarioidea,Loa,301 Loxodontofilaria, Madathamugadia, Onchocerca, Pelecitus, Setaria or Trichuris,although the sequence 302 similarity (95.3-100%) clearly indicated that, in some cases, the exact identity of the species was 303 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint unknown (see also below). Ten Cambodian mosquitoes were positive for Setaria digitata (100% identity) 304 while other mosquitoes yielded sequences that matched Setaria and one or more of the following 305 genera: Aproctella, Breinlia,Dipetalonema,Dirofilaria,Loa,Loxodontofilaria,Madathamugadia,306 Onchocerca, Pelecitus. Nineteen different traps from Maryland produced nematode sequences with 307 particularly high read counts of Setaria, Yatesia and Dirofilaria sequences (98.9% -100% identity). Other 308 genera detected in the traps included Acanthocheilonema, Aproctella,Cercopithifilaria,Choriorhabditis,309 Dipetalonema,Elaeophora,Filarioidea,Loa,Loxodontofilaria,Onchocercidae. 310 Overall, using this single assay, we screened over 3,500 mosquitoes from three geographic locations and 311 identified DNA sequences from numerous microorganisms encompassing six classes, 12 orders and 23 312 families ( Table 3). 313

Identification of Flaviviruses in mosquitoes 314
To detect and identify flaviviruses, we used a primer pair predicted in silico to amplify a wide range of 315 flaviviruses, including all known human pathogens (25), and we validated that these primers successfully 316 (71.1%-74.3%) and they clearly separated from those viruses in phylogenetic analysis (Figure 1 and 322 Supplemental Figure 2). These sequences likely derive from viruses that have not been sequenced yet 323 but, since that they cluster with other mosquito flaviviruses (Figure 1 and Supplemental Figure 2), it is 324 likely that they represent mosquito-infecting viruses rather than new human pathogens. 325 One limitation of our study is that the mosquitoes collected in Maryland, USA were, as typical in many 326 entomological surveys, stored at room temperature upon collection which might have affected RNA 327 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint preservation. To assess the stability of viral and mosquito RNA in samples stored at room temperature, 328 we kept pools of colony mosquitoes known to be infected with Culex flavivirus at room temperature for 329 up to four weeks after collection, with and without preservative (ethanol or RNAlater). After RNA 330 extraction and cDNA synthesis, we determined the amount of mosquito and virus RNA amplifiable using 331 real-time PCR (see Material and Methods for details). Without preservative, the mosquito RNA was 332 largely degraded after two weeks (detectable in only one of three replicates) and undetectable after 333 four weeks (Supplemental Figure 3). By comparison, under the same conditions, viral RNA was still 334 detectable after four weeks (Supplemental Figure 3). As expected, when the mosquitoes were 335 preserved in either ethanol or RNAlater, neither viral nor mosquito RNA showed major change in 336 concentration over four weeks at room temperature. 337

Follow-up phylogenetic studies 338
The taxon-specific primers used in the high-throughput sequencing assay were designed to amplify all 339 members of the chosen group while avoiding off-target amplification and providing as much taxonomic 340 information as possible. However, these criteria, combined with the requirement for short sequences 341 (to be sequenceable on a massively parallel sequencer) sometimes limits their resolution. 342 Thus, the Apicomplexa primers amplified multiple Theileria sequences but did not distinguish among 343 species. We therefore amplified a longer DNA sequence (900 bp) of the 18S rRNA locus from the 344 Theileria-positive African and Cambodian mosquitoes and sequenced them using Sanger sequencing 345 technology. Phylogenetic analysis of these longer sequences, together with known Theileria species 346 sequences deposited in NCBI, showed that the parasites amplified from the Cambodian mosquitoes 347 were closely related to T. sinensis, while those from African mosquitoes were most closely related to T. 348 velifera and T. mutans (Figure 2). 349 We also detected, in several African mosquitoes, filarial worm sequences whose taxonomic assignment 350 was uncertain. One sequence was 100% identical to both Loa loa and Dipetalonema sp. YQ-2006 (also 351 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint known as Mansonella) while the sequence obtained from the same mosquitoes using a different primer 352 pair was also most similar to Dipetalonema (Mansonella) but with 99.2% identity. To clarify the 353 taxonomy of these sequences, we used PacBio long read technology to sequence 3.5kb of filarial worm 354 mitochondrial DNA (amplified by long range PCR from these two mosquitoes). We compared these 355 sequences to known filarial worm mitochondrial DNA sequences and found that these were most similar 356 to, but distinct from, Mansonella perstans (94 and 96 nucleotide differences or~97.0% identity), while 357 Loa loa was much more distantly related (~83.3% identity) (Figure 3). The genetic distance between 358 Mansonella perstans and Loa loa in this tree was much higher (519 nucleotide differences, 83.5% 359 identity) than using short amplicon data where these two sequences were identical, providing greater 360 confidence in the phylogenetic analysis. We concluded that the filarial worms are most likely either 361 Mansonella perstans or a very closely related species. 362

Analysis of individual vs pooled mosquitoes 363
We analyzed both individual mosquitoes and pools of 50-291 mosquitoes. For the pools, 23 out of the 364 25 produced sequences demonstrating that the amplification of pools of up to 291 mosquitoes is 365 feasible without significant PCR inhibition. Out of the 930 individual mosquitoes, 162 (17.4%) had at 366 least ten reads from one or more parasites or arboviruses. By comparison, 23 out of the 25 traps (92%) 367 yielded such sequences, meaning that fewer samples needed to be screened to detect pathogens when 368 samples were pooled. Note however that the individual mosquitoes and the CDC traps were not 369 collected from the same geographic locations and this could possibly cofound the results described. 370

Analysis of DNA vs. RNA extracted from the same mosquito traps 371
All primers used for detecting parasites and viruses are located within single-exon genes and can amplify 372 either DNA or cDNA with the same efficiency. However, the PCRs target genes that are typically highly 373 expressed (e.g., ribosomal RNA genes) and we would therefore expect many more copies of RNA than 374 DNA per cell, (although this could be diminished by the faster degradation of RNA molecules compared 375 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 to DNA). To evaluate the relative sensitivity of our assay for screening DNA and RNA, we compared the 376 results obtained by analyzing matched DNA and RNA isolated from the same mosquito pools. We found 377 that for Spirurida, Kinetoplast, Microsporidia and Plasmodium PCR assays, 62.2% of the sequences 378 identified were detected only in the cDNA sample and not in the corresponding DNA sample from the 379 same trap. For those cases where a sequence was detected in both cDNA and DNA from a given trap, 380 the cDNA yielded more reads in 89 of 119 instances (with, on average, 24.5-fold more reads). Read Vector-borne disease surveillance is an essential component of infectious disease control as it can 390 enable rapid detection of outbreaks and guide targeted elimination efforts (e.g., through insecticide 391 spraying). However, current approaches are extremely demanding in regards to human and financial 392 resources, both for the sample collection and the identification of potential pathogens. Consequently, 393 public health officials and vector biologists often have to focus on a handful of parasites associated with 394 the most current threats. Current detection approaches also often lead to duplicated efforts, as 395 different agencies interested in specific pathogens perform sample collection independently and have a 396 high risk of failing to detect emerging pathogens until they cause outbreaks. Here, we describe 397 application of a genomic assay that allows identification of a wide range of pathogens that can cause 398 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint human and animal diseases, as well as of parasites of the vector that could potentially be useful as 399 biological controls. 400 The analyses of several hundred mosquitoes collected in Cambodia, Mali, Guinea and Maryland revealed 401 well-known human pathogens including P. falciparum, which was the target of the initial study of the 402 Cambodian samples (24). In addition, we detected Theileria species and Setaria digitata, which cause 403 livestock diseases in Southeast Asia (38-42). While we were initially unable to conclusively determine 404 the exact Theileria species with our initial assay, targeted follow-up studies using longer amplicons and 405 Sanger sequencing (Figure 2) revealed that the sequences amplified from the African mosquitoes were 406 most closely related to T. velifera and T. mutans, which are both known to infect African cattle (43),  407 whereas the Cambodian mosquitoes carried sequences most closely related to T. sinensis, a species that 408 infects cattle in China (44). 409 Theileria parasites are transmitted by ticks, not mosquitoes, and the DNA sequences recovered likely 410 derive from parasites taken up by the mosquitoes during a blood meal but likely not transmissible to 411 another hosts. The Schistosoma species detected in mosquitoes from Africa also likely result from 412 parasites present in a bloodmeal. In this regard, it is interesting to note that when one considers the 413 samples collected in Maryland and analyzed with both DNA and RNA, the read counts (a proxy for the 414 abundance of extracted molecules) for transmissible parasites (e.g., Plasmodium) or parasites of the 415 mosquitoes (e.g., Crithidia, Strigomonas and Takaokaspora) were typically higher in the RNA samples 416 than in the matched DNA samples while the opposite was true for parasites "sampled" during the blood 417 meal but unlikely to develop in Anopheles mosquitoes (e.g., Theileria, Trypanosoma) (Figure 4, 418 Table 6). We speculate that this difference is due to the 419 difference between developing, live, parasites still synthesizing RNA molecules and dead (possibly 420 digested) parasites for which the RNA is slowly being degraded. Comparison of DNA and RNA from the 421 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Supplemental Figure 4 and Supplemental
The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint same mosquito could perhaps provide a tool to differentiate transmissible parasites from those sampled 422 by the vector. 423 We also identified, in two African mosquitoes, sequences similar to known filarial worms but identical to 424 multiple sequences present in the database. Using this information, we characterized longer DNA 425 sequences and showed that these two mosquitoes likely carried Mansonella perstans parasites. Since 426 the PCR primers are designed to amplify any member of the selected taxa, they can reveal the presence 427 of novel pathogens as long as they are phylogenetically related to known parasites. This feature is a key 428 advantage of our assay for vector-borne disease surveillance as it may enable early detection of 429 emerging pathogens and zoonoses and provide a basis for rapid response. 430 In addition to known human parasites and potential emerging pathogens, this single-stop assay also 431 provides another source of information valuable for vector-borne disease control: 9% of the individual 432 mosquitoes and 62% of pooled mosquito samples screened yielded sequences of microsporidians 433 related to well-characterized arthropod parasites, which could potentially be used to guide the 434 development of targeted biological vector control. This ability to detect multiple parasites at once in a 435 high-throughput manner and across a wide range of taxonomical groups could reduce duplication of 436 collection efforts and costs, as mosquitoes collected for one purpose could be screened for many 437 parasites affecting both humans and animals. In addition, comprehensive characterization of the 438 parasites present in a given mosquito may also improve our understanding of the general factors 439 regulating infection and transmission: several studies have shown that immunity and previous infections 440 can influence the response of mosquitoes to human parasites and their transmission (45-47) and 441 information of current infections of wild-caught mosquitoes could, for example, significantly improve 442 our assessment of their vector capacity. 443 Several of the infectious diseases that have recently caused major public health challenges by spreading 444 outside of their typical range (7,8) or emerging as novel human infectious diseases(5, 6), are caused by 445 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 viruses transmitted by mosquitoes. We therefore extended our assay to capture, using the same 446 approach as for eukaryotic parasites, both known and novel flaviviruses. Since flaviviruses are RNA 447 viruses and RNA degrades much faster than DNA, we first examined how nucleic acid degradation 448 influenced our ability to detect virus over time. To test RNA preservation, we collected mosquitoes 449 known to carry Culex flavivirus and isolated RNA from pools of five mosquitoes, either immediately 450 frozen or kept at room temperature for two or four weeks, with either no preservative, ethanol or 451 RNAlater. The mosquitoes stored in preservatives had minimal loss of viral (and mosquito) RNAs as 452 determined by qRT-PCR (Supplemental Figure 3). Even when stored without preservatives, viral RNA 453 were detectable after 4 weeks at room temperature (although with a reduction of, on average, 10.7 PCR 454 cycles), demonstrating a remarkable stability of the RNA, possibly due to protection provided by the 455 viral capsid (by contrast very little mosquito RNA remained amplifiable after two weeks at room 456 temperature, Supplemental Figure 3). As a proof-of-principle and to demonstrate the potential of this 457 approach for viral disease surveillance, we screened the Maryland mosquito pools and the individual 458 African mosquitoes for flaviviruses. We identified several viruses, distinct from known viruses (Figure 1) 459 and, based on their phylogenic position, likely to infect mosquitoes rather than humans. 460 Based on the results described above, we believe that this single high-throughput assay can provide a 461 wide range of information critical for vector-borne disease researchers and public health officials. 462 However, several limitations need to be noted. One caveat is that, whereas false positive detection of a 463 species is highly unlikely (aside from laboratory cross-contamination), several factors could lead to false 464 negatives. Thus, while the primers were designed to amplify all known sequences of a given taxon as 465 effectively as possible, nucleotide differences at the primer binding sites could prevent efficient 466 amplification of a specific species. This potential problem could be particularly problematic if several 467 related parasites are present in the same sample but are differentially amplified: for example, it could 468 be possible that a Plasmodium parasite might be mis-detected if the sequences generated by an 469 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Apicomplexan primer pair are out competed by Theileria sequences. Similarly, poor preservation of the 470 nucleic acids in one sample could also lead to false negatives. False negatives could also occur for 471 stochastic reasons: if only a few parasite cells are present in one sample (e.g., an Anopheles mosquito 472 infected by a Plasmodium ookinete) it is possible that no DNA will be present in the PCR reaction 473 (especially if the extract gets divided across many reactions). One approach to circumvent this limitation 474 could be to test cDNA instead of DNA (48, 49): our analyses of the Maryland mosquitoes showed that, 475 for many primer sets, amplification of cDNA resulted in higher read counts than amplification of DNA 476 extracted from the same samples, despite the sub-optimal preservation of these samples. Another 477 limitation is the specificity of taxonomic assignment. As discussed above, if the sequenced amplicon 478 does not contain enough information to distinguish similar species, subsequent experiments may be 479 required to confirm pathogen identity for important detection events. 480 Finally, we showed that analyses of fairly large pools of mosquitoes (up to ~300 mosquitoes) were 481 possible with our assay. This feature could be extremely useful in specific situations, such as for 482 efficiently detecting emerging pathogens, monitoring the spread of pathogens into new regions, or for 483 validating the success of elimination control programs. 484 485

Conclusion 486
This study demonstrates how our high-throughput, one-stop assay could efficiently complement current 487 toolkits to prevent vector-borne diseases by providing a comprehensive description of known and 488 emerging human viruses and parasites, informing on animal pathogens that could affect a region's 489 economy, and indicating possible biological control candidates that could be used against these disease 490 vectors. One additional feature of this sequencing-based assay is the ease of customizing it to different 491 settings and research questions. Since the assay relies on PCR primers, it is straightforward to add and 492 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made   . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021.   Theileria sequences amplified from samples positive by high-throughput sequences and those from 667 known Theileria species deposited in NCBI. Sequences amplified from Cambodian mosquitoes are 668 indicated in green circles, those amplified from African mosquitoes in red squares. 669 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint The neighbor-joining tree shows the relationships between annotated filarial worm 671 sequences and a 3.5 kb sequence amplified from two African mosquitoes (red squares) positive for 672 filarial worms and sequenced using PacBio chemistry. 673 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 14, 2021. ; https://doi.org/10.1101/2021.01.12.426319 doi: bioRxiv preprint