A random priming amplification method for whole genome sequencing of SARS-CoV-2 and H1N1 influenza A virus

Background Non-targeted whole genome sequencing is a powerful tool to comprehensively identify constituents of microbial communities in a sample. There is no need to direct the analysis to any identification before sequencing which can decrease the introduction of bias and false negatives results. It also allows the assessment of genetic aberrations in the genome (e.g., single nucleotide variants, deletions, insertions and copy number variants) including in noncoding protein regions. Methods The performance of four different random priming amplification methods to recover RNA viral genetic material of SARS-CoV-2 were compared in this study. In method 1 (H-P) the reverse transcriptase (RT) step was performed with random hexamers whereas in methods 2-4 RT incorporating an octamer primer with a known tag. In methods 1 and 2 (K-P) sequencing was applied on material derived from the RT-PCR step, whereas in methods 3 (SISPA) and 4 (S-P) an additional amplification was incorporated before sequencing. Results The SISPA method was the most effective and efficient method for non-targeted/random priming whole genome sequencing of COVID that we tested. The SISPA method described in this study allowed for whole genome assembly of SARS-CoV-2 and influenza A(H1N1)pdm09 in mixed samples. We determined the limit of detection and characterization of SARS-CoV-2 virus which was 103 pfu/ml (Ct, 22.4) for whole genome assembly and 101 pfu/ml (Ct, 30) for metagenomics detection. Conclusions The SISPA method is predominantly useful for obtaining genome sequences from RNA viruses or investigating complex clinical samples as no prior sequence information is needed. It might be applied to monitor genomic virus changes, virus evolution and can be used for fast metagenomics detection or to assess the general picture of different pathogens within the sample.


Metagenomics detection 276
Three independent methods were used to detect the presence of the viruses in the samples 277 ( Figure 5). 278 (1) Assembly: The first method used the contigs assembled by SPAdes assembler using 279 inhouse pipeline. If a contig was larger than 150 bases (i.e., the average size of read) a 280 random 100 bp segment of that contig was sampled. These samples were aligned with 281 BLAST to the nt-database. If any of the sampled reads mapped to a virus, its top ten hits 282 were examined, and the contig it was derived from was aligned to the nt-database with 283 BLAST (allowing a maximum of 10 hits per contig). The resulting BLAST alignments were 284 collated to generate a coverage graph of the contigs along the viruses they mapped to. 285 (2) K-mer analysis: The second method analysed k-mers in individual reads ( Figure 5). Each 286 read was inspected using Kraken and its minikraken database to build a report containing 287 the possible organisms the sequences originated from and the number of reads supporting 288 their presence. References for any organisms with a minimum of 100 reads were 289 downloaded and reads were mapped to these references using  (3) Mapping: The final method is the alignment of reads to reference SARS-CoV-19 and A 291 influenza genomes. These alignments were used to generate read depth graphs ( Figure 5). 292 The first assembly method can identify organisms if they are present in the sequencing data 293 in a sufficiently high concentration to be assembled. The second method can detect viruses 294 at a lower concentration. The final method would be sensitive if the references were close to 295 the isolates in the samples. We mark a virus to be present in the sample if there is non-296 random coverage (e.g., uniform overage, long stretch with coverage) of a closely related 297 viral genome in the plots. 298

Results 299
Four random amplification methods coupled with Illumina sequencing. 300 In this study, four random-amplification methods coupled with Illumina sequencing were 301 compared for the ability to obtain full genome sequences of SARS-CoV-2 virus ( Figure 1). 302 Whole genome amplification (WGA) of RNA material, starts with RNA extraction, followed by 303 conversion of RNA into cDNA and then dsDNA synthesis. Once dsDNA is synthetised can 304 be used directly for library preparation using the Nextera XT DNA (Illumina) or further 305 amplified in PCR or isothermal reactions before being used for library preparation. To 306 produce method 1 (H-P), dsDNA following a RT-PCR step with SuperScript™ IV One-Step 307 RT-PCR system (ThermoFisher) with random-hexamer primers, a simple, isothermal 308 random-hexamer-primed, phi29 DNA polymerase-based whole genome amplification was 309 applied. For Method 2 (K-P), Method 3 (SISPA), Method 4 (S-P), in RT-PCR step the 310 hexamer primer was replaced with primer K-8N (Material and Method section). This primer 311 (K-8N) contains a known tag (called here "K") that is linked to the random octamer (8-N). 312 Following the RT-PCR step the tag is incorporated randomly into cDNA. Klenow DNA 313 polymerase was used generate dsDNA in an isothermal reaction (Material and Methods 314 section). The final product obtained after RT-PCR and Klenow reactions in Methods 2, 3 and 315 4 is tagged dsDNA ("K" sequence incorporated into dsDNA). The dsDNA obtained was then 316 used for isothermal (Method 2), or PCR-based amplification (Method 3 and 4). In method 2, 317 the focus was to use an isothermal reaction for amplification and elongate the dsDNA 318 fragments. For that reason, we used multiple displacement amplification (MDA) by Φ 29 DNA 319 polymerase and a mix of hexamer and K-8N primer. Finally, for methods 3 (SISPA) and 320 method 4 (S-P) PCR-based amplification was used, where the aim was to amplify dsDNA 321 using primer K (Material and Methods section) that binds to the primer tag so that the tag 322 works as a primer binding extension site in PCR reaction. Method 4 (S-P) had an additional 323 MDA step after PCR to amplify and elongate the template by Φ 29 DNA polymerase using 324 only the hexamer primers without any modification of the protocol (Genomiphi™ V2, Material 325 and Methods section). 326 Comparison of the methods to sequence the whole genome of SARS-CoV-2 when abundant 327 genetic material was present. 328 In this study assembly of full or near full genome (≥ 97% genome coverage) of SARS-CoV-2 329 virus was achieved using all four amplification methods tested when a high titre of ENG-2 330 virus was analysed (2.6x106 pfu/ml, CT value: 12.22) ( Table 1). Under the conditions of 331 abundant genetic material, the SISPA method (Method 3), produced the highest number of 332 reads that mapped to the SARS-CoV-2 reference genome and the highest average depth of 333 genome coverage (Table 1, Suppl. Figure 1). The percentage of reads mapped to the 334 reference SARS-CoV-2 virus genome was 47.35% and 14.79% for SISPA (method 3) and S-335 P (method 4), respectively whilst for H-P (method 1) and K-P (method 2) amplification was 336 below 1% of total sequencing reads generated ( CoV-2 viral genome assembly using SISPA (method 3) is shown in Table 2 and 341 demonstrates that at the high virus titre, both reference mapping and de novo assemblies 342 produced full genome sequence with high depth of coverage per gene. Depth of coverage 343 being above 10,000 nucleotides per base for the viral genes; orf1ab, orf7b, orf8, N, orf10 344 genes and above 2,000 nucleotides per base for S, orf3a, E, M, orf6 (Table 2 and Suppl. 345 Figure 2). 346 The SISPA (method 3) for WGS of SARS-CoV-2 is reproducible. 347 We applied the SISPA method to four other cell cultured SARS-CoV-2 isolates (EDB-2, 348 EDB-8, EDB-10 & EDB-12), to assess reproducibility of the method to give depth of 349 coverage across the whole genome (Table 3, Figure 2). The SARS-CoV-2 sequencing reads 350 distribution is shown in Figure 2 and resulted in full genome assembly for all four additional 351 isolates. The percentage of viral reads obtained after sequencing that mapped to the 352 reference SARS-CoV-2 genome resulting in complete genome assembly was between 33% 353 to 84% for the SARS-CoV-2 viruses tested (Table 3). We obtained a high average coverage 354 depth across the genome for all viral genes, the mean average being 46181.62 nucleotides 355 per base (ranging from 16935.4 to 70780 nucleotides per base) ( Table 3). The coverage 356 depth per bp position for each viral gene was, at least 20,000 bp per base for orf1ab gene 357 (from 20,330 to 88,920), 10,000 bp for orf7b (ranging from 12,817 to 29,776) and orf8 (from 358 11,888 to 27,124), 5,000 bp for orf7a (from 5,222 to 18,832), 3,000 for S gene (ranging from 359 3351 to 21,345 bp per base), and orf3a (ranging from 3,600 to 15,605), 2,000 for E and orf6 360 (from 2,155 to 16,381), and 1,500 for M gene (from 1,675 to 15,179). A very high depth of 361 coverage was achieved for N (above 30,000 bp per base) and orf10 genes (above 85,000 362 bp per base) ( Table 3). 363 Comparison of limits of detection for SARS-CoV-2 virus for the four amplification methods. 364 To assess the limit of detection and the limits on full genome sequence assembly for each of 365 the four different method protocols we used a ten-fold dilution series of the Eng-2 SARS-366 CoV-2 virus. As anticipated, for all methods the genome coverage and depth of coverage 367 correlated with virus titre (Figure 3 and Suppl. Figure 1). Our results showed that a full 368 genome sequence could be assembled with a low abundance of viral genetic material, 369 minimum viral titre of 2.6x10 3 pfu/ml (CT:22.4) using the SISPA or S-P protocols (methods 3 370 and 4) (Table 1, Figure 3, Suppl. Figure 1). The percentage of reads mapped to reference 371 genome at this low virus titre was between 2% to 5% (S-P and SISPA, respectively). The 372 average depth coverage for the SISPA method at 2.6x10 3 pfu/ml (CT:22.4) virus load was 373 248 nucleotides per base (ranging from 1100 for orf1ab gene to 60 for arf3a and E genes) 374 (Table 2 and Suppl. Figure 2). In comparison, the H-P and K-P protocols (methods 1 & 2) 375 were able to produce full genome assemblies only when the input virus titre was high, above 376 2.6x10 6 pfu/ ml (Table 1 and Figure 3). Even at this input however, the depth of coverage 377 and percentage of mapped viral reads recovered with H-P and K-P methods was low (below 378 1%). For these reasons, these two methods were excluded from further analysis due to 379 overwhelming competition with non-specific or host genome sequences that was not 380 permissive for assembling the SARS-CoV-2 viral genome. 381 Below 10 3 pfu/ml we were not able to assemble full or near full SARS-CoV-2 viral genome 382 by any of the methods applied (Table 1, Figure 3 and Suppl. Figure 1). However, at 2.6 x 10 2 383 pfu/ ml both the SISPA and S-P methods did give over 80% coverage of the SARS-CoV-2 384 genome (Table 1 and Figure 3). The in-depth analysis of the SISPA method indicated that 385 the 84% of genome coverage resulted in 100% coverage of the following open reading 386 frames; orf7a, orf7b, orf8, N and orf10, whilst orf1ab was 95.5% covered and S 84.5% 387 covered (Table 2 and Suppl. Figure 2). The genome areas of orf3a, E, M and orf6, a 388 contiguous region between nucleotides 25400 and 27350 of the SARS-CoV-2 genome had 389 no coverage (Table 2 and Suppl. Figure 2). 390 The average depth of coverage and the number of reads mapped to the reference genomes 391 using the SISPA or S-P method drastically decreased (to the level below 1%) below inputs of 392 10 2 pfu/ ml of virus with no reproducible mapping possible at this level. This suggests that 393 the limit for whole genome assembly of SARS-CoV-2 using SISPA method (or S-P method) 394 is above 10 3 pfu/ ml but depending on the area of genome of interest SISPA could give detail 395 down to 10 2 pfu/ml (Ct=25.34). 396 Although the genome coverage of virus with titre below 10 2 pfu/ml (Ct=25.32) decreased 397 drastically, it was still possible to detect SARS-CoV-2 genome after SISPA or S-P 398 amplification using metagenomics. The limit of detection using metagenomics was 2.6x10 1 399 pfu/ ml (CT: 29.34) using SISPA or S-P methods (Table 1). No virus was detected by 400 metagenomics above CT value of 30 in this study. 401

Full genome recovery of SARS-CoV-2 and A(H1N1)pdm09 influenza virus multiplexed in a 402
single reaction 403 The SISPA method produced the most viral reads of any of the four methods employed at all 404 the dilutions tested, therefore we used this method to recover full genome sequences of 405 SARS-CoV-2 and A(H1N1)pdm09 influenza virus mixed together in single sample (Figure 4, 406 panel B). To assess the limit of detection for full genome assembly of SARS-CoV-2 virus in a 407 mixed viral sample, 10-fold diluted SARS-CoV-2 viral RNA (initial concentration 2.6x10 6 408 pfu/ml and Ct value of 13.61) was spiked with a constant amount of A(H1N1)pdm09 viral 409 RNA (Ct=24.88+/-0.19). The full genome sequence of A(H1N1)pdm09 and SARS-CoV-2 410 was assembled from each sample by de novo assembly and reference mapping (Figure 4,411 panel B). In all samples sequenced the influenza virus genome was fully sequenced by de 412 novo methodology. The full genome of SARS-CoV-2 was assembled from the initial viral titre 413 of 2.6x10 5 pfu/ ml (CT:17) and above only ( Figure 4B and Table 4). This differed to the 414 scenario of SARS-CoV-2 alone when we were able to WGS the virus at a viral titre greater 415 than 2. in this study the SISPA protocol allowed for whole genome assembly of both viruses using 466 only one primer in a sequence independent reaction. The percentage of SARS-CoV-2 viral 467 reads obtained at high virus load ranged between 33% to 84% depending upon the sample 468 and resulted in full coronavirus genome assemblies. However, percentage of reads ranging 469 from 2% to 14% of viral-specific sequencing reads was enough to successfully assembly full 470 or near full SARS-CoV-2 genome, depending upon status as either single or mix infection 471 sample with influenza virus, respectively. Moreover, we have obtained high ( pharyngeal virus shedding is very high during the first week of symptoms, with a peak at 477 7.11 × 10 8 RNA copies per throat swab on day 4, followed by an average titre of 478 3.44 × 10 5 copies per swab after day 5 of infection whereas the average viral load in sputum 479 samples was 7.00 × 10 6 copies per ml, with a maximum of 2.35 × 10 9 copies per ml at the For the whole genome assembly, we showed that the number of viral specific sequencing 489 reads appear distributed between the RNA viruses contained in the sample rather than any 490 host derived sequences and correlated directly with initial input viral load. Although the S-P 491 technique presented in this study did not improve sequencing depth after Illumina 492 sequencing, it might also be useful method to consider when Oxford Nanopore sequencing 493 of SARS-CoV-2 is used as this method produces a longer dsDNA average fragment size of 494 template for further library preparation (we obtained an average fragment size of 20Kb of 495 dsDNA, ranging from 17.9 Kb to 22.2 Kb). An overwhelming number of non-viral sequencing 496 reads were obtained after H-P or K-P methods resulting in lower than 1% of virus 497 sequencing reads produced even when the high virus titre was used in this study. This 498 should be considered when applying hexamer only based amplification as it can result in low 499 depth of coverage, making it impossible to perform viral genome assembly when the virus 500 titre is low and thus pre-detection methods are required so mapping can be directed rather 501 than de novo. As compared to other recently published studies that utilize PCR-based 502 targeted enrichment and either Illumina or Oxford Nanopore sequencing [10, 63, 64] the 503 main advantage of SISPA (and/or S-P) method presented in this study is its simplicity (e.g. 504 only one K-8N primer used), and possibility to apply the method to any unknown samples as 505 no prior knowledge about pathogen is needed. As we showed here, this protocol was 506 successfully applied to SARS-CoV-2 and influenza A(H1N1)pdm09 viruses mix infection in 507 single reaction and allowed us to pull out whole genome sequences of both viruses. 508 Interestingly, decreased number of SARS-CoV-2 viral specific sequencing reads loosely 509 correlated with increased number of influenza A(H1N1)pdm09 virus specific (y= -510 0.6758x+35.468, R 2 =0.43) but importantly we did not observe an increased in GalGal host 511 genome sequencing reads, suggesting that the method presented here is capable of 512 selectively recovering low abundance viral RNA genetic sequences. However, it is important 513 to mention that in targeted whole genome sequencing where multiple pairs of primers are 514 used, even though do it does not allow for assembly of multiple pathogens in singe reaction, 515 the problem with generation of overwhelming number of host genome sequencing reads is 516 also resolved. Hence, the sequencing method of choice depends on the aims of the study 517 where the method is applied. ARCTIC network offers the most updated targeted whole 518 genome sequencing methods (https://artic.network/ncov-2019). 519 Furthermore, we assessed the feasibility of virus identification and estimated its limit of 520 detection for diagnosis of covid-19 infection or co-infection with influenza viruses. We 521 showed that by using the SISPA or S-P protocols presented in this study, the full genome 522 sequence can be assembled when initial viral titres are as low as 2.6x10 3 pfu/ml for single 523 SARS-CoV-2 virus in the sample and approximately 10 5 pfu/ml viral titre (SISPA method) if it 524 is a mixed infection of both viruses with influenza virus being at high titre. However, it is 525 unknown how likely both viruses might be found at a high viral load in a single clinical 526 sample or how one virus will influence the replication of another [65,66]. We also assessed 527 the detection limit for the amplification methods presented in this study using metagenomics 528 approaches. The in-house metagenomics pipeline ( Figure 5) enabled us to detect SARS-529 CoV-2 virus in the sample when the initial virus titre was approximately Ct value of 30 530 regardless of single or mix infection sample and no prior sequence information was needed. 531 This might suggest that the method presented here should allow to detect asymptomatic or 532 pre-symptomatic patients as median Ct value (for two genetic targets: the N1 and N2 viral 533 ORF1 and E-gene, respectively, which indicate that the distribution of Ct values observed in 556 symptomatic patients is approximately 5 Ct value above our metagenomics pipeline limit of 557 detection. Metagenomics analysis of samples that contain less than Ct of 30 might be 558 possible, however for that purpose a pre-processing step in the sample preparation might 559 need to be applied such as DNase treatment, or viral concentrations techniques [71] that 560 could potentially improve the efficacy of viral amplification and sequencing. Notably, the 561 method presented here does not rely on primer specificity as compared to conventional qRT-562 PCR [72] and therefore any changes in viral genomes (mutations or deletion) do not impact 563 the pathogen detection. Previous studies have shown active genetic recombination events in 564 SARS-CoV-2 genomes which may reduce the accuracy of conventional qRT-PCR detection 565 and thus the primers should be precisely chosen to address these challenges [31,[73][74][75]. 566

Conclusion 567
In conclusion, the performance of four different random priming amplification methods to 568 recover RNA viral genetic material (SARS-CoV-2) were compared in this study. The SISPA 569 technique allowed for whole genome assembly of SARS-CoV-2 and influenza 570 A(H1N1)pdm09 in mixed viruses single samples. We assessed limit of detection and 571 characterization of SARS-CoV-2 virus which lies at 10 3 pfu/ml (Ct, 22.4) for full-length 572 SARS-CoV-2 virus genome assembly and Ct of 30 for virus detection. We also presented S-573 P technique that might be useful to apply for Oxford Nanopore real-time sequencing as no 574 non-targeted primer-based protocol is available yet. The whole genome sequences 575 recovered after applying SISPA (or S-P) method presented in this study are free of primer 576 bias and allowed for polymorphism analysis. This method is predominantly useful for 577 obtaining genome sequences from RNA viruses or investigating complex clinical samples 578 (such as mixed infections in single reaction) as no prior sequence information is needed. 579 The method might be useful to monitor SARS-CoV-2 virus changes such as mutation or 580 deletions in virus genome, to perform simple and fast metagenomics detection and to assess 581 general picture of diffrent microbes within the sample that might be useful to identify the 582 other co-factors that correspond to covid-19 infection. 583 Tables 584 Table 1. The comparison of performance of four different random priming amplification 585 methods to recover RNA viral genetic material of SARS-CoV-2 genome. SARS-CoV-2 was 586 quantified by plaque assay titration on Vero E6 cells (pfu/ml) and qRT-PCR (CT value). For 587 metagenomics, three independent methods were used to detect the presence of the virus in 588 the samples. Kraken, each read was inspected using Kraken and its database to build a 589 report containing the possible organisms the sequences originated from and the number of 590 reads supporting their presence. Blast, if the contig assembled by SPAdes using inhouse 591 pipeline was larger than 150 bases, a random 100 bp segment of that contig was sampled. 592 These samples were aligned with BLAST to the nt-database. The final method, Align is the 593 alignment of sequencing reads to customised reference database. 594    was 10-fold serially diluted (from 2.3x106 pfu/mL, mark as "0" on x axis). 633 represents single SARS-CoV-2 virus genome assembly (the percentage of genome 636 coverage after reference mapping and de novo assemblies). The virus was 10-fold serially 637 diluted, starting from viral load of 2.3 x 106 pfu/ml, mark as "0" on x-axis followed by 2.3 x 638 105 pfu/mL (mark as 1), 2.3 x 104 pfu/mL (mark as 2), etc. Right panel (B) represents 639 genome assembly (the percentage of genome coverage after reference mapping and de 640 novo assemblies) of two viruses, SARS-CoV-2 and influenza A(H1N1)pdm09 in mixed 641 viruses single sample. SARS-CoV-2 virus was 10-fold serially diluted, starting from viral load 642 of 2.3 x 106 pfu/mL, mark as "0" on x-axis, that was spiked with constant amount of H1N1 643 virus (7.4 x 106 pfu/mL). 644

Availability of data and materials 673
The datasets used and/or analysed during the current study are available from the 674 corresponding author on reasonable request. 675

Competing interests 676
The authors declare that they have no competing interests. 677 Funding 678 This work described herein was funded by The Pirbright Institute BBSRC ISP grants 679 BBS/E/I/00007037, and BBS/E/I/00007039. The funders had no role in the design of the 680 study and collection, analysis, and interpretation of data and in writing the manuscript. 681

Author contributions 682
The work was conceptualized by KC and HS. Experimental work was executed by KC, CT, 683 DB, GF and JF. The manuscript was written by KC and HS and edited by all authors. 684

Acknowledgments 685
We acknowledge the Pirbright High Throughput Sequencing unit and provision of SARS-686 CoV-2 strains from Public Health England and Dr Christine Tait G  o  l  d  s  t  e  i  n  T  ,  A  n  t  h  o  n  y  S  J  ,  G  b  a  k  i  m  a  A  ,  B  i  r  d  B  H  ,  B  a  n  g  u  r  a  J  ,  T  r  e  m  e  a  u  -B  r  a  v  a  r  d  A  ,  B  e  l  a  g  a  n  a  h  a  l  l  i  744   M  N  ,  W  e  l  l  s  H  L  ,  D  h  a  n  o  t  a  J  K  ,  L  i  a  n  g  E  e  t  a  l  :   T  h  e  d  i  s  c  o  v  e  r  y  o  f  B  o  m  b  a  l  i  v  i  r  u  s  a  d  d  s  f  u  r  t  h  e  r   745   s  u  p  p  o  r  t  f  o  r  b  a  t  s  a  s  h  o  s  t  s  o  f  e  b  o  l  a  v  i  r  u  s  e  s   .  N  a  t  M  i  c  r  o  b  i  o  l  2  0  1  8  ,   3   (  1  0  )  :  1  0  8  4  -1  0  8  9  .  746   1  8  .  C  a  l  l  e  g  a  r  o  A  ,  D  i  F  i  l  i  p  p  o  E  ,  A  s  t  u  t  i  N  ,  O  r  t  e  g  a  P  A  ,  R  i  z  z  i  M  ,  F  a  r  i  n  a  C  ,  V  a  l  e  n  t  i  D  ,  M  a  g  g  i  o  l  o  F  :   E  a  r  l  y   747   c  l  i  n  i  c  a  l  r  e  s  p  o  n  s  e  a  n  d  p  r  e  s  e  n  c  e  o  f  v  i  r  a  l  r  e  s  i  s  t  a  n  t  m  i  n  o  r  i  t  y  v  a  r  i  a  n  t  s  :  a  p  r  o  o  f  o  f  c  o  n  c  e  p . 768    o  n  g  -T  e  r  m  C  a  r  e  S  k  i  l  l  e  d  N  u  r  s  i  n  g  F  a  c  i  l  i  t  y  -K  i  n  g  C  o  u  n  t  y  ,  W  a L  a  r  g  e  -S  c  a  l  e  ,  I  n  -H  o  u  s  e  P  r  o  d  u  c  t  i  o  n  o  f  V  i  r  a  l  T  r  a  n  s  p  o  r  t  M  e  d  i  a  T  o  S  u  p  p  o  r  t  S  A  R  S  -C  o  V  -2  P  C  R   898   T  e  s  t  i  n  g  i  n  a  M  u  l  t  i  h  o  s  p  i  t  a  l  H  e  a  l  t  h  C  a  r  e  N  e  t  w  o  r  k  d  u  r  i  n  g  t  h  e  C  O  V  I  D  -1