Abstract
Retroviral infections create a large population of cells, each defined by a unique proviral insertion site. Methods based on short-read high throughput sequencing can identify thousands of insertion sites, but the proviruses within remain unobserved. We have developed Pooled CRISPR Inverse PCR sequencing (PCIP-seq), a method that leverages long reads on the Oxford Nanopore MinION platform to sequence the insertion site and its associated provirus. We have applied the technique to natural infections produced by three exogenous retroviruses, HTLV-1, BLV and HIV-1 as well as endogenous retroviruses in both cattle and sheep. The high efficiency of the method facilitated the identification of tens of thousands of insertion sites in a single sample. We observed thousands of SNPs and dozens of structural variants within proviruses. While initially developed for retroviruses the method has also been successfully extended to DNA extracted from HPV positive PAP smears, where it could assist in identifying viral integrations associated with clonal expansion.