Abstract
The mechanisms and consequences of defective interfering particle (DIP) formation during influenza virus infection remain poorly understood. The development of next generation sequencing (NGS) technologies has made it possible to identify large numbers of DIP-associated sequences, providing a powerful tool to better understand their biological relevance. However, NGS approaches pose numerous technical challenges including the precise identification and mapping of deletion junctions in the presence of frequent mutation and base-calling errors, and the potential for numerous experimental and computational artifacts. Here we detail an Illumina-based sequencing framework and bioinformatics pipeline capable of generating highly accurate and reproducible profiles of DIP-associated junction sequences. We use a combination of simulated and experimental control datasets to optimize pipeline performance and demonstrate the absence of significant artifacts. Finally, we use this optimized pipeline to generate a high-resolution profile of DIP-associated junctions produced during influenza virus infection and demonstrate how this data can provide insight into mechanisms of DIP formation. This work highlights the specific challenges associated with NGS-based detection of DIP-associated sequences, and details the computational and experimental controls required for such studies.