Abstract
One particularly promising feature of nanopore sequencing is the ability to reject reads, enabling real-time selection of molecules without complex sample preparation. This is based on the idea of deciding whether a molecule warrants full sequencing depending on reading a small initial part. Previously, such decisions have been based on a priori determination of which regions of the genome were considered of interest. Instead, here we consider more general and complex strategies that incorporate already-observed data in order to optimize the rejection strategy and maximise information gain from the sequencing process. For example, in the presence of coverage bias redistributing data from areas of high to areas of low coverage would be desirable.
We present BOSS-RUNS, a mathematical and algorithmic framework to calculate the expected benefit of new reads and generate dynamically updated decision strategies for nanopore sequencing. During sequencing, in real time, we quantify the current uncertainty at each site of one or multiple reference genomes, and for each novel DNA fragment being sequenced we decide whether the potential decrease in uncertainty at the sites it will most likely cover warrants reading it in its entirety. This dynamic, adaptive sampling allows real-time focus of sequencing efforts onto areas of highest benefit.
We demonstrate the effectiveness of BOSS-RUNS by mitigating coverage bias across and within the species of a microbial community. Additionally, we show that our approach leads to improved variant calling due to its ability to sample more data at the most relevant genomic positions.
Competing Interest Statement
ML was a member of the ONT MinION access program and has received free flow cells and sequencing reagents in the past. ML has received reimbursement for travel, accommodation and conference fees to speak at events organized by ONT. EB is a paid consultant to ONT and a small-scale equity and options holder in ONT.
Footnotes
major update with new methods, algorithmic accelerations, software description plus availability, experimental results