Abstract
In QIAseq targeted DNA panels, synthetic primers (short single-strand DNA sequences) are used for target enrichment via complementary DNA binding. Off-target priming could occur in this process when a primer binds to some loci where the DNA sequences are identical or very similar to the target template. These off-target DNA segments go through the rest of the workflow, wasting sequencing resources in unwanted regions. Off-target cannot be avoided if some segments of the target region are repetitive throughout the genome, nor can it be quantified until after sequencing. But if off-target rates can be prospectively predicted, scientists can make informed decisions about investment on high off-target panels.
We developed pordle (predicting off-target rate with deep learning and epcr07), a convolutional neural network (CNN) model to predict off-target binding events of a given primer. The neural network was trained using 10 QIAseq DNA panels with 29,274 unique primers and then tested on an independent QIAseq panel with 7,576 primers. The model predicted a 10.5% off-target rate for the test panel, a -0.1% bias from the true value of 10.6%. The model successfully selected the better primer (in terms of off-target rate) for 89.2% of 3,835 pairs of close-by primers in the test panel whose off-target rates differ by at least 10%. The order-preserving property may help panel developers select the optimal primer from a group of candidates, which is a common task in panel design.
Competing Interest Statement
The authors have declared no competing interest.