Abstract
Motivation Simulating high-throughput sequencing reads that mimic empirical sequence data is of major importance for designing and validating sequencing experiments, as well as for benchmarking bioinformatic workflows and tools.
Results Here, we present InSilicoSeq 2.0, a software package fully written in Python that is able to simulate realistic Illumina-like sequencing reads for a variety of sequencing machines and assay types. In addition to the existing functionality, InSilicoSeq now supports amplicon- based sequencing and comes with premade error models of various quality levels for MiSeq, HiSeq, NovaSeq and NextSeq sequencing platforms. It provides the flexibility to generate custom error models for any short-read sequencing platform from a BAM-file. Furthermore, it has improved computational performance with a reduced memory footprint compared to the original implementation. We demonstrated the novel amplicon sequencing capability by simulating Adaptive Immune Receptor Repertoire (AIRR) reads and show that simulated reads closely resemble the PHRED-scores of actual sequencing data. InSilicoSeq 2.0 generated 15 million amplicon based paired-end reads in under an hour at a total cost of €4.3e-05 per million bases advocating for testing experimental designs through simulations prior to actual sequencing.
Availability Source-code is freely available under the MIT licence at https://github.com/HadrienG/InSilicoSeq
Competing Interest Statement
The authors have declared no competing interest.