TY - JOUR T1 - LRSim: a Linked Reads Simulator generating insights for better genome partitioning JF - bioRxiv DO - 10.1101/103549 SP - 103549 AU - Ruibang Luo AU - Fritz J. Sedlazeck AU - Charlotte A. Darby AU - Stephen M. Kelly AU - Michael C. Schatz Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/26/103549.abstract N2 - Motivation Linked reads are a form of DNA sequencing commercialized by 10X Genomics that uses highly multiplexed barcoding within microdroplets to tag short reads to progenitor molecules. The linked reads, spanning tens to hundreds of kilobases, offer an alternative to long-read sequencing for de novo assembly, haplotype phasing and other applications. However, there is no available simulator, making it difficult to measure their capability or develop new informatics tools.Results Our analysis of 13 real linked read datasets revealed their characteristics of barcodes, molecules and partitions. Based on this, we introduce LRSim that simulates linked reads by emulating the library preparation and sequencing process with fine control of 1) the number of simulated variants; 2) the linked-read characteristics; and 3) the Illumina reads profile. We conclude from the phasing and genome assembly of multiple datasets, recommendations on coverage, fragment length, and partitioning when sequencing human and non-human genome.Availability LRSIM is under MIT license and is freely available at https://github.com/aquaskyline/LRSIMContact rluo5{at}jhu.edu ER -