PT - JOURNAL ARTICLE AU - Song Gao AU - Denis Bertrand AU - Burton KH Chia AU - Niranjan Nagarajan TI - OPERA-LG: Efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees AID - 10.1101/020230 DP - 2016 Jan 01 TA - bioRxiv PG - 020230 4099 - http://biorxiv.org/content/early/2016/03/22/020230.short 4100 - http://biorxiv.org/content/early/2016/03/22/020230.full AB - The assembly of large, repeat-rich eukaryotic genomes continues to represent a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be prohibitively expensive for larger genomes. Advances in assembly algorithms are thus essential to exploit the characteristics of short and long-read sequencing technologies to consistently and reliably provide high-quality assemblies in a cost-efficient manner. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, with consistent improvement over state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation (Illumina, Ion Torrent) and third-generation (PacBio, ONT) sequencing technologies. OPERA-LG efficiently scaffolds large genomes with provable scaffold properties, providing an avenue for systematic augmentation and improvement of 1000s of existing draft eukaryotic genome assemblies.