PT - JOURNAL ARTICLE AU - Fahad Alqahtani AU - Ion I. Măndoiu TI - SMART: Statistical Mitogenome Assembly with Repeats AID - 10.1101/795633 DP - 2019 Jan 01 TA - bioRxiv PG - 795633 4099 - http://biorxiv.org/content/early/2019/10/16/795633.short 4100 - http://biorxiv.org/content/early/2019/10/16/795633.full AB - By using next-generation sequencing technologies it is possible to quickly and inexpensively generate large numbers of relatively short reads from both the nuclear and mitochondrial DNA contained in a biological sample. Unfortunately, assembling such whole-genome sequencing (WGS) data with standard de novo assemblers often fails to generate high quality mitochondrial genome sequences due to the large difference in copy number (and hence sequencing depth) between the mitochondrial and nuclear genomes. Assembly of complete mitochondrial genome sequences is further complicated by the fact that many de novo assemblers are not designed for circular genomes, and by the presence of repeats in the mitochondrial genomes of some species.In this paper we describe the Statistical Mitogenome Assembly with Repeats (SMART) pipeline for automated assembly of complete circular mitochondrial genomes from WGS data. SMART uses an efficient coverage-based filter to first select a subset of reads enriched in mtDNA sequences. Contigs produced by an initial assembly step are filtered using BLAST searches against a comprehensive mitochondrial genome database, and used as “baits” for an alignment-based filter that produces the set of reads used in a second de novo assembly and scaffolding step. In the presence of repeats, the possible paths through the assembly graph are evaluated using a maximum-likelihood model. Additionally, the assembly process is repeated a user-specified number of times on re-sampled subsets of reads to select for annotation the reconstructed sequences with highest bootstrap support.Experiments on WGS datasets from a variety of species show that the SMART pipeline produces complete circular mitochondrial genome sequences with a higher success rate than current state-of-the art tools, even from low coverage WGS data. The pipeline is available through an easy-to-use web interface at https://neo.engr.uconn.edu/?tool_id=SMART.SMARTStatistical Mitogenomes Assembly with Repeats1KGP1000 Genomes ProjectATPAdenosine triphosphateNGSNext-Generation SequencingWGSWhole Genome SequencingWESWhole Exome SequencingCOICytochrome c oxidase INCBIThe National Center for Biotechnology InformationBLASTBasic Local Alignment Search ToolTPRTrue Positive RatePPVPositive Predictive Value