RT Journal Article SR Electronic T1 Metassembler: Merging and optimizing de novo genome assemblies JF bioRxiv FD Cold Spring Harbor Laboratory SP 016352 DO 10.1101/016352 A1 Alejandro Hernandez Wences A1 Michael C. Schatz YR 2015 UL http://biorxiv.org/content/early/2015/03/10/016352.abstract AB Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for metassembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net.CEGMA -Core Eukaryotic Genes Mapping ApproachCE-statistic -Compression/Expansion statisticComp Ref Bases -Compressed Reference BasesCtg NG50 -Contig N50 size relative to the estimated/reference genome sizeCtg GC-NG50 -Contig GAGE Corrected N50, relative to the reference genome size Ctg RC-NG50 - Contig REAPR Corrected N50, relative to the estimated genome size Dup Ref Bases - Duplicated Reference BasesGAGE -Genome Assembly Gold Standard EvaluationGAM-NGS -Genomic Assemblies Merger for Next Generation SequencingICA -Independent Component AnalysisPCA -Principal Components AnalysisREAPR -Recognising Errors in Assemblies using Paired ReadsScf NG50 -Scaffold N50 size relative to the estimated/reference genome sizeScf GC-NG50 -Scaffold GAGE Corrected N50, relative to the reference genome sizeScf RC-NG50 -Scaffold REAPR Corrected N50, relative to the estimated genome size