TY - JOUR T1 - Metassembler: Merging and optimizing de novo genome assemblies JF - bioRxiv DO - 10.1101/016352 SP - 016352 AU - Alejandro Hernandez Wences AU - Michael C. Schatz Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/03/10/016352.abstract N2 - Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence. We apply it to the four genomes from the Assemblathon competitions and show it consistently and substantially improves the contiguity and quality of each assembly. We also develop guidelines for metassembly by systematically evaluating 120 permutations of merging the top 5 assemblies of the first Assemblathon competition. The software is open-source at http://metassembler.sourceforge.net.CEGMA -Core Eukaryotic Genes Mapping ApproachCE-statistic -Compression/Expansion statisticComp Ref Bases -Compressed Reference BasesCtg NG50 -Contig N50 size relative to the estimated/reference genome sizeCtg GC-NG50 -Contig GAGE Corrected N50, relative to the reference genome size Ctg RC-NG50 - Contig REAPR Corrected N50, relative to the estimated genome size Dup Ref Bases - Duplicated Reference BasesGAGE -Genome Assembly Gold Standard EvaluationGAM-NGS -Genomic Assemblies Merger for Next Generation SequencingICA -Independent Component AnalysisPCA -Principal Components AnalysisREAPR -Recognising Errors in Assemblies using Paired ReadsScf NG50 -Scaffold N50 size relative to the estimated/reference genome sizeScf GC-NG50 -Scaffold GAGE Corrected N50, relative to the reference genome sizeScf RC-NG50 -Scaffold REAPR Corrected N50, relative to the estimated genome size ER -