Abstract
While new sequencing technologies have lowered financial barriers to whole genome sequencing, resulting assemblies are often fragmented and far from ‘finished’. Subsequent improvements towards chromosomal-level status can be achieved by both experimental and computational approaches. Requiring only annotated assemblies and gene orthology data, comparative genomics approaches that aim to capture evolutionary signals to predict scaffold neighbours (adjacencies) offer potentially substantive improvements without the costs associated with experimental scaffolding or re-sequencing. We leverage the combined detection power of three such gene synteny-based methods applied to 21 Anopheles mosquito assemblies with variable contiguity levels to produce consensus sets of scaffold adjacency predictions. Three complementary validations were performed on subsets of assemblies with additional supporting data: six with physical mapping data; 13 with paired-end RNA sequencing (RNAseq) data; and three with new assemblies based on re-scaffolding or incorporating Pacific Biosciences (PacBio) sequencing data. Improved assemblies were built by integrating the consensus adjacency predictions with supporting experimental data, resulting in 20 new reference assemblies with improved contiguities. Combined with physical mapping data for six anophelines, chromosomal positioning of scaffolds improved assembly anchoring by 47% for A. funestus and 38% A. stephensi. Reconciling an A. funestus PacBio assembly with synteny-based and RNAseq-based adjacencies and physical mapping data resulted in a new 81.5% chromosomally mapped reference assembly and cytogenetic photomap. While complementary experimental data are clearly key to achieving high-quality chromosomal-level assemblies, our assessments and validations of gene synteny-based computational methods highlight the utility of applying comparative genomics approaches to improve community genomic resources.