PT - JOURNAL ARTICLE AU - Joel Armstrong AU - Glenn Hickey AU - Mark Diekhans AU - Alden Deran AU - Qi Fang AU - Duo Xie AU - Shaohong Feng AU - Josefin Stiller AU - Diane Genereux AU - Jeremy Johnson AU - Voichita Dana Marinescu AU - David Haussler AU - Jessica Alföldi AU - Kerstin Lindblad-Toh AU - Elinor Karlsson AU - Guojie Zhang AU - Benedict Paten TI - Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era AID - 10.1101/730531 DP - 2019 Jan 01 TA - bioRxiv PG - 730531 4099 - http://biorxiv.org/content/early/2019/08/09/730531.short 4100 - http://biorxiv.org/content/early/2019/08/09/730531.full AB - Cactus, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequence. We describe progressive extensions to Cactus that enable reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We show that Cactus is capable of scaling to hundreds of genomes and beyond by describing results from an alignment of over 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment yet created. Further, we show improvements in orthology resolution leading to downstream improvements in annotation.