Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era

View ORCID ProfileJoel Armstrong, Glenn Hickey, View ORCID ProfileMark Diekhans, Alden Deran, Qi Fang, Duo Xie, Shaohong Feng, Josefin Stiller, Diane Genereux, Jeremy Johnson, Voichita Dana Marinescu, David Haussler, Jessica Alföldi, Kerstin Lindblad-Toh, Elinor Karlsson, Guojie Zhang, Benedict Paten
doi: https://doi.org/10.1101/730531
Joel Armstrong
1UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joel Armstrong
Glenn Hickey
1UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark Diekhans
1UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mark Diekhans
Alden Deran
1UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qi Fang
2China National GeneBank, BGI–Shenzhen, Shenzhen 518083, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Duo Xie
2China National GeneBank, BGI–Shenzhen, Shenzhen 518083, China
3University of Chinese Academy of Sciences, Beijing 100049, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shaohong Feng
2China National GeneBank, BGI–Shenzhen, Shenzhen 518083, China
4State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Josefin Stiller
5Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Diane Genereux
6Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeremy Johnson
6Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Voichita Dana Marinescu
7Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Box 582, SE-751 23 Uppsala, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Haussler
1UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jessica Alföldi
6Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kerstin Lindblad-Toh
6Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
7Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Box 582, SE-751 23 Uppsala, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elinor Karlsson
6Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), 7 Cambridge Center, Cambridge, Massachusetts 02142, USA
8Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Guojie Zhang
2China National GeneBank, BGI–Shenzhen, Shenzhen 518083, China
4State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
5Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
9Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benedict Paten
1UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: bpaten@ucsc.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Cactus, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequence. We describe progressive extensions to Cactus that enable reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We show that Cactus is capable of scaling to hundreds of genomes and beyond by describing results from an alignment of over 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment yet created. Further, we show improvements in orthology resolution leading to downstream improvements in annotation.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted August 09, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era
Joel Armstrong, Glenn Hickey, Mark Diekhans, Alden Deran, Qi Fang, Duo Xie, Shaohong Feng, Josefin Stiller, Diane Genereux, Jeremy Johnson, Voichita Dana Marinescu, David Haussler, Jessica Alföldi, Kerstin Lindblad-Toh, Elinor Karlsson, Guojie Zhang, Benedict Paten
bioRxiv 730531; doi: https://doi.org/10.1101/730531
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era
Joel Armstrong, Glenn Hickey, Mark Diekhans, Alden Deran, Qi Fang, Duo Xie, Shaohong Feng, Josefin Stiller, Diane Genereux, Jeremy Johnson, Voichita Dana Marinescu, David Haussler, Jessica Alföldi, Kerstin Lindblad-Toh, Elinor Karlsson, Guojie Zhang, Benedict Paten
bioRxiv 730531; doi: https://doi.org/10.1101/730531

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (2533)
  • Biochemistry (4975)
  • Bioengineering (3486)
  • Bioinformatics (15229)
  • Biophysics (6908)
  • Cancer Biology (5395)
  • Cell Biology (7751)
  • Clinical Trials (138)
  • Developmental Biology (4539)
  • Ecology (7157)
  • Epidemiology (2059)
  • Evolutionary Biology (10233)
  • Genetics (7516)
  • Genomics (9790)
  • Immunology (4860)
  • Microbiology (13231)
  • Molecular Biology (5142)
  • Neuroscience (29464)
  • Paleontology (203)
  • Pathology (838)
  • Pharmacology and Toxicology (1465)
  • Physiology (2142)
  • Plant Biology (4754)
  • Scientific Communication and Education (1013)
  • Synthetic Biology (1338)
  • Systems Biology (4014)
  • Zoology (768)