Skip to main content

BUSCO: Assessing Genome Assembly and Annotation Completeness

  • Protocol
  • First Online:
Gene Prediction

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1962))

Abstract

Genomics drives the current progress in molecular biology, generating unprecedented volumes of data. The scientific value of these sequences depends on the ability to evaluate their completeness using a biologically meaningful approach. Here, we describe the use of the BUSCO tool suite to assess the completeness of genomes, gene sets, and transcriptomes, using their gene content as a complementary method to common technical metrics. The chapter introduces the concept of universal single-copy genes, which underlies the BUSCO methodology, covers the basic requirements to set up the tool, and provides guidelines to properly design the analyses, run the assessments, and interpret and utilize the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vurture GW, Sedlazeck FJ, Nattestad M et al (2017) GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204. https://doi.org/10.1093/bioinformatics/btx153

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Chikhi R, Medvedev P (2014) Informed and automated k-mer size selection for genome assembly. Bioinformatics 30:31–37. https://doi.org/10.1093/bioinformatics/btt310

    Article  CAS  PubMed  Google Scholar 

  3. Hunt M, Kikuchi T, Sanders M et al (2013) REAPR: a universal tool for genome assembly evaluation. Genome Biol 14:R47. https://doi.org/10.1186/gb-2013-14-5-r47

    Article  PubMed  PubMed Central  Google Scholar 

  4. Simão FA, Waterhouse RM, Ioannidis P et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212. https://doi.org/10.1093/bioinformatics/btv351

    Article  CAS  PubMed  Google Scholar 

  5. Waterhouse RM, Seppey M, Simão FA et al (2018) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548. https://doi.org/10.1093/molbev/msx319

    Article  CAS  PubMed  Google Scholar 

  6. Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067. https://doi.org/10.1093/bioinformatics/btm071

    Article  CAS  PubMed  Google Scholar 

  7. Waterhouse RM, Zdobnov EM, Kriventseva EV (2011) Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi. Genome Biol Evol 3:75–86. https://doi.org/10.1093/gbe/evq083

    Article  CAS  PubMed  Google Scholar 

  8. Kriventseva EV, Kuznetsov D, Tegenfeldt F et al (2019) OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res 47:D807–D811. https://doi.org/10.1093/nar/gky1053

    Article  CAS  PubMed  Google Scholar 

  9. Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. https://doi.org/10.1186/1471-2105-10-421

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Keller O, Kollmar M, Stanke M, Waack S (2011) A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics Oxf Engl 27:757–763. https://doi.org/10.1093/bioinformatics/btr010

    Article  CAS  Google Scholar 

  11. Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195. https://doi.org/10.1371/journal.pcbi.1002195

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Araujo NS, Santos PKF, Arias MC (2018) RNA-Seq reveals that mitochondrial genes and long non-coding RNAs may play important roles in the bivoltine generations of the non-social Neotropical bee Tetrapedia diversipes. Apidologie 49:3–12. https://doi.org/10.1007/s13592-017-0542-2

    Article  CAS  Google Scholar 

  13. Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11:345–355. https://doi.org/10.1038/nrg2776

    Article  CAS  PubMed  Google Scholar 

  14. Kollmar M, Mühlhausen S (2017) Nuclear codon reassignments in the genomics era and mechanisms behind their evolution. Bioessays 39:1600221. https://doi.org/10.1002/bies.201600221

    Article  CAS  Google Scholar 

  15. Ioannidis P, Simao FA, Waterhouse RM et al (2017) Genomic features of the Damselfly Calopteryx splendens representing a Sister Clade to most insect orders. Genome Biol Evol 9:415–430. https://doi.org/10.1093/gbe/evx006

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Holt C, Campbell M, Keays DA et al (2018) Improved genome assembly and annotation for the rock pigeon (Columba livia). G3 Genes Genomes Genet 8:1391–1398. https://doi.org/10.1534/g3.117.300443

    Article  CAS  Google Scholar 

  17. Plomion C, Aury J-M, Amselem J et al (2018) Oak genome reveals facets of long lifespan. Nat Plants. https://doi.org/10.1038/s41477-018-0172-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Armstrong EE, Prost S, Ertz D et al (2018) Draft genome sequence and annotation of the Lichen-forming fungus Arthonia radiata. Genome Announc 6:e00281–e00218. https://doi.org/10.1128/genomeA.00281-18

    Article  PubMed  PubMed Central  Google Scholar 

  19. Carruthers M, Yurchenko AA, Augley JJ et al (2018) De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species. BMC Genomics 19:32. https://doi.org/10.1186/s12864-017-4379-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Teh BT, Lim K, Yong CH et al (2017) The draft genome of tropical fruit durian (Durio zibethinus). Nat Genet 49:1633–1641. https://doi.org/10.1038/ng.3972

    Article  CAS  PubMed  Google Scholar 

  21. Core Team R (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  22. Wickham H (2009) Ggplot2: elegant graphics for data analysis. Springer, New York, NY

    Book  Google Scholar 

  23. Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59. https://doi.org/10.1186/1471-2105-5-59

    Article  PubMed  PubMed Central  Google Scholar 

  24. Blanco E, Parra G, Guigó R (2007) Using geneid to identify genes. In: Baxevanis AD, Davison DB, Page RDM et al (eds) Current protocols in bioinformatics. John Wiley & Sons, Inc., Hoboken, NJ

    Google Scholar 

  25. Borodovsky M, Lomsadze A (2011) Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinformatics 35:4.6.1–4.6.10. https://doi.org/10.1002/0471250953.bi0406s35

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank all members of the Zdobnov group, in particular Felipe Simão and Christopher Rands for their useful comments. This work was partly supported by the Swiss Institute of Bioinformatics SER funding and the Swiss National Science Foundation funding 31003A_166483 to E.Z.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evgeny M. Zdobnov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Seppey, M., Manni, M., Zdobnov, E.M. (2019). BUSCO: Assessing Genome Assembly and Annotation Completeness. In: Kollmar, M. (eds) Gene Prediction. Methods in Molecular Biology, vol 1962. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9173-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9173-0_14

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9172-3

  • Online ISBN: 978-1-4939-9173-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics