Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

False gene and chromosome losses affected by assembly and sequence errors

Juwan Kim, View ORCID ProfileChul Lee, View ORCID ProfileByung June Ko, View ORCID ProfileDongAhn Yoo, View ORCID ProfileSohyoung Won, View ORCID ProfileAdam Phillippy, View ORCID ProfileOlivier Fedrigo, View ORCID ProfileGuojie Zhang, View ORCID ProfileKerstin Howe, View ORCID ProfileJonathan Wood, View ORCID ProfileRichard Durbin, View ORCID ProfileGiulio Formenti, Samara Brown, Lindsey Cantin, View ORCID ProfileClaudio V. Mello, Seoae Cho, View ORCID ProfileArang Rhie, View ORCID ProfileHeebal Kim, Erich D. Jarvis
doi: https://doi.org/10.1101/2021.04.09.438906
Juwan Kim
1Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chul Lee
1Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chul Lee
Byung June Ko
2Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Byung June Ko
DongAhn Yoo
1Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for DongAhn Yoo
Sohyoung Won
1Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sohyoung Won
Adam Phillippy
3Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Adam Phillippy
Olivier Fedrigo
4Vertebrate Genome Lab, The Rockefeller University, New York City, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Olivier Fedrigo
Guojie Zhang
5China National Genebank, BGI-Shenzhen, Shenzhen, China
6Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Guojie Zhang
Kerstin Howe
7Wellcome Sanger Institute, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kerstin Howe
Jonathan Wood
7Wellcome Sanger Institute, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonathan Wood
Richard Durbin
7Wellcome Sanger Institute, Cambridge, UK
8Department of Genetics, University of Cambridge, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Richard Durbin
Giulio Formenti
4Vertebrate Genome Lab, The Rockefeller University, New York City, USA
9Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Giulio Formenti
Samara Brown
9Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lindsey Cantin
9Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Claudio V. Mello
10Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Claudio V. Mello
Seoae Cho
11eGnome, Inc, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Arang Rhie
3Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Arang Rhie
Heebal Kim
1Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
2Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
11eGnome, Inc, Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Heebal Kim
  • For correspondence: heebal@snu.ac.kr ejarvis@rockefeller.edu
Erich D. Jarvis
4Vertebrate Genome Lab, The Rockefeller University, New York City, USA
9Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
12Howard Hughes Medical Institute, Chevy Chase, Maryland USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: heebal@snu.ac.kr ejarvis@rockefeller.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Many genome assemblies have been found to be incomplete and contain misassemblies. The Vertebrate Genomes Project (VGP) has been producing assemblies with an emphasis on being as complete and error-free as possible, utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. Here we evaluate these new vertebrate genome assemblies relative to the previous references for the same species, including a mammal (platypus), two birds (zebra finch, Anna’s hummingbird), and a fish (climbing perch). We found that 3 to 11% of genomic sequence was entirely missing in the previous reference assemblies, which included nearly entire GC-rich and repeat-rich microchromosomes with high gene density. Genome-wide, between 25 to 60% of the genes were either completely or partially missing in the previous assemblies, and this was in part due to a bias in GC-rich 5’-proximal promoters and 5’ exon regions. Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the VGP assemblies.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted April 09, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
False gene and chromosome losses affected by assembly and sequence errors
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
False gene and chromosome losses affected by assembly and sequence errors
Juwan Kim, Chul Lee, Byung June Ko, DongAhn Yoo, Sohyoung Won, Adam Phillippy, Olivier Fedrigo, Guojie Zhang, Kerstin Howe, Jonathan Wood, Richard Durbin, Giulio Formenti, Samara Brown, Lindsey Cantin, Claudio V. Mello, Seoae Cho, Arang Rhie, Heebal Kim, Erich D. Jarvis
bioRxiv 2021.04.09.438906; doi: https://doi.org/10.1101/2021.04.09.438906
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
False gene and chromosome losses affected by assembly and sequence errors
Juwan Kim, Chul Lee, Byung June Ko, DongAhn Yoo, Sohyoung Won, Adam Phillippy, Olivier Fedrigo, Guojie Zhang, Kerstin Howe, Jonathan Wood, Richard Durbin, Giulio Formenti, Samara Brown, Lindsey Cantin, Claudio V. Mello, Seoae Cho, Arang Rhie, Heebal Kim, Erich D. Jarvis
bioRxiv 2021.04.09.438906; doi: https://doi.org/10.1101/2021.04.09.438906

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4384)
  • Biochemistry (9609)
  • Bioengineering (7103)
  • Bioinformatics (24896)
  • Biophysics (12632)
  • Cancer Biology (9974)
  • Cell Biology (14372)
  • Clinical Trials (138)
  • Developmental Biology (7966)
  • Ecology (12124)
  • Epidemiology (2067)
  • Evolutionary Biology (16002)
  • Genetics (10936)
  • Genomics (14755)
  • Immunology (9880)
  • Microbiology (23697)
  • Molecular Biology (9490)
  • Neuroscience (50924)
  • Paleontology (370)
  • Pathology (1541)
  • Pharmacology and Toxicology (2686)
  • Physiology (4023)
  • Plant Biology (8674)
  • Scientific Communication and Education (1511)
  • Synthetic Biology (2402)
  • Systems Biology (6444)
  • Zoology (1346)