Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps

View ORCID ProfilePrashant S. Hosmani, View ORCID ProfileMirella Flores-Gonzalez, View ORCID ProfileHenri van de Geest, View ORCID ProfileFlorian Maumus, Linda V. Bakker, Elio Schijlen, View ORCID ProfileJan van Haarst, Jan Cordewener, View ORCID ProfileGabino Sanchez-Perez, View ORCID ProfileSander Peters, View ORCID ProfileZhangjun Fei, View ORCID ProfileJames J. Giovannoni, View ORCID ProfileLukas A. Mueller, View ORCID ProfileSurya Saha
doi: https://doi.org/10.1101/767764
Prashant S. Hosmani
1Boyce Thompson Institute, Ithaca, NY 14850, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Prashant S. Hosmani
Mirella Flores-Gonzalez
1Boyce Thompson Institute, Ithaca, NY 14850, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mirella Flores-Gonzalez
Henri van de Geest
2Applied Bioinformatics Group, Wageningen University & Research, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Henri van de Geest
Florian Maumus
3URGI, INRA, Universite’ Paris-Saclay, Versailles, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Florian Maumus
Linda V. Bakker
2Applied Bioinformatics Group, Wageningen University & Research, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elio Schijlen
2Applied Bioinformatics Group, Wageningen University & Research, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jan van Haarst
2Applied Bioinformatics Group, Wageningen University & Research, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jan van Haarst
Jan Cordewener
2Applied Bioinformatics Group, Wageningen University & Research, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gabino Sanchez-Perez
2Applied Bioinformatics Group, Wageningen University & Research, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gabino Sanchez-Perez
Sander Peters
2Applied Bioinformatics Group, Wageningen University & Research, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sander Peters
Zhangjun Fei
1Boyce Thompson Institute, Ithaca, NY 14850, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zhangjun Fei
James J. Giovannoni
1Boyce Thompson Institute, Ithaca, NY 14850, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for James J. Giovannoni
Lukas A. Mueller
1Boyce Thompson Institute, Ithaca, NY 14850, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lukas A. Mueller
Surya Saha
1Boyce Thompson Institute, Ithaca, NY 14850, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Surya Saha
  • For correspondence: ss2489@cornell.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

The original Heinz 1706 reference genome was produced by a large team of scientists from across the globe from a variety of input sources that included 454 sequences in addition to full-length BACs, BAC and fosmid ends sequenced with Sanger technology. We present here the latest tomato reference genome (SL4.0) assembled de novo from PacBio long reads and scaffolded using Hi-C contact maps. The assembly was validated using Bionano optical maps and 10X linked-read sequences. This assembly is highly contiguous with fewer gaps compared to previous genome builds and almost all scaffolds have been anchored and oriented to the 12 tomato chromosomes. We have found more repeats compared to the previous versions and one of the largest repeat classes identified are the LTR retrotransposons. We also describe updates to the reference genome and annotation since the last publication. The corresponding ITAG4.0 annotation has 4,794 novel genes along with 29,281 genes preserved from ITAG2.4. Most of the updated genes have extensions in the 5’ and 3’ UTRs resulting in doubling of annotated UTRs per gene. The genome and annotation can be accessed using SGN through BLAST database, Pathway database (SolCyc), Apollo, JBrowse genome browser and FTP available at https://solgenomics.net.

Footnotes

  • https://solgenomics.net

  • https://solgenomics.net/organism/Solanum_lycopersicum/genome

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted September 14, 2019.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps
Prashant S. Hosmani, Mirella Flores-Gonzalez, Henri van de Geest, Florian Maumus, Linda V. Bakker, Elio Schijlen, Jan van Haarst, Jan Cordewener, Gabino Sanchez-Perez, Sander Peters, Zhangjun Fei, James J. Giovannoni, Lukas A. Mueller, Surya Saha
bioRxiv 767764; doi: https://doi.org/10.1101/767764
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps
Prashant S. Hosmani, Mirella Flores-Gonzalez, Henri van de Geest, Florian Maumus, Linda V. Bakker, Elio Schijlen, Jan van Haarst, Jan Cordewener, Gabino Sanchez-Perez, Sander Peters, Zhangjun Fei, James J. Giovannoni, Lukas A. Mueller, Surya Saha
bioRxiv 767764; doi: https://doi.org/10.1101/767764

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4838)
  • Biochemistry (10738)
  • Bioengineering (8016)
  • Bioinformatics (27182)
  • Biophysics (13939)
  • Cancer Biology (11083)
  • Cell Biology (15987)
  • Clinical Trials (138)
  • Developmental Biology (8758)
  • Ecology (13241)
  • Epidemiology (2067)
  • Evolutionary Biology (17316)
  • Genetics (11665)
  • Genomics (15885)
  • Immunology (10991)
  • Microbiology (25995)
  • Molecular Biology (10608)
  • Neuroscience (56355)
  • Paleontology (417)
  • Pathology (1728)
  • Pharmacology and Toxicology (2999)
  • Physiology (4530)
  • Plant Biology (9590)
  • Scientific Communication and Education (1610)
  • Synthetic Biology (2671)
  • Systems Biology (6960)
  • Zoology (1507)