Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Extensive gene duplication in Arabidopsis revealed by pseudo-heterozygosity

Benjamin Jaegle, Luz Mayela Soto-Jiménez, Robin Burns, Fernando A. Rabanal, View ORCID ProfileMagnus Nordborg
doi: https://doi.org/10.1101/2021.11.15.468652
Benjamin Jaegle
1Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Luz Mayela Soto-Jiménez
1Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robin Burns
1Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
3Department of Plant Sciences, University of Cambridge, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fernando A. Rabanal
2Max Planck Institute for Developmental Biology, Tübingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Magnus Nordborg
1Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Magnus Nordborg
  • For correspondence: magnus.nordborg@gmi.oeaw.ac.at
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background It is becoming apparent that genomes harbor massive amounts of structural variation, and that this variation has largely gone undetected for technical reasons. In addition to being inherently interesting, structural variation can cause artifacts when short-read sequencing data are mapped to a reference genome. In particular, spurious SNPs (that do not show Mendelian segregation) may result from mapping of reads to duplicated regions. Recalling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million heterozygous SNPs (44% of total). Given that Arabidopsis thaliana (A. thaliana) is highly selfing, we hypothesized that these SNPs reflected cryptic copy number variation, and investigated them further.

Results While genuine heterozygosity should occur in tracts within individuals, heterozygosity at a particular locus is instead shared across individuals in a manner that strongly suggests it reflects segregating duplications rather than actual heterozygosity. Focusing on pseudo-heterozygosity in annotated genes, we used GWAS to map the position of the duplicates, identifying 2500 putatively duplicated genes. The results were validated using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that, in fact, transpose together.

Conclusions Our study confirms that most heterozygous SNPs calls in A. thaliana are artifacts, and suggest that great caution is needed when analysing SNP data from short-read sequencing. The finding that 10% of annotated genes are copy-number variables, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggest that future analyses based on independently assembled genomes will be very informative.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • https://github.com/benjj212/duplication-paper.git

  • https://doi.org/10.5281/zenodo.5702395

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted November 16, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Extensive gene duplication in Arabidopsis revealed by pseudo-heterozygosity
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Extensive gene duplication in Arabidopsis revealed by pseudo-heterozygosity
Benjamin Jaegle, Luz Mayela Soto-Jiménez, Robin Burns, Fernando A. Rabanal, Magnus Nordborg
bioRxiv 2021.11.15.468652; doi: https://doi.org/10.1101/2021.11.15.468652
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Extensive gene duplication in Arabidopsis revealed by pseudo-heterozygosity
Benjamin Jaegle, Luz Mayela Soto-Jiménez, Robin Burns, Fernando A. Rabanal, Magnus Nordborg
bioRxiv 2021.11.15.468652; doi: https://doi.org/10.1101/2021.11.15.468652

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4239)
  • Biochemistry (9171)
  • Bioengineering (6804)
  • Bioinformatics (24062)
  • Biophysics (12154)
  • Cancer Biology (9564)
  • Cell Biology (13825)
  • Clinical Trials (138)
  • Developmental Biology (7656)
  • Ecology (11736)
  • Epidemiology (2066)
  • Evolutionary Biology (15540)
  • Genetics (10670)
  • Genomics (14358)
  • Immunology (9511)
  • Microbiology (22901)
  • Molecular Biology (9129)
  • Neuroscience (49112)
  • Paleontology (357)
  • Pathology (1487)
  • Pharmacology and Toxicology (2583)
  • Physiology (3851)
  • Plant Biology (8351)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2301)
  • Systems Biology (6205)
  • Zoology (1302)