Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations

View ORCID ProfileDivon Lan, View ORCID ProfileGludhug Purnomo, View ORCID ProfileRay Tobler, View ORCID ProfileYassine Souilmi, View ORCID ProfileBastien Llamas
doi: https://doi.org/10.1101/2022.07.17.500374
Divon Lan
1Australian Centre for Ancient DNA, School of Biological Sciences, The Environment Institute, Faculty of Sciences, The University of Adelaide, Adelaide SA 5005, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Divon Lan
  • For correspondence: divon@genozip.com bastien.llamas@adelaide.edu.au
Gludhug Purnomo
1Australian Centre for Ancient DNA, School of Biological Sciences, The Environment Institute, Faculty of Sciences, The University of Adelaide, Adelaide SA 5005, Australia
2Centre of Excellence for Australian Biodiversity and Heritage (CABAH), School of Biological Sciences, University of Adelaide, Adelaide, SA 5005, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gludhug Purnomo
Ray Tobler
1Australian Centre for Ancient DNA, School of Biological Sciences, The Environment Institute, Faculty of Sciences, The University of Adelaide, Adelaide SA 5005, Australia
2Centre of Excellence for Australian Biodiversity and Heritage (CABAH), School of Biological Sciences, University of Adelaide, Adelaide, SA 5005, Australia
3Evolution of Cultural Diversity Initiative, Australian National University, College of Asia and the Pacific, Canberra, ACT 0200, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ray Tobler
Yassine Souilmi
1Australian Centre for Ancient DNA, School of Biological Sciences, The Environment Institute, Faculty of Sciences, The University of Adelaide, Adelaide SA 5005, Australia
4National Centre for Indigenous Genomics, Australian National University, Canberra, ACT 0200, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yassine Souilmi
Bastien Llamas
1Australian Centre for Ancient DNA, School of Biological Sciences, The Environment Institute, Faculty of Sciences, The University of Adelaide, Adelaide SA 5005, Australia
2Centre of Excellence for Australian Biodiversity and Heritage (CABAH), School of Biological Sciences, University of Adelaide, Adelaide, SA 5005, Australia
4National Centre for Indigenous Genomics, Australian National University, Canberra, ACT 0200, Australia
5Indigenous Genomics Research Group, Telethon Kids Institute, Adelaide, SA 5000, Australia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bastien Llamas
  • For correspondence: divon@genozip.com bastien.llamas@adelaide.edu.au
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap).

Availability and Implementation Genozip is free for academic use. Documentation is available on https://genozip.com/dvcf.html. Genozip user manual is available on https://genozip.com/manual.html. The source code is available on https://genozip.com/source.html. The scripts for reproducing the benchmarks are available on https://github.com/divonlan/genozip-dvcf-results.

Competing Interest Statement

D.L. intends to receive royalties from commercial users of genozip.

Footnotes

  • https://genozip.com/dvcf.html

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 18, 2022.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations
Divon Lan, Gludhug Purnomo, Ray Tobler, Yassine Souilmi, Bastien Llamas
bioRxiv 2022.07.17.500374; doi: https://doi.org/10.1101/2022.07.17.500374
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations
Divon Lan, Gludhug Purnomo, Ray Tobler, Yassine Souilmi, Bastien Llamas
bioRxiv 2022.07.17.500374; doi: https://doi.org/10.1101/2022.07.17.500374

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4095)
  • Biochemistry (8788)
  • Bioengineering (6494)
  • Bioinformatics (23400)
  • Biophysics (11766)
  • Cancer Biology (9171)
  • Cell Biology (13292)
  • Clinical Trials (138)
  • Developmental Biology (7423)
  • Ecology (11390)
  • Epidemiology (2066)
  • Evolutionary Biology (15122)
  • Genetics (10415)
  • Genomics (14026)
  • Immunology (9153)
  • Microbiology (22113)
  • Molecular Biology (8793)
  • Neuroscience (47461)
  • Paleontology (350)
  • Pathology (1423)
  • Pharmacology and Toxicology (2486)
  • Physiology (3712)
  • Plant Biology (8069)
  • Scientific Communication and Education (1433)
  • Synthetic Biology (2216)
  • Systems Biology (6022)
  • Zoology (1251)