cyvcf2: fast, flexible variant analysis with Python

Bioinformatics. 2017 Jun 15;33(12):1867-1869. doi: 10.1093/bioinformatics/btx057.

Abstract

Motivation: Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files.

Results: We introduce cyvcf2 , a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility.

Contact: bpederse@gmail.com or aaronquinlan@gmail.com.

Availability and implementation: cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/.

MeSH terms

  • Genetic Variation*
  • Genotyping Techniques / methods*
  • Humans
  • Metadata
  • Sequence Analysis, DNA / methods*
  • Software*