A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

Bioinformatics. 2015 Jan 15;31(2):268-70. doi: 10.1093/bioinformatics/btu630. Epub 2014 Sep 30.

Abstract

Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics.

Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs).

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Computational Biology / methods*
  • Databases, Factual*
  • Genetic Variation / genetics*
  • Genome, Human*
  • Humans
  • Molecular Sequence Annotation
  • Software*
  • Terminology as Topic*