Cutevariant: a GUI-based desktop application to explore genetics variations

Sacha Schutz; Pierre Marijon; Tristan Montier; Emmanuelle Genin

doi:10.1101/2021.02.10.430619

Abstract

Cutevariant is a user-friendly GUI based desktop application for genomic research designed to search for variations in DNA samples collected in annotated files and encoded in the Variant Calling Format. The application imports data into a local relational database wherefrom complex filter-queries can be built either from the intuitive GUI or using a Domain Specific Language (DSL). Cutevariant provides more features than any existing applications without compromising on performance. The plugin based architecture provides highly customizable features. Cutevariant is distributed as a multiplatform client-side software under an open source licence and is available at https://github.com/labsquare/Cutevariant. It has been designed from the beginning to be easily adopted by IT-agnostic end-users.

Introduction

Next-Generation Sequencing (NGS) has opened new opportunities in genomic research such as identification of DNA variations from Genome, Exome or Panel experiments. These data are delivered as files encoded in the standard Variant Calling Format (VCF version 4.0) [1] where the variations are listed together with the genotype information of different samples. Tools such as VEP [2] or SnpSift [3] can be use to add annotations such as genes or functional impact. Biologists can then filter out variants applying customized criteria on these annotations. In medicine, the identification of mutations in rare diseases would be a typical use case. This filtering procedure implements sophisticated software tools that can be easily adopted by end-users who are not necessarily IT-aware.

Several management systems have been developed to ease the usage of the filtering step. GEMINI [4] and VariantTools [5] are command line applications where data from the VCF files are loaded into a relational database managed by SQLite [6]. Filtering can thus be made very efficient using the SQL query syntax. Other tools such as SnpSift [3] or BCFtools [7] apply filters directly while reading the VCF files line by line, thus avoiding the need to create an intermediate data structure. This comes at the cost of poor timing efficiency especially when it is necessary to sort or group variants. While these tools are quite flexible allowing any kind of filtering, the command line interface is not very intuitive, thus reducing the incentive to use it for non IT-specialists.

This called for the development of applications steered by user-friendly Graphical User Interfaces (GUI). Some specializing in diagnostics offer online solutions with a complete set of patient management features but require uploading the VCF files. The most popular of the kind are either private software such as SeqOne [8] and or those distributed under the open source licence such as the recently published VarFish [9]. A major drawbacks of this scheme comes from the transit of a large amount of genetic data through public networks raising on one hand confidentiality and performance issues, and requiring on the other hand a dedicated server which might not be available for every end-users. Moreover, these solutions are tailored for human species data and therefore cannot be adopted for all end-users. GUI Applications that do not require a server and offering an out-of-the-box solution are therefore a preferable solution. The web-based applications VCFMiner [10], BrowseVCF [11] and VCF.Filter [12] implement such a solution. VCFMiner is distributed as a package container running with Docker [13] requiring thus a customized desktop configuration. BrowseVCF provides its own launcher making it quite user friendly but the application is not supported anymore. Both applications import the data from VCF files into an indexed database and provide different GUI forms to create filters. Their main drawback resides in the limited filter settings available through the GUI, complex filters requiring a domain specific language. In addition, web applications offer poor timing performances compared to native desktop applications. Despite the availability of these tools, many biologists still use Microsoft Excel to filter their variants and are facing severe problems [14]. To address the shortcomings of the existing applications, we have developed Cutevariant, a user-friendly and ergonomic desktop application implemented in Python within the Qt5 framework. It takes full advantage of both a GUI and command line user-interface, a Domain Specific Language called VQL allowing the user to build complex filter expressions. It is distributed as a multi-platform client-side software under an open source licence. Thanks to an architecture based on plugins, Cutevariant is fully customizable, allowing to easily extend the application with additional features.

Materials and methods

VCF file importation and preprocessing

Cutevariant imports data from VCF files (with or without SnpEff / Vep annotation) into a normalized SQLite database (Figure 1) stored as a *.db file, and optionally with a PED file to describe affected samples and their relationship. Fields from variants and annotations tables are dynamically created according to the content of the VCF file. This importation step proceeds using a VCF parser to produce json-like arrays tailored for populating the SQLite database. It is based on a strategy design pattern so that any formats can be supported by subclassing an abstract Reader object. The available distribution supports raw VCF files and VCF files annotated with VEP or SnpEff following the ANN specifications [15].

Fig. 1:

Cutevariant database schema. Only mandatory fields are displayed. fields_n are dynamically created during the import step based on the content of the VCF file

Before importation into the database, data are cleaned and normalized following the same procedure as the VT norm [16] application: single lines of multi-allelic variants are split into multiple lines. Computed annotations, not present in the original file, are automatically created. As for example, the count var field contains the number of samples that carry the variant. It is thus possible to filter variants present in more than N samples by filtering on this column. This feature is similar to countVar() from the SnpSift [3] filter command.

From the Cutevariant main window, the new project button starts a wizard and triggers the importation process. Depending on the size of the input, the importation and indexation process might take some time but this has only minimal impact on the performance since this step is performed only once. Alternatively, VCF files import can be triggered from the command line using the Cutevariant-cli button. This feature offers to knowledgeable experts the possibility to integrate the import process at the end of a pipeline.

User interface layout

The main view (Figure 2) of the Cutevariant GUI displays the list of variants together with their annotations. Several GUI controllers allow the user to update the view and display the list in different formats.

Fig. 2:

The Cutevariant main view showing the variants list sub-window (middle), different controllers sub-windows but not all are displayed (left) and the VQL editor sub-window (bottom).

fields_editor: to show or hide selected annotations.
filter_editor: to build a nested list of conditional rules with OR/AND binary operators.
variant_ info: to display in an organised way all annotations related to the currently selected variant.
source _editor: to manage different views and perform set operations (union, intersection, difference) and bed file intersections.
word _set: to manage lists of words used to generate simple filters, e.g., filter all variants belonging to a given gene list or a dbSNP list.

Most of these actions end up building a VQL query that can be checked in the VQL-editor sub-window. The variants list can then be updated either with the controllers or by editing the VQL query directly.

Variant Query Language (VQL)

To facilitate the composition of complex query-filters, the application integrates a Domain Specific Language (DSL) named Variant Query Language (VQL). The syntax of VQL has been designed to look like a subset of the SQL language working on a virtual database schema. It makes use of the Python module textX [17] which provides several tools to define a grammar and create parsers with an Abstract Syntax Tree. VQL queries can be composed in the VQL editor sub-window. However, to avoid forcing users to learn the VQL language, a query can as well be defined from the GUI using the different available controller sub-window listed above. The VQL query is translated through the intermediary of a JSON object into a well formatted SQL query and processed by the SQLite database manager.

As an example, the following VQL query:

is translated into the following SQL query:

Filter expressions

Filter expressions are defined from the VQL WHERE clause. From the filter editor, it is displayed as a nested set of editable condition rules. Logical (AND/OR) and arithmetic (=, <, >, ≤, ≥, /=, IN, NOT IN, IS NULL) operators are supported. Regular expression using the binary ones complement operator (∼) and a special WORDSET keyword are included as well. This keyword allows the user to test if a fields belongs to a set of words defined a priori. For instance, in VQL, to select all variants from a list of a user-defined genes:

Group variants

The GROUP BY keyword allows the user to split the view in two panels: left the list of groups and right the list of all variants belonging to the selected group. With this feature the exploration is made easier by, for instance, grouping variants by genes helping to detect compound heterozygous.

Set operation

Just like Variant Tools, Cutevariant supports operations between variant sets. Each query result can be stored in a view using the CREATE VQL keywords or by clicking the corresponding GUI button. For instance, the following query will create a new view called new view.

It is then possible to build a query directly from this view. The following query returns the same output as the previous one:

Each view behaves as a set with three operations available (difference, intersection, union) by comparing variants fields on chr, pos, ref and alt. The following queries show how to create a new view based on different set operation:

Plugins architectures

The Cutevariant GUI architecture relies entirely on plugins which source is available in the plugins directory. A plugin consists of a module containing different Python files implementing the creation of a Plugin class instance with several overloaded virtual methods. Adding or removing GUI controllers becomes therefore straightforward.

In addition, similarly to excel, cells of the variant view can be formatted conditionally. By subclassing the Formatter class, one can change the style of the cell with different colors, text or icons according to the value of the cell. For instance, impact fields with HIGH as value can be displayed with a red background to catch the user’s attention. Currently, Cutevariant supports only one formatters: cuteStyle.

Cutevariant allows the user to build a custom URL from a variant and open it from an external application. This is used for example to open a web link on a dbSNP database or to show BAM alignment from IGV software at the corresponding variant location.

With plugins, experienced users can customize Cutevariant with dedicated features or create new ones and share them with the users community.

Technical details and continuous integration

Cutevariant is a cross platform application implemented in Python 3.7 using the Qt5 framework for the user interface (PySide2 ≥ 5.11). The VCF parser uses the PyVCF ≥ 0.6.8 library. Syntax and parser of the VQL language rely on the textX ≥ 1.8.0 library. SQLite3 is the database manager interfaced with the Python standard library. The source code and documentation are available on GitHub [18]. Continuous integration are made on GitHub-CI and unit tests are made with the Pytest framework [19]. The application is distributed as windows 32 bits and 64 bits packages. Cutevariant is also available as a Python package from the Python Package Index Pypi [20].

Results

In Table 1 we list the features available in Cutevariant compared to other applications available on the market.

View this table:

Table 1.

Features available in various applications available on the market.

Cutevariant timing performances for executing importation and query action are reported in Table 2 and compared to the timing performances of VCF-Miner. Other GUI applications could not be tested with our data set for several reasons: parsing error with BrowseVCF, upload size-limit for VCF-Server or lack of indexed database for VCF-Filters and VCF-Explorer. Cutevariant outperforms VCF-Miner except for 1KG.chr22.anno.vcf because of the large number of samples required to compute the joint tables between samples and variants.

View this table:

Table 2.

Comparaison of time performance between cutevariant and VCF-miner for importation and query execution. The query used filters variants with QUAL ≥30 and DEPTH ≥ 30. Executed on Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz with 16Gb RAM

Use case 1: Sars-CoV-2-Analysis

In the context of the Covid-19 pandemia, we have tested Cutevariant to identify mutations along the genome of the Sars-Cov-2 virus. For this, we have downloaded from the ENA database, a dataset (PRJNA673096) with 245 samples stored in a Fastq file produced by the Illumina sequencing plateform using an amplicon librarie. The pipeline is available on github [21].The data originate from the US Delaware Public Health Laboratory. Fastq files have been aligned on the NC045512.2 genome of Sars-CoV-2 with the BWA software [22]. Variants have been called with the FreeBayes application [23] and all 245 samples have been merged into one single VCF file annotated with SnpEff[24]. This file has been imported into

Cutevariant for exploration. We executed a VQL statements (Fig. 4) to extract variants within the gene S and sorted the result by count var annotation showing the total number of samples carrying the variant. The sorting process is easily done by clicking on the corresponding header of the view. The mutation p.asp614Gly (highlighted in Fig. 4) is found in 239 samples out of 245. This variant has already been described [25] as a dominant one emerging at the beginning of the pandemia. In the same way, by scrutinizing all the genes, we have identified two others mutation: (ORF1ab)p.Thr265Ile and (ORF3a)p.Gln57His which are exclusive to the North American population [26].

Fig. 3:

Abstract Syntax Tree (AST) of the VQL query SELECT chr,pos,consequence FROM variants WHERE gene=‘CFTR’ AND impact=‘HIGH’. The AST is parsed into a Python object.

Fig. 4:

Mutation found in gene S of Sars-Cov-2 by a Cutevariant analysis of 245 samples.

Use case 2: Cohort analysis

We have repeated with Cutevariant the analysis given as an example by SnpSift [27]. It is a cohort analysis of 17 individuals among which 3 are affected by a nonsense mutation in the CFTR gene (G542*). This analysis cannot be performed with any of the graphics application listed previously (Table 1). After importing the annotated VCF file and the corresponding PED file, the following VQL query was processed by Cutevariant selecting variants with HIGH impact which are homozygous in case samples but are not in control samples. SnpSift uses the following query:

The Cutevariant equivalent VQL query providing the same results reads as:

Discussion

Performance

Cutevariant is implemented within the open-source Qt for Python [28] that provides a set of Python bindings to build modern user interface. Instead of using native Qt/C++ as coding language, we have opted for Python because it is by far the most frequently used coding language in the bioinformatics community. This choice does not cause any significant performance degradation of the Cutevariant GUI. Execution time for queries performed on a complete genome with many filters can become particularly slow. This long execution time is primarily due to the SQL COUNT statement which browses through all the variants to calculate the total number of variants. The table JOIN statement is also time consuming. This is the consequence of the choice made for Curevariant, unlike GEMINI, to store samples and a few annotations in separate tables to avoid table denormalization and to minimize disk space occupation. This time penalty has been minimized on one hand by using a memory cache so that identical VQL queries do not need to recalculate the count of variants and, on the other hand, by using asynchronous queries performed in dedicated threads, thus avoiding to freeze the GUI with the progress bar showing the loading status.

Web app vs Desktop app

Cutevariant is a serverless desktop application and therefore does not provide annotation- or multiuser-features. The annotation step must be carried out upstream at the end of an analysis pipeline by using dedicated tools such as SnpSift or VEP. Multi-users capabilities allow users to share custom annotations and comments. For instance, a user marks a variant as pathogenic and this information is shared among all users. Although this feature is not supported by Cutevariant, it can be delegated to other tools such as MyVariant.info [29]. It provides a database of variants with which Cutevariant can communicate through a REST API. These data can then be used as a source of annotation in the annotation step of the pipeline.

A general purpose and customizable tool

Cutevariant is a general purpose tool to filter variants and is fully customizable thanks to its plugin-based implementation and thus offers features and modularity that are not available with existing applications. Since Cutevariant is not specific to the analysis of the human genome, it can be use with any VCF file as we demonstrated here with the Sars-Cov-2 example. GUI options dedicated to specific tasks are not hard coded in the application but can easily be added to Cutevariant by creating new plugins. As an example of such added GUI options, the Trio Analysis plugin selected from the Tools menu users to build from the GUI a VQL filter including transmission mode and the family tree.

Conclusion

Cutevariant is a new desktop application devoted to explore genetic variations in VCF data provided by next generation sequencing. It is the first GUI software of the kind that integrates both a user friendly graphical user interface and a domain specific language. Starting from a low learning threshold, end-users can easily perform complex filtering to identify variants of interest. Cutevariant is a standalone application that runs on standard desktop computers either under Linux, MacOS or Windows operating systems. The python-based plugins architecture makes the application easily expandable with the addition of new features, thus offering the possibility to involve the biocomputer scientists community at large in new features developments.

Funding

This work has been supported by UBO, Université de Bretagne Occidentale, France. Conflict of Interest: none declared

Acknowledgments

We would like to thank Lucas Bourneuf and Pierre Vignet for their contributions.

Footnotes

https://labsquare.github.io/cutevariant/

References

1.↵
Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean, and Richard Durbin. The variant call format and VCFtools. Bioinformatics, 27:2156–2158, 8 2011.
OpenUrl CrossRef PubMed Web of Science
2.↵
William McLaren, Laurent Gil, Sarah E. Hunt, Harpreet Singh Riat, Graham R.S. Ritchie, Anja Thormann, Paul Flicek, and Fiona Cunningham. The ensemble variant effect predictor. Genome Biology, 17:1–14, 6 2016.
OpenUrl CrossRef PubMed
3.↵
Pablo Cingolani, Adrian Platts, Le Lily Wang, Melissa Coon, Tung Nguyen, Luan Wang, Susan J. Land, Xiangyi Lu, and Douglas M. Ruden. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6:80–92, 2012.
OpenUrl CrossRef PubMed Web of Science
4.↵
Umadevi Paila, Brad A. Chapman, Rory Kirchner, and Aaron R. Quinlan. GEMINI: Integrative exploration of genetic variation and genome annotations. PLoS Computational Biology, 9, 7 2013.
5.↵
Gao T. Wang, Bo Peng, and Suzanne M. Leal. Variant association tools for quality control and analysis of large-scale sequence and genotyping array data. American Journal of Human Genetics, 94:770–783, 5 2014.
OpenUrl CrossRef PubMed
6.↵
Richard D Hipp. SQLite. https://www.sqlite.org/index.html, 2020.
7.↵
Heng Li. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21):2987–2993, 09 2011.
OpenUrl CrossRef PubMed Web of Science
8.↵
SeqOne.https://seq.one/, mJan 2021.
9.↵
Manuel Holtgrewe, Oliver Stolpe, Mikko Nieminen, Stefan Mundlos, Alexej Knaus, Uwe Kornak, Dominik Seelow, Lara Segebrecht, Malte Spielmann, Björn Fischer-Zirnsak, Felix Boschann, Ute Scholl, Nadja Ehmke, and Dieter Beule. VarFish: comprehensive DNA variant analysis for diagnostics and research. Nucleic Acids Research, 48(W1):W162–W169, 04 2020.
OpenUrl
10.↵
Steven N. Hart, Patrick Duffy, Daniel J. Quest, Asif Hossain, Mike A Meiners, and Jean-Pierre Kocher. VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Briefings in Bioinformatics, 17(2):346–351, 07 2015.
OpenUrl PubMed
11.↵
W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and a. D. Haussler. The human genome browser at UCSC. Genome Res., 12(6):996–1006, 06 2002.
OpenUrl Abstract/FREE Full Text
12.↵
Heiko Müller, Raul Jimenez-Heredia, Ana Krolo, Tatjana Hirschmugl, Jasmin Dmytrus, Kaan Boztug, and Christoph Bock. VCF.Filter: interactive prioritization of disease-linked genetic variants from sequencing data. Nucleic Acids Research, 45(W1):W567–W572, 05 2017.
OpenUrl CrossRef
13.↵
Empowering app development for developers. https://www.docker.com/.
14.↵
Mark Ziemann, Yotam Eren, and Assam El-Osta. Gene name errors are widespread in the scientific literature. Genome Biology, 17, 8 2016.
15.↵
Pablo Cingolani, Fiona Cunningham, Will Mclaren, and Kai Wang. Variant annotations in VCF format. http://www.ensembl.org/Help/Glossary?id=492.
16.↵
Adrian Tan, Gon·alo R. Abecasis, and Hyun Min Kang. Unified representation of genetic variants. Bioinformatics, 31(13):2202–2204, 02 2015.
OpenUrl CrossRef PubMed
17.↵
I. Dejanović, R. Vaderna, G. Milosavljević, and Vuković. TextX: A Python tool for Domain-Specific Languages implementation. Knowledge-Based Systems, 115:1–4, 1 2017.
OpenUrl
18.↵
Cutevariant.https://github.com/labsquare/cutevariant.
19.↵
Pytest.https://docs.pytest.org/en/stable.
20.↵
Python Package Index. https://pypi.org/.
21.↵
covid pipeline. https://github.com/dridk/Sars-CoV-2-NGS-pipeline.
22.↵
Heng Li and Richard Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25:1754–1760, 7 2009.
OpenUrl CrossRef PubMed Web of Science
23.↵
Erik Garrison and Gabor Marth. Haplotype-based variant detection from short-read sequencing. http://arxiv.org/abs/1207.3907, x7 2012.
24.↵
P. Cingolani, A. Platts, M. Coon, T. Nguyen, L. Wang, S.J. Land, X. Lu, and D.M. Ruden. A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2):80–92, 2012.
OpenUrl CrossRef PubMed Web of Science
25.↵
Bette Korber, Will M. Fischer, Sandrasegaram Gnanakaran, Hyejin Yoon, James Theiler, Werner Abfalterer, Nick Hengartner, Elena E. Giorgi, Tanmoy Bhattacharya, Brian Foley, Kathryn M. Hastie, Matthew D. Parker, David G. Partridge, Cariad M. Evans, Timothy M. Freeman, Thushan I. de Silva, Adrienne Angyal, Rebecca L. Brown, Laura Carrilero, Luke R. Green, Danielle C. Groves, Katie J. Johnson, Alexander J. Keeley, Benjamin B. Lindsey, Paul J. Parsons, Mohammad Raza, Sarah Rowland-Jones, Nikki Smith, Rachel M. Tucker, Dennis Wang, Matthew D. Wyles, Charlene McDanal, Lautaro G. Perez, Haili Tang, Alex Moon-Walker, Sean P. Whelan, Celia C. LaBranche, Erica O. Saphire, and David C. Montefiori. Tracking changes in sars-cov-2 spike: Evidence that d614g increases infectivity of the covid-19 virus. Cell, 182:812–827.e19, 8 2020.
OpenUrl CrossRef PubMed
26.↵
Xumin Ou, Zhishuang Yang, Dekang Zhu, Sai Mao, Mingshu Wang, Renyong Jia, Shun Chen, Mafeng Liu, Qiao Yang, Ying Wu, Xinxin Zhao, Shaqiu Zhang, Juan huang, Qun Gao, Yunya Liu, Ling Zhang, Maikel Peopplenbosch, Qiuwei Pan, and Anchun Cheng. Tracing two causative snps reveals sars-cov-2 transmission in north america population. bioRxiv, page 2020.05.12.092056, 5 2020.
27.↵
Snpeff usage example.https://pcingola.github.io/SnpEff/examples/.
28.↵
The Qt Company. Qt for Python: The official Python bindings for Qt. https://www.qt.io/qt-for-python.
29.↵
Variant annotation as a service.https://myvariant.info/.
30.
Silvia Salatino and Varun Ramraj. BrowseVCF: a web-based application and workflow to quickly prioritize disease-causative variants in VCF files. Briefings in bioinformatics, 18:774–779, 9 2017.
OpenUrl
31.
Steven N. Hart, Patrick Duffy, Daniel J. Quest, Asif Hossain, Mike A. Meiners, and Jean Pierre Kocher. VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Briefings in Bioinformatics, 17:346–351, 3 2016.
OpenUrl CrossRef PubMed
32.
Jianping Jiang, Jianlei Gu, Tingting Zhao, and Hui Lu. VCF-Server: A web-based visualization tool for high-throughput variant data mining and management. Molecular Genetics and Genomic Medicine, 7, 7 2019.
33.
F. Anthony San lucas, Gao Wang, Paul Scheet, and Bo Peng. Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools. Bioinformatics, 28:421–422, 2 2012.
OpenUrl CrossRef PubMed Web of Science
34.
The Qt Company. Cross-platform software development for embedded and desktop. https://www.qt.io/.
35.
Manuel Holtgrewe, Oliver Stolpe, Mikko Nieminen, Stefan Mundlos, Alexej Knaus, Uwe Kornak, Dominik Seelow, Lara Segebrecht, Malte Spielmann, Björn Fischer-Zirnsak, Felix Boschann, Ute Scholl, Nadja Ehmke, and Dieter Beule. VarFish: comprehensive DNA variant analysis for diagnostics and research. Nucleic acids research, 48:W162– W169, 7 2020.
OpenUrl
36.
Damian Smedley, Julius O B Jacobsen, Marten Jager, Sebastian Köhler, Manuel Holtgrewe, Max Schubach, Enrico Siragusa, Tomasz Zemojtel, Orion J Buske, Nicole L Washington, William P Bone, Melissa A Haendel, and Peter N Robinson. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nature protocols, 10:2004, 2015.
OpenUrl
37.
DNA sequencing. https://www.integragen.com/service-solutions/dna-sequencing, Oct 2020.
38.
Adrian Tan, Gon·alo R. Abecasis, and Hyun Min Kang. Unified representation of genetic variants. Bioinformatics, 31:2202–2204, 7 2015.
OpenUrl CrossRef PubMed

View the discussion thread.

Posted February 15, 2021.

Download PDF

Data/Code

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (5215)
Biochemistry (11752)
Bioengineering (8752)
Bioinformatics (29200)
Biophysics (14974)
Cancer Biology (12096)
Cell Biology (17411)
Clinical Trials (138)
Developmental Biology (9421)
Ecology (14182)
Epidemiology (2067)
Evolutionary Biology (18308)
Genetics (12245)
Genomics (16803)
Immunology (11869)
Microbiology (28097)
Molecular Biology (11594)
Neuroscience (60969)
Paleontology (451)
Pathology (1871)
Pharmacology and Toxicology (3238)
Physiology (4959)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2886)
Systems Biology (7340)
Zoology (1651)

[1] 1.↵
Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean, and Richard Durbin. The variant call format and VCFtools. Bioinformatics, 27:2156–2158, 8 2011.
OpenUrl CrossRef PubMed Web of Science

[2] 2.↵
William McLaren, Laurent Gil, Sarah E. Hunt, Harpreet Singh Riat, Graham R.S. Ritchie, Anja Thormann, Paul Flicek, and Fiona Cunningham. The ensemble variant effect predictor. Genome Biology, 17:1–14, 6 2016.
OpenUrl CrossRef PubMed

[3] 3.↵
Pablo Cingolani, Adrian Platts, Le Lily Wang, Melissa Coon, Tung Nguyen, Luan Wang, Susan J. Land, Xiangyi Lu, and Douglas M. Ruden. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6:80–92, 2012.
OpenUrl CrossRef PubMed Web of Science

[4] 4.↵
Umadevi Paila, Brad A. Chapman, Rory Kirchner, and Aaron R. Quinlan. GEMINI: Integrative exploration of genetic variation and genome annotations. PLoS Computational Biology, 9, 7 2013.

[5] 5.↵
Gao T. Wang, Bo Peng, and Suzanne M. Leal. Variant association tools for quality control and analysis of large-scale sequence and genotyping array data. American Journal of Human Genetics, 94:770–783, 5 2014.
OpenUrl CrossRef PubMed

[6] 6.↵
Richard D Hipp. SQLite. https://www.sqlite.org/index.html, 2020.

[7] 7.↵
Heng Li. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21):2987–2993, 09 2011.
OpenUrl CrossRef PubMed Web of Science

[8] 8.↵
SeqOne.https://seq.one/, mJan 2021.

[9] 9.↵
Manuel Holtgrewe, Oliver Stolpe, Mikko Nieminen, Stefan Mundlos, Alexej Knaus, Uwe Kornak, Dominik Seelow, Lara Segebrecht, Malte Spielmann, Björn Fischer-Zirnsak, Felix Boschann, Ute Scholl, Nadja Ehmke, and Dieter Beule. VarFish: comprehensive DNA variant analysis for diagnostics and research. Nucleic Acids Research, 48(W1):W162–W169, 04 2020.
OpenUrl

[10] 10.↵
Steven N. Hart, Patrick Duffy, Daniel J. Quest, Asif Hossain, Mike A Meiners, and Jean-Pierre Kocher. VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Briefings in Bioinformatics, 17(2):346–351, 07 2015.
OpenUrl PubMed

[11] 11.↵
W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and a. D. Haussler. The human genome browser at UCSC. Genome Res., 12(6):996–1006, 06 2002.
OpenUrl Abstract/FREE Full Text

[12] 12.↵
Heiko Müller, Raul Jimenez-Heredia, Ana Krolo, Tatjana Hirschmugl, Jasmin Dmytrus, Kaan Boztug, and Christoph Bock. VCF.Filter: interactive prioritization of disease-linked genetic variants from sequencing data. Nucleic Acids Research, 45(W1):W567–W572, 05 2017.
OpenUrl CrossRef

[13] 13.↵
Empowering app development for developers. https://www.docker.com/.

[14] 14.↵
Mark Ziemann, Yotam Eren, and Assam El-Osta. Gene name errors are widespread in the scientific literature. Genome Biology, 17, 8 2016.

[15] 15.↵
Pablo Cingolani, Fiona Cunningham, Will Mclaren, and Kai Wang. Variant annotations in VCF format. http://www.ensembl.org/Help/Glossary?id=492.

[16] 16.↵
Adrian Tan, Gon·alo R. Abecasis, and Hyun Min Kang. Unified representation of genetic variants. Bioinformatics, 31(13):2202–2204, 02 2015.
OpenUrl CrossRef PubMed

[17] 17.↵
I. Dejanović, R. Vaderna, G. Milosavljević, and Vuković. TextX: A Python tool for Domain-Specific Languages implementation. Knowledge-Based Systems, 115:1–4, 1 2017.
OpenUrl

[18] 18.↵
Cutevariant.https://github.com/labsquare/cutevariant.

[19] 19.↵
Pytest.https://docs.pytest.org/en/stable.

[20] 20.↵
Python Package Index. https://pypi.org/.

[21] 21.↵
covid pipeline. https://github.com/dridk/Sars-CoV-2-NGS-pipeline.

[22] 22.↵
Heng Li and Richard Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25:1754–1760, 7 2009.
OpenUrl CrossRef PubMed Web of Science

[23] 23.↵
Erik Garrison and Gabor Marth. Haplotype-based variant detection from short-read sequencing. http://arxiv.org/abs/1207.3907, x7 2012.

[24] 24.↵
P. Cingolani, A. Platts, M. Coon, T. Nguyen, L. Wang, S.J. Land, X. Lu, and D.M. Ruden. A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: Snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2):80–92, 2012.
OpenUrl CrossRef PubMed Web of Science

[25] 25.↵
Bette Korber, Will M. Fischer, Sandrasegaram Gnanakaran, Hyejin Yoon, James Theiler, Werner Abfalterer, Nick Hengartner, Elena E. Giorgi, Tanmoy Bhattacharya, Brian Foley, Kathryn M. Hastie, Matthew D. Parker, David G. Partridge, Cariad M. Evans, Timothy M. Freeman, Thushan I. de Silva, Adrienne Angyal, Rebecca L. Brown, Laura Carrilero, Luke R. Green, Danielle C. Groves, Katie J. Johnson, Alexander J. Keeley, Benjamin B. Lindsey, Paul J. Parsons, Mohammad Raza, Sarah Rowland-Jones, Nikki Smith, Rachel M. Tucker, Dennis Wang, Matthew D. Wyles, Charlene McDanal, Lautaro G. Perez, Haili Tang, Alex Moon-Walker, Sean P. Whelan, Celia C. LaBranche, Erica O. Saphire, and David C. Montefiori. Tracking changes in sars-cov-2 spike: Evidence that d614g increases infectivity of the covid-19 virus. Cell, 182:812–827.e19, 8 2020.
OpenUrl CrossRef PubMed

[26] 26.↵
Xumin Ou, Zhishuang Yang, Dekang Zhu, Sai Mao, Mingshu Wang, Renyong Jia, Shun Chen, Mafeng Liu, Qiao Yang, Ying Wu, Xinxin Zhao, Shaqiu Zhang, Juan huang, Qun Gao, Yunya Liu, Ling Zhang, Maikel Peopplenbosch, Qiuwei Pan, and Anchun Cheng. Tracing two causative snps reveals sars-cov-2 transmission in north america population. bioRxiv, page 2020.05.12.092056, 5 2020.

[27] 27.↵
Snpeff usage example.https://pcingola.github.io/SnpEff/examples/.

[28] 28.↵
The Qt Company. Qt for Python: The official Python bindings for Qt. https://www.qt.io/qt-for-python.

[29] 29.↵
Variant annotation as a service.https://myvariant.info/.

[30] 30.
Silvia Salatino and Varun Ramraj. BrowseVCF: a web-based application and workflow to quickly prioritize disease-causative variants in VCF files. Briefings in bioinformatics, 18:774–779, 9 2017.
OpenUrl

[31] 31.
Steven N. Hart, Patrick Duffy, Daniel J. Quest, Asif Hossain, Mike A. Meiners, and Jean Pierre Kocher. VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Briefings in Bioinformatics, 17:346–351, 3 2016.
OpenUrl CrossRef PubMed

[32] 32.
Jianping Jiang, Jianlei Gu, Tingting Zhao, and Hui Lu. VCF-Server: A web-based visualization tool for high-throughput variant data mining and management. Molecular Genetics and Genomic Medicine, 7, 7 2019.

[33] 33.
F. Anthony San lucas, Gao Wang, Paul Scheet, and Bo Peng. Integrated annotation and analysis of genetic variants from next-generation sequencing studies with variant tools. Bioinformatics, 28:421–422, 2 2012.
OpenUrl CrossRef PubMed Web of Science

[34] 34.
The Qt Company. Cross-platform software development for embedded and desktop. https://www.qt.io/.

[35] 35.
Manuel Holtgrewe, Oliver Stolpe, Mikko Nieminen, Stefan Mundlos, Alexej Knaus, Uwe Kornak, Dominik Seelow, Lara Segebrecht, Malte Spielmann, Björn Fischer-Zirnsak, Felix Boschann, Ute Scholl, Nadja Ehmke, and Dieter Beule. VarFish: comprehensive DNA variant analysis for diagnostics and research. Nucleic acids research, 48:W162– W169, 7 2020.
OpenUrl

[36] 36.
Damian Smedley, Julius O B Jacobsen, Marten Jager, Sebastian Köhler, Manuel Holtgrewe, Max Schubach, Enrico Siragusa, Tomasz Zemojtel, Orion J Buske, Nicole L Washington, William P Bone, Melissa A Haendel, and Peter N Robinson. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nature protocols, 10:2004, 2015.
OpenUrl

[37] 37.
DNA sequencing. https://www.integragen.com/service-solutions/dna-sequencing, Oct 2020.

[38] 38.
Adrian Tan, Gon·alo R. Abecasis, and Hyun Min Kang. Unified representation of genetic variants. Bioinformatics, 31:2202–2204, 7 2015.
OpenUrl CrossRef PubMed