PT - JOURNAL ARTICLE AU - Peng Zhang AU - Bertrand Boisson AU - Jean-Laurent Casanova AU - Laurent Abel AU - Yuval Itan TI - SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data AID - 10.1101/408625 DP - 2018 Jan 01 TA - bioRxiv PG - 408625 4099 - http://biorxiv.org/content/early/2018/09/05/408625.short 4100 - http://biorxiv.org/content/early/2018/09/05/408625.full AB - Human whole-genome sequencing generally reveals about 4,000,000 genetic variants, including 20,000 coding variants, in each individual studied. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF files as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence alignment, phylogenetic analysis, and structure prediction. However, there is currently no existing online tool for extracting DNA or protein sequences for genomic variants from VCF files with user-defined parameters in a user-friendly, efficient, and standardized manner. We developed the SeqTailor webserver to bridge this gap. It can be used for the rapid extraction of (1) DNA sequences around genetic variants, with customizable window sizes, from the hg19 or hg38 human reference genomes; and (2) protein sequences encoded by the DNA sequences around genetic variants, with built-in SnpEff annotation and customizable window sizes, from human canonical transcripts. The SeqTailor webserver streamlines the sequence extraction process, and accelerates the analysis of genetic variant data with software requiring DNA or protein sequences. SeqTailor will facilitate the study of human genomic variation, by increasing the feasibility of sequence-based analysis and prediction. The SeqTailor webserver is freely available from http://shiva.rockefeller.edu/SeqTailor/.