PT - JOURNAL ARTICLE AU - Viola Ravasio AU - Edoardo Giacopuzzi TI - GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS AID - 10.1101/149146 DP - 2017 Jan 01 TA - bioRxiv PG - 149146 4099 - http://biorxiv.org/content/early/2017/06/14/149146.short 4100 - http://biorxiv.org/content/early/2017/06/14/149146.full AB - Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. Even if present platforms produce high quality sequencing data, false positives variants remain an issue and can confound subsequent analysis and result interpretation.Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which uses deep learning algorithm to dissect false and true variants in exome sequencing experiments performed with Illumina or Ion platforms. GARFIELD-NGS consists of 4 distinct models tested on NA12878 gold-standard exome variants dataset (NIST v.3.3.2): Illumina INDELs, Illumina SNPs, ION INDELs, and ION SNPs. AUC values for each variant category are 0.9267, 0.7998, 0.9464, and 0.9757, respectively. GARFIELD-NGS is robust on low coverage data down to 30X and on Illumina two-colour data, as well.Our tool outperformed previously proposed hard-filter, and calculates for each variant a score from 0 to 1, allowing application of different thresholds based on the desired level of sensitivity and specificity. GARFIELD-NGS process standard VCF file input using Perl and Java scripts and produce a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline. GARFIELD-NGS is freely available at https://github.com/gedoardo83/GARFIELD-NGS.