RT Journal Article SR Electronic T1 GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS JF bioRxiv FD Cold Spring Harbor Laboratory SP 149146 DO 10.1101/149146 A1 Viola Ravasio A1 Edoardo Giacopuzzi YR 2017 UL http://biorxiv.org/content/early/2017/06/14/149146.abstract AB Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. Even if present platforms produce high quality sequencing data, false positives variants remain an issue and can confound subsequent analysis and result interpretation.Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which uses deep learning algorithm to dissect false and true variants in exome sequencing experiments performed with Illumina or Ion platforms. GARFIELD-NGS consists of 4 distinct models tested on NA12878 gold-standard exome variants dataset (NIST v.3.3.2): Illumina INDELs, Illumina SNPs, ION INDELs, and ION SNPs. AUC values for each variant category are 0.9267, 0.7998, 0.9464, and 0.9757, respectively. GARFIELD-NGS is robust on low coverage data down to 30X and on Illumina two-colour data, as well.Our tool outperformed previously proposed hard-filter, and calculates for each variant a score from 0 to 1, allowing application of different thresholds based on the desired level of sensitivity and specificity. GARFIELD-NGS process standard VCF file input using Perl and Java scripts and produce a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline. GARFIELD-NGS is freely available at https://github.com/gedoardo83/GARFIELD-NGS.