PT - JOURNAL ARTICLE AU - Remi Torracinta AU - Fabien Campagne TI - Training Genotype Callers with Neural Networks AID - 10.1101/097469 DP - 2016 Jan 01 TA - bioRxiv PG - 097469 4099 - http://biorxiv.org/content/early/2016/12/30/097469.short 4100 - http://biorxiv.org/content/early/2016/12/30/097469.full AB - We present an open source software toolkit for training deep learning models to call genotypes in high-throughput sequencing data. The software supports SAM, BAM, CRAM and Goby alignments and the training of models for a variety of experimental assays and analysis protocols. We evaluate this software in the Illumina Platinum whole genome datasets and find that a deep learning model trained on 80% of the genome achieves a 0.986% accuracy on variants (genotype concordance) when trained with 10% of the data from a genome. The software is distributed at https://github.com/CampagneLaboratory/variationanalysis. The software makes it possible to train genotype calling models on consumer hardware with CPUs or GPU(s). It will enable individual investigators and small laboratories to train and evaluate their own models and to make open source contributions. We welcome contributions to extend this early prototype or evaluate its performance on other gold standard datasets.