PT - JOURNAL ARTICLE AU - Antti Larjo AU - Robert Eveleigh AU - Elina Kilpeläinen AU - Tony Kwan AU - Tomi Pastinen AU - Satu Koskela AU - Jukka Partanen TI - Accuracy of programs for the determination of HLA alleles from NGS data AID - 10.1101/183038 DP - 2017 Jan 01 TA - bioRxiv PG - 183038 4099 - http://biorxiv.org/content/early/2017/09/01/183038.short 4100 - http://biorxiv.org/content/early/2017/09/01/183038.full AB - The human leukocyte antigen (HLA) genes code for proteins that play a central role in the function of the immune system by presenting peptide antigens to T cells. As HLA genes show extremely high genetic polymorphism, HLA typing on the allele level is demanding and is based on DNA sequencing. Determination of HLA alleles is warranted as many HLA alleles are major genetic factors that confer susceptibility to autoimmune diseases and is important for the matching of HLA alleles in transplantation. Here, we compared the accuracy of several published HLA-typing algorithms that are based on next generation sequencing (NGS) data. As genome screens are becoming increasingly routine in research, we wanted to test how well HLA alleles can be deduced from genome screens not designed for HLA typing. The accuracies were assessed using datasets consisting of NGS data produced using the ImmunoSEQ platform, including the full 4 Mbp HLA segment, from 94 stem cell transplantation patients and exome sequences from the 1000 Genomes collection. When used with the default settings none of the methods gave perfect results for all the genes and samples. However, we found that ensemble prediction of the results or modifications of the settings could be used to improve accuracy. Most of the algorithms did not perform very well for the exome-only data. The results indicate that the use of these algorithms for accurate HLA allele determination based on NGS data is not straightforward.