Abstract
Background The reported R-package provides an easy way for executing and evaluating genotype imputation studies, by providing functions for preparing input files for AlphaImpute and efficiently calculating imputation accuracies. Using the correlation between true and imputed genotypes is used here as it is directly related to the accuracy of genomic prediction using imputed genotypes. This R-package calculates both correlation and counts correct and incorrect imputed genotypes.
Results Implementing the correlation using a Fortran resulted in faster calculations and using less memory than using base R functions. Reporting the performance of an imputation should not be done only by the average correlation between true and imputed genotype. It is demonstrated that the highest average correlation is not necessarily the best correlation and that the range of obtained correlations provides a more nuanced grasp of the performance of the imputation.
Conclusions An R-package is available that provides a fast, standardized, and tested implementation for computing the correlations.
List of Abbreviations
- LD
- : Low-density (genotyping panel)
- HD
- : High-density (genotyping panel)
- SNP
- : Single nucleotide permutation