Abstract
Herein we introduce software, Numericware i to compute a matrix consisting of all pairwise identical in state (IIS) coefficients from genotypic data. Since the emergence of high throughput technology for genotyping, calculating an IIS matrix between many pairs of entities has required large computer memory and lengthy processing times. Numericware i addresses these limitations with two algorithmic methods: multithreading and forward chopping. The multithreading feature allows computational routines to concurrently run on multiple CPU processors. The forward chopping addresses memory limitations by dividing the genotypic data into appropriately sized subsets. Numericware i allows researchers who need to estimate an IIS matrix for big genotypes to use typical laptop/desktop computers. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi and TASSEL with the same small-sized data set. Numericware i measured kinship coefficients between zero and two, while the matrices from SPAGeDi and TASSEL produced different ranges of values, including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at 0.993, while SPAGeDi rarely showed correlation with Numericware i (0.088) and TASSEL (0.087). To compare the capacity with high dimensional data, we applied the three software to a simulated data set consisted of 500 entities by 1,000,000 SNPs. Numericware i spent 71 minutes using seven CPU cores on a laptop (DELL LATITUDE E6540), while SPAGeDi and TASSEL failed to start. Numericware i is freely available for Windows and Linux under CC-BY license at https://figshare.com/s/f100f33a8857131eb2db.
Footnotes
↵* bkim{at}noble.org