TY - JOUR T1 - Benchmarking Database Systems for Genomic Selection Implementation JF - bioRxiv DO - 10.1101/519017 SP - 519017 AU - Yaw Nti-Addae AU - Dave Matthews AU - Victor Jun Ulat AU - Raza Syed AU - Guil-hem Sempéré AU - Adrien Pétel AU - Jon Renner AU - Pierre Larmande AU - Valentin Guignon AU - Elizabeth Jones AU - Kelly Robbins Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/01/13/519017.abstract N2 - Motivation With high-throughput genotyping systems now available, it has become feasible to fully integration genotyping information into breeding programs [22]. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize them in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs one would need an efficient genotype data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems.Results We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix.Availability http://gobiinx1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browseContact yn259{at}cornell.edu ER -