RT Journal Article SR Electronic T1 Predicting Dog Phenotypes from Genotypes JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.04.13.488108 DO 10.1101/2022.04.13.488108 A1 Emily R. Bartusiak A1 Míriam Barrabés A1 Aigerim Rymbekova A1 Julia Gimbernat-Mayol A1 Cayetana López A1 Lorenzo Barberis A1 Daniel Mas Montserrat A1 Xavier Giró-i-Nieto A1 Alexander G. Ioannidis YR 2022 UL http://biorxiv.org/content/early/2022/04/14/2022.04.13.488108.1.abstract AB We analyze dog genotypes (i.e., positions of dog DNA sequences that often vary between different dogs) in order to predict the corresponding phenotypes (i.e., unique observed characteristics). More specifically, given chromosome data from a dog, we aim to predict the breed, height, and weight. We explore a variety of linear and non-linear classification and regression techniques to accomplish these three tasks. We also investigate the use of a neural network (both in linear and non-linear modes) for breed classification and compare the performance to traditional statistical methods. We show that linear methods generally outperform or match the performance of non-linear methods for breed classification. However, we show that the reverse is true for height and weight regression. Finally, we evaluate the results of all of these methods based on the number of input features used in the analysis. We conduct experiments using different fractions of the full genomic sequences, resulting in input sequences ranging from 20 SNPs to ∼200k SNPs. In doing so, we explore the impact of using a very limited number of SNPs for prediction. Our experiments demonstrate that these phenotypes in dogs can be predicted with as few as 0.5% of randomly selected SNPs (i.e., 992 SNPs) and that dog breeds can be classified with 50% balanced accuracy with as few as 0.02% SNPs (i.e., 40 SNPs).Competing Interest StatementThe authors have declared no competing interest.