RT Journal Article
SR Electronic
T1 Hierarchical modeling of haplotype effects based on a phylogeny
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2020.01.31.928390
DO 10.1101/2020.01.31.928390
A1 Selle, Maria Lie
A1 Steinsland, Ingelin
A1 Lindgren, Finn
A1 Brajkovic, Vladimir
A1 Cubric-Curik, Vlatka
A1 Gorjanc, Gregor
YR 2020
UL http://biorxiv.org/content/early/2020/02/01/2020.01.31.928390.abstract
AB This paper introduces a hierarchical model to estimate haplotype effects based on phylogenetic relationships between haplotypes and their association with observed phenotypes. In a population there are usually many, but not all possible, distinct haplotypes and few observations per haplotype. Further, haplotype frequencies tend to vary substantially - few haplotypes have high frequency and many haplotypes have low frequency. Such data structure challenge estimation of haplotype effects. However, haplotypes often differ only due to few mutations and leveraging these similarities can improve the estimation of haplotype effects. There is extensive literature on this topic. Here we build on these observations and develop an autoregressive model of order one that hierarchically models haplotype effects by leveraging phylogenetic relationships between the haplotypes described with a directed acyclic graph. The phylogenetic relationships can be either in a form of a tree or a network and we therefore refer to the model as the haplotype network model. The haplotype network model can be included as a component in a phenotype model to estimate associations between haplotypes and phenotypes. The key contribution of this work is that by leveraging the haplotype network structure we obtain a sparse model and by using hierarchical autoregression the flow of information between similar haplotypes is estimated from the data. We show with a simulation study that the hierarchical model can improve estimates of haplotype effects compared to an independent haplotype model, especially when there are few observations for a specific haplotype. We also compared it to a mutation model and observed comparable performance, though the haplotype model has the potential to capture background specific effects. We demonstrate the model with a case study of modeling the effect of mitochondrial haplotypes on milk yield in cattle.