TY - JOUR
T1 - Modelling haplotypes with respect to reference cohort variation graphs
JF - bioRxiv
DO - 10.1101/101659
SP - 101659
AU - Rosen, Yohei
AU - Eizenga, Jordan
AU - Paten, Benedict
Y1 - 2017/01/01
UR - http://biorxiv.org/content/early/2017/01/28/101659.abstract
N2 - Current statistical models of haplotypes are limited to cohorts of haplotypes which can be represented by arrays of values at linearly ordered bi- or multiallelic loci. These methods cannot model either structural variants or overlapping or nested variants. A variation graph is a mathematical structure can encode arbitrarily complex genetic variation. We present the first model which uses a variation graph representation of haplotypes. We present an algorithm to calculate the likelihood that a haplotype arose from a population through recombinations and demonstrate time complexity linear in haplotype length and sublinear in population size. We demonstrate mathematical extensions to allow modelling of mutations. Our results provide a starting point for haplotype inference on variation graphs. This is an essential step forward for clinical genomics and genetic epidemiology since it is the first haplotype model which can represent all sorts of variation in the population.
ER -