PT - JOURNAL ARTICLE
AU - Rosen, Yohei
AU - Eizenga, Jordan
AU - Paten, Benedict
TI - Modelling haplotypes with respect to reference cohort variation graphs
AID - 10.1101/101659
DP - 2017 Jan 01
TA - bioRxiv
PG - 101659
4099 - http://biorxiv.org/content/early/2017/01/28/101659.short
4100 - http://biorxiv.org/content/early/2017/01/28/101659.full
AB - Current statistical models of haplotypes are limited to cohorts of haplotypes which can be represented by arrays of values at linearly ordered bi- or multiallelic loci. These methods cannot model either structural variants or overlapping or nested variants. A variation graph is a mathematical structure can encode arbitrarily complex genetic variation. We present the first model which uses a variation graph representation of haplotypes. We present an algorithm to calculate the likelihood that a haplotype arose from a population through recombinations and demonstrate time complexity linear in haplotype length and sublinear in population size. We demonstrate mathematical extensions to allow modelling of mutations. Our results provide a starting point for haplotype inference on variation graphs. This is an essential step forward for clinical genomics and genetic epidemiology since it is the first haplotype model which can represent all sorts of variation in the population.