Abstract
In this report we present a multimarker association tool (Flash) based on a novel algorithm to generate haplotypes from raw genotype data. It belongs to the entropy minimization class of methods [4, 7] and is composed of a two stage deterministic - heuristic part and of a optional stochastic optimization. This algorithm is able to scale up well to handle huge datasets with faster performance than the competing technologies such as BEAGLE[5] and MACH[10] while maintaining a comparable accuracy. A quality assessment of the results is carried out by comparing the switch error. Finally, the haplotypes are used to perform a haplotype-based Genome-wide Association Study (GWAS). The association results are compared with a multimarker and a single SNP association test performed with Plink [12]. Our experiments confirm that the multimarker association test can be more powerful than the single SNP one as stated in the literature. Moreover, Flash and Plink show similar results for the multimarker association test but Flash speeds up the computation time of about an order of magnitude using 5 SNP size haplotypes.