Abstract
The functional mechanisms underlying disease association identified by Genome-wide Association Studies remain unknown for susceptibility loci located outside gene coding regions. In addition to the regulation of gene expression, synthesis of effects from multiple surrounding functional variants has been suggested as an explanation of hard-to-interpret associations.
Here, we define filter criteria based on linkage disequilibrium measures and allele frequencies which reflect expected properties of synthesizing variant sets. For eligible candidate sets we search for those haplotypes that are highly correlated with the risk alleles of a genome-wide associated variant.
We applied our methods to 1,000 Genomes reference data and confirmed Crohn’s Disease and Type 2 Diabetes susceptibility loci. Of these, a proportion of 32% allowed explanation by three-variant-haplotypes carrying at least two functional variants, as compared to a proportion of 16% for random variants (P = 2.92 · 10−6). More importantly, we detected examples of known loci whose association can fully be explained by surrounding missense variants: three missense variants from MUC19 synthesize rs11564258 (L0C105369736/MUC19, intron; Crohn’s Disease). Next, rs2797685 (PER3, intron; Crohn’s Disease) is synthesized by a 57 kilobase haplotype defined by five missense variants from PER3 and three missense variants from UTS2. Finally, the association of rs7178572 (HMG20A, intron; Type 2 Diabetes) can be explained by the synthesis of eight haplotypes, each carrying at least one missense variant in either PEAK1, TBC1D2B, CHRNA5 or ADAMTS7.
In summary, application of our new methods highlights the potential of synthesis analysis to guide functional follow-up investigation of findings from association studies.