Abstract
Background Association studies use statistical links between genetic markers and variation in a phenotype’s value across many individuals to identify genes controlling variation in the target phenotype. However, this approach, particularly conducted on a genome-wide scale (GWAS), has limited power to identify the genes responsible for variation in traits controlled by complex genetic architectures.
Results Here we employ simulation studies utilizing real-world genotype datasets from association populations in four species with distinct minor allele frequency distributions, population structures, and patterns linkage disequilibrium to evaluate the impact of variation in both heritability and trait complexity on both conventional mixed linear model based GWAS and two new approaches specifically developed for complex traits. Mixed linear model based GWAS rapidly losses power for more complex traits. FarmCPU, a method based on multi-locus mixed linear models, provides the greatest statistical power for moderately complex traits. A Bayesian approach adopted from genomic prediction provides the greatest statistical power to identify causal genetic loci for extremely complex traits.
Conclusions Using estimates of the complexity of the genetic architecture of target traits can guide the selection of appropriate statistical methods and improve the overall accuracy and power of GWAS.
List of abbreviations
- GWAS
- : Genome-Wide Association Study
- GBS
- : Genotyping-By-Sequencing
- PCA
- : Principal Component Analysis
- LD
- : Linkage Disequilibrium
- SNP
- : Single Nucleotide Polymorphism
- MAF
- : Minor Allele Frequency
- QTN
- : Quantitative Trait Nucleotide
- GEMMA
- : Genomic Association and Prediction Integrated Tool
- GLM
- : General Linear Model
- MLM
- : Mixed Linear Model
- MLMM
- : Multi-Locus Mixed-Model
- FDR
- : False Discovery Rate
- HDRA
- : High-Density Rice Array
- HCC
- : the Holland Computing Center