Abstract
Major complication in understanding disease biology from GWAS arises from inability to identify a complete set of causal genes. Integration of multiple omics data sources could provide an important functional link between associated variants and candidate genes. Machine-learning could take advantage of this variety of data and provide a solution for prioritization of disease genes. Yet, classical positive-negative classifiers impose strong limitations on the gene prioritization procedure, such as lack of reliable non-causal genes for training.
Here, we developed a novel gene prioritization tool - Gene Prioritizer (GPrior). It is an ensemble of five positive-unlabeled bagging classifiers, that treat all genes of unknown relevance as an unlabeled set. GPrior selects an optimal combination of algorithms to tune the model for each specific phenotype.
Altogether, GPrior fills an important niche of methods for GWAS data post-processing, significantly improving the ability to pinpoint disease genes compared to existing solutions.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Authors declare no conflict of interests.