Abstract
In genetic epidemiology, rare variant case-control studies aim to investigate the association between rare genetic variants and human diseases. Rare genetic variants lead to sparse covariates that are predominately zeros and this sparseness leads to estimators of log-OR parameters that are biased away from their null value of zero. Different penalized-likelihood methods have been developed to mitigate this sparse-data bias for case-control studies. In this research article, we study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. We propose a maximum marginal likelihood method for estimating m, with the marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm and (ii) a combination of a Laplace approximation and derivative-free optimization of the marginal likelihood. We evaluate the statistical properties of the estimator through simulation studies and apply the methods to the analysis of genetic data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI).
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Email address updated