RT Journal Article SR Electronic T1 Gene Age Gap Estimate (GAGE) for major depressive disorder: a penalized biological age model using gene expression JF bioRxiv FD Cold Spring Harbor Laboratory SP 2024.09.03.610913 DO 10.1101/2024.09.03.610913 A1 Li, Yijie (Jamie) A1 Kuplicki, Rayus A1 Ford, Bart N. A1 Kresock, Elizabeth A1 Figueroa-Hall, Leandre A1 Savitz, Jonathan A1 McKinney, Brett A. YR 2024 UL http://biorxiv.org/content/early/2024/09/04/2024.09.03.610913.abstract AB Recent associations between Major Depressive Disorder (MDD) and measures of premature aging suggest accelerated biological aging as a potential biomarker for MDD susceptibility or MDD as a risk factor for age-related diseases. Statistical and machine learning regression models of biological age have been trained on various sources of high dimensional data to predict chronological age. Residuals or “gaps” between the predicted biological age and chronological age have been used for statistical inference, such as testing whether an increased age gap is associated with a given disease state. Recently, a gene expression-based model of biological age showed a higher age gap for individuals with MDD compared to healthy controls (HC). In the current study, we propose a machine learning approach that simplifies gene selection by using a least absolute shrinkage and selection operator (LASSO) penalty to construct an expression-based Gene Age Gap Estimate (GAGE) model. We construct the LASSO-GAGE (L-GAGE) model in an RNA-Seq study of 78 unmedicated individuals with MDD and 79 HC and then test for accelerated biological aging in MDD. When testing L-GAGE association with MDD, we account for factors such as sex and chronological age to mitigate regression to the mean effects. The L-GAGE shows higher biological aging in MDD subjects than HC, but the elevation is not statistically significant. However, when we dichotomize chronological age, the interaction between MDD status and age is significant in L-GAGE model. This effect remains statistically significant even after adjusting for chronological age and sex. We find cytomegalovirus (CMV) serostatus is associated with elevated L-GAGE. We also investigate feature selection methods Random Forest and nearest neighbor projected distance regression (NPDR) to characterize age related genes, and we find functional enrichment of infectious disease and SARS-COV pathways.Competing Interest StatementThe authors have declared no competing interest.