RT Journal Article SR Electronic T1 Identifying Longevity Associated Genes by Integrating Gene Expression and Curated Annotations JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.01.31.929232 DO 10.1101/2020.01.31.929232 A1 F. William Townes A1 Jeffrey W. Miller YR 2020 UL http://biorxiv.org/content/early/2020/02/02/2020.01.31.929232.abstract AB Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (C. elegans and S. cerevisiae) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro-and anti-longevity genes that are not currently in the GenAge database.