1. Abstract
Aging is characterized as a progressive decline in fitness that ultimately results in death. We set out to build both tissue-specific and multi-tissue transcriptomic clocks to make global tissue age predictions in individuals from GTEx. Existing work in the field primarily uses epigenetic clocks as predictors of age, but these models have known issues and are significantly less interpretable than their transcriptomic counterparts. Due to their transcriptomic nature, we can use these models to directly infer mechanisms of aging from their features. Linear regression remains the current standard analysis technique, but we improved upon its baseline performance with modern techniques, exploring both XGBoost and MLPs. We also experimented with using deconvolved cell data for predictions, which account for cellular composition and reduce signal distortion from rare cell types. Since it is known that the heterogeneity of cell types in particular tissues can lead to noise in these models, we proposed using deconvolution as a potential remedy for this problem.
Our results found that MLPs are not well suited for the task due to a lack of training data, but the use of XGBoost is effective at improving the baseline performance of predictions of existing tissue-specific clocks. These models allowed us to directly compute genes most important to age prediction in our models, and we showed that multiple genes found have been independently identified elsewhere to show evidence of correlation with age. Given the small size of our datasets, we were unable to make conclusive determinations about multi-tissue predictors, but preliminary results suggest that the technique shows promise and is worthy of future investigation. Likewise, given our limited deconvolved cell data, we did not currently observe strong results, but we again note that this is an area in need of further investigation.
By improving upon the performance of existing models, we demonstrated that a novel machine learning technique, XGBoost, can be an effective technique to further our understanding of aging mechanisms by extraction of the most relevant genes found in those models. This is significant because the genetic causes of aging are still not fully understood, and research in the field of aging is lacking in comparison to other domains. As the problem of identifying tissues that age at different rates is of specific interest, our tissue-specific models potentially have other applications in this domain, including informing pathologies in tissues that are found to be aging faster, or analyzing how people with similar ages can have vastly different tissue ages. An extended technical presentation of this work can be found here, and a highly simplified non-technical overview presentation can be found here.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
mdlu{at}mit.edu, jessjs{at}mit.edu