Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Supervised machine learning with feature selection for prioritization of targets related to time-based cellular dysfunction in aging

View ORCID ProfileNina Truter, View ORCID ProfileZuné Jansen van Rensburg, Radouane Oudrhiri, Raminderpal Singh, View ORCID ProfileCarla Louw
doi: https://doi.org/10.1101/2022.06.24.497511
Nina Truter
1Incubate.bio
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nina Truter
Zuné Jansen van Rensburg
1Incubate.bio
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zuné Jansen van Rensburg
Radouane Oudrhiri
2Eagle genomics
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Raminderpal Singh
1Incubate.bio
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carla Louw
1Incubate.bio
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Carla Louw
  • For correspondence: carla@incubate.bio
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background Global life expectancy has been increasing without a corresponding increase in health span and with greater risk for aging-associated diseases such as Alzheimer’s disease (AD). An urgent need to delay the onset of aging-associated diseases has arisen and a dramatic increase in the number of potential molecular targets has led to the challenge of prioritizing targets to promote successful aging. Here, we developed a pipeline to prioritize aging-related genes which integrates the plethora of publicly available genomic, transcriptomic, proteomic and morphological data of C. elegans by applying a supervised machine learning approach. Additionally, a unique biological post-processing analysis of the computational output was performed to better reveal the prioritized gene’s function within the context of pathways and processes involved in aging across the lifespan of C. elegans.

Results Four known aging-related genes — daf-2, involved in insulin signaling; let-363 and rsks-1, involved in mTOR signaling; age-1, involved in PI3 kinase signaling — were present in the top 10% of 4380 ranked genes related to different markers of cellular dysfunction, validating the computational output. Further, our ranked output showed that 91% of the top 438 ranked genes consisted of known genes on GenAge, while the remaining genes had thus far not yet been associated with aging-related processes.

Conclusion These ranked genes can be translated to known human orthologs potentially uncovering previously unknown information about the basic aging processes in humans. These genes (and their downstream pathways) could also serve as targets against aging-related diseases, such as AD.

Introduction

The global life expectancy has increased by more than 6 years in the last two decades, without a corresponding increase in health span (life without major disease or disability) (1). With an expected near doubling in the number of people over the age of 60 years globally by 2050, this poses a major socioeconomic burden and an urgent need exists to delay the onset of age-related diseases such as Alzheimer’s disease (AD) and cardiovascular disease (1).

A major drive for the implementation of interventions that increase health span and delay senescent span is therefore underway (Figure 1; modified from (2) for C. elegans), with the aim to promote successful and to limit unsuccessful aging. Successful aging can be defined as the decline in cellular, tissue, and organ function over an organism’s lifespan without the onset of pathology and with the presence of high physical, cognitive, and social function (adapted from (3) using (4)). An improved understanding of the cellular and molecular mechanisms and the rate of their deterioration is needed to develop therapies for successful aging. This requirement has contributed to the definition of the nine hallmarks of aging, which include intracellular processes that manifest during normal aging (5). These include: genomic instability, telomere attribution, epigenetic alterations, loss of proteostasis, deregulated nutrient-sensing, mitochondrial dysfunction, cellular senescence, stem cell exhaustion, and altered intercellular communication (5).

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

Markers of dysfunction observed over the lifespan of C. elegans related to the aging process, specific to this study. Aging-related functional decline includes morphological aging features which degrade over C. elegans’s lifespan and other indicators of dysfunction which include molecular markers of intestinal aging that accumulate over C. elegans’s lifespan. The black line indicates the normal aging trajectory in C. elegans, while the green dotted line indicates a successful aging trajectory in C. elegans.

Since the introduction of the above hallmarks of aging, potential molecules which could modulate the activity of mechanistic pathways involved in aging have been reported. The GenAge database summarizes targets associated with human longevity and currently describes 307 genes (6). These targets include genes from known age-associated pathways, such as the sirtuin, insulin/IGF-1 signaling, AMP-activated protein kinase (AMPK), and mTOR pathways (7,8). Although some success has been achieved by targeting components in these pathways with promising results in clinical trials, such as inhibiting mTOR by rapamycin (9), the relative contribution of each hallmark and consequently its involved molecular mechanisms driving and contributing to the overall aging process remains unclear (10). This makes the prioritization of targets for therapeutic intervention challenging. Moreover, longitudinal studies of human aging as well as the development of pharmacotherapies are time consuming and costly (11). Given these challenges, a particular need arises to better prioritize targets based on their ability to modulate the aging process, without solely relying on in vitro and in vivo experiments (11,12).

To address these challenges, increased utilization of database searches, biological assays, and machine learning techniques have been implemented to better identify targets of interest (12–15). These techniques have been used to advance research in the field of neurodegeneration, with a specific focus on AD. Targets in AD have been identified through a multi-omics approach focusing on protein networks and pathways during disease progression (15). Semi-supervised machine learning has also been used to identify an MRI biomarker for mild cognitive impairment (MCI), effectively predicting the clinical conversion from MCI to AD up to three years prior to disease onset (16). MRI imaging data has also been used in an ML approach to aid in disease diagnosis (17). Given the complexity of the biological systems and the wealth of available data, the use of machine learning seems invaluable to accurately and efficiently identify targets for ND.

The careful consideration of targets identified with machine learning techniques in the context of biological processes is necessary to assess the feasibility and actionability of the identified targets. One such approach is taken by Pun et al., (2022) by using the PandaOmics platform to investigate targets associated with aging and age-associated diseases from human cross-sectional data based on the nine hallmarks of aging (12). Although the use of human data to identify targets is highly relevant for drug discovery, obtaining rich longitudinal data to understand disease progression across an individual’s lifespan has remained extremely challenging. Current approaches are therefore typically reliant on cross-sectional data, further limiting the deduction of cellular dynamics in aging. Short-lived simple animal models can, however, aid in circumventing this limitation as they are ideal for studying the intracellular evolutionarily conserved processes of aging (18,19).

C. elegans is one of the most popular simple organisms used to investigate aging (8). Its short lifespan enables the in-depth characterization of its life stages to generate time-based data on different levels of Omics (20,21). Importantly, several of the intracellular hallmarks of aging are conserved across species. Despite C. elegans lacking some of the evolutionary advanced aging hallmarks present in other multicellular organisms due to the postmitotic nature of many of its cells, the core hallmarks of aging can be investigated in a multicellular organism with less complexity than found in mammalian species (18). As the hallmarks of aging are interconnected, many of the non-conserved, more advanced hallmarks exert their eventual effects through conserved pathways (18). Therefore, understanding the core mechanisms of aging in isolation can inform strategies that aim to improve their function, in order to promote successful aging (Figure 1). Altered insulin signaling by knockout of daf-2, for example, resulted in a reduced accumulation in aging pigment (lipofuscin) and improved locomotor capacity, which is indicative of successful aging (22).

A further benefit of working with C. elegans is the vast range of available data including time-based transcriptomic, proteomic, phosphorylation status, molecular data and detailed morphological data, allowing to describe the relationships between different Omics layers and their possible contributions to the aging process (20,21,23,24). However, prioritization of aging-related targets remains a challenge in this organism due to the many existing targets and their association with longevity and not necessarily the aging process.

In this study, supervised machine learning and recursive feature elimination techniques were used to construct a pipeline to prioritize genes associated with aging and thereby identify potential targets. To achieve this, several time-based biological layers, including markers of cellular and macroscopic function were used to characterize the aging process in C. elegans. This included a broad range of categorical and time-dependent phenotypic and genotypic data; i.e. RNA and protein levels, phosphorylation status, and the biological process description of each gene. A unique post-processing workflow was established to translate the prioritized genes to pathways and cellular processes in human aging, to support their potential druggability for aging-related diseases.

Methods

A computational pipeline (workflow) was constructed which includes supervised machine learning on time-based data of C. elegans genes (feature variable) and aging-related genes (response variable) (Figure 2). An example of the computational output with post-processing using biological analysis is presented below.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

Workflow demonstrating the synergy between biological analysis and supervised machine learning using time-based C. elegans data to prioritize aging-related targets using: i. Biologically-informed curation of data from C. elegans studies and databases, used to select genes for analysis. ii. Processing of aging-related and proxy genes (response variables) and 4380 general genes (feature variables) to produce a ranking of aging-related targets. iii. Post-processing of aging-related genes to include supporting information for their use as aging-related targets through biological analysis to produce a prioritized aging-related target.

Data collection and formatting

Data were collected from open-source databases and from literature. The data collected from different studies (Table 1) were combined with the Wormbase Gene IDs or alternatively the gene names. Several layers of Omics were used to understand the broader context in which the targets could exert effects on the aging process (Figure 3). All computational analyses were conducted in Python 3.8 (25). MinMaxScaler from the Scikit-learn package was used to standardize the combined data (26). Time course data were used in the original format while the categorical data were converted to binomial data by One-hot encoding (26). Pandas and Numpy packages were used for data formatting (27,28). Genes with missing data fields were excluded from the dataset. A total number of 4380 active C. elegans genes were further included in computational analyses (Figure 2).

View this table:
  • View inline
  • View popup
Table 1:

Datasets used for supervised machine learning model training and their sources. “Categorical” data refers to quantitative data converted to binomial data and “numerical” data refers to time course datasets.

Figure 3:
  • Download figure
  • Open in new tab
Figure 3:

Visual representation of the different Omics layers used in the presented pipeline and the data used to characterize them.

Compiling a concise list of genes associated with processes of aging

A list of genes associated specifically with aging processes was required as response variables for training of a supervised machine learning model to highlight targets within aging processes (Figure 2). First, a complete dataset of the known life-extending genes (n=887) was utilized from the public database GenAge (6), https://genomics.senescence.info/genes/stats.php, accessed Feb 2022. Only 429 genes had the required data (categorical information, gene expression, protein levels and phosphorylation status) to be used within our computational model. However, due to the discrepancy between aging and longevity (36) the life-extending (longevity) genes were manually filtered to compile a list of genes consisting only of genes associated with known aging processes. Genes were chosen if their function listed on UniProt or Wormbase formed part of a known intracellular process associated with the hallmarks of aging (5,18,35). These processes of aging included: mitochondrial dysfunction (mitochondrial integrity and biogenesis and reactive oxygen species), sequence information transfer (a combination of the hallmarks genome instability and epigenetic alterations), immune signaling (the intracellular component of inflammation), loss of proteostasis and regulatory pathways (including deregulated nutrient sensing) (Figure 3). Once completed, the list contained 378 genes that function within a known aging process (Supplementary Table 1). These genes were used as response variables during model training, either as proxy genes associated with dysfunction markers or as non-specific aging associated genes.

Dysfunction markers representative of aging

Various markers of aging indicative of organism dysfunction in C. elegans over time were used to understand how genes could contribute to the aging process, not only on a genetic level but also on a physiological/functional basis. Three types of dysfunction markers were selected across omics layers: either macroscopic morphological markers (“morphological aging features”), molecular markers (“intestinal aging”) or genetic markers (“genes associated with aging processes”) (Figure 3).

Morphological features

Morphological features are used to quantify the macroscopic health of C. elegans over time as it ages (32). Furthermore, changes in morphological features have been shown to be related to cellular function, for example, changes in dopaminergic neuronal function in knock-out studies are evident in locomotion (37). Both locomotion and pharyngeal pumping are related to lifespan and possibly to each other, with both showing a reduction in speed over time (38). Pharyngeal pumping and three features related to locomotion that decline significantly during aging (speed of tail tip, width of tail base and tail tip angular velocity relative to tail base) were included as dysfunction markers (32,39).

Markers of intestinal aging

The C. elegans intestine is the location of many stress responses that change during aging (40). Several known C. elegans aging markers are associated with the intestine, including autofluorescence and aging pigment (22,41). Furthermore, the accumulation of E. coli in the intestinal lumen is related to the lifespan of C. elegans (42). An increase in autofluorescence, aging pigment or E. coli accumulation are indicative of general cellular dysfunction during aging, including a decline in immune signaling and redox balance dysfunction (22,41,42).

Non-specific genes associated with the aging process

Genes from the list of 378 aging-related genes that did not form part of the morphological or intestinal aging markers were used as “genes associated with aging processes” markers (n=329).

Selection of proxy genes for dysfunction markers

Although all 378 aging-associated genes were used for model training, we wished to interpret the computational output within the context of dysfunction markers associated with aging. Therefore, seven proxy genes were selected for the markers of dysfunction based on computational analysis. Next, in order to identify proxy genes computationally, we assumed that a change in a dysfunction marker over time will be highly correlated (positively or negatively) with the change in expression over time of the genes associated with the respective marker. Based on this assumption we selected the genes with the lowest (closest to zero) Root Mean Square Error (RMSE) between the change in gene expression and dysfunction marker over time. Only seven proxy genes could be found with RMSE values < 0.3 for some dysfunction markers, therefore seven genes were selected for each marker to ensure equal contributions towards the ranking process by each dysfunction marker. Gaussian Process Regression was used to infer gaps in temporal data (21,22,32,39,41,42) and Linear regression was used to calculate the RMSE values.

Importantly, to ensure that the genetic association with the morphological aging features were biologically sensible, the genes selected to represent the feature had to be involved in the respective process or expressed within the corresponding tissue (based on Wormbase information and (43). Specifically, genes with a high RMSE value with involvement in movement and/or with expression in the tail were used for the features “speed of tail tip”, “width of tail base”, “tail tip angular velocity relative to tail base”. Similarly, genes expressed in the pharynx were used for pharyngeal pumping (Wormbase information and (43).

Gene ranking with recursive feature elimination

All genes were ranked according to their similarity to the dysfunction marker genes. This similarity was determined by Recursive Feature Elimination (RFE), a feature selection algorithm. During RFE a machine learning model was iteratively retrained while the weakest predictive “feature” (in this case gene) was removed with each iteration. This process aids in the training of cleaner, more effective models, eliminating unnecessary features. RFE outputs a list of all features (genes), ranked from most relevant to least relevant for the model’s predictive capability. In this study, RFE was used to eliminate genes that contribute least to the processes of aging-associated dysfunction, during model training. Instead of using the traditional elimination of unnecessary features, we eliminated unnecessary genes, generating an output of genes ranked from most similar to the dysfunctions markers to least similar. A Support Vector Machine and Generalised Linear models were used within the RFE wrapper algorithm (26).

Validation of genes associated with aging processes ranking

Genes from the GenAge database known to be associated with processes of aging were ranked, along with all other genes, by their predicted association with aging processes. The model was validated by its ability to highly rank genes known to be associated with aging. This validation is visualized in Figure 4 - the predicted ranking of these genes are distributed across 10 quantiles. We defined accurate predictive capability of the model if >75% of known genes associated with aging processes were ranked in the top three quantiles (30%) given the ranking output of the analyses. To further validate the computational ranking output, the top three genes from GenAge (modulating their expression results in the largest life extension) were selected: daf-2 (insulin receptor-like gene), age-1 (PI3 kinase signaling), and let-363 (mTOR signaling). The positions of these genes within our ranked output were determined.

Figure 4:
  • Download figure
  • Open in new tab
Figure 4:

Ranking of known genes associated with aging processes (6,34) by the model as validation of the ranked output. The number of known aging-related genes placed in each of the 10 quantiles of the ranked output is shown by different dysfunction marker types: (aging-related genes (green), intestinal aging (orange), and morphological aging (light green) feature) and across all three dysfunction marker types (overall; red). The rankings of the three known aging-related genes, daf-2 (black), let-363 (gray), and age-1 (purple), as well as the selected aging-related gene, rsks-1 (blue), are shown by each dysfunction marker type and overall.

Rsks-1 as example aging target

The computational output includes an overall ranking of the gene’s strength of association with known genes from aging processes. Furthermore, the similarity of the ranked genes to specific dysfunction marker genes is included. The output also includes correlation analyses between the chosen gene’s mRNA expression, protein levels, phosphorylation status, and categorical information, and those of all associated dysfunction markers. This can potentially associate the ranked gene more strongly with a dysfunctional process based on the strength of the similarity. In order to validate the computational output, a known gene associated with the aging process of the mTOR pathway, rsks-1, was selected. This selection was made since rsks-1 was ranked in the top 100 genes of the computational output, had been reported in relatively few publications (16 publications on Pubmed, https://pubmed.ncbi.nlm.nih.gov/, in the last 5 years), and its human ortholog has existing drugs targeting its activity (S6K1 inhibitors) (44,45).

Discovery of known and predicted associations between genes (matrix)

Possible associations between the chosen gene (rsks-1) and dysfunction marker genes with the highest similarity were explored. This was done by placing the dysfunction marker genes and rsks-1 into their designated aging processes used for our list of aging-related genes (Supplementary Table 1). The aging processes attributed to the genes were chosen based on its Uniprot and Wormbase biological description (35). The known association between genes (Pubmed) or predicted associations based on computational output were visualized in a diagram for further interpretation.

In summary, a supervised machine learning model was trained to rank genes based on their similarity to known aging-related genes from GenAge. These known aging-related genes are grouped into dysfunction marker categories: morphological features, intestinal aging, and non-specific aging processes. Genes can be ranked by their similarity to all known aging genes or by the sub-groups of genes in the dysfunction marker categories.

Results

Assigning genes to dysfunction marker categories

Dysfunction markers of aging were selected in three categories based on their biological relevance to aging. For the categories (i) Intestinal aging and (ii) morphological aging feature, their respective seven proxy genes are listed in Table 2. All other genes formed part of the (iii) genes associated with aging processes category.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

Dysfunction marker types and their respective markers representative of aging. Dysfunction marker proxy genes associated with the eight dysfunction markers were selected from literature and regression analyses [Ref: Methodology_Selection of proxy genes for dysfunction markers]. The RMSE values of the regression analyses are included in brackets.

Ranking of C. elegans genes based on dysfunction marker proxy genes

Genes were ranked based on their similarity to aging-related dysfunction markers. The ranking of genes by their potential involvement in the process of aging (i. e. their potential importance in determining the appearance of the dysfunction markers) is shown in Supplementary Information (Supplementary Table 2).

Validation of ranked output using known aging genes

The accuracy of the ranking of genes by their potential involvement in the aging process is validated by the model’s ranking of known aging-related genes. The ranking of known genes associated with aging processes from the GenAge database (6,34) is shown in Table 3. The majority of the known aging-related genes were ranked in the first three quantiles (n=1314) of the ranking output (Figure 4).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3:

Ranking of known aging-related genes (daf-2, let-363, and age-1), and the selected aging-related gene, rsks-1, by the model out of 4380 C. elegans genes. The ranks of the genes per dysfunction marker type are shown, as well as the ranking percentiles in brackets.

Three known aging-related genes (daf-2, insulin signaling; let-363, mTOR signaling; age-1, PI3 kinase signaling) and one general aging-related gene (rsks-1) were selected for further validation of the output. The rankings of these four genes are also shown (Figure 4) indicating their positions based on each dysfunction marker type (aging-related genes, intestinal aging, and morphological aging feature). All four genes were ranked in Q1 based on the overall ranking (mean of all rankings by the individual dysfunction marker genes) as well as the ranking by aging-related genes, morphological aging features, and intestinal aging (with a ranking of <400). The ranking of known aging-related genes out of the 4380 genes that were analyzed are shown in Table 3.

Figure 5 shows that the top quantile of ranked genes (Q1 in Figure 4) consists of genes that are listed on GenAge, known to be aging-related genes (91% of the Q1 genes) (6,34). The remaining 9% of the Q1 genes are, to the best of our knowledge, not yet associated with aging-related processes. Based on the gene descriptions of Uniprot, these genes are involved in a variety of functional processes that could potentially affect aging (35). Some of these processes include iron metabolism, protein degradation and amino acid metabolism.

Figure 5:
  • Download figure
  • Open in new tab
Figure 5:

Summary of the top quantile of the ranked genes (Q1 in Figure 4). 91% (452) of the top ranked genes are known aging-related genes (6,34), while 9% (42) are not previously known to be associated with aging. The functional categories, based on Uniprot’s gene descriptions (35), of the potential new aging-related genes are listed.

rsks-1 is similar to several dysfunction marker proxy genes

The gene rsks-1 shows an overall high ranking (top 2%) and is highly ranked with the three aging dysfunction types (morphological aging feature marker rank: 6%; intestinal aging marker rank: 6%; Aging-related gene rank: 1%) (Table 3). This high ranking is determined by the large degree of similarity of rsks-1 with several dysfunction marker genes across the different types of aging dysfunction markers (see Figure 6 for similarity with different types of aging dysfunction markers and Figure 7 for individual gene similarity). The aging-related genes showed the highest ranking score for rsks-1 (Figure 6), based on the phosphorylation data, biotype categorical data, gene ontology information, as well as the expression patterns in tissue types and different life stages.

Figure 6:
  • Download figure
  • Open in new tab
Figure 6:

Correlation between rsks-1 and all dysfunction marker genes (rows) per datafield (columns 2 - 11), and overall similarity (first column) contribution to rsks-1’s ranking by each dysfunction marker gene.

Figure 7:
  • Download figure
  • Open in new tab
Figure 7:

rsks-1’s known and predicted associations with the dysfunction marker genes based on the computationally determined similarity of gene expression (E), phosphorylation (P) and shared function from gene ontology (G). Black lines denote known associations based on literature, whereas red lines denote newly predicted associations.

Dysfunction marker genes linked to rsks-1

In Figure 7, the known and predicted interactions between rsks-1 and similar dysfunction marker genes in their associated aging processes and systems based on UniProt keywords are displayed (35). Based on computational output, rsks-1 was highly similar to genes associated with mitochondrial function, redox balance, proteostasis and regulatory pathway systems. Based on existing literature and database searches (e.g. Pubmed, Wormbase, UniProt, https://www.uniprot.org/, and String, https://string-db.org/), rsks-1 and the following genes have a known interaction or are present in interacting pathways: unc-51, hsf-1, atg-9, daf-7, daf-8, glp-1, hlh-30, akt-1, daf-2, daf-15 (46–52). rsks-1 is predicted by the computational output to be highly associated with the following genes: ced-3, unc-32, eat-6, hcf-1, hmg-4, egl-27, age-1, itr-1, let-765, arr-1, with no current supporting literature of a direct association.

Discussion

As our understanding of the aging process improves, the number of potential aging-related targets increases. This leads to the challenge of prioritizing targets of the aging process to promote successful aging and prevent the development of age-related diseases. Moreover, several functional parameters and the overall context of molecular targets in aging, is poorly reflected in current approaches to identify novel targets of interest. In this study, we propose a pipeline that integrates the plethora of publicly available genomic, transcriptomic, proteomic and morphological data of C. elegans with a supervised machine learning approach to prioritize aging-related genes. These ranked genes can be translated to known human orthologs potentially uncovering previously unknown information on the basic aging processes in humans. These genes could also serve as targets against aging-related diseases, such as AD. To test the capability of the pipeline, we used a known aging-related gene (rsks-1), enabling us to contextualize our findings with current literature.

The computational output ranked aging-related genes of C. elegans highly

We repurposed recursive feature elimination with SVM and GLM models to develop a machine learning technique which ranks genes by their potential involvement in the process of aging. The computational output was validated by identifying the ranks of well-known aging-related genes such as age-1, daf-2, and let-363, all of which were ranked within the top 3% of the 4380 C. elegans genes. rsks-1, a ribosomal S6 kinase gene, was also ranked highly by our machine learning technique due to its high similarity with the identified proxy genes (associated with the dysfunction markers) and genes associated with aging processes. The overall ranking of rsks-1 was 55 out of 4380 genes (top 2% of all C. elegans genes), and it had the highest similarity with other genes associated with aging processes (n = 39, top 1%) (Table 3). The ranking of the known aging-related genes and rsks-1 is comparable to the GenAge ranking of longevity genes based on their ability to extend lifespan, however, rsks-1 is ranked much higher by our pipeline (6,34). This could be due to GenAge only ranking genes based on the percentage increase in lifespan after gene modulation, rather than the involvement of the gene in the aging process.

The C. elegans gene daf-2 (insulin-like receptor) is well-studied in the aging field and its knockout increases lifespan by 169%, whereas knockout of rsks-1 results in a lifespan increase of 20% (46). The dual knockout of daf-2 and rsks-1 has shown a synergistic lifespan extension by 454% (46). However, the mechanisms through which rsks-1 affects the aging process and results in this synergistic effect remains unclear. Characterizing the mechanisms involved in rsks-1 functioning will help discover and detail associated targets involved in the aging process, which could promote successful aging if inhibited or activated/stimulated.

Our pipeline identified other aging-related genes possibly related to rsks-1’s mechanism of aging. The genes from aging processes which were uncategorized (not directly associated with dysfunction markers) were used to detect similar aging-related genes, such as rsks-1. Additionally, proxy genes from the other dysfunction marker types (intestinal aging markers and morphological aging features) were used to predict possible associations (Figure 7) through which rsks-1 could affect the aging process in C. elegans.

The potential interaction of rsks-1 with aging-related genes

The interactions between the proxy genes and rsks-1 with a similarity score were assessed using publicly available databases (Pubmed) (Figure 7). These genes include atk-1, daf-15, hlh-30, hsf-1, glp-1, daf-8, unc-51, atg-9, ced-3, unc-32, eat-6, hcf-1, hmg-4, egl-27, age-1, itr-1, let-765, arr-1.

There is supporting evidence for multiple of the associations proposed by our pipeline between the proxy genes and rsks-1 based on previous studies. Indeed, the String database (52) has predicted a functional association between rsks-1 and the dysfunction markers atk-1 and daf-15, based on their putative homologs interacting in other organisms. A possible relationship has been shown between hlh-30 and rsks-1 in C. elegans, with knockout of rsks-1 resulting in increased hlh-30 mRNA levels compared to wild type (51). Further, a specific genetic interaction has been found between rsks-1 and hsf-1 through RNA interference screening (50). Through an RNA interference study it was shown that unc-51 and atg-9 extended the lifespan of C. elegans with an rsks-1 mutant (47). There is also a functional interaction between rsks-1 and the dysfunction marker glp-1, in which rsks-1 promotes glp-1 fate; the nature however, of this interaction, remains unclear (49). A null mutation with rsks-1 and daf-8 showed a strong synergistic Daf-c phenotype (48), further supporting the association between the proxy genes and rsks-1 using the similarity score from our pipeline.

The remaining dysfunction marker genes were to the best of our knowledge not previously described in literature to have an interaction or association with rsks-1. These include the genes: ced-3, unc-32, eat-6, hcf-1, hmg-4, egl-27, age-1, itr-1, let-765, and arr-1. Our computational output indicates a possible interaction or association between the latter genes and rsks-1 based on a combination of coexpression, phosphorylation pattern, and functional description (categorical data). The potential relationship between the proxy genes and rsks-1 could allude to new mechanisms through which rsks-1 affects aging in C. elegans. Our pipeline, therefore, does not only rank aging-related genes, but also provides valuable information on how the ranked genes might interact or be associated with the proxy genes based on their similarity score.

Using cellular and morphological dysfunction to describe effects of targets on the aging process and possible relevance to human disease

When implementing the pipeline presented here, the proxy genes related to dysfunction markers enable the detection of aging-related genes that have a large potential of affecting aging outcomes. This is due to the close association between their gene expression and the change in the dysfunction marker over the lifespan of C. elegans. Although gene expression is not indicative of protein activity, for our purposes it is an initial indicator of a possible relationship between a gene and an aging-related feature (Supporting Figures: see Supplementary information).

As an example, daf-2 was selected as a proxy gene for aging pigment due to its high RMSE value when comparing its expression with aging pigment accumulation. Knockout of daf-2 has resulted in a decreased accumulation of aging pigment and improved locomotory function (22), supporting its relevance to the presence of aging pigment. Similarly, in our computational output rsks-1 shows the highest similarity to genes related to aging pigment and tail tip movement. Both daf-2 and rsks-1 impact C. elegans lifespan through shared pathways, ultimately involving translation regulation (53). Therefore, it is expected that they would have similar effects on molecular and morphological aging features, which is reflected in the computational output.

Aging pigment forms due to a combination of dysregulation in proteostasis, as well as disturbances in redox balance. Locomotion (such as tail tip movement) is related to aging pigment formation (22). These functional outcomes could be equivalent to dysregulation of cellular function in neuronal cells, such as amyloid beta aggregation in humans. As changes in rsks-1’s function may lead to improvement (or deterioration) in the healthspan of C. elegans during aging, this may suggest that targeting its human homolog S6 kinase in neurons could prevent the accumulation of proteins that contribute to the onset of neurodegenerative diseases. This is supported by the observation that a genetic reduction in S6K1 reduced the generation of amyloid beta in mice (54). Therefore, the pipeline presented here is able to detect and describe potential aging-related genes relevant to AD.

Conclusion and future outlook

The aim of this study was to develop a computational pipeline enabling to prioritize C. elegans genes by their probability of being involved in aging-related functional processes. This was achieved by using a supervised machine learning technique to rank genes by their similarity to known aging-related genes and dysfunction marker genes (known to be present in aging C. elegans). The ranked gene output showed that 91% of the top-ranked quantile of genes are known aging-related genes, while the remainder could be potential novel aging-related genes. The accuracy of the ranked output was shown through the high ranking of known aging-related genes, age-1, daf-2, let-363, and rsks-1.

Next, rsks-1 was used as an example gene to showcase the output’s functionality. The dysfunction marker (proxy) genes with a high similarity to rsks-1 could potentially indicate a functional interaction or association. Furthermore, the dysfunction markers (eg. aging pigment, tail tip speed, etc.) could be used to understand how a highly ranked gene may affect the aging process. Throughout, the computational output was validated through contextualisation with the most recent literature.

Future work that includes information beyond gene expression may further strengthen approaches similar to the pipeline presented here, to describe the connection between genes and dysfunction markers (such as protein and phosphorylation data). In addition, causal inference analysis, as described in the works of Pearl et al., 2016, 2019, may add value by uncovering the causal structure in the system, allowing one to identify potential new treatment targets not evident from correlation and machine learning analyses (55,56).

Well-designed experiments that are guided by the current computational output may inform current gaps, such as the association between proxy genes and dysfunction markers, which will improve the relevance of the identified age-related targets. Such approaches may allow a more in-depth characterization of the association between gene expression and dysfunction marker decline. Finally, by using a workflow similar to that demonstrated in the present study for rsks-1, unknown genes could be explored and prioritized to potentially identify novel targets for aging.

Funding

Not applicable.

Contributions

NT and ZJVR: Data collection, analysis, and interpretation of results. CL: Project design and development, computational analysis. RO provided computational and mathematical advice. RS and all authors contributed to manuscript writing and revision. All authors read and approved the final manuscript.

Competing interests

NT, ZJVR, RS, CL are employed by incubate.bio, a commercial company developing computational solutions for aging research and drug discovery in the field of neurodegenerative diseases.

Acknowledgements

We would like to acknowledge Prof Ben Loos and Dr. Dawie van Niekerk for assistance with formatting and editing of the manuscript, and Dr. Anthony Sedgwick for assistance with the initial conceptualization of the study.

References

  1. 1.↵
    WHO. Ageing and Health [Internet]. World Health Organization. 2021 [cited 2022 Apr 22]. Available from: https://www.who.int/news-room/fact-sheets/detail/ageing-and-health#:~:text=By%202050%2C%20the%20world’s%20population,2050%20to%20reach%20426%20million.
  2. 2.↵
    Arking R. Biology of longevity and aging: pathways and prospects. Oxford University Press, USA; 2019.
  3. 3.↵
    Rowe JW, Kahn RL. Successful aging. Gerontologist. 1997 Aug;37(4):433–40.
    OpenUrlCrossRefPubMedWeb of Science
  4. 4.↵
    Fedarko NS. The biology of aging and frailty. Clin Geriatr Med. 2011 Feb;27(1):27–37.
    OpenUrlCrossRefPubMed
  5. 5.↵
    López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The hallmarks of aging. Cell. 2013 Jun 6;153(6):1194–217.
    OpenUrlCrossRefPubMedWeb of Science
  6. 6.↵
    Tacutu R, Thornton D, Johnson E, Budovsky A, Barardo D, Craig T, et al. Human aging genomic resources: new and updated databases. Nucleic acids research. 2018;46((D1)):D1083–90.
    OpenUrlCrossRefPubMed
  7. 7.↵
    Bitto A, Wang AM, Bennett CF, Kaeberlein M. Biochemical Genetic Pathways that Modulate Aging in Multiple Species. Cold Spring Harb Perspect Med. 2015 Nov 2;5(11).
  8. 8.↵
    Zhang S, Li F, Zhou T, Wang G, Li Z. Caenorhabditis elegans as a Useful Model for Studying Aging Mutations. Front Endocrinol (Lausanne). 2020 Oct 5;11:554994.
    OpenUrl
  9. 9.↵
    Nielsen JL, Bakula D, Scheibye-Knudsen M. Clinical trials targeting aging. Front Aging. 2022 Feb 4;3.
  10. 10.↵
    Tsurumi A, Li WX. Aging mechanisms—A perspective mostly from Drosophila. Advanced Genetics. 2020 Dec;1(1).
  11. 11.↵
    Le Couteur DG, McLachlan AJ, Quinn RJ, Simpson SJ, de Cabo R. Aging biology and novel targets for drug discovery. J Gerontol A Biol Sci Med Sci. 2012 Feb;67(2):168–74.
    OpenUrlCrossRefPubMedWeb of Science
  12. 12.↵
    Pun FW, Leung GHD, Leung HW, Liu BHM, Long X, Ozerov IV, et al. Hallmarks of aging-based dual-purpose disease and age-associated targets predicted using PandaOmics AI-powered discovery engine. Aging (Albany NY). 2022 Mar 29;14(6):2475–506.
    OpenUrl
  13. 13.
    Tanveer M, Richhariya B, Khan RU, Rashid AH, Khanna P, Prasad M, et al. Machine learning techniques for the diagnosis of alzheimer’s disease. ACM Trans Multimedia Comput Commun Appl. 2020 Apr 28;16(1s):1–35.
    OpenUrl
  14. 14.
    Shangguan Z. A Review of Target Identification Strategies for Drug Discovery: from Database to Machine-Based Methods. J Phys: Conf Ser. 2021 Apr 1;1893(1):012013.
    OpenUrl
  15. 15.↵
    Bai B, Wang X, Li Y, Chen P-C, Yu K, Dey KK, et al. Deep multilayer brain proteomics identifies molecular networks in alzheimer’s disease progression. Neuron. 2020 Mar 18;105(6):975–991.e7.
    OpenUrlCrossRef
  16. 16.↵
    Moradi E, Pepe A, Gaser C, Huttunen H, Tohka J, Alzheimer’s Disease Neuroimaging Initiative. Machine learning framework for early MRI-based Alzheimer’s conversion prediction in MCI subjects. Neuroimage. 2015 Jan 1;104:398–412.
    OpenUrlCrossRef
  17. 17.↵
    Fan Z, Xu F, Qi X, Li C, Yao L. Classification of Alzheimer’s disease based on brain MRI and machine learning. Neural Comput & Applic. 2020 Apr;32(7):1927–36.
    OpenUrl
  18. 18.↵
    Lemoine M. The evolution of the hallmarks of aging. Front Genet. 2021 Aug 26;12:693071.
    OpenUrl
  19. 19.↵
    Cohen AA, Deelen J, Jones OR. Editorial: mechanisms and pathways contributing to the diversity of aging across the tree of life. Front Cell Dev Biol. 2022 Feb 16;10:854700.
    OpenUrl
  20. 20.↵
    Narayan V, Ly T, Pourkarimi E, Murillo AB, Gartner A, Lamond AI, et al. Deep Proteome Analysis Identifies Age-Related Processes in C. elegans. Cell Syst. 2016 Aug;3(2):144–59.
    OpenUrl
  21. 21.↵
    Hastings J, Mains A, Virk B, Rodriguez N, Murdoch S, Pearce J, et al. Multi-Omics and Genome-Scale Modeling Reveal a Metabolic Shift During C. elegans Aging. Front Mol Biosci. 2019 Feb 6;6:2.
    OpenUrl
  22. 22.↵
    Gerstbrein B, Stamatas G, Kollias N, Driscoll M. In vivo spectrofluorimetry reveals endogenous biomarkers that report healthspan and dietary restriction in Caenorhabditis elegans. Aging Cell. 2005 Jun;4(3):127–37.
    OpenUrlCrossRefPubMedWeb of Science
  23. 23.↵
    Martineau CN, Baskaner B, Seinstra RI, Schafer WR, Brown AE, Nollen EA, et al. Deep behavioural phenotyping reveals divergent trajectories of ageing and quantifies health state in C. elegans. BioRxiv. 2019 Feb 20;
  24. 24.↵
    Huang J, Wu Z, Wang J, Zhang X. Quantitative phosphoproteomics reveals GTBP-1 regulating C.elegans lifespan at different environmental temperatures. Biochem Biophys Res Commun. 2018 Sep 10;503(3):1962–7.
    OpenUrl
  25. 25.↵
    Van Rossum G, Drake FL. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace; 2009.
  26. 26.↵
    Pedregosa O F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. Journal of machine learning research. 2011 Oct;12:2825–30.
    OpenUrlCrossRef
  27. 27.↵
    McKinney W. Data structures for statistical computing in python. Proceedings of the 9th Python in Science Conference. SciPy; 2010. p. 56–61.
  28. 28.↵
    Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020 Sep 16;585(7825):357–62.
    OpenUrlCrossRefPubMed
  29. 29.
    Davis P, Zarowiecki M, Arnaboldi V, Becerra A, Cain S, Chan J, et al. WormBase in 2022—data, processes, and tools for analyzing Caenorhabditis elegans. Genetics. 2022;
  30. 30.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000 May;25(1):25–9.
    OpenUrlCrossRefPubMedWeb of Science
  31. 31.
    The Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 2021 Jan 8;49(D1):D325–34.
    OpenUrlCrossRefPubMed
  32. 32.↵
    Martineau CN, Brown AEX, Laurent P. Multidimensional phenotyping predicts lifespan and quantifies health in Caenorhabditis elegans. PLoS Comput Biol. 2020 Jul 21;16(7):e1008002.
    OpenUrl
  33. 33.
    Bastian FB, Roux J, Niknejad A, Comte A, Fonseca Costa SS, De Farias TM, et al. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals. Nucleic acids research. 2021;49((D1)):D831–47.
    OpenUrl
  34. 34.↵
    Budovsky A, Craig T, Wang J, Tacutu R, Csordas A, Lourenço J, et al. LongevityMap: a database of human genetic variants associated with longevity. Trends Genet. 2013;29(10):559–60.
    OpenUrlCrossRefPubMed
  35. 35.↵
    The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–9.
    OpenUrlCrossRefPubMed
  36. 36.↵
    McDonald RB, Ruhe RC. Aging and longevity: why knowing the difference is important to nutrition research. Nutrients. 2011 Feb 28;3(3):274–82.
    OpenUrlPubMed
  37. 37.↵
    Maulik M, Mitra S, Bult-Ito A, Taylor BE, Vayndorf EM. Behavioral Phenotyping and Pathological Indicators of Parkinson’s Disease in C. elegans Models. Front Genet. 2017 Jun 13;8:77.
    OpenUrl
  38. 38.↵
    Huang C, Xiong C, Kornfeld K. Measurements of age-related changes of physiological processes that predict lifespan of Caenorhabditis elegans. Proc Natl Acad Sci USA. 2004 May 25;101(21):8084–9.
    OpenUrlAbstract/FREE Full Text
  39. 39.↵
    Yin J-A, Gao G, Liu X-J, Hao Z-Q, Li K, Kang X-L, et al. Genetic variation in glia-neuron signalling modulates ageing rate. Nature. 2017 Nov 8;551(7679):198–203.
    OpenUrl
  40. 40.↵
    McGhee JD. The C. elegans intestine. WormBook. 2007 Mar 27;1–36.
  41. 41.↵
    Pincus Z, Mazer TC, Slack FJ. Autofluorescence as a measure of senescence in C. elegans: look to red, not blue or green. Aging (Albany NY). 2016;8(5):889–98.
    OpenUrl
  42. 42.↵
    Portal-Celhay C, Bradley ER, Blaser MJ. Control of intestinal bacterial proliferation in regulation of lifespan in Caenorhabditis elegans. BMC Microbiol. 2012 Mar 27;12:49.
    OpenUrlCrossRefPubMed
  43. 43.↵
    Kaletsky R, Yao V, Williams A, Runnels AM, Tadych A, Zhou S, et al. Transcriptome analysis of adult Caenorhabditis elegans cells reveals tissue-specific gene and isoform expression. PLoS Genet. 2018 Aug 10;14(8):e1007559.
    OpenUrl
  44. 44.↵
    Tolcher A, Goldman J, Patnaik A, Papadopoulos KP, Westwood P, Kelly CS, et al. A phase I trial of LY2584702 tosylate, a p70 S6 kinase inhibitor, in patients with advanced solid tumours. Eur J Cancer. 2014 Mar;50(5):867–75.
    OpenUrlCrossRefPubMedWeb of Science
  45. 45.↵
    Park JS, Kang DH, Lee DH, Bae SH. PF-4708671, a specific inhibitor of p70 ribosomal S6 kinase 1, activates Nrf2 by promoting p62-dependent autophagic degradation of Keap1. Biochem Biophys Res Commun. 2015 Oct 23;466(3):499–504.
    OpenUrl
  46. 46.↵
    Chen D, Li PW-L, Goldstein BA, Cai W, Thomas EL, Chen F, et al. Germline signaling mediates the synergistically prolonged longevity produced by double mutations in daf-2 and rsks-1 in C. elegans. Cell Rep. 2013 Dec 26;5(6):1600–10.
    OpenUrlCrossRefPubMedWeb of Science
  47. 47.↵
    Hashimoto Y, Ookuma S, Nishida E. Lifespan extension by suppression of autophagy genes in Caenorhabditis elegans. Genes Cells. 2009 Jun;14(6):717–26.
    OpenUrlCrossRefPubMedWeb of Science
  48. 48.↵
    Friberg J. The control of growth and metabolism in Caenorhabditis elegans. Umeå centrum för molekylär patogenes (UCMP) (Medicinska fakulteten); 2006.
  49. 49.↵
    Roy D, Kahler DJ, Yun C, Hubbard EJA. Functional Interactions Between rsks-1/S6K, glp-1/Notch, and Regulators of Caenorhabditis elegans Fertility and Germline Stem Cell Maintenance. G3 (Bethesda). 2018 Oct 3;8(10):3293–309.
    OpenUrlAbstract/FREE Full Text
  50. 50.↵
    Seo K, Choi E, Lee D, Jeong D-E, Jang SK, Lee S-J. Heat shock factor 1 mediates the longevity conferred by inhibition of TOR and insulin/IGF-1 signaling pathways in C. elegans. Aging Cell. 2013 Dec;12(6):1073–81.
    OpenUrlCrossRefPubMed
  51. 51.↵
    Lapierre LR, De Magalhaes Filho CD, McQuary PR, Chu C-C, Visvikis O, Chang JT, et al. The TFEB orthologue HLH-30 regulates autophagy and modulates longevity in Caenorhabditis elegans. Nat Commun. 2013;4:2267.
    OpenUrlCrossRefPubMed
  52. 52.↵
    Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021 Jan 8;49(D1):605–12.
    OpenUrl
  53. 53.↵
    Lan J, Rollins JA, Zang X, Wu D, Zou L, Wang Z, et al. Translational Regulation of Non-autonomous Mitochondrial Stress Response Promotes Longevity. Cell Rep. 2019 Jul 23;28(4):1050–1062.e6.
    OpenUrl
  54. 54.↵
    Caccamo A, Branca C, Talboom JS, Shaw DM, Turner D, Ma L, et al. Reducing ribosomal protein S6 kinase 1 expression improves spatial memory and synaptic plasticity in a mouse model of alzheimer’s disease. J Neurosci. 2015 Oct 14;35(41):14042–56.
    OpenUrlAbstract/FREE Full Text
  55. 55.↵
    Pearl J, Dana M. The book of why. New York: Basic Books; 2018.
  56. 56.↵
    Pearl J, Glymour M, Jewell NP. Causal Inference in Statistics: A Primer. Wiley; 2016.
Back to top
PreviousNext
Posted June 28, 2022.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Supervised machine learning with feature selection for prioritization of targets related to time-based cellular dysfunction in aging
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Supervised machine learning with feature selection for prioritization of targets related to time-based cellular dysfunction in aging
Nina Truter, Zuné Jansen van Rensburg, Radouane Oudrhiri, Raminderpal Singh, Carla Louw
bioRxiv 2022.06.24.497511; doi: https://doi.org/10.1101/2022.06.24.497511
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Supervised machine learning with feature selection for prioritization of targets related to time-based cellular dysfunction in aging
Nina Truter, Zuné Jansen van Rensburg, Radouane Oudrhiri, Raminderpal Singh, Carla Louw
bioRxiv 2022.06.24.497511; doi: https://doi.org/10.1101/2022.06.24.497511

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4232)
  • Biochemistry (9124)
  • Bioengineering (6774)
  • Bioinformatics (23985)
  • Biophysics (12116)
  • Cancer Biology (9520)
  • Cell Biology (13772)
  • Clinical Trials (138)
  • Developmental Biology (7626)
  • Ecology (11683)
  • Epidemiology (2066)
  • Evolutionary Biology (15502)
  • Genetics (10637)
  • Genomics (14317)
  • Immunology (9476)
  • Microbiology (22826)
  • Molecular Biology (9087)
  • Neuroscience (48947)
  • Paleontology (355)
  • Pathology (1480)
  • Pharmacology and Toxicology (2567)
  • Physiology (3844)
  • Plant Biology (8325)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2295)
  • Systems Biology (6185)
  • Zoology (1300)