Abstract
Maintenance of cellular function requires highly coordinated communication between trillions of biomolecules. However, over time, communication deteriorates, thereby disrupting effective information flow and compromising cellular health. To quantify the age-related loss of molecular communication, we applied information theory to quantify communication efficiency between transcription factors (TF) and corresponding target genes (TGs). Using single cell RNA-seq data from the limb muscle of young, middle-aged, and aged mice, we found that the precision with which TFs regulate TGs diminished with age, but that information transfer was preferentially preserved in a subset of gene pairs associated with homeostasis—a phenomenon we termed “age-based canalization”. Collectively, these data suggest that aging may be accompanied by a reallocation of resources that favor messages crucial to maintenance of stability and survival.
One sentence summary As communication efficiency in the regulatory network diminishes, aged cells prioritize homeostatic over adaptive functions.
Main text
“It is by avoiding the rapid decay into the inert state of equilibrium that an organism appears so enigmatic.”
Schrödinger, ‘What is life?’ p70
All matter undergoes ‘wear and tear’ due to the destructive effects of physical forces; this is a core fundamental truth even for biological systems. The maintenance of proper functioning requires trillions of biomolecules to communicate with each other in an orderly fashion. However, over time, communication deteriorates, thereby disrupting effective information flow within biomolecular networks(1). As a result, essential biological functions are progressively impaired, and organisms become increasingly vulnerable to death (i.e., ageing). Ageing has been defined as a progressive loss of body function due to the accumulation of damage to molecules, cells, and tissues(2–5). To date, studies that have estimated escalating biological noise within an ageing system have focused on variability in the expression of individual genes and/or signaling pathways(6–9). However, estimation of biological noise for individual genes does not provide an integrative view of disorder in the aging system. Towards an integrated view, in our previous work, we considered the inter-connectivity of biomolecules and quantified network Shannon entropy. We demonstrated that the collective ‘molecular disorderliness’ of the skeletal muscle transcriptome escalates with age (10). This finding suggests that, at the level of the tissue, biomolecular communication becomes increasingly unpredictable over time.
Effective communication is also essential in the regulation of biological processes and function. Indeed, cells perform complex tasks by functioning as a network of interacting genes, and successful regulation among genes occurs when the cellular system can produce a reliable and reproducible response to a given external stimuli. In the context of the regulatory network, a communication channel can be defined with transcription factor (TF) expression representing the ‘input message’ and target gene (TG) expression representing the ‘output message’. Effective regulation occurs when the mutual information (MI) between the gene pairs is high, with MI defined as the information obtained about the input by observing the output. The maximum MI for a given channel over all possible input patterns constitutes the information capacity of the channel(11). Although Claude Shannon introduced mathematical concepts to quantify the transfer of information in 1948, it took many decades before principles of information theory were applied to the regulatory function of genes. It was not until the 2000s that investigators characterized the information flow within a simple transcriptional regulatory system (12–17). Tkacik et. al. showed that the transcriptional regulatory system can have multiple states carrying more than one bit of information (12, 17), thereby generalizing beyond the notion that gene regulatory elements represent a simple “on-off” state. Over the next decade, researchers applied a similar framework to estimate the information transduced by receptors and biochemical signaling pathways (1, 18–20). Still unknown, however, is whether and how information flow is altered with aging regulatory networks. To fill this gap, here we tested the hypothesis that intrinsic biological noise arising from unpredictable fluctuations in gene expression with aging disrupts effective communication between TFs and TGs, leading to compromised function.
Useful information transmitted between regulatory gene pairs decreases with age
As a first step to test our hypothesis, we employed an information theoretic approach to quantify the escalating biological noise of the communication network at the level of single cell transcripts. The Tabula Muris Consortium has compiled the transcriptional profiles of multiple tissues at the single-cell level across the lifespan of mice(21). Specifically, we used transcript abundance obtained by FACS-Smart-seq2 (Fluorescence Activated Cell Sorting) from the skeletal muscle of young (3 months), middle-aged (18 months), and aged (24 months) mice for our analyses. In agreement with a recent study (22), we observed that overall transcript counts in skeletal muscle decreased with age (Figure 1A). To quantify whether molecular disorder of transcript expression within single cells increases with age, we computed Shannon entropy of mean gene counts per animal. A higher Shannon entropy indicates that the probability distribution of gene expression becomes more uniform, thereby decreasing the ability to reliably predict gene expression. Consistent with our previous work using bulk RNA-seq data from skeletal muscle (10), we found that entropy at the level of single cells increases over time (Figure 1B).
(A) Histogram showing mean gene counts decreases with age from single-cell RNA seq TMS data (males: n=4,2,4; females: n=2,2,0). (B) Shannon Entropy of gene expression increases with age using single-cell RNA seq TMS data (males: n=4,2,4; females: n=2,2,0). Non-parametric Kruskal Wallis (p= 0.0970) and post-hoc bonferroni test was performed to get a p value of 0.0922. The blue shaded region is the error bar representing standard deviations. (C) Schematic representing Shannon’s noisy channel as applied to an aging biological system. Our hypothesis is that in an aged system, increasing biological noise leads to increased error accumulation and compromised function. (D) Venn diagram and schematic showing parts of communication channel. Both useful information and noise in the output are normalized by total entropy as described in the methods. The blue shaded region represents standard deviations. *** represent p<0.01 for bonferroni post-hoc performed after non-parametric Kruskal Wallis test. (E) Mutual information declines with age when considering transcriptional regulatory networks within single cells.
We next asked, is the increase in gene expression entropy with aging accompanied by a decrease in efficiency of communication between genes? Recent evidence suggests that cell-cell transcriptional ‘noise’, typically defined as the standard deviation of a single transcript, increases with age (6–9). Yet, variance in transcript expression as a measure of noise is limited by the fact that it does not give an indication of how effectively the transcripts communicate within the regulatory network. In some cases, greater variance facilitates better regulation (1, 23). For a communicating pair of regulatory transcripts, noise can alternatively be defined as the information lost between input and output. We term this as ‘noise in input’ at the regulator end, and ‘noise in output’ at the effector end (Figure 1C, 1D). In our model system, we considered each cell to be an individual network of communication channels. Using a framework whereby each communication channel has an input transcription factor (TF) and an output target gene (TG), we generated a network of TF:TG pairs representing single cells for each age group (24). We then applied Shannon’s noisy channel coding theorem to describe how communication between TF and TG changes with age (Figure 1C)(11). According to this theorem, mutual information (MI) is the amount of information that can be conveyed through a noisy communication channel. We found that average MI, or the average information effectively transmitted between TF:TG pairs, declined with age, as evidenced by a leftward shift in the distribution (Figures 1D, 1E).
We next evaluated whether the decline in MI is accompanied by increasing noise in output over time. For our communication channel, noise can be attributed to a combination of factors that, together, prevent effective communication between ‘input’ TF and the final ‘output’ TG. We found that the noise in output increased with age (Figures 1D, 1E). Our observation of the increased biological noise and a corresponding decreased MI suggests that TF-TG communication pathways become more unpredictable with age.
Loss of information with aging is driven by a decreased channel capacity
According to Shannon’s noisy channel theorem, channel capacity (CC) is the maximum theoretical limit to the rate of information that can be transmitted through a noisy communication channel (11). For our system, CC is the MI maximized over the distribution of TF levels and is directly related to the total number of input-output mappings that are resolvable given the noise. CC can be thought of as the maximal precision with which a particular TF can regulate its TG (12). CC depends on two quantitative features, range and variability (Figure 2A, Supplemental File). Range is defined as the difference in minimum and maximum output values for a given input. CC is expected to increase with range. Variability is defined as the variance of output for each input expression value. CC is expected to decrease with variability. There are three possibilities for how the CC may change relative to our observation of a decrease MI with increasing age (Figure 2B). First, CC may remain unchanged over time, suggesting that the communication system becomes progressively inefficient despite the maintained potential of the TF to regulate its TG. Second, CC may decline with MI, suggesting that the potential of a TF to regulate its TG declines with age, driving a loss of effective communication between the two. Finally, CC could increase over time even as MI decreases. The interpretation of this third, least likely, scenario would be that the aging system displays an increased regulatory potential, but that other factors preclude the regulatory network from taking advantage of such potential, resulting in faulty communication. We sought to determine which of these possibilities manifests in our aging muscle system.
(A) Schematic representing a hypothetical probability distribution for a TF. Variability and Range are two factors that affect the Channel capacity value. (B) Schematic representing hypothesized scenarios that may cause decline in CC. (C) Total CC follows mutual information trend indicating that biological noise is steady with increasing age. The error bar indicates 95% confidence interval. (D) Stacked bar graph showing that range for 64% of TF-TG pairs (n=344) decreases with age. (E) Stacked bar graph showing that variability for 87% of TF-TG pairs (n=344) decreases with age.
Computing CC is mathematically challenging, especially for a system with continuous input and output signals(12, 20). Therefore, we simplified the mapping space into a coarse-grained 4-state model (Supplemental Figure 2A, 2B). The 4 states we defined were based on simplifying input-output expression levels into binary variables: zero if gene count is zero, and 1 if gene count is non-zero. Hence, the 4 states for TF-TG mappings were (0, 0), (0, 1) (1, 0), and (1, 1). Consistent with Fig 1D, we observed with this reduced description an overall decreasing trend for MI with age, and furthermore we found that this decrease was accompanied by a reduced CC (Figure 2C). This suggests that a loss of effective communication over time may be dictated by diminishing precision with which TFs are able to regulate TGs.
The next logical question was, what causes CC to decline with age? As described above, range and variance are the two factors that contribute to CC. We found that as the organism ages, most TF-TG pairs displayed a decrease in range (Figure 2D) but that the variance also decreased (Figures 2E). This implies that the TG expression becomes more uniform across cells (i.e., lower variance), but in a way that is less dependent on regulation by TFs (i.e., reduced range). Though initially unexpected, further evaluation of the data revealed that, because of the overall downregulation in gene expression with age (Figure 1A), variance is constrained as expression values approach zero (Supplemental File), explaining the decrease in variance with age.
Homeostatic functions are preferentially preserved with increasing age
The above findings demonstrate that an aging system displays an overall decrease in the transmission of ‘useful’ information and that this decrease is attributed to the inability of TFs to effectively regulate TGs within the regulatory network. To understand the extent to which gene regulation may mitigate the escalating disorder that accompanies aging, we calculated the total Shannon entropy of genes that are a part of the regulatory network compared to genes that are not. Interestingly, whereas the total entropy of genes not a part of the regulatory network remained constant over time (Supplemental Figure 3A), TFs and TGs within the regulatory network displayed a decreased entropy with aging (Supplemental Figures 3B, 3C). This suggests that regulation may render a subset of genes within the regulatory network less susceptible to escalating molecular disorder as the organism ages.
The above observation led us to ask whether there may be a distinct set of genes within the regulatory network that are preferentially regulated according to biological relevance. To answer this question, we generated a list of TF-TG pairs and their corresponding MI values across young and aged groups. TF-TG gene pairs were divided into two categories: one subset of genes that displays a preserved MI with age, and a second subset that displays a compromised MI with age. For this, the gene subset with preserved MI was defined a priori as starting with a high MI (>0.3) in young and persisting over time into old age (MI change threshold <0.2). In contrast, the gene subset that displayed a compromised MI was defined as starting with a high MI (>0.3) in young but decreasing over time (MI change threshold >0.2). We found that only 27 gene pairs (33rd percentile) displayed a preserved MI with aging, whereas 89 gene pairs (67th percentile) displayed compromised MI over time (Figure 3A). When we plotted MI and CC values as per the 4-state model for preserved and compromised genes, there was little change in the MI and CC for preserved genes over time, while both MI and CC displayed statistically significant declines for compromised genes (Figures 3B). Further inspection of the data revealed that the maintained precision with which TFs regulate corresponding TGs for the preserved gene set was a result of a maintained range and variability with aging (Figure 3D, 3E). However, the decreased variance and, particularly, range observed in the compromised genes suggests that these TGs operate more independently of their TFs as the organism ages (Figures 3F, 3G).
(A) Each cell in the heat map represents mutual information of the specific gene pair in that age group. The gene pairs are arranged in decreasing order of MI at 24m. Genes that preserve MI have >0.3 when young and <0.2 difference in 3m vs 24m, whereas, genes that compromise MI have >0.3 when young and >0.2 difference in 3m vs 24m. Mann Whitney U test shows p<0.01 in both preserved and compromised categories. (B) Channel capacity for preserved genes follows mutual information trend indicating that biological noise is steady with increasing age. The shaded region indicates the Standard Error of the Mean. Mann Whitney U test was performed. No significant change in MI and CC. (C) Channel capacity for compromised genes also follows mutual information trend indicating that biological noise is steady with increasing age. The shaded region indicates the Standard Error of the Mean. Mann Whitney U test was performed. Significant change in MI and CC (*** p<0.01). (D) Stacked bar graph showing that range for 52% of preserved TF-TG pairs (n=27) decreases with age. (E) Stacked bar graph showing that variability for 64.5% of preserved TF-TG pairs (n=27) decreases with age. (F) Stacked bar graph showing that range for 86% of TF-TG compromised pairs (n=89) decreases with age. (G) Stacked bar graph showing that variability for 82% of TF-TG compromised pairs (n=89) decreases with age.
(A) Gene Ontology (GO) terms associated with the preserved target genes are predominantly related to homeostasis. (B) Gene Ontology (GO) terms associated with the compromised target genes are predominantly related to tissue remodeling and growth.
Finally, we performed functional enrichment of TGs to better understand the biological processes for those genes in which information transfer is either maintained or dispensed with aging. ‘Homeostasis’ emerged as the most common process among those genes in which MI was preserved with aging. In contrast, genes that displayed a compromised information flow with aging were largely associated with functions associated with ‘muscle adaptation’, ‘tissue remodeling’, ‘regulation of reproductive process’, and ‘ organ growth’ (Figures 4A, 4B). These findings suggest that there may be preferential preservation of biological functions that are crucial to survival of the organism.
Discussion
While a mathematical theory of communication was first proposed over 70 years ago, its application to quantify transcriptional information processing with age, as we have done here, is novel. The information theory approach we adopted represents an unbiased way to characterize biologically relevant information transmission with aging at the resolution of single cells (25–27). We found that the amount of useful information transmitted between a TF and its TG progressively declines over time. Further, we found that the maximum precision by which TF can regulate a TG, declines with age, and that this decreasing capacity may be a primary driver of the loss of useful information as the system ages. Finally, classification of gene pairs into preserved and compromised sets revealed that homeostatic functions are preferentially preserved over time at the expense of tissue plasticity and adaptability.
From a biological perspective, our information theoretic approach offers the advantage of disentangling unique components of transcriptional heterogeneity and how this heterogeneity evolves with aging. Whereas previous studies have demonstrated that variability in gene expression, or transcriptional heterogeneity, increases with time (6–8), few studies have investigated distinct aspects of transcriptional regulation that contributes to the overall increased heterogeneity with aging. Perez-Gomez et al. referred to the random, or stochastic, transcriptional variability as “transcriptional noise”, while non-random transcriptional variability, that may result from deterioration of systemic regulation, was defined as “transcriptional drift” (28). Authors pointed out that a major challenge lies in distinguishing changes in which the expression of a single gene drifts away from its original level due to a lack of regulation versus adaptive changes that are initiated by regulated processes. Authors also point out that this distinction can be better made when correlation or co-expression of multiple genes is taken into consideration. In line with this proposition, we defined noise with respect to a pair of genes in a communication channel, and we focused on how efficiency of transcriptional regulation changes with age. Given that MI reflects non-random transcriptional variability, our results suggest that preserved genes display less transcriptional drift, whereas the compromised genes display increased transcriptional drift with increasing age.
From a mathematical perspective, our information theoretic approach has several advantages over the conventional statistical measures such as covariance, correlation coefficient, or linear regression. Although these latter measures are typically employed to describe variability/heterogeneity in an aging system(6–8), the usage of MI as first defined by Shannon has several advantages in the context of a biological system (11, 29). First, MI is assumption-free when considering the mathematical function between TFs and TGs. That is, unlike linear regression, there is no linearity assumption, thereby allowing us to consider all possible nonlinear dependence that is commonly observed in a complex system. In addition, MI is applicable to both continuous and discrete variables, and it is invariant to reparameterization. Stated otherwise, unlike spearman correlation coefficient or covariance, the MI between raw counts of TF and TG is the same as MI between any one-to-one function of TF and TG, e.g., MI (TF; TG) = MI(log2(TF); log2(TG)). This is particularly valuable given that we use normalized scRNA-seq gene counts. Another advantage of our approach is that MI also obeys the data processing inequality, where information in the output relative to the input is necessarily either lost or stays the same at each noisy step in the transmission process but is never “spontaneously” created. This ensures that, unlike covariance, correlation coefficient, and regression, there is no unexplainable manifestation of new noise or new information for each age group. Finally, MI has a clear quantitative interpretation as information, i.e., there would be 2MI(TF;TG) distinguishable levels of TG for the range of TF expression (15). Since the numerical value of MI will remain consistent, usage of MI enhances the reproducibility and rigor of our studies.
Given these advantages of an unbiased information theoretic approach, the next critical consideration is to place the observed decline in MI into a biologically relevant context. The observed decline in CC with age suggests that TFs loses the ability to precisely control TGs. While we did not directly evaluate cellular conditions that may contribute to this loss of control, biological changes resulting in CC decline may include, for example, resource re-allocation, altered energy availability, waste accumulation, etc. Indeed, our data suggest that resource allocation may not be the same for different biological functions as the organism ages. Specifically, genes associated with homeostasis displayed a preserved MI over time, whereas genes associated with tissue adaptability and plasticity displayed a diminished MI. This finding suggests that the organism achieves preferential preservation of cellular functions by regulating certain gene pairs more tightly than others.
Waddington suggested that a robustness to perturbation reflects a long-term prioritization for optimal phenotypes(30). Broadly speaking, canalization is an evolutionary process in which the final phenotype persists even in the face of challenges or perturbations(31). For example, genetic canalization is when the phenotype remains stable in face of genetic mutations or variations. Environmental canalization is when the phenotype remains stable in response to environmental variations, such as temperature, pressure, etc. In our case, we found that preserved genes remain stable in the presence of age-related perturbations, which we termed “age-based canalization”. Previous scientists have suggested that canalization is an inevitable consequence of increasing chances of survival amid complex dynamic processes (30–32). This is consistent with our observation of preserved expression of genes associated with homeostasis at the expense of genes associated with adaptation and plasticity. On the other hand, in the face of unchecked transcriptional variability, adaptive responses may tend to be detrimental, such as in the case of cancer.
Taken together, the application of an information theoretic approach serves as a biologically intuitive mathematical model that has potential to describe the progressively disrupted cellular communication with age. In the long term, we anticipate that this model may serve as a basis to develop a clinically interpretable biological age metric since this communication channel framework can be easily applied to any omic-based biological network data, including, epigenomic, proteomic and metabolomic network. This model can also be extrapolated to other tissue types and organ systems. Since our model captures aspects that are fundamental to the aging process, we expect that it can also be easily translated to humans and other multi-cellular species that undergo aging.
Although there are several advantages of this approach, limitations should be noted. First, we used gene expression values from single-cell RNA seq data, which contains a combination of technical and biological variability. Distinguishing between these two types of variability is challenging. Since MI is invariant to reparameterization, the model captures core properties of biology despite performing transformations and normalization to the raw transcript counts. However, our model will prove to be more accurate as single-cell technologies become more sophisticated in generating high resolution sequencing data. Moreover, it would be interesting in future studies to design an in-vivo experiment to measure time-series measurement of transcript expression levels as the organism ages(33). This can potentially serve as a biological validation of our current findings about preferential preservation of homeostatic functions.
In conclusion, our findings demonstrate that increasing biological noise can be quantified using the mathematical basis of communication and suggest that this increase in noise may be preferentially regulated depending on the gene function. Safeguarding of specific pathways into old age contrasts the popular stochastic error accumulation theory of aging, which purports that molecular mistakes are random (34) An enhanced understanding of how certain biological functions are given priority (i.e., ‘biological wisdom’), may aid in the future design of targeted therapeutics with the ultimate goal of sustaining organismal health and function over time.
Author contributions
Conceptualization: SS, FA
Methodology: SS, FA, AM, RWL
Investigation: SS, RWL
Visualization: SS, RWL
Funding acquisition: FA
Project administration: FA, AM
Supervision: FA, AM, GM
Writing – original draft: SS, FA
Writing – review & editing: SS, FA, AM, RWL, GM
Competing interests
Authors declare that they have no competing interests.
Data and materials availability
All files and code is available on Github (https://github.com/sruthi-hub/Aging_mutual_info_TMS_FACS).
Acknowledgments
This work was funded by NIA RO1 AG052978 (FA) and R01AG061005 (FA). We thank Center for Research and Computing at the University of Pittsburgh that provided resources for providing the platform for analyses. We also thank the attendees of Gordon conference on Stochastic Physics in Biology (2021), Gordon conference on Systems Aging (2022), and Keystone conference on Single-cell biology (2022) for engaging in fruitful conversations.
Footnotes
Supplemental figures updated.