Meta-Analysis of the Gene Expression Profiles of Aging Brain Reveals Consistent Increase in Heterogeneity

In largely non-mitotic tissues such as the brain, cells are prone to gradual accumulation of stochastic genetic and epigenetic errors, which may lead to increased gene expression variation with time, both among cells and possibly also among individuals. Evolutionary theory also predicts increased genetic variation after puberty, associated with the expression of slightly deleterious variants. Although increased inter-individual expression heterogeneity during brain aging was previously reported, whether this process starts at the beginning of life or it is mainly restricted to the aging period has not been studied. The regulation and functional significance of putative age-related heterogeneity are also unknown. Here we address these issues by a systematic analysis of 19 transcriptome datasets from diverse brain regions covering the whole postnatal lifespan. Among all datasets, we observed a significantly higher increase in inter-individual gene expression heterogeneity during aging (20 to 98 years of age) than during postnatal development (0 to 20 years of age). Moreover, increased heterogeneity during aging was consistent among different brain regions, which was associated with many biological processes and pathways that are important for aging and neural function, including longevity regulating pathway, autophagy, mTOR signaling pathway, axon guidance and neuron to neuron synapse. Overall, our results suggested that age-related increase in gene expression heterogeneity during aging is a general effect among human brain transcriptome and, may play a significant role in processes of aging-related changes of brain functions. We also provide the necessary functions to calculate heterogeneity change with age as an R package, ‘hetAge’.


Introduction
Aging is a complex process characterized by a gradual decline in maintenance and repair mechanisms, accompanied by an increase in genetic and epi-mutations, and oxidative damage to protein and lipids (Gorbunova, Seluanov, Mao, & Hine, 2007;Lu et al., 2004). Human brain experiences dramatic structural and functional changes in the course of aging. Age-related changes in the human brain include a decline in gray matter and white matter volumes (Sowell, Thompson, & Toga, 2004), an increase in axonal bouton dynamics (Grillo et al., 2013) and reduced synaptic plasticity, which may be associated with the decline in cognitive functions (Dorszewska, 2013).
Changes during brain aging are suggested to be a result of stochastic processes, unlike changes associated with postnatal neural development which are known to be primarily controlled by regulatory processes (Polleux, Ince-Dunn, & Ghosh, 2007;Schratt, 2009;Stefani & Slack, 2008). The molecular mechanisms underlying age-related alteration of regulatory processes and eventually leading to aging-related phenotypes, however, are barely understood.
Over the past decade, a number of transcriptome studies focusing on age-related changes in human brain gene expression profiles were published (Kang et al., 2011;Lu et al., 2004;Miller et al., 2014;Somel et al., 2010;Tebbenkamp, Willsey, State, & Šestan, 2014). These studies report aging-related differential expression patterns in many functions, including synaptic functions, energy metabolism, inflammation, stress response, and DNA repair. Analyzing age-related change in gene expression profiles in diverse brain regions, we previously showed that gene expression changes occur in the opposite direction during postnatal development (pre-20 years of age) and aging (post-20 years of age), which may be associated with aging-related phenotypes in healthy brain aging (Dönertaş et al., 2017). While different brain regions associated with specific, and often independent, gene expression profiles (Kang et al., 2011;Miller et al., 2014;Tebbenkamp et al., 2014), these studies show that agerelated alteration of gene expression profiles during aging is a widespread effect across different brain regions.
One of the suggested effects of aging on gene expression is increased variability between individuals and somatic cells, which has been previously reported by several studies. Some of these studies find an increase in age-related heterogeneity in heart, lung and white blood cells of mice (Angelidis et al., 2018;Bahar et al., 2006;Pilar Martinez-Jimenez et al., 2017), C.elegans (Herndon et al., 2002), and human twins (Fraga et al., 2005), but also see (Viñuela et al., 2018) study where they find more decrease than increase in heterogeneity in human twins and (Ximerakis et al., 2018) where they show the direction of heterogeneity change depends on cell-type in aging mice brain. Using GTEx data covering different brain regions (ages ranging from 20 to 70 in years), Brinkmeyer-Langford et al identify a set of differentially variable genes between different age groups, although they do not observe increased variation in the old (Brinkmeyer-Langford, Guan, Ji, & Cai, 2016). A more recent study, performing single-cell RNA sequencing of human pancreatic cells, identifies an increase in transcriptional noise and somatic mutations with age (Enge et al., 2017). In an earlier study, using microarray datasets from different tissues of humans and rats, we observed that an increase in agerelated heterogeneity of expression is a general effect in the transcriptome (Somel, Khaitovich, Bahn, Pääbo, & Lachmann, 2006). However, we found no significant consistency across datasets, nor any significant enrichment in functional gene groups. In another study, a meta-analysis suggested differences across brain regions collected from the same individuals are higher in aging than in development, suggesting an increase in inter-individual variability (Dönertaş et al., 2017). A more recent prefrontal cortex transcriptome analysis we conducted revealed a weak increase in agedependent heterogeneity, which was consistent at the gene, transcriptome and pathway level (Kedlian, Donertas, & Thornton, 2019).
Although the age-related increase in heterogeneity has been suggested in previous studies, whether it is a time-dependent process that starts at the beginning of life or it is only seen after puberty, and its functional consequences were not explored. In this study, we retrieved transcriptome data from independent microarray-based studies covering the whole lifespan from diverse brain regions and conducted a comprehensive analysis to identify the prevalence of age-related heterogeneity change in human brain aging, compared with the postnatal development. We confirmed that increased age-related heterogeneity is a consistent trend in human brain transcriptome during aging but not during development and is associated with aging-related biological functions.

Results
To investigate how heterogeneity in gene expression changes with age, we used 19 published microarray datasets from three independent studies. Datasets included 1,010 samples from 17 different brain regions of 298 individuals, ranging from 0 to 98 years in age (Table S1, Figure S1). In order to analyze the age-related change in gene expression heterogeneity during aging compared to the change in development, we divided datasets into two groups as development (0 to 20 years of age, n = 441) and aging (20 to 98 years of age, n=569). We used the age of 20 to separate preadulthood from adulthood based on commonly used age intervals in earlier studies (see Methods).
For the analysis, we focused only on the genes for which we have a measurement across all datasets (n=11,137).

Age-related change in gene expression levels
Although the primary focus of this study is to explore how heterogeneity in gene expression changes with age, we first characterized the changes in gene expression level. In order to quantify age-related changes in gene expression, we used a linear model between gene expression levels and age (see Methods, Figure S2). We transformed the ages to the fourth root scale before fitting the model as it provides relatively uniform distribution of ages across the lifespan, but we also confirmed different age scales yield quantitatively similar results (see Figure S3). We measured expression change of each gene in aging and development periods separately and considered regression coefficients from the linear model (β values) as a measure of age-related expression change ( Figure S4, Table S2).
We first analyzed similarity in age-related expression changes across datasets by calculating pairwise Spearman's correlation coefficients (ρ) among the β values ( Figure 1a). Development datasets showed a higher correlation (median ρ= 0.57) compared to aging datasets (median ρ=0.45). Although this difference was not significant (Permutation test p = 0.1, Figure S5a), weaker consistency during aging may reflect the stochastic nature of aging, causing higher noise in aging datasets. In addition, we observed a mostly negative correlation between aging and development (median rho = -0.04), consistent with our previous report of gene expression reversal, which involved the same datasets (Dönertaş et al., 2017).
The principal component analysis (PCA) of age-related expression changes (β) revealed distinct clusters of development and aging datasets ( Figure 1b). Moreover, aging datasets were more dispersed than development datasets (median pairwise Euclidean distance between PC1 and PC2 was 77 for aging and 21 for development), which may again reflect stochasticity in gene expression change during aging and can indicate more heterogeneity among different brain regions or datasets during aging than in development.
We next identified genes showing significant age-related expression change (FDR corrected p < 0.05), for development and aging datasets separately ( Figure 1c). Development datasets showed more significant changes compared to aging (Permutation test p = 0.003, Figure S5c), which may again indicate higher expression variability among individuals during aging. Moreover, the direction of change in development was mostly positive (14 with more positive and 5 with more negative), whereas in aging datasets, we observed more genes with a decrease in expression level (13 with more negative and 5 with no significant change, and 1 with an equal number of positive and negative changes).

Age-related change in gene expression heterogeneity
In order to assess age-related change in heterogeneity, we used the unexplained variance (residuals) from the linear model we constructed to calculate the change in gene expression level. For each gene in each dataset separately, we calculated Spearman's correlation coefficients (ρ) between the absolute value of residuals and age, irrespective of the gene having a significant change in expression (see Methods, Figure S2). We considered ρ values as a measure of heterogeneity change, where positive values mean an increase in heterogeneity with age (Table S2). Moreover, we repeated this approach using loess regression instead of a linear model between expression level and age. We confirmed the correlations between the change in heterogeneity based on linear model and loess regression were high ( Figure S14) but preferred to continue with the results based on the linear model as loess regression was observed to be more sensitive to the changes in sample sizes and parameters. Then, we asked if datasets show similar changes in heterogeneity by calculating pairwise Spearman's correlation across datasets (Figure 2a). Unlike the correlations among expression level changes, agerelated change in expression heterogeneity did not show a higher consistency during development. In fact, although not significant (permutation test p = 0.2, Figure S5b), the median value of the correlation coefficients was higher in aging (0.23), than development (0.11).
A principal component analysis (PCA) showed that heterogeneity change is also able to differentiate aging datasets from development ( Figure 2b). Similar to the pairwise correlations (Figure 2a), aging datasets clustered more closely than development datasets (median pairwise Euclidean distances between PC1 and PC2 are 41 and 44 for aging and development, respectively). Both observations imply more similar changes in heterogeneity during aging.
Using the p-values from Spearman's correlation between age and the absolute value of residuals for each gene, we then investigated the genes showing a significant change in heterogeneity during aging and development (FDR corrected p-value < 0.05). We found almost no significant change in heterogeneity during development, except for Colantuoni2011 dataset, for which we have high statistical power due to the large sample size. Aging datasets, on the other hand, showed more significant changes in heterogeneity (Permutation test p = 0.06, Figure S5d) and the majority of the genes tended to increase in heterogeneity ( Figure 2c). However, the genes showing a significant change did not overlap across aging datasets ( Figure S6). Since the significance of the changes is highly dependent on the sample size, instead of focusing on these changes, we utilized having multiple datasets and focused on shared trends across them, capturing weak but reproducible trends across multiple datasets.
Nevertheless, these analyses indicated relatively more consistent heterogeneity change among datasets in aging, compared to development, which may imply that heterogeneity change is one of the characteristics of aging (see Discussion).

Consistent increase in heterogeneity during aging
As our previous analyses suggested age-related changes in heterogeneity can differentiate development and aging and show more similarity in aging, we sought to characterize these changes.
The method that we used in this study uses the consistent changes in heterogeneity across datasets, instead of considering significant ones within individual datasets.
We first examined profiles of age-related heterogeneity change in aging and development. 18/19 aging datasets showed more increase than decrease in heterogeneity with age (median rho > 0), while the median heterogeneity change in one dataset was zero, meaning there is an equal number of genes with increase and decrease in heterogeneity. In development, on the other hand, only 5/19 datasets showed more increase in heterogeneity, while heterogeneity of remaining 14/19 datasets showed more decrease with age (median rho < 0) ( Figure 3a). The overall increase in age-related heterogeneity during aging was significantly higher than development (permutation test p<0.001, Figure 5e). We also checked if there is a relationship between the changes in heterogeneity during development and aging (e.g. if those genes that decrease in heterogeneity tend to increase in heterogeneity during aging) but did not find any significant trend ( Figure S15).
A potential explanation why we see different patterns of heterogeneity change with age in development and aging could be the accompanying changes in the expression levels, as it is challenging to disband dependence between the mean and variance. To address this possibility, we first examined calculated Spearman's correlation between the changes in heterogeneity and expression, for each dataset. Surprisingly, we saw an opposing profile for development and aging; while the change in heterogeneity and expression were positively correlated in development, they showed a negative correlation in aging ( Figure 3b).
Having observed both a tendency to increase and a higher consistency in heterogeneity change during aging, we next asked if particular genes become more heterogeneous consistently across datasets and how the numbers compare with development. We first calculated, for each gene, the number of datasets with an increase in heterogeneity, for development and aging separately ( Figure   3c). To calculate significance and expected consistency, while controlling for dataset dependence, we performed 1,000 random permutations of individuals' ages and re-calculated the heterogeneity changes (see Methods). Importantly, we only created random permutations for the heterogeneity change but not the gene expression changes. In development, there was no significant consistency in heterogeneity change in either increase or decrease. During aging, however, there was a significant shift toward heterogeneity increase, i.e. genes showed more than expected consistency toward heterogeneity increase across aging datasets (Figure 3c, lower panel). We identified 147 common genes with a significant increase in heterogeneity across all aging datasets (one-sided permutation test p < 0.001, Table S3). Based on our permutations, we estimated that 84/147 genes would be expected to have consistent increase just by chance, suggesting almost 60% false positives. In development, however, there was no significant consistency in heterogeneity change in either direction (increase or decrease). Nevertheless, comparing the consistency in aging and development, there was an apparent shift towards a consistent increase in aging -even if we cannot confidently report the genes that become significantly more heterogeneous with age across multiple datasets.
Low statistical power due to the small number of independent datasets is likely to contribute to the high false positive rate. The relationship between expression and heterogeneity change with age. Spearman correlation analysis was performed between age-related expression changes (β values) and age-related heterogeneity changes (rho values) of 11,137 common genes, separately for each dataset. Y-axis shows the correlation coefficient between expression and heterogeneity changes for each dataset. Spearman correlation analysis was performed between age-related expression changes (β values) and age-related heterogeneity changes (rho values) of 11,137 common genes, separately for each dataset. The dotted red line reflects no correlation between expression and heterogeneity. (c) Expected and observed consistency in the heterogeneity change across datasets in aging and development. There is a significant shift toward heterogeneity increase in aging (one-sided permutation test <10-7) (lower panel), while there is no significant consistency in either direction in development (upper panel).

Heterogeneity Trajectories
We next asked if there are specific patterns of heterogeneity change, e.g. increase only after a certain age. We used the genes with a consistent increase in heterogeneity with age, during aging (n = 147) to explore the trajectories of heterogeneity change. Genes grouped with k-means clustering showed multiple patterns in heterogeneity increase (Table S3). Three patterns are observed: i) genes in clusters 3 and 7 show noisy but a steady increase throughout aging, ii) genes in clusters 4, 5 and 8 show increase in early aging but slightly decrease after a certain age, revealing a reversal (up-down) pattern, and iii) the other genes increase in heterogeneity dramatically after the age of 60 (clusters 1, 2 and 6). We also analyzed the accompanying changes in mean expression levels for these clusters.
Except for cluster 1, which shows a decrease in expression level at around the age of 60 and then shows a dramatic increase, all clusters show a steady scaled mean expression level at around zero, i.e. different genes in a cluster show different patterns ( Figure S16).
We further tested the genes showing dramatic heterogeneity increase after the age of 60 (clusters 1, 2 and 6) for association with Alzheimer's Disease, as the disease incidence increases after 60 as well, however, found no evidence for such an association (see Figure S7). . The x-and y-axis show age (on fourth root scale) and heterogeneity levels, respectively. Mean heterogeneity change for the genes in each cluster was drawn by spline curves. The colors and line-types of curves specify different brain regions and data sources, respectively.

Functional analysis
To examine the functional associations of heterogeneity changes with age, we performed gene set enrichment analysis using KEGG pathways (Kanehisa, Sato, Furumichi, Morishima, & Tanabe, 2019), Gene Ontology (GO) categories (Ashburner et al., 2002; The Gene Ontology Consortium, 2019), Disease Ontology (DO) (Kibbe et al., 2015), Reactome pathways (Fabregat et al., 2018), Transcription Factor (TF) Targets (TRANSFAC) (Matys et al., 2003), and miRNA targets (MiRTarBase) (Chou et al., 2016). In particular, we rank ordered genes based on the number of datasets that show a consistent increase in heterogeneity and asked if the extremes of this distribution are associated with the gene sets that we analyzed. There was no significant enrichment for any of the categories for the consistent changes in development. The significantly enriched KEGG pathways for the genes that become consistently heterogeneous during aging included longevity regulating pathway, autophagy, mTOR signaling pathway and other pathways that are previously suggested to be important for aging (Figure 5a). Among the pathways listed in Figure 5a, only protein digestion and absorption, primary immunodeficiency, linoleic acid metabolism, and fat digestion and absorption pathways had negative enrichment score, meaning these pathways were significantly associated with the genes having the least number of datasets showing an increase. However, it is important to note that this does not mean these pathways have a decrease in heterogeneity as the distribution of consistent heterogeneity is skewed (Figure 3c, lower panel). We also calculated if the KEGG pathways that we identified are particularly enriched in any of the heterogeneity trajectories we identified. Although we lack the necessary power to test the associations statistically, we saw that i) group 1, which showed a stable increase in heterogeneity, is associated more with the metabolic pathways and mRNA surveillance pathway, ii) group 2, which showed first an increase and a slight decrease at later ages, is associated with axon guidance, mTOR signaling, and phospholipase D signaling pathways, and iii) group 3, which showed a dramatic increase after age of 60, is associated with autophagy, longevity regulating pathway and FoxO signaling pathways. The full list is available as Figure S8. x and y-axes show the number of datasets with a consistent increase and the density for each significant pathway, respectively. The dashed red line shows x=9.5, which is the middle point for 19 datasets, representing no tendency to increase or decrease. Values higher than 9.5, shown with red color, indicate an increase in heterogeneity while values lower than 9.5, shown with blue color, indicate a decrease in heterogeneity and the darkness show the consistency in change across datasets. The pathway scheme on the right exemplifies the distribution of the genes in a pathway. More detailed schemes for all significant pathways with the gene names are given as SI. (b) Correlation between the change in heterogeneity and number of transcriptional regulators, i.e. miRNA and transcription factors. Each point represents a dataset, and the color shows the data source. pvalues are calculated using a permutation test. The dashed line at y=0 shows zero correlation.
The distribution of consistent heterogeneity in development and aging also showed a clear difference.
The pathway scheme for the longevity regulating pathway, colored based on the number of datasets with a consistent increase, shows how particular genes compare between development and aging.
The visualizations for all significant pathways, including the gene names are given in the Supplementary Information. Although we focused on KEGG pathways here, other significantly enriched gene sets, including GO, Reactome, TF and miRNA sets are included in as Tables S4-11. In general, while the consistent changes in development did not show any enrichment (except for miRNAs, see Table S11), we detected a significant enrichment for the genes that become more heterogeneous with age during aging, with the exception that disease ontology terms were not significantly associated with the consistent changes in either development or aging. The gene sets included specific categories such as autophagy and synaptic functions as well as large functional categories such as regulation of transcription and translation processes, cytoskeleton or histone modifications. There were 30 significantly enriched Transcription Factors, including EGR and FOXO, and 99 miRNAs (see Table S9-10 for the full list). We also asked if the genes that become more heterogenous consistently across datasets are known aging-related genes, using GenAge Human gene set (Tacutu et al., 2018), but did not find a significant association ( Figure S9).
Apart from having specific regulators that affect the heterogeneity, we also asked if the total number of transcription factors or miRNAs regulating a gene might be related to the heterogeneity (Figure 5b).
We calculated the correlations between the total number of regulators and the heterogeneity changes while controlling for the expression changes in development and aging. Genes that show a decrease in expression first and increase during aging (down-up) did not show any significant association between the change in heterogeneity and the number of regulators. Genes that show a decrease in expression during ageing, irrespective of their expression during development (down-down and updown), showed a higher correlation between the change in heterogeneity and the number of regulators in aging, and was mostly positive in aging datasets, meaning genes with a higher number of regulators become more heterogeneous with age. Genes that showed an increase in expression throughout the lifespan (up-up) also had a higher correlation between the heterogeneity and the number of miRNAs in aging, but this trend is not observed for transcription factors.
Johnson & Dong, et al. previously compiled a list of traits that are age-related and been sufficiently tested for genome-wide associations (Johnson, Dong, Vijg, & Suh, 2015). Using the genetic associations for those traits in GWAS Catalog, we tested if there are significantly enriched traits for the consistent changes in heterogeneity during aging (Table S12). Although there was no significant enrichment, all these age-related terms had positive enrichment scores, i.e. they all tended to include genes that consistently become more heterogeneous with age during aging.

Discussion
Aging is characterized by a gradual decrease in the ability to maintain homeostatic processes which leads to functional decline, age-related diseases, and eventually to death. This age-related deterioration, however, is thought as not a result of expression changes in a few individual genes, but rather as a consequence of an age-related alteration of the whole genome, which could be a result of an accumulation of both epigenetic and genetic errors in a stochastic manner (Enge et al., 2017;Vijg, 2004). This stochastic nature of aging hinders the identification of the age-related change patterns in gene expression from a single dataset with a limited number of samples.
In this study, we examined 19 gene expression datasets compiled from three independent studies to identify the changes in gene expression heterogeneity with age. We implemented a regression-based method and identified genes showing a consistent change in heterogeneity with age. As we did not observe a substantial significant age-related heterogeneity change in most of the datasets, which could be due to lack of power due to the small sample sizes, we took advantage of a meta-analysis approach and focused on consistent signals among datasets, irrespective of their effect sizes and significance. Although this approach will fail to capture region-specific patterns, it includes genes that fail to pass the significance threshold due to insufficient power. Furthermore, we expected our method to be robust to noise and confounding effects within individual datasets.
Analyzing age-related gene expression changes, we first identified that more significant and concordant changes during development than in aging. Additionally, genes showing significant change during aging tended to decrease in expression (Figure 1). While, more consistent expression change during development may be an indication of active regulatory processes that control neural development, accumulation of detrimental effects during aging may disturb this regulation and result in increased noise. As previously suggested, an accumulation of stochastic effects during aging may cause a decrease in expression levels, leading to age-related phenotypes (Lu et al., 2004). The significant age-related decrease in expression during aging can also be explained within this framework. Overall, initial analysis of gene expression changes in aging and development suggested increased transcriptional noise during aging.
We next focused on age-related heterogeneity change and found a significant increase in age-related heterogeneity during aging, compared to development. Notably, increased heterogeneity is not limited to individual brain regions, but a consistent pattern across different brain regions during aging. Our results support the view that aging is a result of stochastic evolutionary and cellular dynamics, reflecting the heterogeneous nature of the aging. We found that age-related heterogeneity change is more consistent among aging datasets which may reflect an underlying systemic mechanism. Further, more genes showed significant heterogeneity change in aging, and the majority of these genes showing significant heterogeneity change in aging tended to increase in heterogeneity. Importantly, our results indicate increased heterogeneity among individuals, rather than cells, as we used bulk RNA expression datasets where each value is an average for the tissue. Stochastic effects resulting in increased cell-to-cell variation might be evened out in bulk RNA expression data and may result in steady heterogeneity level in tissue level. Increased heterogeneity at the tissue level can be explained by several biological factors: (1) Environmental or physiological changes, causing stochastic genetic and epigenetic errors, resulting in increased heterogeneity among individuals; (2) Stochastic effects at the cellular level could be passed on to other cells via cellular interactions; (3) The interaction between age and individual's genotype or somatic mutations may cause inter-individual heterogeneity; (4) A mechanism causing increased heterogeneity can convergently create similar states across cells. Another factor that may explain age-related heterogeneity change may be related to change in cell-type contributions, which can be expressed as a change in heterogeneity.
What are the possible molecular mechanisms causing an increase in age-related heterogeneity during aging? To address this question, we first confirmed that it was not a result of low statistical power ( Figure S1) or a technical artifact (Figure 3b, S10, S17). Specifically, we tested whether increased heterogeneity during aging can be a result of the mean-variance relationship, but we found no significant effect that can confound our results. In fact, the mean-variance relationship in development and aging showed opposing profiles. We further analyzed this by grouping genes based on their expression in development and aging ( Figure S10). The group that showed the biggest difference between development and aging was those that decrease in expression, which could suggest that the decrease in development are more coordinated and well-regulated whereas the decrease in aging occurs due to stochastic errors. Another potential confounder is the post-mortem interval (PMI), which is the time between death and sample collection. Since we do not have this data for all datasets we analyzed, we could not account for this in our model. However, using the list of genes previously suggested as associated with PMI (Zhu, Wang, Yin, & Yang, 2017), we checked if the consistency among aging datasets could be driven by PMI. Only 2 PMI-associated genes were among the 147 that become consistently heterogeneous, and the distribution also suggested there is no significant relationship ( Figure S17).
It was previously suggested that accumulation of somatic mutations with age, which is a stochastic process, may be associated with transcriptome instability, resulting in functional and cognitive decline (Lodato et al., 2018;Lombard et al., 2005;Lu et al., 2004;Vijg, 2004). In an earlier study, we also proposed that increased heterogeneity among individuals may be a result of an accumulation of somatic mutations (Somel et al., 2006). Although we do not have access to somatic mutation data and thus cannot test age-related mutation accumulation in this study, our results can be discussed within this framework. If mutation accumulation also results in a decrease in expression as previously suggested (Lu et al., 2004), we can also explain the significant age-related decrease in expression during aging in this context. However, it was also shown that genome-wide substitutions in single cells are not so common as to influence genome stability and cause transcriptional noise at the cellular level (Enge et al., 2017;Lodato et al., 2015). An alternative explanation may be related to epigenetic regulation (Booth & Brunet, 2016). Variation in chromatin modifications was suggested to be increased with age, which was proposed to be a major mechanism that results in increased heterogeneity with age (Cheung et al., 2018). Another possible mechanism that may contribute to increased heterogeneity may include transcriptional regulation that is considered to be inherently stochastic (Maheshri & O'Shea, 2007). Several studies found that variation in gene expression is positively correlated with the number of TFs controlling gene's regulation (Gustavo Valadares Barroso, Natasa Puzovic, 2018). One proposed transcription regulation mechanism includes nonspecific DNA binding and sliding among DNA during target-search of TFs (Sharon et al., 2014).
Authors of the study further suggested that transcription factor target search may contribute gene expression noise, and therefore explain the effect of the number of binding sites on expression noise.
Our results also suggest that genes with a higher number of regulators and decrease in expression become more heterogeneous, and this correlation is higher in aging. Together with these studies, our results support that decrease in the regulation of transcriptional machinery may be associated with increased heterogeneity observed in aging. Nevertheless, it worth noting that the relationship between the number of regulators and heterogeneity does not necessarily reflect a functional relationship but rather could be a consequence of research bias. Since we used regulatory information that is experimentally determined, it is possible that genes that we found to be increased in heterogeneity during aging may be studied more in the context of transcription regulation and are associated with higher number of regulators.
One important limitation of our study relates to microarray-based data that we analyzed. Since gene expression levels measured by microarray do not reflect an absolute abundance of mRNAs, but rather are relative expression levels, we were able to examine relative change in gene expression. A recent study, analyzing single-cell RNA Sequencing data from aging Drosophila brain, identified an age-related decline in total mRNA abundance (Davie et al., 2018). It is also suggested that, in microarray studies, genes with lower expression levels tend to have higher variance (Aris et al., 2004). In this context, whether the change in heterogeneity is a result of the total mRNA decay is an important question. As an attempt to see if the age-related increase in heterogeneity is dependent on the technology used to generate data, we repeated the initial analysis using the expression datasets for the human brain, generated by GTEx Consortium (GTEx Consortium, 2015) (Figure S11-13). Nine out of thirteen datasets confirmed that there is more increase in heterogeneity at the transcriptome level, while the remaining four datasets were from BA24, cerebellar hemisphere, cerebellum and substantia nigra regions. The change in expression and heterogeneity, on the other hand, were positively correlated and the correlation was much higher in the magnitude. Unfortunately, expression level and variation in RNA-seq is challenging to disentangle. Thus, the biological relevance of the relationship between the age-related change in expression and heterogeneity still awaits to be discovered through detailed experimental and computational approaches. Nevertheless, RNA-seq analysis also suggests an overall increase in age-related heterogeneity increase.
Although a growing body of evidence suggests that an increase in gene expression heterogeneity accompanies aging, its functional role and significance in aging are still open questions. While it seems reasonable to assume increased heterogeneity may result in functional decline (Somel et al., 2006), there are also molecular mechanisms that reduce the effects of noise in transcript levels, such as nuclear retention of mRNA that can buffer cytoplasmic transcript level (Bahar Halpern et al., 2015), and also it was previously suggested that variability in mRNA levels might be advantageous in unicellular organisms as bet-hedging strategy, implying that variable expression of genes may provide evolutionary advantage to organisms via phenotypic plasticity (Chalancon et al., 2012;Eldar & Elowitz, 2010). Our gene set enrichment analysis revealed a set of significantly enriched pathways that are known to modulate aging, including longevity regulating pathway, autophagy, mTOR signaling pathway (Figure 5a.). Further, GO terms shared among genes showing increased heterogeneity during aging include some previously identified common pathways in aging and agerelated diseases. FOXO transcription factors, regulating stress resistance, metabolism, cell cycle arrest, apoptosis, also found to be significantly enriched. We have also tested if these genes are associated with age-related diseases through GWAS, and although not significant, we found a positive association with all age-related traits defined in Johnson & Dong et al. 2015. Overall, these results indicate the effect of heterogeneity on pathways that modulate aging and may reflect the significance of increased heterogeneity in aging. Importantly, we identified genes that are enriched in terms related to neural and synaptic functions, such as axon guidance, neuron to neuron synapse, postsynaptic specialization. Additionally, significantly enriched Transcription Factors includes early growth response (EGF), which is known to be regulating the expression of many genes involved in synaptic homeostasis and plasticity (Gashler & Sukhatme, 1995;Li, Carter, Gao, Whitehead, & Tourtellotte, 2005;Pagel & Deindl, 2011). These results may reflect the role of increased heterogeneity in synaptic dysfunction observed in the mammalian brain, which is considered to be a major factor in age-related cognitive decline (Morrison & Baxter, 2012).
In summary, performing a meta-analysis of transcriptome data from diverse brain regions we found a significant increase in gene expression heterogeneity during aging, compared to development.
Increased heterogeneity was a consistent pattern among diverse brain regions in aging, while no significant consistency observed across development datasets. Our results support the view of aging as a result of stochastic dynamics, unlike development. We also reported that genes showing a consistent increase in heterogeneity during aging are involved pathways that are important for aging and neural function. Therefore, our results suggest that the increase in heterogeneity is one of the characteristics of brain aging and is unlikely to be only driven by the passage of time as we observe different trends before puberty.

Dataset collection
Microarray datasets: Raw data used in this study were retrieved from the NCBI Gene Expression Omnibus (GEO) from three different sources (Table S1). All three datasets consist of human brain gene expression data generated through microarray experiment. In total, we obtained 1017 samples from 298 individuals, spanning the whole lifespan with age ranging from 0 to 98 year ( Figure S1).

RNA-seq dataset:
We used the transcriptome data generated by GTEx consortium (v6p) (GTEx Consortium, 2015). We only used the samples with a death circumstance of 1 (violent and fast deaths due to an accident) and 2 (fast death of natural causes) on Hardy Scale so that we do not include any samples with illnesses. As we focus only on the brain, we used all 13 brain tissues. As a result, we analyzed 623 samples, samples from 99 individuals.

Separating datasets as aging and development datasets:
To differentiate changes in gene expression heterogeneity during aging with regard to development, we used the age of 20 to separate preadulthood from adulthood. It was shown that the age of 20 corresponds to the first age of reproduction in human societies (Walker et al., 2006). Structural changes after the age of 20 in the human brain were previously linked to age-related phenotypes, specifically neuronal shrinkage and a decline in total length of myelinated fibers (Sowell et al., 2004). Earlier studies examining age-related gene expression changes in different brain regions also showed a global change in gene expression patterns after the age of 20 (Colantuoni et al., 2011;Dönertaş et al., 2017;Somel et al., 2010). Thus, consistent with these studies, we separated datasets using the age of 20 into development (0 to 20 years of age, n = 441) and aging (20 to 98 years of age, n=569).

Preprocessing
Microarray datasets: RMA correction (using 'oligo' library in R) and log2 transformation were applied to Somel2011 and Kang2011 datasets. For Colantuoni2011 dataset, we used the preprocessed data deposited in GEO, which was loess normalized, as there was no public R package to analyze the raw data. We quantile normalized all datasets using 'preprocessCore' library in R. Since our analysis focused on consistent patterns across datasets, we minimized the effects of confounding factors through quantile normalization, and we considered consistent results as potentially a biological signal.
We also applied an additional correction procedure for Somel2011 datasets, in which there was a batch effect influencing the expression levels, as follows: for each probeset (1) calculate mean expression (M), (2) scale each batch separately, (3) add M to each value. We excluded outliers given in Table S1, through a visual inspection of the first two principal components for the probeset expression levels. We mapped probeset ids to Ensembl gene IDs 1) using Ensembl database, through the 'biomaRt' library in R for Somel2011 dataset, 2) using the GPL file deposited in GEO for Kang2011 as probeset IDs were not complete in Ensembl, and 3) using the Entrez gene ids in the GPL file deposited in GEO for Colantuoni2011 dataset and converting them the Ensembl gene ids using ensemble database, through the "biomaRt" library in R. Lastly, we scaled expression levels for genes by 'scale' function in R. Age values of each dataset were converted to fourth root of age (in days) to ensure the relationship between age and expression is linear.

RNA-Seq dataset:
The genes with median RPKM value of 0 are excluded from data. The RPKM values provided in the GTEx data are log2 transformed and quantile-normalized. Similar to the microarray data, we excluded the outliers based on the visual inspection of the first and second principal components (Table S1). As ages are given as an interval in GTEx, we used the mean of values in our analysis.

Age-related expression change
We used linear regression to assess the relationship between age and gene expression. The model used in the analysis is: where Y i is the scaled log2 expression level for the i th gene, A is the fourth root of age and ε i is the residual. We performed the analysis for each dataset and considered β 1 value as a measure of change in expression. P-values obtained from the model were corrected for multiple testing according to Benjamini & Hochberg procedure by using 'p.adjust' function in R.

Age-related heterogeneity change
In order to quantify the age-related change in gene expression heterogeneity, we calculated Spearman's correlation coefficient (ρ). The correlations were calculated between the absolute values of residuals obtained from equation (1)

Principal Component Analysis
We conducted principal component analysis on both age-related changes in expression (β) and heterogeneity (ρ). We followed a similar procedure for both analyses, in which we used 'prcomp' function in R. Analysis was performed on a matrix containing β values (for the change in expression level) and ρ values (for the change in heterogeneity), for 11,137 commonly expressed genes for all 38 development and aging datasets. The change in expression (β) or heterogeneity (ρ) values were scaled for each dataset before calculating principal components. First two principal components that explain the variance between variables the most are used to examine the patterns of aging and development datasets.

Permutation test
We performed a permutation test, taking non-independence of Somel2011 and Kang2011 datasets into account. These datasets include multiple samples from the same individuals for different brain regions. We first randomly permuted ages among individuals, not samples, for 1000 times in each data source, using 'sample' function in R. Next, we assigned ages of individuals to corresponding samples and calculated age-related expression and heterogeneity change for each dataset, corresponding to brain regions. For the tests related to the changes in gene expression with age, we used a linear model between gene expression levels and the randomized ages. However, for the tests related to the changes in heterogeneity with age, we measured the correlation between the randomized ages and the absolute value of residuals from the linear model that is between expression levels and non-randomized ages for each gene. In this way, we preserved the relationship between age and expression, and we were able to ensure that our regression model was

Clustering
We used k-means algorithm ('kmeans' function in R) to cluster genes according to their heterogeneity profiles. We first subset the heterogeneity levels (absolute value of the residuals from equation (1)) to include only the genes that show a consistent increase with age and then scaled the heterogeneity levels, so that each gene has a mean heterogeneity level of zero and standard deviation of 1. Since the number of samples in each dataset is different, just running k-means on the combined dataset would not equally represent all datasets. Thus, we first calculated the spline curves for scaled heterogeneity levels for each gene in each dataset (using 'smooth.spline' function in R, with three degrees of freedom). We interpolate at 11 (the smallest sample size) equally distant age points within each dataset. Then we use the combined interpolated values to run k-means algorithm with k=8.
To test association of the clusters with Alzheimer's Disease, we retrieved overall AD association scores of the 147 consistent genes (n = 40) from the Open Targets Platform (Koscielny et al., 2017).

Functional Analysis
We used "clusterProfiler" package in R to run Gene Set Enrichment Analysis, using Gene Ontology (GO) Biological Process (BP), GO Molecular Function (MF), GO Cellular Compartment (CC), Reactome, Disease Ontology (DO), and KEGG Pathways. We performed GSEA on all gene sets with a size between 5 and 500, and we corrected the resulting p values with Benjamini Hochberg correction method. We used the number of datasets with a consistent increase to run GSEA so that we can test if the genes with a consistent increase or decrease in their expression are associated with specific functions. We repeated the same analysis using the heterogeneity change levels (Spearman's rho between the absolute value of residuals and age) for each dataset to confirm the gene sets are indeed associated with the increase/decrease in heterogeneity. We visualized the KEGG pathways using 'KEGGgraph' library in R and colored the genes by the number of datasets that show an increase.
We also performed an enrichment analysis of the transcription factors and miRNA to test if specific TFs or miRNAs regulate the genes that become more heterogeneous consistently. We collected gene-regulator association information using Harmonizome database (Rouillard et al., 2016), "MiRTarBase microRNA Targets" (12086 genes, 596 miRNAs) and "TRANSFAC Curated Transcription Factor Targets" (13216 genes, 201 TFs) sets. We used 'fgsea' package in R, which allows GSEA on a custom gene set. We tested the association for each regulator with at least 10 and at most 500 targets. Moreover, we tested if the number of regulators is associated with the change in heterogeneity. We first calculated the correlation between the heterogeneity change with age (or the number of datasets with an increase in heterogeneity) and the number of TFs or miRNAs regulating that gene, for aging and development separately and accounting for the direction of expression changes in these periods (i.e. separating genes into down-down, down-up, up-down, and up-up categories based on their expression in development and aging). To test the difference in the correlations between aging and development, we used 1,000 random permutations of the number of TFs. For each permutation, we randomized the number of TFs and calculated the correlation between heterogeneity change (or the number of datasets with an increase in heterogeneity) and the randomized numbers. We then calculate the percentage of datasets where aging has a higher correlation than development. Using the distribution of percentages, we test if the observed value is expected by chance.

Software
All analysis is done using R and the code to calculate heterogeneity changes with age is available as an R package 'hetAge', which is documented in https://mdonertas.github.io/hetAge/. "ggplot2" (Wickham, 2017) and "ggpubr" (Kassambara, 2018) R libraries are used for the visualization purposes.