Abstract
Since chronological age is not a complete and accurate indicator of organism aging, the concept of biological age has emerged as a well-accepted way to quantify the aging process in humans and laboratory animals. In this study, we performed a systematic statistical evaluation of the relationships between locomotor activity and biological age, mortality risk, and frailty using human physical activity records from the 2003-2006 National Health and Nutrition Examination Survey (NHANES) and UK BioBank (UKBB) databases. These records are from subjects ranging from 5 to 85 years old and include 7-day long continuous tracks of activity provided by wearable monitors as well as data for a comprehensive set of clinical parameters, lifestyle information and death records, thus enabling quantitative assessment of frailty and mortality. We proposed a statistical description of the locomotor activity tracks and transformed the provided time series into vectors representing individual physiological states for each participant. Using this data, we performed an unsupervised multivariate analysis and observed development and aging as a continuous trajectory consisting of distinct phases, each corresponding to subsequent human life stages. Therefore, we suggest the distance measured along this trajectory as a definition of the biological age. Consistent with the Gompertz law, mortality, estimated with the help of a proportional hazard model, was found to be an exponential function of biological age as quantified herein. However, we observed that the significant contribution of clinical frailty to mortality risk can be independent of biological age. We used the biological age and mortality models to show that some lifestyle variables, such as smoking, produce a reversible increase in all-cause mortality without a significant effect on biological age. In contrast, medical conditions, such as type 2 diabetes mellitus (T2DM) or hypertension, are associated with significant aging acceleration and a corresponding increase in mortality as well. The results of this work demonstrate that significant information relevant to aging can be extracted from human locomotor activity data and highlight the opportunity provided by explosive deployment of wearable sensors to use such information to encourage lifestyle modifications and clinical development of therapeutic interventions against the aging process.
I. INTRODUCTION
Accurate and non-invasive quantification of the aging process is essential for successful translation of basic research in the field of aging into possible future clinical practice. Most studies of aging in model organisms, such as C. elegans, S. cerevisiae, and D. melanogaster, involve direct measurements of lifespan to characterize pro-or anti-aging effects of gene variants, nutrition conditions or experimental therapies. In longer-lived animals, such as mammals, and especially in humans, analysis of longevity itself is generally not practical since it would require long and exceptionally well-controlled experiments. Many age-related physiological changes are similar in mice and humans and hence can be described by a universal frailty index, which was recently proposed as a promising pre-clinical indicator to quantify aging [1]. Other useful metrics of aging include health span, maximum lifespan, and biological age [2], along with less commonly employed properties, rooted in aging theory phenomenology, such as Strehler and Mildvans concept of vitality [3]. It remains to be seen, however, if and how any of these measures of aging are related to each other in human populations and whether the same relations hold and hence can be reliably examined in animal models.
Aging is a continuous phenotypic change. Therefore, to answer questions regarding the age-dependencies of physiological variables, it is appropriate to apply the language and tools of dynamical systems theory to relate changes in physiological parameters to organism-level properties, such as mortality and lifespan. However, investigations of this kind are complicated by the high intrinsic dimensionality of high-throughput biological data, overlaid by batch effects and prohibitive costs of large-scale human studies. This situation is improving due to the recent explosive deployment of web-connected wearable devices that provide personal digitized health records, including measurements of locomotion, heart rate, skin temperature, etc. It is projected that 400M such devices will be in use worldwide by 2020 [4]. This provides an unprecedented opportunity to monitor physiological changes in large human populations [5] and improve our understanding of how such changes are related to health and lifespan [6] and hence present a rich and yet an untapped source of large scale human data for aging research.
In this study, we performed a systematic statistical evaluation of the relationships between locomotor activity and biological age, mortality risk, and frailty using human physical activity records from the 2003-2006 National Health and Nutrition Examination Survey (NHANES) and UK Bio Bank (UKBB) databases. These large databases contain data for 7-day long continuous tracks of activity provided by wearable monitors as well as health and lifestyle information, death records, and data for a comprehensive set of clinical parameters sufficient to compute a frailty index for every participant. We proposed a statistical description of 7-day long locomotor activity tracks and performed an unsupervised principal components analysis of the study participants physical activity representations. This revealed human development as a continuous process following a trajectory in the physical activity parameters space. According to the Gompertz law [7], the mortality rate in human populations increases exponentially starting at age 40. The dynamics of the physiological variables along the corresponding trajectory segment is approximately a linear function of age. Therefore, we proposed the distance measured along the path as the natural definition of biological age or bioage.
Although bioage was found to account for most of the variation in mortality observed in our study, there was a notable contribution to mortality that was independent of biological age and associated with a high frailty index score. Statistically, we observed an increasing fraction of such participants starting from approximately the age of the middle-age to old-age transition point on the aging trajectory. We used models of biological age and mortality to show that some lifestyle choices, such as smoking, produce a reversible increase in all-causes mortality without an appreciable effect on biological age. On the other hand, some medical conditions, such as type 2 diabetes mellitus (T2DM) and hypertension, were associated with significant aging acceleration (elevated biological age in the chronological age-matched cohorts).
Overall, the results of this work provide an indication of the power of the vast amounts of activity-related and physiological data that can be easily obtained via simple wearable monitoring devices. In particular, with respect to aging, our statistical analysis of locomotor activity data from such devices has revealed a new physical activity-based descriptor of biological age (bioage) and suggests that mortality risk is determined by both bioage-related and bioage-independent components, which may be differentially amenable to modification by lifestyle changes or therapeutics.
II. RESULTS
A. Quantification of Human Locomotor Activity
For this study, we used two large-scale repositories of wearable accelerometer track records made available by the 2003-2006 National Health and Nutrition Examination Survey (NHANES, 12053 subjects, age range 5 – 85 years old) and UK Biobank (UKBB, 95609 subjects, age range 45-75 years old). For both NHANES and UKBB, a 7-day long continuous activity track was collected for each subject, as well as data for a comprehensive set of clinical variables, and a record of death within up to ten years following activity monitoring. Human physical activity is usually collected in the form of time series of direct sensor readouts, such a 3D accelerations, sampled at a specified frequency. Instead, NHANES provides sequences of transformed quantities such as the number of steps or activity counts per minute. Figure 1A shows plots of two representative 2-day long activity count tracks from a younger (age 43) individual and an older (age 65) individual, which we selected by the same level of total activity. Nevertheless, their patterns of activity were qualitatively different. Transitions between the states corresponding to different levels of physical activity appeared to be random. Therefore, instead of trying to approximate the precise shape of the activity time series, we chose to apply a probabilistic model, specifically Markov chain approximation, a simple yet powerful tool from stochastic processes theory apparatus (see [8] for a review of applications, including stochastic modelling of biological systems).
Statistical description of the participants’ activity was based on the concept that any future state of a Markov chain is completely determined by its current state and the probabilities of transitions between different states. Therefore, we divided physical activity measurements ranges into eight discrete bins representing activity states (numbered from 1 to 8 and corresponding to increasing activity levels, see histograms to the left of the activity tracks in Figure 1A) and counted the transitions between every consecutive pair of states along the track. The number of transitions from state j to i was then normalized to the number of times that state j was encountered along the activity record. This yielded the kinetic transition rate, i.e. the probability of a stochastic “jump” from state j to state i per unit time. We then combined these transition rates into the transition matrix (TM) elements (shown as bins in heatmaps to the right of the activity tracks in Figure 1A), which is thus a complete description of the underlying Markov chain model (see Materials and Methods and Figure 1A additional details).
On a more technical level, the TM element values can be related to the organism’s responses to external perturbations on physiological time scales. To make this connection, we checked explicitly that the TM elements satisfied a detailed balance condition [9] and hence the TM eigenvalues represent inverse equilibration times. Using the relation between the autocorrelation function of the time series and the Markov chain TM from Appendix A, we reconstructed a Power Spectrum Density (PSD) display (Figure 1B) of the physical activity signals for the same two study participants shown in Figure 1A. Figure 1B also shows the discrete sets of TM eigenvalues (the TM spectra) for the same individuals. The crossover frequency on the PSD plots coincides with the lowest TM eigenvalue, corresponding to a time scale in the range of tens of minutes. The time scale corresponding to the eigenvalue is considerably longer than any period associated with body motion and, therefore, should reflect the organism’s physiological state. The observed decrease of the limiting time scale signifies a reduction of temporal correlations of physical activity, commonly observed in human [10] and animal [11] studies of aging and age-associated neurological and mental disorders, including Alzheimer’s disease [12], depression [13], and bipolar disorder [14].
A transition matrix is a conceptually simple and physically intuitive way to illustrate the aggregate characteristics of physical activity-time series. TM elements are kinetic transition rates and the spectral properties of TM are directly related to the organism’s responses at physiological time scales. Therefore, TM are useful descriptors containing information for a set of parameters representing human physiological state vectors in a form that can be used alone (as in this study) or along with other phys-iological metrics in human health-related applications.
B. Manifestations of aging in Locomotor Activity
According to our assumptions described above, TM provide an aggregate representation of each study participant’s activity state during the time of monitoring. To reveal the intrinsic structure of the physical activity data for the entire NHANES study population, we used Principal Components Analysis (PCA, [15]). The results of this analysis are shown in Figure 2A, with the position of each point defined by the average PC score for a cohort of age-matched men or women separated by one year of age. The plot of PC1 vs. PC2 vs. PC3 in Figure 2A clearly shows that the activity state vector evolves over the course of human lifespan. The aging trajectory is continuous and yet is visually broken into distinct phases that are easily recognizable as life stages (chronological age ranges). Following a previously established age classification [16], we divided the trajectory into four segments covering childhood and adolescence (younger than 16 years old), followed by early adulthood (16-35 y.o.), middle (35-65 y.o.) and older ages (older than 65 y.o.). Each life stage has a characteristic range of average PC values that is essentially the same for men and women. These results suggest a universal character of changes associated with each period in development and aging.
According to the Gompertz law, mortality risk in human populations increases exponentially in mid-life, starting from the age of about 40 [7, 17]. The corresponding age range boundary between the development trajectory fragments II and III is visible in Figure 2A and can be identified with the early-adulthood to middle-age transition. The subsequent cross-over between middle-age and older-age occurs in the vicinity of the average human lifespan. To better focus our study on aging, we limited all further analysis to participants older than 40 years old. In this restricted dataset, aging was observed to manifest itself as the evolution of the participants’ physiological state along the PC1 direction, which by definition is the direction of the most variance in the data. PC1 scores in this group of participants were strongly associated with chronological age (Pearson’s correlation coefficient r = 0.64, see Figure 2B). PCA is an unsuper-vised learning technique and, therefore, can be used to infer functional dependence of physiological state variables on age without a prior hypothesis. Therefore, we propose the first PC score, PC1, as a natural definition of biological age, or bioage, a quantitative measure of the aging process in the most relevant age range. This argument is supported by our observation that no other PCs show any significant association with age (Figure 2B). A more careful examination of Figure 2A reveals that variation of PC2 and PC3 scores is associated with the transition between early adulthood and middle age stages.
Biological age defined as PC1 was found to increase approximately linearly with chronological age for NHANES participants over 40 years old (Figure 2B). The variance in biological age (PC1 distribution width squared in age-and sex-matched cohorts) in this population increased linearly with age (inset in Figure 2B). The latter result is a hallmark of diffusion, suggesting that biological age not only drifts in time but also undergoes a random walk under the influence of stochastic forces (see Discussion for more details).
Any appreciable difference in the rate (change in biological age per unit of time) or direction of aging among study participants would lead to faster growth of the bioage distribution width as a function of chronological age and, therefore, cannot be supported by the data. To highlight the apparent stability of the aging trajectory, we assessed the effect of a common age-related disease, type 2 diabetes mellitus (T2DM), on biological age as defined by PC1. The average and range (standard deviation) of biological age in age-matched cohorts of healthy subjects and T2DM patients are compared in Figure 2B. Generally, the T2DM patients appeared to be older, biological age-wise, than their chronological age-matched peers. The difference between the healthy and T2DM groups did not significantly change with age, which suggests that the disease does not modify the rate of aging.
C. Biological Age and the Gompertz law of Mortality
To examine how biological age (again, defined as PC1 from the analysis of locomotor activity data described above) relates to lifespan and mortality, we first confirmed that the mortality rate indicated by the NHANES death register is an exponential function of chronological age as predicted by the Gompertz law. Next, since the state vector dynamics in the physical activity parameters space was approximately linear with age, we were able to naturally model the exponentially increasing individual risks of death by using a log-linear model, such as a variant of the Cox proportional hazards model [19]. We used the parametric Cox-Gompertz proportional hazard model in the form of maximal likelihood minimization adapted from [20], since this model allowed us to obtain explicit mortality predictions rather than implicit proportional hazards values, which are uncertain up to an unknown baseline hazard function. To produce a robust model, we used a batch mode stochastic gradient descent with cross-validation (see Materials and Methods section V C for details). The final model yielded the Gompertz exponent Γ ≈ 0.08 ± 0.01 y−1, which is very close to the commonly accepted value Γ ≈ 0.085y−1 corresponding to the mortality rate doubling time of 8 years [21]. Mortality risk was determined in this way for every NHANES study participant and found to increase exponentially as a function of biological age (Figure 3A; determination coefficient of the corresponding log-linear model R2 = 0.81). This further supports our identification of the PC1 score as a quantitative measure of aging.
Variation in biological age explains most of the variation in mortality observed in the NHANES study (Figure 3A). However, there was also a significant spread in the predicted log-mortality values in biological age-matched cohorts, amounting to as much as 25% of the total difference in the population. Given the value of the Gompertz exponent of the mortality model, the observed differences correspond to up to ≈ 10 years of remaining lifespan. To understand the nature of the unexplained difference in mortality, we hypothesized that the log-mortality predictions into a sum of two parts, one of which is associated with (proportional to) biological age and one of which is statistically independent of biological age.
Next, we investigated the relation between biological age, the risk of death, and another commonly employed descriptor of aging: a clinical Frailty Index (FI). FI is a composite measure of the health of an individual calculated as the proportion of health deficits present in an individual out of the total number of age-related health variables considered [22]. Using a FI variant adapted for use with the NHANES meta-data from [18], we computed FI values for every study participant and categorized them into one of the three FI ranges: “non-frail”, “vulnerable”, and “frail/most frail”, as defined in the same source. Figure 3B shows boxplots of the sex-and the chronological age-adjusted bioage distributions in cohorts of NHANES participants, split according to their FI category. These data revealed a statistically significant correspondence between aging acceleration, i.e. higher values for biological age after adjustment for sex and chronological age, and FI. The increases in aging acceleration between the non-frail and vulnerable groups and between the vulnerable and frail/most frail groups were both significant.
For comparison, in Figure 3C, we provide box-plots of the sex-adjusted distributions of the log-mortality component that is biological age-independent for the same three cohorts of NHANES participants. This shows that bioage-independent mortality is similar between the non-frail and vulnerable groups, see Figure 3C. At the same time, the bioage-independent component is much significantly higher in the highest FI range (frail/most frail). Therefore, except in cases with high clinical FI values, biological age serves as an indicator of frailty as well as mortality (see above). In this study FI increases gradually first and then reaches its maximum value, at approximately the age corresponding to the transition between middle-and old-age (development stages III and IV in Figure 2A), marking the end of health span (we will focus on this in Discussion). These observations suggest that the bioage-independent contribution to mortality can be attributed to most frail phenotype.
D. Association of unhealthy lifestyles and medical conditions with increased locomotor activity-based biological age and mortality risk
In the next stage of our study, we evaluated how biological age and mortality (biological age-dependent and-independent components) are influenced by known hazardous behaviours/lifestyle choices or medical conditions. As shown in Figures 4A and 4B, we constructed volcano plots to illustrate statistical associations between the predicted log-mortality components for NHANES participants and the NHANES 2003-2006 Questionnaire, Demographics and Examination lifestyles and medical conditions labels.
Figure 4A and its legend lists the lifestyles and medical conditions that showed statistically significant associations with biological aging acceleration, defined as the elevated bioage in a group after adjustment for sex and chronological age (see e.g. [23]). The identified associations included high blood pressure (A1), C-reactive protein level (B1), clinical parameters associated with diabetes [the diagnosis itself (E1), taking insulin (E2), elevated levels of glycated haemoglobin (G1) and blood serum glucose (J2, J5)], along with self-reported weight (K1-K3) and nutrition status information (F1-F2). The association of clinical parameters related to diabetes with aging acceleration is consistent with the generally higher locomotor activity-based biological age (PC1 value) of NHANES participants diagnosed with T2DM compared to healthy controls across age cohorts (Figure 2B).
In a similar manner, the volcano plot shown in Figure 4B shows lifestyles and medical conditions that demonstrated statistically significant associations with the bioage-independent log-mortality component. The most striking association here was with parameters related to smoking (F1-F3), including elevated blood levels of cotinine (B1), the predominant metabolite of nicotine, which is used as a biomarker for exposure to tobacco smoke. Associations with general (C1) and physical health (C2) conditions, along with mental health-related parameters (C3, D1) are in agreement with our association of bioage-independent hazard with frailty (see above).
Comparison of the predicted log-mortality component distributions between smokers and non-smokers, after adjustment for age and sex, showed little effect on biological age (i.e., no strong evidence of aging acceleration associated with smoking, see Figure 4C). In contrast, the difference in the bioage-independent component of log-mortality between smokers and non-smokers was highly significant (Figure 4D). In contrast, the difference in the bioage-independent component of log-mortality between smokers and non-smokers was highly significant (Figure 4D). Study participants who smoked in the past but quit smoking, demonstrate an significant improvement in the bioage-independent component of log-mortality compared to current smokers (Figure 4D) although there was no difference in bioage between the groups after adjustment for sex and age (Figure 4C). From a dynamic point of few, this is not very surprising, since the bioage-independent mortality component originates from deviations of the physiological state from the development trajectory due to lifestyle choices and diseases and, if small, should be reversible. This finding is also in qualitative agreement with an estimated longevity dividend of as much as 5 years upon quitting smoking [24].
The impacts of smoking on biological age and mortality were verified using an independent dataset of locomotor activity records from UK Biobank (UKBB, 95,605 loco-motor activity samples after exclusion of poor quality records). We computed TM descriptors using time series of activity counts per minute for UKBB participants and applied the mortality model without any additional pre-training. As seen for the NHANES study, there was no significant difference in biological age means or distributions after correcting for sex and age (i.e., no aging acceleration) between current smokers, previous smokers and non-smokers, compare the UKBB and the NHANES analysis results in Figures 6A and 4C. Also similar to the NHANES findings, there was a statistically significant difference in the biological age-independent component of mortality between current smokers and either of the other two groups (participants with no smoking history and those who quit smoking in the past). The log-mortality ratio between previous smokers and current smokers was very similar to that observed for the NHANES, see Figure 6B.
III. DISCUSSION
We performed a systematic identification of biomark-ers of age and frailty using an extensive human physical activity records collection. The phenotypic changes associated with the development and aging have different dynamics depending on the life stage. We identify the age range 40+ as the most relevant to the studies of aging in humans in relation to the Gompertz mortality law.
Biological age is a phenomenological organism-level property that is linearly associated with age and serves as a key indicator of aging and all-cause mortality. The linear association of any physiologically relevant variables with age is a hallmark of aging studies in human subjects and, therefore, can be used to construct useful “biological clocks”. Examples of this include DNA methylation [25, 26], IgG glycosylation [27], blood biochemical parameters [28], gut microbiota composition [29], and cere-brospinal fluid proteome [30]. To date, the “epigenetic clock” based on DNA methylation (DNAm) levels [25, 26] appears to be the most accurate measure of aging, showing remarkably high correlation with chronological age. The DNAm clock predicts all-cause mortality in later life better than chronological age [31], is elevated in groups of individuals with HIV, Down syndrome [32, 33], obesity [34], but is not correlated with smoking status [35].
We find that the biological age signature computed from locomotor activity was found to be elevated in cohorts of NHANES participants diagnosed with T2DM or those characterized by hypertension, increased levels of C-reactive protein, excess weight, or elevated frailty index. An earlier analysis of aging acceleration using the epigenetic clock with data from the Woman’s Health Initiative study did not identify significant associations with these indications [23]. We were able to obtain the same results with the help of the same approach in a similar sized age-and gender-matched subpopulation of NHANES participants. A more careful examination, however, reveals that the multivariate test from [23] was underpowered due to intrinsic correlations between the participants characteristics. We observed that, e.g., removal of the diabetes-associated features such as blood glucose and insulin levels from the model leads to a dramatic increase of the significance of the association between the disease and aging acceleration.
The tightly coordinated change of the physiological variables with age is a direct consequence of intrinsic low-dimensionality of the organism state dynamics, a hallmark feature of criticality. This observation should not be surprising since slow evolution of physiological variables associated with major “biological programs”, such as morphogenesis [36] and aging [37], exhibit characteristic properties of critical dynamics, such as critical slowing down, rising variance, strong correlations between key variables, and non-Gaussianity in the distribution of fluctuations [38]. We find that in humans between approximately age 40 and the average lifespan, the evolution of physiological parameters with age is dominated by the dynamics of a single mode. In [37] we suggested that the stochastic drift of the collective variable associated with the critical mode is the driving force behind the characteristic increase in mortality with age. We observed that the corresponding mode vector coincides with the singular eigenvector of the data covariance matrix and can, therefore, be reliably identified with the help of PCA in an unsupervised way (i.e., without prior assumptions regarding the functional dependence of the biologically-relevant variables on age). Since the first PC score (PC1) grows approximately linearly with age, the stochastic broadening of the PC1 variation is small, and the PC scores with lesser variance (PC2, PC3) are practically independent of age, we propose PC1 as a natural measure of biological age (Figure 2B). The association between the aging mode variable and biological age was confirmed the finding that most of the mortality variation in our study can be explained by biological age alone (Figure 3A).
Biological age thus emerges as an organism-level property associated with the dynamical properties of the underlying regulatory network. Therefore, we expect multiple phenotypic changes associated with biological age on various levels of an organisms organization to occur in a coordinated fashion. This suggests that different biological clocks based on any convenient phenotypic feature, even features as different as DNA methylation and loco-motor activity, should yield very similar biological age predictions for the same subject.
In our study, we observed that variance of biological age in sex-and age-matched cohorts increases linearly with age. This is a signature of a random walk, suggesting that the biological age, as an organism-level property, does not only increase as a function of age, but also undergoes Brownian motion under the influence of stochastic forces. Therefore, we suggest that popular regression models of biological age can be further refined by explicitly including the effects of stochastic broadening.
The exponential form of mortality as a function of biological age shown in Figure 3A is typical for situations where the lifespan of a system is limited by the decay of a metastable state (see, e.g., [39]). This is schematically illustrated in Figure 5, where the small fluctuations of the physiological state variables that are independent of biological age are reversible, which means that the mode variables describing the state vector deviations from the aging trajectory are dynamically stable. In contrast, the large amplitude fluctuations are most certainly not reversible and manifest themselves as diseases. The dots in Figure 5 represent the data for NHANES participants, averaged over the sex-and age-matched cohorts and projected onto a subspace spanned by the direction associated with biological age and bioage-independent mortality (axes in the horizontal plane).
We superimposed the experimental points on a schematic representation of the effective potential energy surface (the vertical axis) set by the underlying regulatory network constraints. Around the early adulthood to mid-life transition, the organism state vector starts in a potential energy basin A separated from the dynamically unstable regions C by sufficiently high potential energy barriers B. The dynamics of the state variables are critical, which means that there is no or almost no curvature in the potential energy in the direction associated with biological age. With the natural assumption that the mode coupling is weak, the barrier heights depend on biological age linearly, and hence the probability of barrier crossing increases exponentially with biological age. Once the (presumably lowest) barrier is crossed, the dynamic stability along the corresponding mode vector will be lost (see example trajectories 1 and 2 in Figure 5, which differ by the age at which barrier crossing occurs). The resilience understood as the ability to regain the homeostatic state disappears, and the deviations of the physiological parameters develop beyond control. The situation manifests itself as development of extreme frailty (Figure 3C) and, eventually, certain death of the organism. On a population level, the loss of stability happens approximately at the age range corresponding to the average lifespan. The point on the aging trajectory corresponds to the middle-age to older-age transition and signifies the end of health span.
The physical picture behind the presented scenario reveals the dynamic origins of the postulates underlying the Strehler-Mildvan theory of aging [3], in which a linear decrease in vitality leads to an exponential increase in mortality with age. We note that the aging at criticality conjecture is then the necessary mechanistic link behind the phenomenological vitality concept and the dynamics of the physiological state variables. We relate the biological age with Strehler-Mildvan vitality deficit. The stochastic drift along the aging direction is then naturally the driving force behind the vitality attrition in the Strehler-Mildvan theory. The exponential Gompertz mortality increase with age appears to be a consequence of a gradual loss of the dynamic stability in the directions independent of biological age.
Since evolution of the physiological state vector is not reducible to aging drift alone, biological age constitutes an essential, but not the only contribution to human mortality. Our work indicates that the bioage-independent component of the risk of death is significant and associated with extreme frailty; this is consistent with the conclusions of [40] where a frailty index showed superior performance compared to DNAm age in mortality predictions. Also, in an epigenome-wide association study [41], the reported DNAm signature of all-cause mortality, was found to comprise a component independent of the “epigenetic clock”. We show that the physiological state vector fluctuations that are independent of bioage and the associated component of mortality risk are signatures of an organisms responses to stresses, diseases, and hazardous behaviors (e.g., smoking, Figure 4B). Our findings related to the impact of smoking agree with results obtained earlier in [35] where frailty index demonstrated a significant correlation with methylation sites associated with smoking. We also established that variations in the bioage-independent component of mortality can be induced by smoking early in life, but reversed if the individual quits smoking. Therefore, the biological age-dependent and-independent components of mortality are independent factors contributing to human lifes-pan determination.
In conclusion, this report demonstrates a way to quantify human physical activity time series and extract lo-comotor activity-based signatures of aging acceleration and increased mortality risk in association with diseases and hazardous lifestyles. A systematic study of aging and frailty in a large NHANES dataset revealed the dynamical origin of biological age and its relation to the characteristic increase in mortality with age, the Gom-pertz mortality law.
On a practical level, the results of this study lead us to propose using the hazard function, a property associated with all-causes mortality, as an ultimate measure of the health or wellness of an individual. This can then be decomposed into the biological age-associated component (which is determined by the accumulated effects of the individuals life history), and the potentially modifiable biological age-independent component. Both mortality components can be quantified from a single one-week long physical activity track collected by a consumer-grade wearable device.
Our findings highlight an opportunity for deployment of fully automated wellness intelligence systems capable of ambiently processing tracker information and providing dynamic feedback to the general public for improved engagement in health-promoting lifestyle modifications, disease interception, and clinical development of therapeutic interventions against the aging process.
V. MATERIALS AND METHODS
A. NHANES dataset
Locomotor activity records and questionnaire/laboratory data from the National Healthand Nutrition Examination Survey (NHANES) 2003-2004 and 2005-2006 cohorts were downloaded from [http://www.cdc.gov/nchs/nhanes/index.htm]. NHANES provides locomotor activity in the form of 7-day long continuous tracks of” activity counts sampled at 1min−1 frequency and recorded by a physical activity monitor (ActiGraph AM-7164 single-axis piezoelectric accelerom-eter) worn on the hip. Of 14,631 study participants (7176 in the 2003-2004 cohort and 7455 in the 2005-2006 cohort), we filtered out samples with abnormally low (average activity count <50) or high (>5000) physical activity. We also excluded participants aged 85 and older since the NHANES age data field is top coded at 85 years of age and we desired precise age information for our study.
To calculate a statistical descriptor of each participant’s locomotor activity, we first converted activity counts into discrete states with bin edges bk, k = 1..K. Activity level states 1…K − 1 were then defined as half-open intervals bk ≤ a < bk+1, state 0 as a > b1 and state K as a ≥ bK, where a is the activity count value. In this study, we defined 8 activity states with bin edges bk = ek − 1, k = 1…7. Thus, each sample was converted into a track of activity states and a transition matrix (TM) was then calculated for each participant (see below). To ensure that our analysis dealt only with days on which a participant actually performed some physical activity, we applied an additional filter. We excluded days with less than 200 minutes corresponding to activity states > 0. Only participants with 4 or more days that passed this additional filter were retained, yielding a total of 11839 samples (age, years: 35±23, range 6 – 84; women: 51%). For PCA and Survival analysis, the only samples used were those for participants aged 40 and older with known follow-up on survival/mortality outcome (age, years: 59 ± 12, range 40 – 84; women: 50%). Once PCA loading vectors were identified, we plotted all NHANES samples’ scores in Figure 2A, including those for which survival/mortality data were not available.
Transition matrices (TM) Tij, i = 1…8, j = 1…8 were calculated as a set of transition rates from each state j to each other state i (the diagonal elements correspond to the probability of remaining in the same activity state). TM elements were calculated as Tij = N(j → i)/N(j), where N(j) is the number of minutes corresponding to state j and N(j → i) is the number of times the state j was immediately followed by state i (in the consecutive minute along the sample record). We next converted the TM from a discrete point map to continuous notation: Wij = Tij − I, where I is the identity matrix. Wij is the proper TM for which the apparatus of the Markov chains can be used. We used this property to calculate Power Spectral Densities (PSD) and eigenfrequencies (shown in Figure 1B) based on the assumption that the Markov chain model can be an approximation of observed activity records. We flattened 8×8 TM of each sample into a 64-dimensional descriptor vector for Principal Component Analysis (PCA) and Survival analysis. Additionally, we converted the flattened descriptor to log-scale to ensure approximately normal distribution for elements of the locomotor descriptor (a useful property for the stability of the linear models that we applied in PCA and Survival analysis). All near-zero elements (< 10−3, which corresponds to less than 10 transitions during a week) were imputed by the value of 10−3 before log-scaling.
B. UKBB dataset
We accessed data from UK Biobank (UKBB) under the approved research project 21988 (formerly 9086). At the time the present study was conducted (2015-2017), locomotor activity data were available for 103710 UKBB participants. Physical activity was measured using Axiv-ity AX3 tri-axial accelerometers worn on the wrist for 7 consecutive days. The data were recorded in the low-level format as continuous tracks of 3D acceleration values sampled at 100Hz. Some tracks indicated that hardware errors occurred during the monitoring period. Participants with more than 10 such hardware errors in their track were excluded from our analysis, leaving 102914 participants. To make it possible to apply the PCA and Survival analysis models established using NHANES data to the UKBB data, we downsampled the original UKBB tracks to 1min−1 (as used in NHANES). For this purpose, individual acceleration records were split into 1-minute slices, and for each slice, the natural logarithm of the sum over the power spectral density (PSD) of the signal within that slice was calculated. Each of these PSDs was calculated from the absolute values of acceleration using the Welch method with 512 points Hann window function and 50% window overlap.
The downsampled UKBB tracks represent the level of physical activity per minute but are quantitatively different from the NHANES activity counts. We used a quantile normalization procedure to re-scale the UKBB values to the range of discrete activity states of NHANES. We selected NHANES participants in the age range 45-75 and dropped ⅙ of participants with the lowest and highest average activities. The combined tracks from the remaining 2398 NHANES participants were used to calculate the occupancy fractions pk = N(k)/N for each NHANES activity state (here N(k) is the number of times the state k was seen and N is the total number of minutes in all tracks). Then we randomly selected 5000 UKBB participants from the same age range and similarly dropped ⅙ of participants with the lowest and highest average activities; this resulted in selection of 3288 UKBB participants. Using the combined UKBB tracks from selected participants, we found UKBB bin edges b’k such, that the occupancy fractions for the corresponding activity states, were equal to the occupancy fractions in NHANES. Note that such quantile normalization automatically accounts for shift, linear and mono-tonic non-linear scaling of values, and so the resulting UKBB activity states are roughly equivalent to the ones from NHANES. Once bin edges for UKBB were obtained, the downsampled UKBB tracks were processed exactly as described above for NHANES. TMs and corresponding descriptors were obtained for 95609 UKBB participants (age, years: average 62, median 63, range 43-79; women: 56%).
C. Survival analysis
We estimated hazard rate (i.e. mortality rate) for each participant using parametric Cox-Gompertz proportional hazard model in the form of maximal likelihood minimization adapted from [20]. The model estimates mortality rate in response to values of each sample explanatory covariates, the locomotor activity descriptors in our study. In a general form, Cox proportional hazards are uncertain up to an arbitrary baseline hazard function. Cox-Gompertz model, in contrast, explicitly accounts for Gompertz exponential increase of mortality with age and provides an estimation of Gompertz parameters for the studied dataset. We used NHANES survival and mortality follow-up data to train the model. The corresponding parametric log-likelihood was maximized using theano python package: where M0 and Γ are the initial mortality rate and mortality doubling time constant of Gompertz law, respectively; Δti is the time between sampling of locomotor activity and censored or death event (in years), xi is the set of explanatory covariates (locomotor descriptors and gender label to account for gender-related differences in mortality) of i-th participant and β is the set of weighting coefficients corresponding to locomotor descriptors. To train the model we used the subset of participants for whom mortality/survival outcome was known. Covariates xi were normalized to zero mean and unit variance across the said subset. Regularization parameter λ is introduced to account for overfitting. We screened λ in log-space range 10−5 − 105 for stability of trained weighting coefficients β and selected λ = 0.1. We did not add chronological age explicitly to the model because our intention was to let the model attribute the age-related increase in mortality completely to locomotor biological age. The bioage-related and-independent components of log Mortality were obtained by linear detrending of log Mortality on bioage.
D. Volcano plot
NHANES 2003-2006 study participants in the age range 40-70 y.o. were divided into groups for each relevant entry from NHANES Questionnaire and Laboratory data. For entries containing continuous data, participants were divided into three groups by percentiles: 0 to 13, 14 to 86 and 87 to 100. For entries with categorical data, participants were grouped by their categorical label. In addition, we allocated two marginal groups corresponding to the first two and the last two labels, if an entry had more than 3 categories. The means of log-mortality were calculated for each group and differences of means for all possible combinations of group pairs were evaluated for statistical significance (p < 0.001) by the Mann-Whitney U test. For each pair of groups, we assigned the group containing more participants as the control group, while the other group was assigned as hazardous. Therefore, the positive sign for Δ log(mortality) corresponds to the increased mortality for the hazardous group. The difference of Δ log(mortality) for the bioage-related component was referred to as aging acceleration, and p-values were transformed using the-log 10 function. The significance level was adjusted for multiple testing with Bonferroni correction with the factor of n = 104 (total number of observed pairs rounded up to a power of 10).
All analyses were conducted using a set of in-house developed scripts in Python [http://www.python.org] and R [http://www.r-project.org].
IV. ACKNOWLEDGEMENT
This study was conducted using the UK Biobank Resource, application number 21988. We would like to thank G. Ivashkevich, I. Molodtsov, A. Tarkhov, V. Ko-gan from Gero LLC for extensive help in conducting the research.
P.O. Fedichev is a shareholder of Gero. T.V. Pyrkov, E. Getmantsev, B. Zhurov, K. Avchaciov, M. Pyatnit-skiy, L. Menshikov, K. Khodova, and P.O. Fedichev are employees of Gero LLC. An international PCT patent application submitted by Gero LLC on the described methods and tools for non-invasively evaluating health is pending.
Appendix A: Transition matrix theory.
Under the Markov chain model, the evolution of the probability Pi (t) to find the system at state i for the system with N discrete states is governed by the master equation which in the linear mode can be written as where kij ≥ 0 is the rate of transition from state j to state i. By introducing the transition matrix (TM) according to we can rewrite Eq. A1 as
Note from Eq. A2 and definition of kij we have Wij ≥ 0 for i ≠ j, Wii ≤ 0 and from which it follows that the probabilty norm is preserved = 0, as it should be.
In the following analysis we will assume that the TM W is irreducible and has distict eignevalues. The reasoning for such asumptions will be provied later. Under this assumptions W can be diagonalized where Ak and Bk are left and right eigenvectors corresponding to eigenvalue λk. Note that the systems of left and right eigenvectors are the inverse for each other:
To solve Eq. A3 we introduce and using Eqs. A5 and A6 rewrite Eq. A3 as for which the solution is from which using Eqs. A6 and A7 we get where Gij (t) is the probability P (i,t|j,0) to find the system at state i at time t if the system originally was at state j at time 0 and P0 is the initial distribution.
The assumption that W has distinct eignavalues together with Eq. A4 imply that W has exactly one zero eigenvalue. Since the order of eigenvalues is arbitrary, we can state that where the later unequality follows from W being a TM.
Indeed, W is real-valued, therefore for any eigenvalue λi and corresponding left eigenvector Ai we have and where * is complex conjugate. After multiplying the first equation by , the second by Aij and summing we get
Representing all Aij in exponential form Aij = ρij exp (ιϕij), dividing by |Aij|2 and replacing Wjj using Eq. A4 we get
This equation holds for all i and j. For a given i let us choose a particular j such that ρik ≤ ρij. Since all ρik and Wkj for k ≠ j are non-negative by definition, the Eq. A11 becomes Re λi ≤ 0.
According to Eq. A3, any equlibrium state is the right eigenvector corresponding to the zero eigenvalue. Since W has only one such eigenvector (up to scaling), we have a unique equlibrium distribution given by
The eigensystem has several interesting properties. From Eqs. A9 and A10 we get Gij (+∞) = A1jB1i and the distribution at t = +∞ is
For any initial distribution P0 the corresponding P∞ is an equlibrium state: and since equlibrium is unique for any P0. From this and Eq. A13 we have
Using Eq. A4, for the right eigenvectors Bk we get and therefor
Let as consider a discrete real-valued stochastic process x (t) having value xi when system happens to be in state i. According to the Wiener-Khinchin theorem, the power spectral density Sx (ω) for the x (t) is the Fourier transform of the autocorrelator
Using the fact that Rxx (τ) is an even real-valued function we obtain
Here we follow the common physical convention that the total power of the signal is given by . Expanding the Eq. A16 we get where P (i,t + τ|j,t) is the probaility to find the system at state j at time t + τ if the system was at state j at time t and P (j,t) is the probability to find the system at state j at time t, with the evolution of the system starting from some state P0. From the definitions we have P (i,t + τ|j, t) = Gij (τ) for τ ≥ 0 and P (j,t) = Pj (t). Using this and Eqs. A8 and A9 rewrite Eq. A18 as where τ > 0 and the summation is done for each index from 1 to N. By rearranging and using Eq. A10 we get from which using Eq. A12 we finally obtain
Note that Rxx (τ) is not dependent on the initial distribution P0, as it is expected for the system with equlibrium state. The integration of Eq. A19 using Eq. A17 is straightforward, and we get
The Eq. A20 is valid for any irreducible diagonalizable TM W. In particular, some λk may be complex. However, for the real-valued matrix W complex eigenvalues and corresponding eigenvectors always comes in complex conjugate pairs, which, together with Eq.A10, imply that Sx (ω) is always real positive, as any PSD should be.
Due to time symmentry of the fundamental physical laws, for the systems in thermodynamic equilibrium the detailed balance assumption is hold:
Biological organisms as a whole are not systems in thermodynamic equlibrium and the description of the motion using Markov chain model is only a rough approximation, so there are no a priori reasons to assume the detailed balance. However, experimetally the correlation between and is good, so it is interesting to see how Sx (ω) looks under detailed balance assumption.
First we introduce a derived matrix
With Eq. A21 hold, is symmetric and therefore can be eigendecomposed into where all eignvalues λk are real and eigenvectors μk are orthonormal:
From Eqs. A22 and A23 we get where from which using Eq. A24 follows which imply that Eq. A25 is an eigendecomposition of W as in Eq. A5, so we can use Eq. A20, which becomes
Here we used Ãki = , obtained from Eq. A26, to express Sx (ω) via right eigenventors alone. Each of the right eigenvectors Bk is defined up to a multiplication constant, however the scaling is fixed for : from Eqs. A24 and A26 we have from which we can find a proper scaling for an arbitrary right eigenvector Bk:
The Sx (ω) can be calculated under detailed balance assumption as follows: calculate the right eigensystem for W, scale the found eigenvectors Bk using Eq. rA29 and finally calculate Sx (ω) using Eq. A27. The same procedure can be applied when the detailed balance assumption holds only approximately, as long as we drop the imaginary part of the found eigenvalues and right eigenvectors. Note that even when all eigenvalues are real, the Eq. A27 is not equivalent to Eq. A20 without the detailed balance assumption. In particular, scaling according to Eq. A29 is not enought for Eq. A28 to hold, which is required for Eq. A27 to be precise.
Appendix B:
Appendix figures
Footnotes
↵* tim.pyrkov{at}gero.com, peter.fedichev{at}gero.com