Aging Skeletal Muscle Proteomics Finds Changes in Spliceosome, Immune factors, Proteostasis and Mitochondria

A progressive decline of skeletal muscle strength with aging is a primary cause of mobility loss and frailty in older persons, but the molecular mechanisms of such decline are not fully understood. Here, using quantitative discovery proteomic data from skeletal muscle specimens collected from 58 healthy persons aged 20 to 87 years show that ribosomal proteins and proteins related to energetic metabolism, including those related to the TCA cycle, mitochondria respiration, and glycolysis were underrepresented in older persons. Proteins with important roles in innate and adaptive immunity, involved in proteostasis and regulation of alternative splicing were all overrepresented in muscle from older persons. Changes with aging of alternative splicing were confirmed by RNA-seq. Overall, older muscle has a profound deficit of energetic metabolism, a pro-inflammatory environment and increased proteostasis. Upregulation of the splicing machinery maybe an attempt to compensate for these changes and this could be tested in future studies.


Introduction
One of the most striking phenotypes of aging is the decline of skeletal muscle strength, which occurs in all aging individuals and contributes to the impairment of lower extremity performance and loss of mobility 1-3 . The magnitude of decline in strength is higher than that expected from the loss of muscle mass, suggesting that the contractile capacity of each unit of muscle mass is progressively lower with aging. The reasons for such a decline of contractile capacity are unclear, and several hypotheses have been proposed 4 . Studies conducted in humans by 31 P magnetic resonance spectroscopy as well as "ex vivo" respirometry have shown that skeletal muscle oxidative capacity declines with aging and such decline affects mobility performance 5-8 . Ample evidence from animal models, and more limited evidence from human studies also suggest that aging causes progressive muscle denervation, with enlargement of the motor units and degeneration of the neuromuscular junction, but whether these changes account for the change of contractile performance of human muscle with aging has not been studied 9-13 .
Currently, no treatment is available to prevent or delay the decline of muscle strength and function with aging. Thus, understanding the mechanisms driving the decline in muscle contractile capacity with aging is essential to identify new targets of intervention. Previous studies attempted to address this question by performing cross-sectional untargeted proteomic analysis in skeletal muscle biopsy specimens from young and old individuals.
However, these studies were limited in size, focused on cancer cachexia, analyzed single fibers, did not account for levels of physical activity or did not explore the effect of aging over its continuous range and, therefore, could not distinguish changes due to aging from those due to disease or sedentary state [14][15][16][17][18] . To overcome these earlier limitations, we have performed a quantitative mass spectrometry-based proteome analysis (tandem mass tag, TMT) of skeletal muscle biopsies obtained from individuals distributed over a wide age range, who were extremely healthy based on strict objective clinical criteria. We characterized proteins that were overrepresented and underrepresented in older individuals and using these data we made inferences about molecular pathways affected by aging in skeletal muscle.

Quantitative Skeletal Muscle Proteome Analysis of Healthy Aging
Skeletal muscle biopsies were collected from 60 healthy participants of the Genetic and Epigenetic Study of Aging Laboratory Testing (GESTALT) aged 20 to 87 years who were defined as 'healthy' based on very strict evaluation criteria at the National Institute on Aging Clinical Unit in Baltimore 19 . Exclusion criteria included any diseases that required chronic treatment with the exclusion of mild hypertension fully controlled with one drug only, any physical or cognitive impairment, and any abnormal values in pre-defined list of blood clinical tests (see methods for details). Participants who consented for a muscle biopsy were homogeneously distributed across the age strata 20-34 (n=13), 35-49 (n=11), 50-64 (n=12), 65-79 (n=12) and 80+ (n=10), and biopsies were analyzed by tandem shotgun mass spectrometry-based quantitative proteomics method ( Figure 1A, Table S1). Using multiplexed isobaric labelling tags (TMT) and a customized analytical strategy 20 21 we identified 400,000 tryptic peptides from 6.7 million spectra (396 multiplexed MS runs from 12 TMT 6plex experiments), which allowed the quantification of 5,891 proteins (Table S2).
To control for batch variability and avoid bias, we included a reference sample in all 12 TMT sets. A loading normalization was implemented that assumed that the sums of all intensities from all the proteins across the samples in a single TMT experiment were equal and that the sample loading effects, peptide bias effects and the residual error were normally distributed across a constant variance across samples ( Figure S1A). To test the effectiveness of these approaches, we examined TMT batch effects in several analyses, allowing for experimentspecific random effects. We then averaged the expression values from each TMT across the sample groups and found that the ranks between TMTs were highly correlated ( Figure 1B-C, Figure S1B). Together, these findings indicate that the protein quantification across the 12 TMT experiments was robust.
Of the initial 5,891 proteins detected, we excluded from the analysis 1,511 proteins that were not quantifiable in at least 3 participants per age strata (at least 15 participants total) and performed the analysis in the remaining 4,380 proteins detected in more than 15 donors (3 per age strata), which were quantified from 46,834 unique peptides and 2.7 million spectra ( Figure 1D). We used Partial Least Squares (PLS) analysis to explore the overall clustering of the 4,380 proteins across age groups ( Figure 1E). The age groups (colorcoded) were well separated along at least one axis in the three-dimensional clustering classification ( Figure S1C). As expected, most of the proteins identified were classified as "muscle proteins", and the top 10 most abundant muscle proteins accounted for 45% of the total spectral abundance. Low-abundant mitochondrial proteins, such as cardiac phospholamban (PLN), were also quantified.

Focus on the Aging Biological Mechanisms
The relationship of age with the skeletal muscle proteins was estimated by linear mixed regression models that included sex, race, level of physical activity, type I/type II muscle fiber ratio, body mass index (BMI) and TMT batch effect as covariates (method section). Of note, the age beta-coefficients (aging effect size) are small because they express the difference in protein "per year" of age. For example, the difference in protein between two individuals that differ by 20 years would be 20 times the size of the beta coefficient. We adjusted for physical activity because it both tends to decline with age and strongly affects biological processes in muscle cells [22][23][24] . Previous studies demonstrated that gender and race strongly affect body composition and muscle mass 25 . Skeletal muscle tissue includes different myofiber types: type I fibers (slow-twitch), type IIa fibers (fast-oxidative), and IIb fibers (fast glycolytic muscle fibers containing four different myosin isoforms), each supported by different energetic metabolism and with different protein composition. An analysis for a proxy measure of "muscle fiber ratio" 26 was estimated by calculating the ratio of myosin 7 (MYH7), the slow-twitch fiber isoform, and the sum of fast-twitch fiber isoforms (MYH1, MYH2 and MYH4) ( Figure S2A1-A4); as expected, the fiber ratio of slow/fast was higher with older age ( Figure S2A5). The slight change of slow/fast fiber ratio was significant and outweighed the wide variability among individuals (p=0.005); BMI was adjusted because obese persons tend to have muscle fat infiltration and lower muscle quality and muscle-fat interaction may affect muscle composition and function 4,27 . Gender may also have an impact on protein expression in skeletal muscle, as males and females are known to have differences in muscle mass; however, because of the limited sample size, we did not stratify the analysis by gender. This analysis should be done in future larger studies.
Proteins were then deemed significantly underrepresented or overrepresented in older age based on p-values for age-coefficients in the regression equation, calculated from Satterthwaite's t-tests ( Figure 2A; Figure S2B). There were 1265 proteins significantly associated with age (p <0.05, with BH correction <0.1, 917 proteins), suggesting that approximately 29% of the skeletal muscle proteome changed with aging after 20 years of age (Table S3). Of these, 29% (361) were significantly underrepresented and 71% (904) were overrepresented with older age. The age-associated analysis across the experimental dataset and across multiple comparisons was highly robust ( Figure S2C).
Protein LSm14A is implicated in processing the assembly of processing bodies, involved in mRNA turnover, and can also bind to viral nucleic acids and initiate IFN-β production, contributing to innate immunity 28 . TIMP3 regulates the adipogenic differentiation of fibro/adipogenic progenitors (FAPs) in skeletal muscle, and its overrepresentation may explain the tendency for fat infiltration in aging muscle 29 . Consistent with this hypothesis, Perilipin 1 (PLIN1, β=0.014, p=0.0003), a lipid droplet-coating protein, and adipogenesis regulatory factor (ADIRF, β=0.01173, p=1.78E-05), a protein that is only expressed in adipose tissue, were among the most overrepresented proteins in old muscle. APCS is indicative of systemic amyloid, and its overrepresentation in aging muscle has never been previously described.
To explore differences of protein expression profiles across the lifespan, we generated a heatmap of the 1,265 age-associated proteins and looked for clusters of proteins showing parallel changes with age ( Figure 2B). Hierarchical clustering of protein expression suggested that the strongest difference was between young (20-34) and old (80+). There were small differences before the age of 50, but afterwards there was on average three log fold protein expression differences, and even more substantial differences after the age of 64. The separation of protein expression between three age groups (20-49, 50-64, and 65+) was confirmed by PLS analysis ( Figure 2C). These findings are consistent with data showing that the age-associated decline of muscle strength is already detectable after the fourth decade of life and substantially accelerates after the age of 70 31 .
Next, we grouped all quantified proteins according to biological mechanisms associated with aging in skeletal muscle ( Figure 2D.1). We also included a category for all contractile and architectural muscle proteins (named hereafter "muscle proteins"). Though the highest abundance proteins detected correspond to muscle proteins, the largest category were mitochondria proteins (15%). Each of the other categories represented <9% of total proteins. Protein classes differed between those underrepresented and overrepresented with older age are summarized in ( Figure 2D.2 and 2D.3) and are described in detail in subsequent sections. Specifically, proteins implicated in muscle contraction, muscle architecture, mitochondria metabolism, as well as ribosome function decreased with older age. By contrast, proteins related to genomic maintenance, transcriptional regulators, splicing, neuromuscular junction, proteostasis, senescence and immune function increased with age.
Other smaller subcategories of proteins were also differentially abundant ( Figure 2D.3).

Contractile, Architectural and Neuromuscular Junction Proteins (NMJ)
Since many proteins decreasing with age were contractile proteins, we classified these further by function. The top 95 proteins in this class are involved in the architectural and functional stabilization of the sarcomere, including sarcospan (SSPN, β=0.002, p=0.016) ( Figure 2E), a dystrophin-associated protein complex important for muscle regeneration, actin-binding LIM domain and actin-binding protein 1 (LIMA1, β=0.003, p=0.009), a cytoskeleton-binding protein that stabilizes actin filaments, and plectin (PLEC, β=0.0007, p=0.036), a large cytoskeleton protein that preserves interactions within the acto-myosin complex. Increases in delta sarcoglycan (SGCD, β=0.0019, p=0.00004, and gamma sarcoglycan (SGCG, β=0.0016, p=0.0062) were consistent with mouse studies showing that dystrophin, sarcoglycan subcomplex γ-and δ-sarcoglycan were overexpressed with aging, perhaps a compensatory mechanism to avoid damage in the sarcomere during contraction or as biomarkers of continuous repair 32 . Interestingly, MAPT (tau, mostly expressed in neurons and involved in the assembly and stabilization of microtubules), was also significantly underrepresented in older muscle ( Figure 2E). A crucial component of muscle function is the neuromuscular junction (NMJ), and since the abundance of all NMJ related proteins increased with age we examined the agrin signaling pathway of NMJ. Agrin (AGRN) and acetylcholine esterase (ACHE) increased with age but not significantly ( Figure S2D). By contrast, the levels of Syne-1 which anchors both synaptic and non-synaptic myonuclei for proper neuron innervation and respiration increased with age (SYNE1 β=0.002, p=0.005) as did beta-2-syntrophin, which is believed to be involved in acetylcholine receptor clustering (SNTB2, β=0.0029, p=0.0003).

Decline of Mitochondrial Proteins with Age
Because of the striking difference in abundance of mitochondrial and energy metabolism proteins with age, we studied these proteins by protein annotations using Uniprot keywords, GO ontology terms and extensive manual curation based on the most recent literature. The coverage of mitochondrial proteins quantified by our analysis compared to those described in the literature ranged from 92% for TCA proteins to 52% for proteins located in outer mitochondrial membrane [possibly due to incomplete tissue disruption 33,34 ] ( Figure 3A). The coverage of the bioenergetics and mitochondrial proteome in our dataset is similar to that reported by other authors 15,33 . Of the mitochondrial proteins identified, the abundance of 25% of them (173 proteins) changed with age, mostly (70%) declining with age. Notably, however, outer membrane proteins were more abundant ( Figure 3B); for example, NADHcytochrome b5 reductase 3 (CYB5R3), an NADH-dehydrogenase located in the outer membrane of ER and mitochondria, whose overexpression is known to mimic many effects of caloric restriction, was significantly overrepresented in older age ( Figure S3A) 35,36 . The permanence of mitochondrial protein debris in aging muscle has been previously reportedattributed to defective autophagy, and through to cause activation of the inflammasome and a proinflammatory state 37 .
Of the enzymatic mitochondrial proteins, 99 were respiratory chain proteins (Complex I-V and assembly complex proteins), and most of them declined with aging (28 proteins p<0.05; Figure 3C). Surprisingly, succinate dehydrogenase complex assembly factor 2 (SDHAF2), required for covalent FAD insertion into complex II, the electron transport chain, and the TCA cycle, were significantly overrepresented with older age (Figure 3C  We then analyzed proteins from complex I to V and found that 16 proteins were significantly lower at older age ( Figure 3C, Figure S3B). Among 41 proteins involved in energy production, most were underrepresented at older ages. Of 22 proteins quantified for TCA cycle, only malate dehydrogenase (MDH1), isocitrate dehydrogenase (IDH1), fumarate hydratase (FH) and succinate--CoA ligase (SUCLG1) ( Figure S3C) were significantly lower at older ages. The decreasing levels of IDH-1 with age is unsurprising, as previous studies have shown a decrease in abundance of IDH-1 in older C.elegans 38 . IDH1 converts isocitrate to α-ketoglutarate by reducing NADP to NADPH in the process. In addition, to IDH1, NADP+ is also reduced to NADPH via the mitochondrial NAD(P)-malic enzyme (ME2) 39 and predominantly through NNT (NAD(P) transhydrogenase) and the pentose phosphate pathway. In our study, NNT (β=-0.003, p=0.001) significantly decreased with aging.
Interestingly, the decrease in expression levels of both NNT and IDH1 with age, suggests a decreased capacity of the mitochondria to maintain proton gradients and results in oxidative damage. Further, NADK (NAD+ Kinase), which is highly regulated by the redox state of the cell and regulates NADP synthesis in vivo decreased with age (NADK2, β=-0.001, p=0.052).
The changes in the NADP/NADPH levels influence cellular metabolism, calcium signaling and anti-inflammatory processes and regeneration of glutathione 40 . NAD + declines with age in several tissues and its metabolism has been implicated in the aging process and age-related pathologies including loss of skeletal muscle mass 41,42 . NAD + is synthesized in vivo predominantly via the salvage pathway and the de novo and Preiss-Handler pathways 43,44 . We specifically examined age differences in abundance of proteins from these pathways. We found that NAM-N-methyl transferase (NNMT, β=0.007, p=0.016), nicotinamide ribose kinases (NMRK1, β=-0.003, p=0.002), poly-ADP-ribose polymerases (PARP1, β=0.002, p=0.003) and CD73 (NT5E, β=0.004, p=0.056) were significantly increased with at older ages, while only NMRK1 decreased with age ( Figure 3D). NAMPT, which converts NAM to NMN, was not significantly different with age while NMRK1, which converts NR to NMN, was significantly lower in the muscle of older participants. These findings may explain the mechanism by which NMN tends to be lower in tissue from older compared to younger persons. Two additional mechanisms that may exacerbate the decline in NMN and NAD + with aging, namely the increased expression levels of CD73 that converts NMN to NR and the increase in expression levels of PARP1, which converts NAD + into NAM and ADP-ribose.

Changes of Proteins Involved in Genomic Maintenance and Cellular Senescence
Most genomic maintenance proteins increased in abundance with age, especially those involved in DNA damage recognition and repair, such as double-strand break repair protein (MRE11), X-ray repair cross-complementing protein 5 (XRCC5), and structural maintenance of chromosomes protein 1A (SMC1A) ( Figure S4A). Prelamin-A/C (LMNA), Lamin-B1 (LMNB1) and Lamin-B2 (LMNB2), members of the LMN family of protein components of nuclear lamina that help maintain nuclear and genome architecture, were all overrepresented with older age ( Figure S4B). Sirtuin 2 (SIRT2, β=-0.0013, p=0.032), implicated in genomic stability, metabolism and aging, was also found to be lower in older skeletal muscle ( Figure S4C).

Implications of Proteins that Modulate Transcription and Splicing
Of all the 69 age-associated transcription regulatory proteins quantified, only 8 were underrepresented with older age, including kelch-like protein 31 (KLHL31, β=-0.0017, p=0.003), which is implicated in the maintenance of skeletal muscle structure, increases with muscle growth and prevents congenital myopathy in mice 45 . By contrast, myocyte-specific enhancer factor 2D (MEF2D, β=0.003, p=0.018), essential for myogenesis and muscle regeneration and regulator of KLHL31 production, increased with age 46 . Contrary to earlier reports, CTCF (β=0.009, p=0.026), a transcriptional activator and repressor protein that finetunes chromatin architecture, also increased with age ( Figure 4A).
A major unexpected finding of our analysis was the strong increase in major spliceosome complex proteins with aging ( Figure S5A). The spliceosome comprises five small nuclear RNAs (snRNAs), U1, U2, U4, U5, and U6, that form functional complexes with proteins to regulate alternative splicing, a process by which different exons of one pre-mRNA are variably combined to generate different proteins 47 . We found differential expression of many proteins widely distributed across the five spliceosome complexes and other spliceosomeassociated protein factors essential for mRNA maturation and gene expression ( Figure 4B).
In particular, of the ~300 proteins and spliceosome-associated factors described 48 49 , we quantified 99 and 57 of them, respectively, were overrepresented in older muscle ( Figure   4C). Overall, spliceosomal proteins increased by ~15% between the ages of 20 and 87 years ( Figure 4D). Spliceosome components are actively rearranged during assembly, catalysis, disassembly and recycling, each step involving recruitment and recycling of several proteins 50 . To understand whether aging affects preferentially one of these biological steps, we categorized the spliceosomal complexes and snRNPs into E complex, A complex, and B complex (assembly complex, 37 proteins), Bact complex and C complex (catalysis complex, 7 proteins) and snRNPs (recycling, SART1 protein) ( Figure 4E), but we found no evidence of proteins from a specific complex being more overrepresented with aging than proteins from other complexes ( Figure S5B). LSm RNA-binding protein (LSM14A) was the most overrepresented assembly protein, displaying a 20-fold increase with age. The overrepresentation of spliceosomal proteins, such as the pre-mRNA-processing-splicing factor 8 (PRPF8) ( Figure 4E, inset), the key catalytic core and the largest and most conserved protein in the spliceosome, suggests that pre-RNA processing may be upregulated in older skeletal muscle.
A systematic change in the splicing machinery with older age was previously suggested by transcriptomic analyses skeletal muscle biopsies 51 and human peripheral blood leukocytes 52 of young and old individuals. In both studies, processing of mRNAs was the feature that best discriminated younger and older persons, suggesting that modulation of alternative splicing is one of the signatures of aging 53 . Although the mechanisms and consequences of the rise in splicing factors with aging are unknown, they may indicate either a dysregulation of the splicing apparatus or a shift toward increased splicing and/or altered splice isoform diversity with aging 54 .

Age-Associated Alternative Splicing and Splicing Events
The marked rise in overrepresentation of splicing machinery proteins with aging prompted questions about its functional consequences. Emerging literature suggest that change in expression of splicing factors is a major determinant for selection of specific splicing variants and changes in splicing variants contributes to some aging phenotypes, including agerelated diseases 55 56 . We analyzed potential differences in mRNA splicing with age (see methods) using RNA-seq data that were available for most of the same specimens used for the proteomic study (n=53). Specifically, we studied a set of variations of the exon-intron structure, known as transcriptional events, namely Alternative First exon (AF), Skipped Exon (SE), Alternative Last exon (AL), Alternative 3' splice-site (A3), Alternative 5' splice-site (A5), Retained Intron (RI) and Mutually Exclusive Exons (MX) 57 . Donor-specific splicing index (PSI, which measures each isoform as a % of total isoforms) was calculated for each AS event in each sample and a linear mixed regression model was used to identify age-associated PSIs for each splicing event. Analysis of 144,830 transcripts from RNA-seq datasets showed that around 3.7% of the skeletal muscle transcripts (5,459 transcripts, corresponding to 6,255 events) showed relative abundance changes with aging ( Figure 5A; Table S4). Next, we calculated the frequency and distribution of splicing events with aging as well as the directionality of such changes and found that 2,714 events were significantly less frequent at older ages and 3,545 events significantly more frequent at older ages ( Figure 5B; Table S5). The overall number of events increased slightly with older age but AS events, at least for the 6,255 AS events quantified, increased significantly with age (r2=0.33, p=6.001e-06) ( Figure 5C).
We then investigated whether any specific class of skeletal muscle AS events was enriched in our age-association analysis compared to the list of splicing events described for human skeletal muscle in the Ensembl human transcriptome ( Figure 5D). The rates of observed skeletal muscle events are very similar to those reported in the Ensembl transcriptome ( Figure 5D) except for ME, A3, SE and AF. The largest difference was in the skipped exon (SE) class of events, where a higher percentage of transcripts were exon-skipped compared to Ensembl events, with 27% of all the skeletal muscle AS events of the exon skipping type.
A previous study reported 35% of the erythroid genes show evidence of AF exons, indicating that alternative promoters and AF are widespread in the human genome and play a major role in regulating expression of select isoforms in a tissue-specific manner 58 . This finding is in line with our result of 36% AF in our skeletal muscle data.
We next examined whether AS events occurred in proteins connected with pathways that are known to be dysregulated with aging; interestingly, among the top fold enriched (FE) gene ontology (GO) biological processes associated with age, splicing changes were more frequent on those that negatively regulated IκB kinase/NF-κB signaling (FE=2.86, p=3.18E-04), and those that regulated mitophagy (autophagy of mitochondria; FE=3.71, p=2. 23E-04) and fatty acid beta oxidation (FE=3.21, p=1.72E-04). The GO biological process with positive age-associated splicing events were mitochondrial morphogenesis (FE=5.15, p=8.98E-03), response to mitochondrial depolarization (FE=4.93, p=2.46E-04), and endoplasmic reticulum calcium ion homeostasis (FE=4.48, p=2.31E-04). These data suggest that the upregulation of alternative splicing in skeletal muscle with aging may be reactive to change that occur with aging either by rising an inflammatory response or by activating damage-response mechanisms at a time when energy becomes scarce.
Among the 5,459 transcripts (from 3,791 genes) that were alternatively spliced with age, 4967 transcripts were protein-coding. We compared these genes with the age-associated proteins and found that 8.9% of the age-associated alternatively spliced transcripts (385) were reflected in protein changes ( Figure 5E). This comparison of age-associated proteins and alternatively splicing mRNAs suggests that 30% (385) of the age-associated proteins undergo alternative splicing. Among this group, 64 proteins are involved in cellular organization or biogenesis (GO:007180), and proteins like tubulin (TUBB2B, TUBB), profilin 2 (PFN2) and actin-related protein 2/3 complex subunit 4 (ARPC4) are involved in the cytoskeletal regulation by Rho GTPase pathway. A further PANTHER database classification of these proteins shows an enrichment in categories like RNA/DNA binding, cytoskeletal, translational and ribosomal proteins ( Figure 5E protein categories). Overall, these findings suggest that a large percentage of proteins that change with aging also undergo splicing variations, and this is especially true for mitochondrial proteins. The physiological reasons for these changes remain unknown.

Depletion of Ribosomal Proteins with Age
Similarly, to previous studies, we found that a large number of ribosomal proteins were differentially expressed with older age (Figure 2, Figure S6A) 59,60 . In particular, all the 60S and 40S ribosomal proteins were globally reduced in older muscle; exceptions included 60S ribosomal proteins L12 and L3 (RPL12, β=0.0008, p=0.024, RPL3, β=0.003, p=0.016), as well as H/ACA ribonucleoprotein complex subunit 4 (DKC1, β=0.002, p=0.034) and nucleolar protein 58 (NOP58, β=0.003, p=0.00007), which were overrepresented in old muscle. RPL12, RPL3, and DKC1 play a role in viral mRNA translation, while NOP58 is important for ribosomal biogenesis ( Figure S6A-C). Changes in ribosome proteins may signal a decline in protein synthesis with aging, which may lead to slow recycling and progressive damage accumulation in contractile proteins.
The loss of chaperone function during aging may be compensated by an increase in autophagic activity ( Figure 6C), as misfolded proteins must be removed and degraded through an alternative mechanism. Indeed, most proteostasis proteins overrepresented with aging were related to autophagy except HSPA8 and Eukaryotic translation initiation factor 4 gamma 1 (EIF4G1). For example, TDP-43, a DNA/RNA-binding protein that tends to form aggregates in tissues such as skeletal muscle and brain and is both removed by autophagy and involved in autophagy maintenance, increased significantly with aging (TARDBP, β=0.002, p=0.0002) ( Figure 6D). On the contrary, calreticulin (CALR), a quality control chaperone induced under ER stress that stimulated autophagy, was significantly higher in muscle of older participants (β=0.001, p=0.022) ( Figure 6D) 65 . Of note, calreticulin is used by macrophages to tag cells to be removed by programmed cell phagocytosis 66 . Consistent with this finding, CALR is considered a main biomarker of age-related diseases and frailty 67 .

Pro-inflammatory and Anti-Inflammatory Immune Proteins of Aging Muscle
Of the 32 immune-related age-associated proteins that were quantified ( Figure 7A 68,69 . Interestingly, we also identified proteins that were concurrently downregulated, such as Microtubule-associated serine/threonine-protein kinase 2 (MAST2, β=-0.002, p=0.023) and Phosphatidylinositol 3,4,5-trisphosphate 5-phosphatase 2 (INPPL1, β=-0.0009, p=0.036), that could accentuate the inflammatory phenotype by attenuating the negative regulation of NF-κB ( Figure 7B.2) 70,71 . Thus, increased expression of NF-κB activators and decreased expression of NF-κB attenuators may synergize to elevate chronic inflammation in aging muscle. We also noted increased expression of High mobility group protein B2 (HMGB2, β=0.004, p=0.001), a well-known 'alarmin' 72 that is released from dying cells or within neutrophil extracellular traps (NETs), that may further exacerbate the inflammatory milieu. Cumulatively, our observations are consistent with the emergence and/or enrichment of pro-inflammatory macrophages in aging human muscle. Second, we found evidence of an anti-inflammatory activity that could potentially offset the pro-inflammatory milieu of aging muscle ( Figure 7C). This was most evident in strong ageassociated up-regulation of annexin A1 (ANXA1, β=0.008, p=0.00001), a protein that has been linked to resolution of inflammation 73 . Elevated levels of adiponectin (ADIPOQ, β=0.002, p=0.008), a chemokine produced exclusively by adipocytes, likely reflected increased adipogenic activity in aging muscle. However, it is interesting to note that ADIPOQ has also been proposed to inhibit endothelial NF-κB activation 74,75 and may, thereby, have context-dependent anti-inflammatory functions. Finally, erbin (ERBIN, β=0.002, p=0.019), a nuclear lamina-associated protein that was overrepresented with age in our studies, has been implicated in reducing NF-κB activation by some stimuli 76 , with associated reduction in pro-inflammatory gene expression.
Third, coordinate up-regulation of several members of the alternate complement pathway, such as CFAH (β=0.003, p=0.028, and CFAD (β=0.003, p=0.039) and modulators of complement activity such as CD antigen CD55 (DAF and CD55, β=0.006, p=0.00007) indicate ongoing innate immune activity in aging muscle ( Figure 7D). Whether this trend reflects increased presence of dying cells and cell debris or below-threshold autoimmune activity remains to be determined. The latter could be mediated, for example, by Immunoglobulin heavy constant gamma 4 (IGHG4, β=0.008, p=0.019) which we found to be increased with age. This antibody isotype has been implicated in the generation of autoantibodies against muscle-specific kinases that are prevalent in certain forms of myasthenia gravis 77 . The possible connection between the aging muscle and chronic neurodegenerative disorders in which destruction of self-tissue by complement has been ascribed a causative role 78 is an intriguing area for future investigation.

Conclusions
The biological mechanisms that mediate the deleterious effect of aging on skeletal muscle are still controversial, as some evidence suggested that the decline of mitochondrial content, volume and energetic efficiency plays a primary role, while other evidence showed show no significant change for the same parameters with aging, especially if the level of physical activity was considered 79 . To investigate systematically the changes in expressed proteins that might drive the decline in skeletal muscle function, we conducted an in-depth quantitative measurement of age-related changes in protein abundance in human skeletal muscle. While we did not use model systems or in vivo experiments, because of the careful design of the study, the selection of an extraordinary healthy population, the depth of protein detection and rigorous analysis made it possible to produce a descriptive quantitative dataset to show aging-associated molecular changes. We used a MS-based isobaric relative quantitative approach for proteome analysis that provides broad coverage of the proteins of human skeletal muscle in very healthy individuals over a wide age range and we adjusted our analysis for potential confounders. The biological function of most of the protein reported in this study was gathered by an extensive review of the literature and instead of relying only on annotation of Uniprot or GO database, we manually curated the functional classification used in the analysis. We present evidence that our approach is robust and sensitive to true biological variability. We confirmed the altered expression of proteins implicated in pathways differentially active in human skeletal muscle with aging, including more highly abundant mitochondrial proteins and less abundant inflammatory proteins. We also identified subsets of proteins increasing with age that were not previously described, namely proteins implicated in alternative splicing and autophagy. Our work provides a rich resource to study the effect of aging on the human skeletal muscle proteome and sets the stage for future research on the mechanisms driving the age-associated decline in muscle function.

Study design and participants
Muscle biopsies analyzed in this study were collected from participants from the Genetic and Epigenetic Study of Aging and Laboratory Testing (GESTALT). Participants were enrolled in GESTALT if they were free of major diseases, except for controlled hypertension or a history of cancer that had been clinically silent for at least 10 years, were not chronically on medications (except one antihypertensive drug), had no physical or cognitive impairments, had a BMI less than 30 kg/m 2 , and did not train professionally. Inclusion criteria were gathered from information on medical history, physical exams, and blood test interpreted by a trained nurse practitioner 80 . Participants were evaluated at the Clinical Research Unit of the National Institute on Aging Intramural Research Program. Data and muscle specimens from 60 participants were available for this study. However, two participants were excluded because the muscle specimen provided was too small to obtain reliable proteomic data. Therefore, data from 58 participants dispersed over a wide age-range (20-34 y, n=13; 35-49 y, n=11; 50-64 y, n=12; 65-79 y, n=12; 80+ y, n=10) were used for this study.
Anthropometric parameters were objectively assessed. The level of physical activity was determined using an interview-administered standardized questionnaire originally developed for the Health, Aging and Body Composition Study 81 and modeled after the Leisure-Time Physical Activity questionnaire 82 . Total participation time in moderate to vigorous physical activity per week was calculated by multiplying frequency by amount of time performed for each activity, summing all of the activities, then dividing by two to derive minutes of moderate to vigorous physical activity per week, the following categories were used: <30 minutes per week of high intensity physical activity was considered "not active" and coded as 0; high-intensity physical activity ≥30 and <75 minutes was considered "moderately active" and coded as 1, high-intensity physical activity ≥75 and <150 minutes was considered "active" and coded as 2, and high-intensity physical activity ≥150 minutes was considered to "highly active" and coded as 3. An ordinal variable from 0 to 3 was used in the analysis. The

Muscle biopsies
The depth of the subcutaneous fat (uncompressed and compressed) was determined using MRI images of the middle thigh performed on the previous day. A region above the vastus lateralis muscle was marked at the mid-point of a line drawn between the great trochanter and the mid-patella upper margin. The skin was prepped with povidone-iodine (Betadine®) and ethyl alcohol, and the outside areas covered with sterile drapes. The biopsy site was anesthetized intradermally using a 27-gauge needle and then subcutaneously using a 23gauge x 1 1/2 -inch needle, follow by an 18-gauge spinal needle, with ~15 mL of 1% lidocaine with sodium bicarbonate. The operator was careful that the anesthetic was infiltrated in the subcutaneous tissue and above the muscle fascia but not the muscle fibers not to distort the tissue structure and induce a gene expression response. A 6-mm Bergstrom biopsy needle was inserted through the skin and fascia incision into the muscle, and muscle tissue samples were obtained using a standard method. Biopsy specimens cut into small sections were snap frozen in liquid nitrogen and subsequently stored at −80 °C until used for analyses.

Sample preparation and protein extraction
On average 8 mg of muscle tissue was pulverized in liquid nitrogen and mixed with the lysis buffer containing protease inhibitor cocktail (8 M Urea, 2M Thiourea, 4% CHAPS, 1% Triton X-100, 50 mM Tris, pH 8.5 (Sigma)). Protein concentration was determined using commercially available 2-D quant kit (GE Healthcare Life Sciences). Sample quality was confirmed using NuPAGE® protein gels stained with fluorescent SyproRuby protein stain (Thermo Fisher).
In order to remove detergents and lipids 300 μg of muscle tissue lysate were precipitated using standard methanol/chloroform extraction protocol 83 . Proteins were resuspended in concentrated urea buffer (8 M Urea, 2 M Thiourea, 150 mM NaCl (Sigma)), reduced with 50 mM DTT for 1 hour at 36°C and alkylated with 100 mM iodoacetamide for 1 hour at 36°C in the dark. The concentrated urea was diluted 12 times with 50 mM ammonium bicarbonate buffer and proteins were digested for 18 hours at 36°C using trypsin/LysC mixture (Promega) in 1:50 (w/w) enzyme to protein ratio. Protein digests were desalted on 10 x 4.0 mm C18 cartridge (Restek, cat# 917450210) using Agilent 1260 Bio-inert HPLC system with the fraction collector. Purified peptides were speed vacuum-dried and stored at -80°C until further processing.
Tandem Mass Tags (TMT) labeling was used to perform quantitative proteomics. Each TMT labeling reaction contains 6 labels to be multiplexed in a single MS run. Donor IDs were blinded, and samples were randomized to prevent TMT bias. Each TMT 6-plex set included one donor from each of the 5 age groups and one reference sample. 5 muscle samples 100 μg each corresponding to 5 different age groups and one separately prepared master reference sample were labeled with 6-plex tandem mass spectrometry tags using a standard TMT labeling protocol (Thermo Fisher). 200 femtomole of bacterial beta-galactosidase digest (SCIEX) was spiked into each sample prior to TMT labeling to control for labeling efficiency and overall instrument performance. Labeled peptides from 6 different TMT channels were combined into one experiment and fractionated.

High-pH RPLC fractionation and concatenation strategy
High-pH RPLC fractionation was performed on Agilent 1260 bio-inert HPLC system using 3.9 mm X 5 mm XBridge BEH Shield RP18 XP VanGuard cartridge and 4.6 mm X 250 mm XBridge Peptide BEH C18 column (Waters). Solvent composition was as follows: 10 mM ammonium formate (pH 10) as mobile phase (A) and 10 mM ammonium formate and 90% ACN (pH 10) as mobile-phase B 84 .
TMT labeled peptides prepared from the skeletal muscle tissues were separated using a linear organic gradient that went from 5% to 50% B in 100 min. Initially, 99 fractions were collected during each LC run at 1 min interval each. Three individual high-pH fractions were concatenated into 33 combined fractions with the 33 min interval between each fraction (fractions 1, 34, 67=combined fraction 1, fractions 2, 35, 68=combined fraction 2 and so on).
Combined fractions were speed vacuum dried, desalted and stored at -80°C until final LC-MS/MS analysis.

LC-MS/MS analyses
Purified peptide fractions from skeletal muscle tissues were analyzed using UltiMate 3000 Nano LC Systems coupled to the Q Exactive HF mass spectrometer (Thermo Scientific, San Jose, CA). Each fraction was separated on a 35 cm capillary column (3 um C18 silica, Hamilton, HxSil cat# 79139) with 150 um ID on a linear organic gradient using 650 nl/min flow rate. Gradient went from 5 to 35% B in 205 min. Mobile phases A and B consisted of 0.1% formic acid in water and 0.1% formic acid in acetonitrile, respectively. Tandem mass spectra were obtained using Q Exactive HF mass spectrometer with the heated capillary temperature +280°C and spray voltage set to 2.5 kV. Full MS1 spectra were acquired from 300 to 1500 m/z at 120000 resolution and 50 ms maximum accumulation time with automatic gain control [AGC] set to 3x10 6 . Dd-MS2 spectra were acquired using dynamic m/z range with fixed first mass of 100 m/z. MS/MS spectra were resolved to 30000 with 155 ms of maximum accumulation time and AGC target set to 2x105. Twelve most abundant ions were selected for fragmentation using 30% normalized high collision energy. A dynamic exclusion time of 40 s was used to discriminate against the previously analyzed ions.

Proteomics informatics
The mgf files generated (using MSConvert, ProteoWizard 3.0.6002) from the raw data from each sample fraction was searched with Mascot 2.4.1 and X!Tandem CYCLONE Mascot and X!Tandem search engine results were analyzed in Scaffold Q+ 4.4.6 (Proteome Software, Inc). The TMT channels' isotopic purity was corrected according to the TMT kit.peptide and protein probability was calculated by PeptideProphet 85 and ProteinProphet probability model 86 . The PeptideProphet model fits the peptide-spectrum matches into two distributions, one an extreme value distribution for the incorrect matches, and the other a normal distribution for correct matches. The protein was filtered at thresholds of 0.01% peptide FDR, 1% protein FDR and requiring a minimum of 1 unique peptide for protein identification. We allow single peptide hits for two reasons: first, any peptide that is quantifiable is detected across all samples (n=58); second, we identify proteins with more than one search engine, so the protein identification is confirmed at least twice, even for single-peptide hits. For these reasons the even single peptides are unlikely to be random hits. As for single peptide quantification, the spectrum-to-spectrum variability is no higher between spectra from the same peptide than between spectra from different peptides from the same protein. Therefore, it is unlikely that there is any differential 'bias' in reporter ions from peptide to peptide. More importantly, TMT is taken as relative, not absolute, quantification. So even if there were such a bias, it would be the same across samples, so the relative quantification would not be affected. Reporter ion quantitative values were extracted from Scaffold and decoy spectra, contaminant spectra and peptide spectra shared between more than one protein were removed. Typically, spectra are shared between proteins if the two proteins share most of their sequence, usually for protein isoforms.
Reporter ions were retained for further analyses if they were exclusive to only one protein, and they were identified in all 6 channels across each TMT set. Since we have multiple age group across each TMT experiment, we analyzed the proteins for missing reporter ion intensity.
For this analysis the protein with missing reporter ion in some of the channels (not more than 2 channels) for each TMT experiment was identified and missing value imputation was performed using multiple imputation with chained equations (MICE) R library by predictive mean matching. Mean imputation was performed <0.01% in one or two TMT channels in most of the TMT experiments, except TMTset1 (the missing reporter ion for channel 5 is 0.03%). The reporter ion intensity from the proteins derived from the imputation method (on an average <10 proteins) were concatenated with reporter ion intensity identified in all 6 channels and further analysis performed using adjudicated values. The log2 transformed reporter ion abundance was normalized by median subtraction from all reporter ion intensity spectra belonging to a protein across all channels 20,21 . Relative protein abundance was estimated by median of all peptides for a protein combined. Protein sample loading effects from sample preparations were corrected by median polishing, i.e., subtracting the channel median from the relative abundance estimate across all channels to have a median zero as described elsewhere 20,21 . Quantified proteins were clustered together if they shared common peptides and corresponding gene names were assigned to each protein for simplicity and data representation. Annotation of the proteins were performed by manual curation and combining information from Uniprot, GO and PANTHER database. Further bioinformatics analysis was performed using R programming language (3.4.0) and the free libraries available on Bioconductor.

Linear mixed effect model and statistical analyses
Linear mixed regression model was implemented to examine age effects and the data was adjusted for physical activity, gender, race, bmi, type I and type II myosin fiber ratio and TMT mass spectrometry experiments. Protein significance from the regression model was determined with p-values derived from lmerTest. Partial Least Square analysis (PLS) was used to derive models with classification that maximized the variance between age groups. PLS loadings were derived from linear model adjusted protein results. The regression model was performed using R 3.3.4 87 with lme4 v1.1. library. Heat maps and hierarchical cluster analysis were performed using the non-linear minimization package in R 88 . GraphPad PRISM 6.07 and R packages were used for statistical analysis and generation of figures. STRING analysis (10.5 version) was used for obtaining protein-protein interaction network. Enrichment analysis was performed by GeneSet Enrichment Analysis (GSEA) and PANTHER, the pathways were mapped and visualized by Cytoscape 3.0 89 . One-way ANOVA, nonparametric, and chi-square tests (continuous and categorical variables) were used to test for sample differences between age groups.

RNA extraction and purification
Total RNA was prepared by lysing cell pellets (2x106) in 700 μl Qiazol and extracted using Qiagen miRNeasy mini kit according to the manufacturer's recommendation (Qiagen Inc, CA, USA) from the same samples (n=54). Small ribosomal RNA was further depleted using Qiagen GeneRead rRNA Depletion Nano Kit. Total RNA quality and quantity was checked using RNA-6000 nano kits on the Agilent 2100-Bioanalyzer. 375 ng of high-quality RNA was used for first-strand and second-strand cDNA synthesis followed by single primer isothermal amplification (SPIA) using NuGEN Ovation RNA-Seq System V2 kits according to manufacturer's protocol. This kit amplified both polyA-tailed and non-polyA tailed RNA and removed ribosomal RNA. The amplified cDNA was sheared using Bioruptor (Diagenode) to an average size of 150-250 bases. The sequencing library was prepared using Illumina ChIP-Seq kits according to the manufacturer's protocol (Illumina, San Diego, CA). In short, the ends of the fragments were repaired using T4 DNA polymerase, E. coli DNA Pol I large fragment (Klenow polymerase), and T4 polynucleotide kinase (PNK) and an A-overhang was added to the 3' end. Adapters were ligated to the DNA fragments and size-selected (250-350 bases) on a 4.5% agarose gel. An 18-cycle PCR amplification was performed followed by a second 4.5% agarose gel size selection before cluster generation in cbot2 and sequencing with Illumina Hiseq2500 sequencer using V4 reagents. Single-read sequencing was performed for 138 cycles and Real-Time Analysis (RTA) v1.18.66.3 generated the base-call files (BCL files). BCL files were de-multiplexed and converted to standard FASTQ files using bcl2fastq program (v2.17.1.14)

RNA-Seq quantification and splicing analysis
The quality of the bases was checked using FASTQC program (v11.2) before and after adapter removal and last base trimming by cutadapt program (v1.9). The cleaned FASTQ files were aligned, quantified and annotated to the human hg19 genome using Salmon 90 with the concept of quasi-mapping with two phase inference procedure for gene model annotations. The GC bias corrected, quantified transcript isoform abundance values (TPM) were used for further computation of relative abundance of the events or transcripts isoforms known as percent spliced-in (PSI) by SUPPA 57 . Since the variability of low-expressed genes between biological replicates were reported, the transcript data were filtered for the transcripts which were expressed in at least three donors per each age group. Thus, we excluded ~23% of the transcripts from total transcript quantification for further splicing analysis. Events coordinates are extracted from the Ensembl annotation (GRCh37.75) and alternative splicing events were generated. PSI values of alternative splicing events for each transcript from each sample (n=53) were estimated and the PSI values showing a good agreement with the RNA seq data were kept for further analysis. The magnitude of the PSI change (differential splicing) across the age were calculated with a linear mixed model analysis performed on the PSI to estimate the age-related splicing changes of the transcript isoform. The PSI regression model was adjusted with the aging confounders as same as described above for protein regression model except fiber ratio. For transcript data we used RNA experiment batches as a random effect. The age beta coefficient for each alternative splicing event transcript PSI was reported as the magnitude to the splicing event-specific PSI change with age. Significance of the alternative splicing events was calculated by lmerTest and was reported if the observation had a p-value <0.05 at transcript level for age beta coefficient.

Age-association of proteins and transcripts
Proteins or transcripts either significantly upregulated or down regulated with age, present in 50% of the samples or at least in three samples for each age group, and significant (p<0.05) were considered as age-associated. Age-association was measured by linear mixed model adjusted for confounders of aging phenotype either in protein analysis or in RNAseq analysis and were further filtered for significance calculation. Age beta coefficient for each protein or transcript were calculated from log2 normalized data on which a mixed linear regression model was applied. Thus, the age beta coefficient represents the mean log2 fold expression per year of age. LmerTest was used for calculating p-values from t-tests via Satterthwaite's degrees of freedom method. Any protein or transcript was represented as age-associated if the p-value for the protein or transcript was <0.05. P-values for multiple comparisons were adjusted using Benjamini-Hochberg method in R and adjusted p-values were reported on supplemental tables. Age-associated proteins and age-associated alternatively spliced transcripts were further analyzed into two categories, either age-association beta coefficient (<0) was under represented with age--indicating a decrease in the abundance of the protein with a year of age or age-association beta coefficient (>0) was over represented with age-indicating the abundance of the protein was increased with a year of age. For simplicity of reporting, we calculated the enrichment of these proteins/transcripts over the total ageassociated protein/transcripts and reported as underrepresented and overrepresented with age.

Data availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD011967.