Identification of novel mitochondrial and mitochondrial related genetic loci associated with exercise response in the Gene SMART study

Mitochondria supply intracellular energy requirements during exercise. Specific mitochondrial haplogroups and mitochondrial genetic variants have been associated with athletic performance, and exercise responses. However, these associations were discovered using underpowered, candidate gene approaches, and consequently have not been replicated. Here, we used whole-mitochondrial genome sequencing, in conjunction with high-throughput genotyping arrays, to discover novel genetic variants associated with exercise responses in the Gene SMART (Skeletal Muscle Adaptive Response to Training) cohort (n=62 completed). We performed a Principal Component Analysis of cohort aerobic fitness measures to build composite traits and test for variants associated with exercise outcomes. None of the mitochondrial genetic variants but nine nuclear encoded variants in eight separate genes were found to be associated with exercise responses (FDR<0.05) (rs11061368: DIABLO, rs113400963: FAM185A, rs6062129 and rs6121949: MTG2, rs7231304: AFG3L2, rs2041840: NDUFAF7, rs7085433: TIMM23, rs1063271: SPTLC2, rs2275273: ALDH18A1). Additionally, we outline potential mechanisms by which these variants may be contributing to exercise phenotypes. Our data suggest novel nuclear-encoded SNPs and mitochondrial pathways associated with exercise response phenotypes. Future studies should focus on validating these variants across different cohorts and ethnicities. AUTHOR SUMMARY Previous exercise genetic studies contain many flaws that impede the growth in knowledge surrounding change in exercise outcomes. In particular, exercise studies looking at mtDNA variants have looked at very small portions of the mitochondrial genome. Mitochondria are the ‘power house’ of the cell and therefore understanding the mitochondrial genetics behind adaptations to training can help us fill knowledge gaps in current research. Here, we utilised a new mitochondrial genetic sequencing technique to examine all mitochondrial and mitochondrial related genetic variations. We have shown that there were no mitochondrial specific variants that influenced exercise training however there were 9 related variants that were significantly associated with exercise phenotypes. Additionally, we have shown that building composite traits increased the significance of our association testing and lead to novel findings. We will be able to understand why response to training is so varied and increase the effectiveness of exercise training on a host of metabolic disorders.

these associations were discovered using underpowered, candidate gene approaches, and consequently have not 23 been replicated. Here, we used whole-mitochondrial genome sequencing, in conjunction with high-throughput 24 genotyping arrays, to discover novel genetic variants associated with exercise responses in the Gene SMART 25 (Skeletal Muscle Adaptive Response to Training) cohort (n=62 completed). We performed a Principal Component 26 Analysis of cohort aerobic fitness measures to build composite traits and test for variants associated with exercise 27 outcomes. None of the mitochondrial genetic variants but nine nuclear encoded variants in eight separate genes 28 were found to be associated with exercise responses (FDR<0.05) (rs11061368: DIABLO, rs113400963: 29 FAM185A, rs6062129 and rs6121949: MTG2, rs7231304: AFG3L2, rs2041840: NDUFAF7, rs7085433: 30 TIMM23, rs1063271: SPTLC2, rs2275273: ALDH18A1). Additionally, we outline potential mechanisms by 31 which these variants may be contributing to exercise phenotypes. Our data suggest novel nuclear-encoded SNPs 32 and mitochondrial pathways associated with exercise response phenotypes. Future studies should focus on 33 validating these variants across different cohorts and ethnicities.

40
we utilised a new mitochondrial genetic sequencing technique to examine all mitochondrial and mitochondrial 41 related genetic variations. We have shown that there were no mitochondrial specific variants that influenced 42 exercise training however there were 9 related variants that were significantly associated with exercise 43 phenotypes. Additionally, we have shown that building composite traits increased the significance of our 44 association testing and lead to novel findings. We will be able to understand why response to training is so varied 45 and increase the effectiveness of exercise training on a host of metabolic disorders.

48
Responses to exercise training depends on the type of exercise stimulus, and varies considerably between 49 individuals (1-3). This variability is tissue-specific, and may be explained by a combination of genetic variants, 50 epigenetic signatures, other molecular and lifestyle factors (4,5). Mitochondria are the key mediators of 51 intracellular energy and are involved in many essential cell metabolism and homeostasis processes (6) with 52 exercise training improving mitochondrial function and content (6-9).

53
The mitochondrial genome encodes 37 genes that are highly conserved but differ slightly amongst different 54 regional isolates (haplogroups) (10). Mitochondrial haplogroups and Single Nucleotide Polymorphisms (SNPs), 55 in conjunction with SNPs in mitochondrial-related genes (nuclear encoded mitochondrial proteins: NEMPs) have 56 previously been associated with athletic performance in highly trained populations and response to exercise 57 training in the general population (11). While these studies have advanced our understanding, they have primarily 58 utilised targeted genotyping technology such as candidate gene approaches, or Sanger sequencing to investigate 59 specific mitochondrial coding regions and NEMPs, such as NRF2 and PGC1α (12-15). Many of these studies also 60 lacked robust technical measures on aerobic fitness measures (9). As such, many of the identified variants have 61 not been replicated, and exercise-related genetic variants remain unknown (16).

62
To date, studies assessing mitochondrial DNA (mtDNA) variants and NEMPs pertaining to exercise training have 63 focused on protein-coding variants, with no studies looking at the more subtle effects of synonymous and non-64 coding changes (11,(17)(18)(19)(20). Further, these studies have often based haplogroup analyses on sequencing or 65 genotyping of the mitochondrial hypervariable region(s) (~500-1,000bp), with no consideration for the remaining 66 mitochondrial genome (~15,000bp) and the specific haplogroup of exercise participants. For instance, 3`UTR 67 (untranslated regions) variants that do not directly affect protein function may however affect translation, mRNA 68 shuttling to specific organelles, or epigenetic modification such as microRNA silencing (21). Intronic variants 69 may also lead to splice site changes directly contributing altered protein structure and function (22). As Next

70
Generation Sequencing has become more widely available and affordable, sequencing of the whole mitochondrial 71 genome (16,569 bp) is now feasible to uncover genetic variants associated with physical fitness phenotypes. When 72 used in combination with SNP genotyping arrays, it is possible to examine, not only the 37 mitochondrially-73 encoded genes, but variants within all nuclear NEMP genes simultaneously.

74
Therefore, the aim of the present study was to examine the association between genetic variants (i.e. mitochondrial 4 76 hypothesise that by utilising whole-mitochondrial sequencing, we will uncover novel genetic variants associated 77 with exercise responses.

79
Exercise responses and Principal Component Analysis (PCA)

80
Participant characteristics and response to exercise for all phenotypes are detailed in Table 1

90
There were 60 distinct haplogroups within the Gene SMART completed cohort of 62 participants. As such, there 91 were no statistically significant associations between the mitochondrial haplogroups with exercise response traits.

92
A summary table of the mitochondrial haplogroups found within the Gene SMART participants is shown in Table   93 2.       147 SI). 6 SNPs in 5 distinct genes were associated with ΔTT and 2 SNPs in 2 distinct genes were associated with 148 PC2. The most significant variant was rs2041840 associated with PC2 and located within NDUFAF7; we found 149 that the rs12712528 variant also within NDUFAF7 had a moderate correlation with rs2041840 (R 2 =0.5) Figure   150 3a. This variant was also trending towards significance in the Δ-Weight and Δ-VO 2max response phenotypes (Table   151 3). The T allele at rs2041840 was associated with a better response to exercise. The Locus Zoom plot (Fig 3b) 152 surrounding the MTG2 gene was also gene-rich with 11 proximal genes. The two associated variants (rs6062129 153 and rs6121949) were moderately correlated (R 2 =0.5), however there were no SNPs found within the proximal 154 genes. The locus zoom plot for the variants found within the AFG3L2 gene (Fig 3c)

169
In this study, we utilised state-of-the-art mitochondrial sequencing, along with high-throughput targeted 170 genotyping of mitochondrial-related variants encoded by the nucleus (NEMPs) to discover novel genetic variants 171 associated with responses to exercise. A total of 28 mitochondrial and 4,325 nuclear encoded mitochondrial 172 associated variants passed the nominal significance thresholds for the various candidate gene association tests.

173
We did not detect mitochondrial variants associated with exercise response, but we uncovered eight NEMPs in 174 seven distinct genes associated with exercise response.

175
Novel exercise loci

176
The most significant variant was associated with the composite exercise response phenotype and located within 177 an intron of NDUFAF7 (rs2041840). The T allele was associated with better exercise response as shown by the 178 positive beta values. NDUFAF7 encodes an arginine methyltransferase that is essential for mitochondrial complex 179 I assembly (24). We have showed that this variant was in a gene rich region with 8 proximal genes (Fig 3a),

180
indicating possible effects for this variant in any of the proximal genes or indeed for genes that may be further 181 away from the loci.

182
The two intronic variants within the MTG2 gene were found to be associated with the change in time trial measures 183 and appeared to be moderately linked (Figure 3b).

192
An intronic variant within AFG3L2 was also shown to be associated with the composite exercise response 193 phenotype (rs7231304), but this gene has not previously been associated with exercise response. However, 194 mutations in AFG3L2 have been shown to cause spinocerebellar ataxia through the development of mitochondrial 195 proteotoxicity (25, 26). As such, the intronic variation within this gene might inhibit exercise response through 197 (Fig 3c), however no genes within this locus shared a recombination rate above 10%. There were two proximal

198
SNPs with a moderate correlation to the SNP of interest also within the AFG3L2 genic region.

199
The T allele at the exonic rs7085433 variant in the TIMM23 gene was associated with the change in time trial

213
Overexpression of this protein has also been shown to cause elevated sphingolipid formation and therefore 214 mitochondrial autophagy (32). Much like the TIMM23 rs7085433 variant, the effect size for time to completion 215 in Time Trial (β = 587.7 seconds) indicated that carriers of T allele/genotype have slower TT and therefore poorer 216 response to exercise when compared to carriers of the C allele/genotype. We hypothesise that the C allele for this 217 variant may induce a novel miRNA binding site in the transcript resulting in the silencing of the SPTLC2 gene.

225
None of the mitochondrial genetic variants identified in this study were associated with exercise response in the 226 present study to a threshold of FDR<0.05. Additionally, we lacked enough statistical power to associate 227 mitochondrial haplogroup with exercise responses as the cohort was extremely heterogenous.

228
The g.A8701G variant within the ATP6 gene causes a missense change within its respective protein (p.Thr59Ala)

229
and has been well characterised in hypertensive cases (33). This variant was nominally significant in both the Δ-

230
LT phenotype and the PC3 composite trait within the cohort. As the Δ-LT trait was provides a smaller contribution 231 to PC3, the variant was assumed to be partially associated with a mixture of the Δ-TT and Δ-VO 2max phenotypes.

232
The effect size of this variant indicated a poor response to exercise training (β = -26.19 LT).

233
Interestingly, all the variants associated with PC4 were related to the utilisation of the amino acid Leucine. Firstly,

266
BMI and were moderately trained with an age range of (31.33 ± 7.94 years).

267
The Gene SMART study design has been previously reported (37). Briefly, participants were required to provide 268 medical clearance to satisfy the inclusion criteria. Following familiarisation, baseline exercise performance was 269 determined on a cycle ergometer during a 20 km time trial (TT), and two graded exercise tests (GXTs); these tests

270
were administered a few days apart and no more than two weeks apart to limit temporal variability in performance.

272
Genomic DNA was extracted for 77 participants regardless of completion status from 2.0mL of whole blood using

289
The ped file generated from Illumina GenomeStudio v2.0 software was converted into binary format. We did not 290 impute any genotypes to prevent false positive associations and a larger multiple testing burden.

302
Participant stratification into high and low response groups lead to a loss of statistical power in association testing.

303
As such, and to avoid classifying responders and non-responders via arbitrary thresholds, we chose to keep the 304 phenotypes as continuous variables for association testing (45).

305
To ascertain variants that were associated with exercise response for key physiological traits, we utilised the delta 306 (Δ) change (Post phenotype -Pre phenotype) quantitative trait data for; peak power output (ΔWpeak in Watts); 307 power at lactate threshold (ΔLT in Watts); peak oxygen uptake (ΔVO 2peak in mL/min/kg body weight); and time 308 to completion measurement for a 20 km time trial (ΔTT in seconds). As the quantitative traits were all continuous 310 response phenotypes, we conducted a Principal Component Analysis (PCA) of the response phenotypes using the 311 R package FactoMineR (46). PCA is a dimensionality reduction method that computes linear combinations of the 312 multiple response phenotypes into principal components (PCs) so that the variance between individuals is 313 maximised. Every individual is then represented by one value for each PC, considered a composite trait of the 314 different response phenotypes. A more detailed description of PCA for composite trait association testing is shown 315 in Supplementary Fig 1.

316
Missing phenotypic values were excluded from the phenotype table prior to PCA to prevent skewing of data and 317 to maintain appropriate PCs. Following the PCA, these variables were set as "missing" for the association analysis.

318
We also tested the individual response phenotypes and compared the significance levels of variants between the 319 composite traits with those within each PC. This resulted in 4 PCs that cumulatively explained > 90% of the 320 variance between participants.

321
Statistical analysis

322
Analysis of the response traits was performed in SPSS using a paired samples t-test. SPSS was also used to test 323 associations between mitochondrial haplogroups and exercise response with a Wald test. Analyses for the 324 mitochondrial SNPs and NEMP SNPs were kept separate for analysis using different association models. We used 325 PLINK V1.90p to perform quantitative linear association tests (95% CI) with both dominant and recessive models.

326
An additive model was also attempted but yielded the same results as our dominant model. We adjusted all 327 association results for age and effect sizes were determined using raw beta regression coefficient values (i.e.

328
genotype X is associated with β [unit specific to trait of interest] changes in the phenotype). Variants that passed 329 a nominal P value threshold of P<0.05 were considered for further analysis whereas variants that passed multiple 330 testing adjustment using the Benjamini-Hochberg False Discovery Rate (FDR<0.05) method were considered 331 significant associations. We performed adjustment for multiple testing for each phenotype separately. All variants 332 from the association tests were plotted in R using the tidyverse, ggplot2, and qqman packages.