Abstract
Mitochondrial dysfunction is involved in a wide array of devastating diseases but the heterogeneity and complexity of these diseases’ symptoms challenges theoretical understanding of their causation. With the explosion of -omics data, we have the unprecedented ability to gain deep understanding of the biochemical mechanisms of mitochondrial dysfunction. However, there is also a need to make such datasets interpretable, and quantitative modelling allows us to translate such datasets into intuition and suggest rational biomedical treatments. Working towards this interdisciplinary goal, we use a recently published large-scale dataset, and develop a mathematical model of progressive increase in mutant load of the MELAS 3243A>G mtDNA mutation to develop a descriptive and predictive biophysical model. The experimentally observed behaviour is surprisingly rich, but we find that a simple, biophysically-motivated model intuitively accounts for this heterogeneity and yields a wealth of biological predictions. Our findings suggest that cells attempt to maintain wild-type mtDNA density through cell volume reduction, and thus energy demand reduction, until a minimum cell volume is reached. Thereafter, cells toggle from demand reduction to supply increase, upregulating energy production pathways. Our analysis provides further evidence for the physiological significance of mtDNA density, and emphasizes the need for performing single-cell volume measurements jointly with mtDNA quantification. We propose novel experiments to verify the hypotheses made here, to further develop our understanding of the threshold effect, and connect with rational choices for mtDNA disease therapies.
Author Summary Mitochondria are organelles which produce the major energy currency of the cell: ATP. Mitochondrial dysfunction is associated with a multitude of devastating diseases, from Parkinson’s disease to cancer. Large volumes of data related to these diseases are being produced, but translation of these data into rational biomedical treatment is challenged by a lack of theoretical understanding. We develop a mathematical model of progressive increase of mutant load in mitochondrial DNA, for the mutation associated with MELAS (the most common mitochondrial disease), to address this. We predict that cells attempt to maintain the ratio of healthy mtDNA to cell volume by reducing their cell volume until they reach a minimum cell volume. As mutant load continues to increase, cells switch strategy by increasing their energy supply pathways. Our work accounts for large-scale experimental data and makes testable predictions about mitochondrial dysfunction. It also provides support for increasing mitochondrial content, as well as reduction in dependence upon mitochondrial metabolism via the ketogenic diet, as relevant treatments for mitochondrial disease.
Introduction
Mitochondria are organelles known for their role in the production of ATP, the major energy currency of the cell. Their dysfunction is implicated in a host of diseases because of their role in biosynthesis [1] and energy supply, as well as their importance in cell death signalling [2], implicating them in diseases ranging from neurodegeneration [3] to cancer [4]. Fundamental understanding of these organelles and their dysfunction is therefore of far-reaching biomedical importance.
Mitochondria generate ATP by pumping electrons across their inner membrane, to generate an electro-chemical gradient, which is used by ATP synthase to convert ADP to ATP. The process of electron pumping is known as the electron transport chain (ETC), and this pathway of ATP generation is called oxidative phosphorylation (OXPHOS). Mitochondria also possess their own circular DNA (mtDNA), which are held in multiple copy number per cell. These genomes encode 13 proteins (which encode subunits of complexes I, III and IV of the ETC and ATP synthase), 22 tRNAs and 2 rRNAs. An important class of diseases which affect mitochondria are those which are caused by a mutation in mtDNA. The most common [5,6], and most studied, of these is MELAS (mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes) syndrome, which is often associated with a mitochondrial tRNA mutation at position 3243A>G of the mitochondrial genome. Its incidence rate shows large regional variability, with prevalence of 1:6000 in Finland [7], to 1:424 in Australia [8]. tRNAs affected by the mutation cause amino-acid misincorporations during translation, generating defective mitochondrial protein, and defective respiration, when mutant load (or heteroplasmy) is high [9].
A common feature in many diseases associated with mutations of mitochondrial DNA, including MELAS, is the non-linear physiological response of cells and tissues to increasing levels of mtDNA heteroplasmy. In particular, cells appear to be able to withstand high levels of heteroplasmy without showing any significant metabolic or physiological defect. For instance, fibroblasts possessing the MELAS mutation were shown to have unaffected respiratory enzyme activity until mutant load exceeded around 60% [10]. Also, Chomyn et al. [11] showed that oxygen consumption of cells does not significantly reduce until MELAS heteroplasmy exceeds ~90%. This observation has been named the threshold effect (reviewed in [12]).
It has been argued that the threshold effect occurs because mitochondria possess spare capacity at the translational, enzymatic and biochemical levels, which are each able to absorb some degree of stress, and thus delay the phenotypic response of increasing heteroplasmy, until a particular threshold heteroplasmy is exceeded, which is typically large [12]. Within this picture, each physiological feature (such as enzymatic activity or oxygen consumption), may be expected to display step-like behaviour with respect to increasing heteroplasmy. Asynchrony of thresholds between different features, such as 60% for enzyme activity [10] and 90% for oxygen consumption [11], may be explained by spare capacity at intermediate levels: a biochemical threshold effect in this example, where metabolic fluxes are altered to compensate for fewer functional enzymes [12].
A recent study published by Picard et al. [13] established 143B TK− osteosarcoma cell lines containing the MELAS 3243A>G mutation across the full dose response of mutant load. They measured a diverse array of features including RNA expression, protein expression, cell volume, growth rates, mitochondrial morphology and mtDNA content. The sheer diversity of data collected, across multiple levels of heteroplasmy, makes this an important dataset in understanding the threshold effect and mitochondrial dysfunction. Under the interpretation of the threshold effect presented above, one might expect a monotonic response to heteroplasmy, as spare capacity is depleted and the cell seeks alternative means of energy provision. However, Picard et al. observed complex multiphasic responses across numerous physiological readouts as heteroplasmy was increased [13].
The authors of that study identified four distinct transcriptional phases in the gene expression profiles of MELAS 3243A>G cells: 0%, 20-30%, 50-90% and 100% mutant load. They argue that continuous changes in heteroplasmy results in discrete changes in phenotype, because there exists a limited number of states that the nucleus can acquire in response to progressive changes in retrograde signalling [13]. In this work, by considering a distilled subset of the data from [13], and using simple, physically motivated arguments, we attempt to provide a simplified account of this dataset to gain better understanding of the consequences of this mutation and the threshold effect.
Our mathematical model suggests that cells attempt to maintain homeostasis in wild-type mtDNA density at low heteroplasmies, through reduction of cell volume and therefore cellular energy demand. We propose the existence of a single critical heteroplasmy, where cells are no longer able to maintain this homeostasis, and toggle from energy demand reduction to supply increase. In this regime, energy supply pathways are upregulated. Our model also identifies an additional bioenergetic transition, in excess of 90% mutant load, as cells become fully homoplasmic. We explore the possibility of reduced transcriptional activity in mutant mtDNAs/mitochondria, limited tRNA diffusivity, and a connection between cellular proliferation rate and cell volume, finding all of the above to have explanatory power. We propose new experiments to verify the novel hypotheses made here, to drive forward understanding of the threshold effect.
Results
Per-cell Interpretation of Omics Data Highlights Multiphasic Dynamics in Response to Heteroplasmic Load
We aim to understand mean cellular behaviour in response to rising levels of 3243A>G heteroplasmy. It is therefore important that the data we use to build this description is in per-cell dimensions. We perform normalisations of the data from [13] to create measures of overall gene expression of bioenergetic pathways from measurements of individual genes, and adjust for potential bias from variable cell volume induced by heteroplasmy (see Materials and Methods). Fig. 1 shows the core subset of physiological features from [13] we attempt to describe here, after these normalisations.
A simple interpretation of the threshold effect predicts the existence of spare capacity in the transcription, translation, enzyme complex and biochemical levels of the cell, in response to increasing heteroplasmy [12]. Under this interpretation of the threshold effect, we might expect all of these functions to have no more than one turning point with increasing heteroplasmy.
However, the data in Fig. 1, and indeed the dataset of Picard et al. [13] overall, shows a much more complex response. For instance, ETC transcripts clearly show two turning points, suggesting some kind of transient compensatory response. Across the features, these data also appear to be asynchronous in their turning points, for instance ETC transcripts peak at h = 0.6, but glycolysis transcripts peak at h = 0.9. This highlights the need for an extension in our understanding of the threshold effect, as well as the challenge in trying to parsimoniously model such a complex dataset.
We note that the measurement uncertainty, where reported in [13], for our selected features of interest are small relative to the variation with respect to heteroplasmy (see Fig. 1), justifying a non-linear fit to the data. It should be noted, however, that this uncertainty only reflects the technical variability in measurement, and does not include potential biological variability of these features. We use a Bayesian approach to appropriately account for this uncertainty, see Generative Model Description.
Integrated Omics Data Motivates a Model of the Causal Relationships between Bioenergetic Variables
We present a qualitative description of our model in Fig. 2, which we will develop into a full quantitative description below. One of our central claims is the existence of a single transition in cellular behaviour, in response to increasing heteroplasmy of the 3243A>G mutation, over the 0-90% heteroplasmy range. We propose that, at low heteroplasmy values, cells attempt to maintain homeostasis in wild-type mtDNA density by reducing their volume. This reduces biosynthetic and translational energy demands, by the simplifying assumption that energy demand scales directly with cell volume.
Our model suggests that at a critical heteroplasmy, h*, cells undergo a demand/supply toggle where energy supply is upregulated. Electron transport chain (ETC) transcripts are stabilized through reduced degradation, and glycolysis is increased. This bioenergetic compensatory behaviour at intermediate heteroplasmies allows cell volume to recover.
As heteroplasmy continues to increase, we claim that degradation of ETC transcripts becomes negligible. Thus, further increases in heteroplasmy results in reduction in ETC protein content and ETC exhaustion ensues.
These behaviours are captured in the mathematical framework of our model. However, as cells transition from 90% to 100% mutant mtDNA, another transition in cellular behaviour appears to occur, according to the data of [13]. Cells downregulate glycolysis, and yet retain cell volume and growth rate. The mode of energy production in this case is unclear, and opens new questions as to the most relevant energy supplies and demands in homoplasmic cells (see Key Claims and Predictions of Biophysical Model of Heteroplasmy).
Interactions between Bioenergetic Variables can be Cast as a Bottom-up Quantitative Model
We now present a quantitative description of our model, see Table 1, whose mechanistic interpretations will be more fully explored in Key Claims and Predictions of Biophysical Model of Heteroplasmy. Our model attempts to unify the experimentally-measured features of [13] within a simple, physically plausible, bottom-up cell biological representation. We stress that our choice of model structure was not developed independently of the data in [13]; hence, at the level of choice of model structure, we have limited control of over-fit. However, the uncertainty in its parameters, given the data and a set of priors (see Generative Model Description), was computed using Bayesian inference. So whilst our parametrization of the model has statistical control for uncertainty, we have not employed a statistical model selection framework. We believe this to be appropriate and practically unavoidable, as our objective is to yield a new reduced account for these heterogeneous data, present novel hypotheses and propose new experiments.
Wild-type mtDNA scaling
A theme apparent in the data of Picard et al. is an overall downward trend of ETC mRNA and ETC protein with increasing heteroplasmy (h). We therefore use the hypothesis that these quantities scale with the amount of wild-type mtDNA (N+). We assume that N=mtDNA copy number=const (set to 1 after normalisation, without loss of generality) which then defines N+, see Eq.(1). The successful performance of this simple model for N+ is shown in Figure S1. Note that this model has no free parameters, so was neglected in our Bayesian inference.
ETC mRNA
Transcript copy number is determined by the balance of transcription (β) and degradation (δm) rates. Given our assumption of N+ scaling, it can be shown (see Text S1) that Eq.(2) may be used to model the ETC mRNA pool size (METC). We further assume a constant mean transcription rate β for parsimony, and allow the degradation rate δm to vary with heteroplasmy in response to cellular signals. We require the degradation rate to be high for low heteroplasmies, and low at high heteroplasmies, to describe the ability of cells to upregulate their transcript copy number with rising heteroplasmy. A biologically-motivated choice of function which achieves this is a sigmoid, see Eq.(3), where kmRNA, km and h0 are constants.
ETC protein
It is intuitive to assume that mean protein levels scale with transcript levels, although this relationship may be noisy [14]. Following a similar assumption for ETC mRNA, we also assume that ETC protein (P +) scales with wild-type mtDNA levels. Using analogous arguments to METC (see Text S1), we show that a reasonable model for ETC protein is Eq.(4), where δp=const, denotes the baseline degradation of mitochondrial protein.
Glycolysis mRNA
We assume that the glycolysis mRNA pool size (Mgly) is invariant to heteroplasmy, until a critical heteroplasmy h*, where glycolysis is gradually upregulated as a result of cellular control. It is therefore parsimonious to assume that glycolysis regulation obeys a spline of a constant and linear model which toggles at h*, see (5), where c1 and m2 are free parameters, and c2=c1 − m2h*, by continuity.
Cell volume
We propose that the energy demands of the cell may be well approximated as scaling with cell volume, see Text S1 for further discussion. As glycolysis and OXPHOS provide energy supply to first order, we assume that mean cell volume in an asynchronous population of cells (V) is effectively determined by a scaled sum of glycolysis and OXPHOS contributions to energy balance, such that the cell obeys an energy supply=energy demand relationship, see Eq.(6), where ko, kg=const.
From Fig. 1, it is clear that this assumption fails at h=1, where glycolysis levels and cell volume are comparable to h=0 levels, and yet ETC proteins are only 30% of wild-type levels. As ATP levels are below wild-type levels in these cells (see Fig. S6E in [13]), the mode of energy production is not clear and further metabolomic data may be required. We therefore exclude all h=1 data, and limit the domain of our model to 0 ≤ h ≤ 0.9.
Growth rate
We observe that the cellular proliferation rate (which we call growth rate, G, for consistency with [13]) varies with heteroplasmy (see Fig. 1). We hypothesize that there exists a relationship between mean cell volume and growth rate. It can be shown that, assuming individual cells increase their volume linearly through the cell cycle, growth rate varies inversely with mean cell volume (shown in Text S1). This is shown in Eq.(7), where kgr is a constant.
We show in Text S1 that, under an exactly exponential model of cell growth, G is independent of V. However, given that there is presumably a wide class of cytoplasmic growth-vs-time profiles which cells may obey, we use a linear model as a parsimonious example of how cell growth may be connected to cytoplasmic volume.
Maximum respiratory capacity
It has long been recognised that cells carrying the MELAS mutation experience a respiratory defect when heteroplasmic load exceeds approximately 90% [11, 15], and that this is due to a defect in protein synthesis [9, 11]. We therefore assume that maximum respiratory capacity (Rmax) is always determined by protein content. This yields a simple linear expression, see Eq.(8), where kp=const.
Model summary
In summary, our model of mean cellular behaviour with respect to heteroplasmy describes 7 features from Picard et al. [13] (N+, METC, P +, Mgly, V, G, Rmax) and has 12 adjustable parameters (as discussed later this is fewer than the number required for 7 linear models), a table of which is shown in Table S2. In writing down this phenomenological model, we have attempted to account for a physiologically important subset of the data generated in [13], using bottom-up arguments wherever possible. In doing so, a number of novel, falsifiable, hypotheses are made.
Parametrizations of a Simple Biophysical Model Account for Complex Observations Across Range of Heteroplasmic Load
The fit of the model described above is shown in Fig. 3. Between 0 ≤ h ≤ h*, h* being the critical heteroplasmy where glycolysis is upregulated (0.34 ≤ h* ≤ 0.44, 25-75% CI), our model reproduces the reduction in ETC transcript pool size. Similarly, we observe that ETC protein pool size also reduces, as does cell volume and maximum respiratory capacity.
Our model is able to successfully capture the transient compensatory responses in ETC mRNA, ETC protein and cell volume which begin around the critical heteroplasmy h*. For heteroplasmies between , ETC mRNA degradation reduces causing ETC mRNA to be upregulated, along with ETC protein and maximum respiratory capacity. In this region, glycolysis becomes induced above wild-type levels, and cell volume can be observed to also recover.
In excess of h ≈ 0.5, our model shows the observed reductions in ETC mRNA, ETC protein and maximum respiratory capacity. We see that continued upregulation of glycolysis mRNA allows cell volume to remain at an approximately constant value, although diminished relative to a wild-type cell. Consequently, heteroplasmic cells between 0.2 ≤ h ≤ 0.9 are predicted to proliferate at a faster rate than wild-type cells (see Fig. 3E).
Key Claims and Predictions of Biophysical Model of Heteroplasmy
Here, we revisit the interpretations of our model in light of the mathematical description developed above, and explore the evidence for the biological insights it provides. We make experimental proposals to validate our claims, which are given in Text S2. The set of mechanistic interpretations which follow from our mathematical model, see Fig. 4, are:
Wild-type mtDNA density is maintained homeostatically at low heteroplasmy
There exists a minimum cell volume which is approached at the critical heteroplasmy
Cells toggle from demand reduction (i.e. cell volume reduction) to supply increase (i.e. glycolysis and ETC mRNA upregulation), at the critical heteroplasmy
Mutant mtDNAs do not significantly contribute to the mitochondrial mRNA pool
Mitochondrial tRNAs remain moderately localised to their parent mtDNA
Maximum respiratory capacity is determined by ETC protein levels through a linear relationship
Cell growth rate is the reciprocal of mean volume, thus smaller cells grow faster
Wild-type mtDNA density homeostasis is maintained until a minimum volume is reached near the critical heteroplasmy
The parameter h* determines the extent of mutant load, for which the cell begins to upregulate ETC mRNA and glycolysis mRNA. But what causes this change in behaviour, at this particular value of heteroplasmy? By examining the posteriors of our model fit (Fig. 3) we infer that cell volume takes its minimum value shortly before the most probable value of h* (see Fig. 5). We hypothesize that an attempt to conserve wild-type mtDNA density (N +/V) determines the position of h*.
For , wild-type mtDNA density is maintained despite increasing heteroplasmy, because cell volume diminishes. As a result of this reduced demand, the cell can tolerate diminished mitochondrial power supply. However, cell shrinkage cannot continue indefinitely and we hypothesize that the cell reaches a minimum cell volume at h ≈ h*. Once heteroplasmy exceeds this value, the cell toggles its energy balance strategy from demand reduction to supply increase, and the cell recovers in volume.
There is evidence in the literature that wild-type mtDNA density is an important quantity. Bentlage and Attardi [15] observed that long-term culture of heteroplasmic MELAS cells resulted in an increase in mtDNA copy number, resulting in increased oxygen consumption. Whilst this was often accompanied by a decrease in heteroplasmy, some cell lines also exhibited this at constant heteroplasmy. This is consistent with the cell attempting to increase the absolute number of wild-type mtDNAs, perhaps to compensate for heteroplasmic load, and suggests that the absolute value of N + is a physiologically important quantity.
The density of mitochondrial content per unit cytoplasmic volume has been observed by many authors to be tightly regulated and physiologically predictive. The historical observations of Posakony et al. [16] showed that the mean ratio of mitochondrial content to cytoplasmic volume is kept relatively constant throughout the cell cycle in HeLa cells, occupying ~10-11% of cytoplasmic area throughout. Similar observations have been reproduced in more recent studies, in various other systems. Rafelski et al. found in budding yeast that mitochondrial content was proportional to bud size, and that all buds attain the same average ratio regardless of the mother’s age or mitochondrial content [17], suggesting a stable scaling relation. Also, Johnston et al. [18] found that the density of mitochondrial mass was predictive of cell cycle dynamics, indicating that N/V (N=total number of mtDNAs) is physiologically relevant and potentially linked to cell energy supply and growth dynamics. Indeed Jajoo, Paulsson and co-workers [19] found that the density of mitochondrial DNA tracks the quantity of cytoplasm inherited upon division in wild-type fission yeast. Finally, Otten et al. found a positive correlation between cell volume and mtDNA copy number in zebrafish oocytes [20].
We may speculate as to the interpretation of a minimum cell volume. One straightforward interpretation is that a minimum cell volume corresponds to a mechanical constraint: a cell may only become so small because the machinery required to perform tissue-specific metabolic and structural tasks require a minimum amount of space.
An alternative to this is a bioenergetic minimum cell volume. Numerous historical studies have shown that there exist appreciable energy demands which do not scale linearly with volume [21–23]; for instance, processes which only serve the nucleus such as DNA-replication, or demands associated with the plasma membrane. If a unit volume of cytoplasm has a particular energy output, which satisfies the energy demand of that unit volume plus an energy surplus, then continued reduction of cell volume results in the total energy surplus of the cytoplasm being unable to meet the demands of the nucleus and plasma membrane. At this bioenergetic minimum cell volume, the nucleus may signal to increase energy production pathways to restore the energy surplus of a unit volume of cytoplasm. If we assume that power supply per unit volume must be maintained, then as cells become smaller in radius (r), surface area demands per unit volume scale with r2/r3=1/r whereas constant energy demands per cell per unit volume scale with 1/r3. In this way, demands associated with cell surface area may be the first to become prohibitive as cells reduce in size, more so than constant demands which scale with a larger negative power of r. Both mechanical and bioenergetic limits no doubt exist, but which of these constraints is first encountered upon volume reduction is open.
ETC mRNA degradation diminishes at the critical heteroplasmy contributing to energy demand/supply toggle
The induction of glycolysis at the critical heteroplasmy is observed in our model by construction, see Eq.(5), since glycolysis is modelled to increase linearly when heteroplasmy exceeds this point. However, by observing the posterior distribution of the ETC mRNA degradation rate (see Fig. 6), we see that the critical heteroplasmy also coincides with the beginning of reduction in ETC transcript degradation with respect to heteroplasmy. Since ETC mRNA pool size varies with the inverse of this degradation rate (see Eq.(2)), ETC transcripts are consequently upregulated in tandem with glycolysis transcripts. This occurs until ETC degradation diminishes to negligible levels around h ≈ 0.5, where this particular control mechanism becomes exhausted. Thus, the critical heteroplasmy coincides with a shift from energy demand reduction, to supply increase from both glycolysis and OXPHOS contributions.
Since mtDNA is transcribed as a single polycistronic transcript [24], the stoichiometry of individual mRNA species must be controlled via active degradation. This is achieved by a balance between processes which stabilize and degrade mRNA [25]. The Picard data set can be explored further to seek corroborating evidence, by observing the ratio of ETC mRNA degraders to stabilizers. We find a qualitative similarity between this ratio (see Figure S7) and the posterior distribution of the ETC degradation rate (see Fig. 6), both displaying a substantial reduction between h=0.3 and h=0.5.
Mutant mtDNAs do not significantly contribute to the mitochondrial mRNA pool
Our hypothesis that ETC mRNA transcript pool size is proportional to wild-type mtDNA copy number, i.e. METC ∝ N +, was invoked as a simple explanation for the overall downward trend with heteroplasmy. We favoured this explanation over, for instance, allowing the transcript birth rate to decrease with heteroplasmy, as such behaviour would contradict the behaviour of the degradation rate which acts to increase transcript pool size. The implication of this model is that mutant mtDNAs do not contribute strongly to the transcript pool, either through a transcription defect or selective degradation of all transcripts from mutated mtDNAs, the precise mechanism is not prescribed by the model. (We discuss alternative models to N + scaling for METC in Alternative Hypotheses)
Mitochondrial tRNAs are relatively localised to their parent mtDNA
It has been observed that homoplasmic MELAS cells are able to translate mitochondrially-encoded proteins; however, misincorporations cause these translation products to become unstable [9]. Assuming rapid degradation of such proteins, mutant mtDNAs are not expected to contribute strongly to the total ETC protein content of the cell, which has been confirmed by experimental observation in homoplasmic cells [9]. We adopted the model of ETC protein being proportional to wild-type mtDNA copy number, i.e. P + ∝ N +, as a parsimonious model for such scaling.
One interpretation of this assumption is that we can identify Eq.(4) as obeying mass-action kinetics between ETC mRNA and wild-type mtDNA molecules, with a constant baseline degradation rate. A simple interpretation of this, is that ETC mRNAs must come into the proximity of wild-type mtDNAs, to be translated.
One way in which this might be achieved is if tRNAs remain spatially localised to their parent mtDNA; in other words, tRNAs have low diffusivity. ETC mRNAs which come into contact with mutant mtDNAs are translated into mutated protein only, since mutated tRNAs are much more available, which are then rapidly degraded. Conversely, mRNAs which localise with wild-type mtDNAs are only translated into normal protein, since only wild-type tRNAs are available. (We discuss alternative models to N + scaling for P + in Alternative Hypotheses)
Evidence in the literature for this claim is mixed. It has been observed that mitochondrial mRNAs, such as ND6, localise to mtDNA suggesting that mtDNA may be a site for mitochondrial translation [26]. However, cybrid experiments involving homoplasmic tRNA mutants 3243A>G and 4269A>G are able to recover their respiratory function by fusing such cells together to form hybrids [27]. Their recovery is presumably due to the diffusion of the healthy form of each tRNA, so that normal proteins may be translated. See Text S2 and Table S1 for experimental suggestions to determine the extent of tRNA diffusivity.
Cell volume is not explained by cell cycle variations
Our model predicts that cells, on average, change their size as heteroplasmy is varied, due to variation in power supply from OXPHOS and glycolysis. However, since cells vary their volume by a factor of 2 throughout the cell cycle, it is possible that cells with different heteroplasmies spend different durations at various stages of the cell cycle, explaining the observed variation in expected cell volume with heteroplasmy (see Fig. 1D). We sought evidence for this hypothesis, by computing the ratio of the expression level, for genes associated different stages of the cell cycle [28], see Figure S8. However, we found little evidence to support the enrichment of cell cycle markers at any particular level of heteroplasmy.
OXPHOS contributions to energy supply are stabilized at the critical heteroplasmy
The relative contribution of OXPHOS to energy supply, i.e. koP +/(koP + + kgMgly), is also interesting to observe as heteroplasmy is varied. We observed a transient stabilization in OXPHOS contributions around h*. A discussion of this is presented in Text S3.
Cells proliferate inversely with their size
Due to our reciprocal model connecting cell volume and growth (see Eq.(7)), our model suggests that wild-type cells proliferate more slowly relative to heteroplasmic cells due to their larger size.
Maximum respiratory capacity linearly tracks ETC protein content
It has long been suggested that cells above a particular threshold heteroplasmy experience a respiratory defect [10, 11, 15]. In our model, we found that a simple linear relationship between ETC protein and maximum respiratory capacity was sufficient to describe the data available (see Eq.(8)). With a more classical interpretation, we might have expected the need to deploy a model which has switching behaviour in excess of 60% heteroplasmy [10] for maximum respiratory capacity, in analogy with glycolysis transcript levels (see Eq.(5)).
Reactive oxygen species may explain the transition to homoplasmy but mode of energy production remains unclear
In Eq.(6), we claim that cell volume is determined by the weighted sum of glycolysis transcripts and ETC protein. Over the range 0.9 < h ≤ 1, glycolysis transcripts reduce by 57%, whereas ETC protein and cell volume remain comparable, thus breaking the supply=demand relationship, as we have modelled it. Consequently, our model fails to describe the transition from h=0.9 → 1.
A potential explanation for the reduction in glycolysis transcripts over this range, comes from the fact that glycolysis provides substrate for oxidative phosphorylation. Damaged electron transport chain proteins may produce an excess of reactive oxygen species (ROS) [29], which can damage mitochondrial proteins, DNA and membranes. If, at high heteroplasmy, any flux through the electron transport chain causes high levels of ROS, then cells may attempt to reduce flux through glycolysis, to avoid production of these species.
Some evidence from Picard et al. supports this hypothesis, where superoxide dismutase (SOD) activity is largely constant with heteroplasmy, except for homoplasmic mutant MELAS cells, which have ~ 20% higher SOD activity than wild-type cells (see Fig. S7D of [13]). Furthermore, it is known that ROS can reversibly inhibit the activity of GAPDH, one of the enzymes involved in glycolysis [30, 31].
However, given that fatty acid oxidation (see Figure S9) is strongly downregulated over this range, it remains unexplained how homoplasmic mutant cells maintain their cell volume (and growth rate), given their reduced reliance upon mitochondrial and glycolytic metabolism. Further metabolomic measurements may be required to uncover this mode of energy production.
ATP levels are also observed to decrease over this range (see Fig. S6E of [13]), which may even suggest that an alternative fuel currency besides ATP supports the growth and size of these cells. However, more careful investigation of this observation may be of value, since it is important to draw the distinction between ATP pool sizes and ATP fluxes, the latter perhaps being more indicative of ATP usage, and the former being indicative of only relative production/consumption rates.
Alternative Hypotheses
Here we explore several alternative hypotheses, and point out a number of reservations in accepting these alternatives over the model presented above.
Wild-type mtDNA copy number scaling for ETC mRNA
Our claim that ETC mRNA scales with wild-type mtDNA copy number linearly (see Eq.(2)) was used for parsimony, but other models exist. We know that mtDNA exists in multiple copy number per organelle, one estimate is that there exists between 4-40 mtDNAs per mitochondrion [32]. If we use the model that mitochondria containing only mutated mtDNA are unable to transcribe, and mitochondria are otherwise able to transcribe, then this results in a non-linear scaling of METC with h. Instead of scaling with N +, ETC mRNA would scale with the probability that 100% of mtDNAs per organelle are mutated (p100, which follows a binomial distribution with probability h). We might think of this as an organellar threshold effect. This model is plausible if we assume that mitochondria power their own transcription, and even a single mtDNA is able to power transcription for the entire organelle.
However, since there exists a range of possible values for the number of mtDNAs per mitochondrion in a cell, this non-linearity is likely to be somewhat smoothed out. So having ETC mRNA scaling with N + is still plausible at a cellular level, even under this hypothesis, and is more parsimonious.
We also investigated the possibility that ETC mRNA scales with N + + μN −, where N −=the number of mutant mtDNAs (see Text S4), where we constrained 0 ≤ μ ≤ 1. We found large support for values of μ close to 1 (see Figure S3), but draws from the posterior distribution of METC were often purely linear and thus inappropriate for understanding threshold effects (see Figure S15). For this reason, we rejected the mutant transcription model in favour of the model presented in the main text.
Wild-type mtDNA copy number scaling for ETC protein
A similar organellar threshold argument of scaling with the probability that 100% of mtDNAs per organelle are mutated, p100 scaling (instead of N +), may hold if mitochondria power their own translation. This would relax the need to invoke low tRNA diffusivity. But, again, for parsimony we favoured N + scaling.
We also explored the ability of misincorporation effects to explain the ETC protein data, as opposed to tRNA localisation (see Text S4). If tRNAs are well mixed, then an ETC protein may possess a tolerance to the number of misincorporations per protein. Upon performing Bayesian inference, we found that the most likely tolerance to the MELAS mutation was 100% of residues: in other words, ETC proteins are immune to the MELAS mutation. This has been shown experimentally to be incorrect [9]. Furthermore, draws from the posterior distribution of METC were often purely linear and thus inappropriate for understanding threshold effects (see Figure S15). For these reasons, we rejected the tRNA misincorporation model in favour of the model presented in the main text.
Discussion
In this study, through use of a distilled subset of data from [13] and using minimal arguments, we have attempted to explore the apparent marked difference between the complex multiphasic observations of Picard et al., and the classical step-like models associated with the threshold effect.
We argue that a single critical heteroplasmy, h*, is sufficient to explain this subset of data over the heteroplasmy range 0 ≤ h ≤ 0.9 and that other multiphasic behaviour arises naturally from the simple physical/biological assumptions of our model. Our model suggests that cells undergo an energy demand/supply toggle at h*, from demand reduction to supply increase. We hypothesize that homeostasis in wild-type mtDNA density is maintained via cell volume reduction, ensuring that the available functioning power sources are matched to a corresponding level of cellular demand, until a minimum cell volume is reached which coincides with h*. This triggers the demand/supply bioenergetic toggle where energy production pathways are upregulated. We believe this re-emphasizes the need for quantification of single-cell mtDNA content to be associated with volume measurements of the same cell: mtDNA density is a relevant physiological variable [16–20]. We find that the mode of energy production over the range 0.9 ≤ h ≤ 1 is unclear, and that further metabolomic investigations may be required to determine this.
Our model further generates hypotheses that mutant mtDNAs (or alternatively homoplasmic organelles) have a reduced contribution to transcription, that tRNAs have low diffusivity, and that a relationship exists between mean cell volume and cell growth. We have proposed novel experiments to verify these hypotheses, to further develop our understanding of the threshold effect.
A potential consequence of our predictions is that controlling either mtDNA copy number, wild-type mtDNA copy number, or wild-type mtDNA copy number density, to ensure optimal values of wild-type mtDNA copy number density could be valuable control axes in therapy. Increasing mitochondrial DNA copy number, for instance through activation of the PGC-1α pathway, may facilitate the increase of cell volume, deferring the critical heteroplasmy to higher values by delaying the approach towards a minimum cell volume. We might reason that this enhances a wild-type phenotype at higher heteroplasmy values, potentially deferring the full MELAS phenotype to higher heteroplasmies, which typically appears between ~ 50-90% mutant load [33]. Indeed, it has been found that increasing mitochondrial biogenesis can ameliorate mitochondrial myopathy in vivo [34].
We might also argue that as cells toggle from energy demand reduction to supply increase, further bolstering of this compensatory response may have clinical significance. For instance, since we observe that cells switch to glycolytic metabolism to compensate for diminishing mitochondrial power supply, further encouragement of this energy mode may be therapeutic. This is supported by the recent observation that promoting the hypoxia response is protective against multiple forms of respiratory chain inhibition [35]. Alternatively, since we predict that cells innately downregulate ETC mRNA degradation, seeking to upregulate mitochondrial transcription may aid the cell in maintaining a sufficient mRNA pool size. Furthermore, promoting alternative energy production pathways such as fatty acid oxidation via the ketogenic diet may also aid in reducing the dependence on oxidative phosphorylation. This diet has been associated with increased mitochondrial transcripts [36], mitochondrial content [36,37] and has been shown to slow mitochondrial myopathy progression in transgenic Deletor mice [37]. Indeed, the diet has recently been used in clinic as an adjunctive therapy for a patient suffering from MELAS, harbouring the 3260A>G mutation, which successfully decreased the frequency of seizures and stroke-like episodes [38].
Materials and Methods
Data Normalization
Bioenergetic pathways, such as glycolysis or oxidative phosphorylation, consist of a set of enzymes, whose corresponding genes may be correlated in their expression. To have some measure of the overall expression level of a pathway , we use the mRNA concentration (in RPMK, reads per kilobase of transcript per million mapped reads), for each gene corresponding to enzymes of the pathway (ei,k(h), for gene i and technical replicate k at heteroplasmy h) and take a normalized sum where nr=number of technical replicates, and N=number of genes in the pathway of interest. This quantity normalizes the expression level of each gene to h=0 levels, to avoid effects from consistently highly-expressed genes. The factor of 1/N results in having the value of 1 at h=0, so may be interpreted as a fold-change in expression relative to h=0.
The standard error of is given by where is the sample variance over xk. Eq.(9) and Eq.(10) are applied to glycolysis and ETC mRNAs in our main model, which yield dimensionless, normalized, measures of transcript levels for each biological pathway.
ATP synthase is excluded from both mRNA and protein data, as it is expected to be regulated differently from other ETC proteins. This difference arises because mitochondrial membrane potential is required for cell growth [39], and glycolytic ATP may be used, even in cells without mtDNA, by ATP synthase to maintain membrane potential [40]. Thus, protein levels of ATP synthase may be expected to be regulated quite differently to those of the electron transport chain, and not generally indicative of respiratory activity.
For ETC protein, we simply use the sample mean of complexes I, III and IV, since the data given by Picard et al. [13] is already normalized.
Data Transformation to Per-Cell Dimensions
The data we consider of Picard et al. [13], consists of RNA-seq and Western blot measurements for mRNA and protein levels respectively. We wish to model the bioenergetic strategy of an average cell, so it is important that the data we use to parametrize our model is of per-cell dimensions. We show in Text S5 that it is appropriate to multiply protein and transcript data by cell volume to gain per-cell dimensions.
Error Propagation
Our work focuses on describing mean behaviour with respect to heteroplasmy, so uncertainty in this mean must be quantified. For Mgly and METC, we used error propagation on the normalised transcript levels (see Eq.(9)) and V, to derive the volume-adjusted transcript uncertainties for the data in Fig. 1 where is defined in Eq.(10) and sV is the SEM for cell volume (raw data provided by Martin Picard). For the case of ETC protein data, since the corresponding experiments in Picard et al. [13] had only a single technical replicate, we derived an uncertainty by simply multiplying the normalised protein value (see Data Normalization) by sV.
Growth Rate Determination
The speed with which cells proliferate is dependent upon heteroplasmy, as can be seen in Figure S11. However, by day 6 of growth, cell growth appears to change its behaviour, with evidence of saturation; we therefore truncate the raw data to day 5 and calculate the exponential growth rate by linear regression in log-lin space.
Generative Model Description
We used a Bayesian framework to find the supported parameter values given the data, using the Metropolis-Hastings algorithm [41]. To do this, we included an additional 6 noise parameters, for the features where parameter inference was performed (i.e. all of the features except N+, which has no free parameters, see Eq.(1)). For these 6 features (METC, P +, Mgly, V, G, Rmax), we assumed that the data were generated subject to Gaussian noise (see Generative Model Description).
Thus, the full statistical model contains 12 parameters (excluding 6 noise parameters for each feature), with 32 data points which enter the likelihood (after excluding h=1 data). To summarise, counting the 6 features which have free parameters, the model consists of 12/6=2 mean parameters per feature, on average. Note that simply fitting linear models to the 6 features in Fig. 1 would also require 2 parameters per feature. The model fit is shown in Fig. 3.
To connect our model of mean cellular behaviour , to the data of Picard et al. [13], we assume that the sample mean of feature i (yi,j) at a discrete value of heteroplasmy h=j is generated via Gaussian noise whose mean corresponds to one of the models , where is an element from the set of models . We stress that the data we train our model on, yi,j, is the sample mean, rather than the raw data. This is a less common approach; however, we believe that it is appropriate as individual replicates only give us information on the technical variability measured in [13], whereas the total error is a combination of both technical and biological variability. Training our models on individual replicates would be likely to underestimate the true variability of the data, so we favoured training on the sample mean only. This raises the challenge of establishing an appropriately permissive model for our uncertainty in σi.
We can infer the distribution of the parameters (θ) of the models , given the data yi,j, using Bayes rule and a prior distribution over θ (P (θ))
The log-likelihood in this case is
We drop the constant Σi,j − 1/2 log(2π) from our log-likelihood, since we will only be interested in differences in the log-likelihood to perform Bayesian inference using the Metropolis-Hastings algorithm [41].
We used exponential priors σi as our error model. The constant λi was chosen such that the scale of decay of probability was on the same scale as the range of the data. Noting that 〈P (σi)〉=1/λj, we chose where Ω is a hyper-parameter of the prior and Ω ≥ 0. Note that we may interpret Ω=0 as a uniform prior, since P (σi)=const in this case. In order to make the posterior distribution well-defined, we may think of the case of Ω=0 as a uniform prior P (σi)=unif(0, α) for some α which is large enough to be never encountered during the finite number of iterations used in our Markov chain Monte Carlo sampling strategy. This is as opposed to an improper uniform prior which would make the posterior distribution unnormalized. In this case α=100 is a sufficiently large upper bound to never be encountered in the 1010 iterations of the sampler.
We began with Ω=0 as the most permissive choice of prior possible, given the model in Eq.(15). We found that when Ω=0 the maximum a posteriori estimates were qualitatively similar to choosing Ω=2 (our final choice which we justify below) see Fig. 3 (Ω=2) and Figure S12A-F (Ω=0). However, we found that the posterior 25-75% confidence intervals supported model fits for METC, P + and Rmax which were relatively poor when Ω=0, compared to Ω=2 (see Figure S12A-F). We determined that large values of h* were indicative of purely linear fits to the data, which is unlikely given the wider body of evidence demonstrating the nonlinearity of the threshold effect. This is seen in Figure S12G-L (high h*, poorer fit) when compared with Figure S12M-R (low h*, better fit). Comparison between Figure S12G and Figure S12M is particularly noteworthy, where the 25-75% posterior confidence interval for high h* sub-samples predicts METC ≈ 0 for all values of h, which is physiologically implausible, whereas low h* sub-samples display non-linear fits which more faithfully track the data. Figure S13 shows that the high h* mode is of comparable prevalence to the low h* mode when Ω=0.
We therefore investigated the sensitivity to choice in Ω in Figure S13. We see that increasing Ω reduces the width of the marginal posterior distribution of h*, constraining the posterior distribution to lie around the nonlinear solutions shown in Fig. 3. We found that the permissive prior Ω=2 was sufficient to strongly subdue this, physiologically implausible, large h* mode. This can be interpreted as a prior belief that our model uncertainty is, on average, 50% of the range of the data (since . We believe this to be a sensible prior choice, encoding our prior belief that the threshold effect is nonlinear while providing only a gentle constraint on parameters.
We favoured uniform priors on the remaining parameters so that the posterior would be dominated by the likelihood. However, a number of the parameters in the model were uncertain over orders of magnitude; in these cases, we allowed the log of these parameters to take uniform distributions. Explicitly, our priors were chosen as:
The ranges for h* and fm are justified since these quantities can physically only be between 0 and 1. c1 and m2 are parameters of linear models for Mgly (see Table S2 and Eq.(5)) for data which has been normalized to the scale of 1; therefore priors were chosen with suitably large ranges. Similarly for kgr, a proportionality constant relating growth to cell volume ((see Table S2 and Eq.(7)), we expect kgr to be of the order of 1, since the data has been normalized, and chose suitably relaxed priors. The ranges for all other parameters, which were sampled in log-space due to our greater uncertainty of their values, were chosen to be suitably large as to be unlikely to reach the boundary of the prior during sampling with MCMC.
The parameters β and kmRNA from Eq.(2) were highly correlated. For more efficient chain mixing, we rearranged Eq.(2) into the form where ζ=β/kmRNA, and used the prior and again used relatively relaxed boundaries for the uniform prior.
We performed the Metropolis-Hastings algorithm [41] to sample from the posterior, using a Gaussian random walk as our transition kernel, whose covariance matrix was determined from a trial run of the adaptive Metropolis algorithm [42]. All code was written in either Python or C, and is available upon request. The MCMC chain trajectory is presented in Figure S2.
Acknowledgements
We are grateful to Martin Picard for providing raw data, advice on our model, and experimental suggestions. Till Hoffmann provided technical advice for the Bayesian inference aspects of our work. We would like to thank David Fell, Thomas Ouldridge and Hanne Hoitzing for their useful comments and suggestions.