Abstract
We present a simple and robust technique to extract kinetic rate models and thermodynamic quantities from single molecule time traces. SMACKS (Single Molecule Analysis of Complex Kinetic Sequences) is a maximum likelihood approach that works equally well for long trajectories as for a set of short ones. It resolves all statistically relevant rates and also their uncertainties. This is achieved by optimizing one global kinetic model based on the complete dataset, while allowing for experimental variations between individual trajectories. In particular, neither a priori models nor equilibrium have to be assumed. The power of SMACKS is demonstrated on the kinetics of the multi-domain protein Hsp90 measured by smFRET (single molecule Förster resonance energy transfer). Experiments in and out of equilibrium are analyzed and compared to simulations, shedding new light on the role of Hsp90’s ATPase function. SMACKS pushes the boundaries of single molecule kinetics far beyond current methods.
The ability to reveal conformational state sequences at steady state is a unique feature of single molecule time traces. Conformational kinetics is detectable in or out of equilibrium, which enables direct calculation of thermodynamic quantities. Single molecule Forster resonance energy transfer (smFRET) is one of the most common methods to do so. According to the current standard analysis of kinetic smFRET trajectories, state sequences are deduced using hidden Markov models (HMM) (1–3) and rates are then obtained from single-exponential fits to the respective dwell time histogram of every observed state.
This standard approach is feasible under the following two conditions: First, every state has a characteristic FRET efficiency. Second, all transition rates are similar. In this case, there is a sampling rate at which every state is reached many times before irreversible photo-bleaching. Both requirements are broken by regular proteins, which commonly exhibit rates on diverse time-scales and conformations that are experimentally indistinguishable, but differ kinetically (kinetic heterogeneity) (4–6). As a consequence, multi-exponential dwell time distributions are obtained. The interpretation of such distributions may lead to erroneous conclusions (see below).
With our new experiment-friendly approach, we overcome these problems by training one global HMM based on a set of experimental time traces. The procedure copes with experimental shortcomings and kinetic heterogeneity. Further, it provides several means of model evaluation including error quantification. Finally, we demonstrate how to deduce kinetics and thermodynamics of the heat-shock protein Hsp90.
Results
Rate extraction from an ideal model system. Holliday-junctions (7) have become a widely used model system for conformational dynamics studied by smFRET. These DNA four-way junctions alternate constantly between two equilibrium conformations (8). Such dynamics were recorded by a custom-built objective-type total internal reflection fluorescence (TIRF) microscope (Fig. 1A) with alternating laser excitation (ALEX) (9). An example trace is shown in Fig. 1B. As expected for a two-state system, the FRET histogram shows two peaks (Fig. 1C) and the dwell time histograms are well fit by single-exponential functions (Fig. 1D). In this case, all standard methods work well and the extracted rates will be correct.
Rate extraction from typical protein systems. In contrast, the situation is more complicated for proteins, which usually adopt significantly more than two states (10). As an example, we show equivalent single protein time traces revealing conformational changes of the heat-shock protein Hsp90 (11) (Fig.1E). This homo-dimeric protein fluctuates between N-terminally open and closed conformations (12) resulting in two peaks in the FRET histogram (Fig. 1F). The fluctuations occur on a broad range of time-scales resulting in very long and short dwells, and generally fewer transitions per trace (here 3 on average). Despite the two apparent FRET populations, both dwell time distributions are multi-exponential (Fig. 1G). Yet, no systematical change in FRET efficiency from fast to slow dwells is observed (Fig. S1). Such behavior (hereafter referred to as degenerate FRET efficiencies) is indicative of truly hidden states that cannot be separated by FRET efficiency, but differ kinetically.
The kinetic analysis is complicated by the limited detection bandwidth of smTIRF experiments. It is restricted by the exposure time, on the one side, and the mean observation time - limited by photo-bleaching - on the other side. While enzymatic anti-bleaching agents largely increase the observation time of DNA-based samples, those are much less effectual against bleaching of all-protein systems. Furthermore, their use with protein systems is problematic, as they might interact with the protein under study. Accordingly, the detection bandwidth spans less than a factor of 200 at a reasonable signal to noise ratio - independent of the sampling rate applied.
In this situation, the classical dwell time analysis ignores large parts of the data, because only dwells with clearly defined start and end points are considered. As a consequence, predominantly long dwell times are missed, resulting in transition rates that are systematically overestimated. Already in a two state system, deviations of more than a factor of two occur (Fig. 2A). Importantly, even so-called static traces (without any transition) contain kinetic information. They occur in the experiment as a result of the finite observation time, especially if both fast and slow processes occur. In other words: the presence of at least two transitions per trace is an inappropriate and misleading criterion for trace selection. Yet it is the intrinsic requirement of dwell time analysis. Moreover, the connectivity of states is completely ignored by dwell time analysis. Please note that the limitations of dwell time analysis have been recognized in the patch clamp field more than 20 years ago (13). Nevertheless, it is still the standard analysis in the smFRET field today.
A better solution for typical protein systems. In view of the experimental reality, we developed a new Single-Molecule Analysis for Complex Kinetic Sequences - short: SMACKS. It combines all experimentally available information in one HMM, which allows us to investigate important thermodynamic concepts that go significantly beyond dwell time analysis.
Such an HMM consists of invisible or “hidden” kinetic states that generate certain detectable signals (e.g. high FRET, low FRET) with a given probability. The sequence of states is assumed to be memory-less, i.e. the probability of a certain transition depends only on the current state. Any time-homogeneous Markovian analysis requires stationarity - but not thermodynamic equilibrium. An HMM is parameterized by one start probability πi per state transition probabilities aij between all hidden states assembled in the transition matrix A, and a set B of so-called emission probabilities bi that link the hidden states to the observables (14, 15).
By exploiting the original two observables - donor and acceptor fluorescence - instead of the FRET efficiency (only one observable), the robustness with respect to uncorrelated noise is significantly increased. These fluorescence signals are appropriately described by 2D Gaussian probability density functions (PDFs),, parameterized by the means μ and the co-variance matrix V - all in dimensions of donor and acceptor fluorescence. Representative emission PDFs are graphed at the right hand side of Fig. 1B, E.
The mathematically available parameter space for emission probabilities is further restricted by physical knowledge about FRET. Namely, the mean total fluorescence intensity is required to remain constant within one trace (Eq. 1 and SI Methods), whereas experimental variations between individual molecules are tolerated.
Here the donor and acceptor intensities (with means μD and μA) were corrected for background, experimental cross-talk and the gamma factor beforehand. The resulting allowed “FRET-line” is displayed in the emission graphs (Fig. 1B, E right).
In the classical HMM implementation 14, the model λ(π, A, B) is iteratively rated by the forward-backward algorithm and optimized by the Baum-Welch algorithm until convergence to maximum likelihood. The Viterbi algorithm is used to compute the most probable state sequence for every trace given the previously trained model. In contrast to earlier published ensemble approaches (16–19), SMACKS works without additional (hyper-) parameters or prior discretization.
The full procedure was tested on various synthetic datasets generated by known input models, in or out of equilibrium, with or without degenerate FRET efficiencies. Synthetic data contained noise, photo-bleaching, randomly offset individual traces and a realistic dataset size (see SI Methods and example data in Fig. S2).
SMACKS resolved accurate transition rates despite degenerate FRET efficiencies, where neither dwell time derived rates nor error estimates were meaningful (Fig. 2B).
Demonstration of SMACKS using experimental data. We start with a set of smFRET time traces obtained with alternating laser excitation (ALEX) that were selected and corrected as previously described (20) (SI Methods). Namely, the donor and acceptor intensities (ID, IA) satisfy (Eq. 2). However, previous smoothing is not required.
An apparent state model can be deduced from visual inspection of the FRET time traces and the FRET histogram. This model (2 states for Hsp90) is used in a first trace-by-trace HMM optimization to train individual emission PDFs on each molecule separately. The trained parameters are examined visually by comparing the resulting Viterbi path to the input data. Notably, by searching for flat plateaus, HMM echoes a characteristic requirement for singlemolecule fluorescence data.
For static traces (here 34% of all traces), a model with more than one state will not converge sensibly. Therefore, static traces are included using the mean emission PDFs of the remaining dataset (see fifth molecule in Fig. 1E).
As a next step (Fig. 3A), an ensemble HMM run is performed to optimize the start and transition probabilities based on the entire dataset, while holding the predetermined, individual emission PDFs fixed. While different strategies have been tested, this solution worked equally well for experimental and simulated data. The kinetic heterogeneity found in Hsp90 is investigated by comparing different state models including duplicates and triplicates of the apparent states (Fig. 3B).
Similar to others (1, 3, 21), we then use the Bayesian information criterion (BIC) (22) for model selection (see SI Methods). We find that Hsp90’s conformational dynamics are best described by a 4-state model with 2 high FRET (closed) and 2 low FRET (open) states. This is consistent with the bi-exponential dwell time distributions shown in Fig. 1G.
Once the optimal number of states is deduced, the model is further refined by inspecting the Viterbi paths. The transition map (Fig. 3C left) shows the quality of both, the original input data and the state allocation based on the obtained model. It reveals the clustering of the transitions in FRET space. Importantly, the transition map itself cannot report on the number of states in the model, because it is the consequence of a predetermined model. The occurrence of all transitions is shown in a 2D histogram (Fig. 3C middle). For a system functioning at thermodynamic equilibrium, detailed balance requires that the transition histogram is symmetric about the main diagonal. Out of 12 possible transitions in a fully connected 4-state model, only 8 cyclic transitions are populated for Hsp90 with ATP. Despite the reduced number of free parameters, a cyclic 4-state model fits the data with equal likelihood. This is in line with the maximal number of theoretically identifiable transitions (8 in the case of 2 open (o) and 2 closed (c) states 23). While being difficult to interpret in the context of Hsp90, a cyclic - o-c-o-c-model would theoretically fit the data equally well. Further information on the interpretation of degenerate state models is given in (23–25).
Model evaluation. In most previous kinetic studies on smFRET, the only reported error estimates were the uncertainties of fit coefficients from fitting dwell time distributions, disregarding systematic overestimation and variations throughout the dataset (Fig. 2). In contrast, we propose three tests to assess the reliability of the results from the above procedure.
First, the most illustrative test for the consistency of the trained model with the original data is “re-simulation” using the obtained transition matrix, the experimental bleach rate and degenerate states (here 2o, 2c). Fig. 3D (left) shows very good agreement between the re-simulated and the experimental dwell time distribution. FRET histograms can be re-simulated, too.
Second, the convergence of the HMM to the global maximum is tested by using multiple random start parameters (26). In all attempts, the parameters converged to the same maximum likelihood estimators (MLE). Additionally, random subsets of the data (here 66% of the traces) reveal the heterogeneity of the dataset (Fig. 3D middle). Here all random subsets (n > 50) converged to the same model with normally distributed parameters (RMSD ≈ 30% of the MLE).
Third, the confidence interval of every trained parameter is calculated using likelihood ratio tests (Fig. 3D right) (3, 27). It reports on the dataset heterogeneity and the precision of the HMM (see SI Methods).
In summary, the accuracy and precision of the kinetic state model deduced by our semi-ensemble HMM approach was demonstrated by threefold evaluation. A reliable state model is necessary to take the next step and resolve kinetic and thermodynamic information from proteins in or out of equilibrium.
The kinetic model of Hsp90. Hsp90 is a slow ATPase (28) and changes between open and closed conformations at room temperature. In Fig. 4, kinetic results for Hsp90’s conformational dynamics are compared under different nucleotide conditions (2mM ADP, ATP, AMP-PNP) or without nucleotides (apo). The quality of the input data and the resulting state allocation is visible on the transition map in FRET space (Fig 4A). It is evident that Hsp90’s conformational changes are less defined in the absence of nucleotides. The same FRET efficiencies are detected in all experiments: Elow=0.1, Ehigh=0.8 (Fig. 4D). Consistently, the transitions cluster around these efficiencies. The optimal state model contains four states under all conditions (Fig. 4E): states 0/1 are long-/short-lived low FRET states, states 2/3 are short-/long-lived high FRET states, respectively (Fig. 4C). While four links are required to describe Hsp90’s conformational dynamics in the presence of ATP (Fig. 4B, C), under apo, ADP and AMP-PNP conditions models with three links are sufficient (detailed in SI Note 1) (23). The rates under apo and ADP conditions are similar to those in the presence of ATP (Fig. 4F). Only with the non-hydrolysable ATP-analogue, AMP-PNP, the rates between both short-lived states are inverted.
This is in agreement with the pronounced shift towards the closed conformation observed in the FRET histogram, in the presence of AMP-PNP (Fig. 4D).
Exploring energy coupling. Protein machines, such as Hsp90, use external energy (e.g. from ATP hydrolysis) and therefore operate out of equilibrium. A central question is where (in the conformational cycle) energy consumption couples into protein function. Based on SMACKS, we can address this question quantitatively. It boils down to determining the free energy difference over closed cycles (29, 30) (in units of thermal energy, kT):
As expected and required, we find that in the absence of an external energy source, Hsp90’s conformational dynamics are at equilibrium. At first sight unexpected, we find for Hsp90 in the presence of ATP ΔGcyc = (0.9 ± 0.9) kT. This indicates that the energy of ATP hydrolysis is not coupled to the observed conformational changes, which is consistent with earlier results (12).
A schematic energy landscape is shown in Fig. 4G. The 3D illustration highlights SMACKS ability to split two observable FRET states into four states based on their distinct kinetic behavior. Quantitative energies are shown in Fig. 4H.
Experimental limits for resolving energy coupling. Clearly, the accuracy of the resolved ΔGcyc depends on the size of the dataset. Especially for systems away from equilibrium, very slow (“reverse”) rates can occur. Due to the finite dataset, only few respective transitions are observed, resulting in large relative errors for these small rates. In this case, an alternative formulation of ΔGcycusing the number of transitions Ntrans. found by the Viterbi algorithm is more robust:
Eq. 4 represents a lower bound for the free energy difference, given the finite dataset (zero transitions are set to one to avoid poles). If all rates are well resolved, Eqs. 3 and 4 yield the same result.
In the following, two limit cases for the coupling of conformational changes to ATP hydrolysis are considered systematically (ΔGcyc = 30 kT for ATP to ADP hydrolysis assuming 1% ADP, 3mM Mg2+, 250mM KCl and 100% efficiency) (31). In the first case (Fig. 5A), the full 30 kT are introduced within one step. Whereas in the second case (Fig. 5B), the energy is successively released over four steps, comparable to contributions by ATP binding, hydrolysis and ADP or Pi release, proposed e.g. for the human mitochondrial F1-ATPase (32). Although realistic mechanisms will be a mixture of the two, these ideal cases allow for a systematic calculation of the maximally observable free energy ΔGobs as a function of the dominating forward rate (Fig. 5A, B bottom). Even in the absence of noise and degenerate states, the observed free energy difference is limited by the finite dataset size. The same is true for the more realistic model shown in Fig. 5C: Eq. 4 applied to discrete state sequences yields 20.5 kT of the original 30 kT. This is because very unlikely transitions do not occur throughout the dataset (Fig. 5C bottom). After including all the experimental shortcomings and degenerate FRET efficiencies, SMACKS recovered ΔGcyc = (12 ± 2) kT. This is 58% of the free energy, which was actually present in the synthetic data.
In view of these results, we stimulated Hsp90’s hydrolysis rate more than tenfold by its co-chaperone Aha1 (33). If we had missed out on the directionality due to the slow ATPase rate, this should ultimately allow us to resolve putative energy coupling. Fig. 5D shows that even highly stimulated hydrolysis does not induce conformational directionality in Hsp90: ΔGcyc = (−0.4± 1.2) kT in the presence of 3.5 μM Aha1. Our results strengthen the notion that Hsp90’s large conformational changes are mainly independent of ATP hydrolysis.
Discussion
SMACKS is a novel HMM approach, which resolves all relevant rates that characterize the observed conformational dynamics, from a set of (short) smFRET time traces. The underlying states are identified by their FRET efficiency or kinetic behavior or both. SMACKS is a tailor-made solution for the wide family of protein machines that are clearly more challenging than DNA prime examples. It represents a significant advance that enables direct quantification of the energy coupled to conformational changes.
This progress is achieved by the following six key features:
SMACKS exploits the original fluorescence signal of the FRET donor and acceptor as 2D input. The FRET-specific anticorrelation provides significantly increased robustness with respect to uncorrelated noise. This unique information is lost in 1D FRET trajectories.
SMACKS tolerates experimental intensity variations between individual molecules, while at the same time, the transition rates are extracted from the entire dataset.
SMACKS minimizes the bias of photo-bleaching, because it determines transition rates based on their occurrence in the dataset. Thus, the range of detectable timescales can be expanded by increasing the dataset.
SMACKS performs the entire analysis on the experimental (i.e. noisy) fluorescence data. In fact, the knowledge about a given data point’s reliability is used to weight its contribution accordingly. Therefore, SMACKS is robust enough to handle realistic noise levels in protein systems.
SMACKS identifies hidden states that share indistinguishable FRET efficiencies, but differ kinetically.
SMACKS quantifies the precision of extracted rates. The precision is limited by the dataset size and signal quality, but it is not compromised by systematic overestimation, which contrasts with earlier studies.
The ATP-dependent molecular machine Hsp90 served here as an illustrative test case. SMACKS shed new light on the enigmatic and controversially discussed ATPase function (34). Clearly, the N-terminal conformational dynamics are not coupled to ATP hydrolysis, even in the presence of the co-chaperone Aha1. Further reaction coordinates will be explored by SMACKS to elucidate driven conformational changes and finally uncover the role of Hsp90’s slow ATPase function.
In summary, our results demonstrate how SMACKS provides new power and confidence for the kinetic analysis of single molecule time traces in general. In particular for smFRET studies on sophisticated protein systems, SMACKS is unparalleled. We anticipate that SMACKS will reveal drive mechanisms in a large number of protein machines.
Methods
Hsp90 or Holliday junctions specifically biotinylated and labeled with fluorescent dyes (Atto550/Atto647N maleimide) were immobilized on a passivated and Neutravidin coated fused silica coverslip that shows no auto-fluorescence upon ALEX (532nm or 635nm) in TIRF geometry using an EMCCD for detection. Measurements were performed at 5Hz at 21°C. More detailed descriptions are given in SI Methods. Monte Carlo simulations and HMM calculations were run in Igor Pro (Wavemetrics) on an ordinary desktop PC. Synthetic data contained Gaussian noise (σ=0.3*signal), random offsets (±0.2*signal), degenerate FRET, efficiencies (two low / two high), a sampling rate of 5Hz and a bleach rate of 0.03Hz (see Fig. S2). All formulae utilized in semi-ensemble HMM (Forward-Backward, Baum-Welch and Viterbi algorithms) with continuous observables in 2D are included in SI Methods. The complete source code together with example data will be available shortly after publication at: http://www.singlemolecule.uni-freiburg.de/SMACKS.
AUTHOR CONTRIBUTIONS
S.S. and T.H. designed research; S.S. performed experiments and simulations; S.S. analyzed data after consultation with all authors; S.S. and M.G. developed software; S.S. and T.H. wrote manuscript; all authors evaluated the data, discussed the results and commented on the manuscript.
The authors declare no conflict of interest.
ACKNOWLEDGMENTS
We thank Attila Szabo, Jens Timmer, Udo Seifert and Bjorn Hellenkamp for helpful discussions and Markus Jahn for providing Aha1. This work is funded by the European Research Council through the ERC Grant Agreement n. 681891 and the German Science Foundation (SFB863, A4).