DNA Calorimetric Force Spectroscopy at Single Base Pair Resolution

DNA hybridization is a fundamental reaction with wide-ranging applications in biotechnology. The nearest-neighbor (NN) model provides the most reliable description of the energetics of duplex formation. Most DNA thermodynamics studies have been done in melting experiments in bulk, of limited resolution due to ensemble averaging. In contrast, single-molecule methods have reached the maturity to derive DNA thermodynamics with unprecedented accuracy. We combine single-DNA mechanical unzipping experiments using a temperature jump optical trap with machine learning methods and derive the temperature-dependent DNA energy parameters of the NN model. In particular, we measure the previously unknown ten heat-capacity change parameters ΔCp, relevant for thermodynamical predictions throughout the DNA stability range. Calorimetric force spectroscopy establishes a groundbreaking methodology to accurately study nucleic acids, from chemically modified DNA to RNA and DNA/RNA hybrid structures.

The elastic properties of ssDNA are strongly temperature dependent (see Fig. 1B, main text).Accurately measuring these properties requires modeling all contributions to the trap-pipette distance, λ, which includes the optical trap displacement (x b ), the dsDNA handles (x h ), the ssDNA (x ss ), and the molecular diameter (d 0 ).The setup contributions (x b and x h ) can be evaluated by using the effective stiffness method 48 .According to it, these terms are approximated by an effective stiffness, k −1 eff ≈ k −1 h + k −1 b .The use of short handles (29bp) makes the evaluation of the stretching terms easier as their stiffness is much larger as compared to the trap stiffness (k h ≫ k b ), implying that k eff ≈ k b .Moreover, if the force varies in a relatively narrow range (f max − f min ≲ 10pN), the trap stiffness can be considered nearly forceindependent so k eff is constant along the folded branch of the FDC.Therefore, we can estimate k eff by fitting the slope preceding the first force rip in the FDC to the linear equation f = k eff x (orange dashed-line in Extended Data Fig. 1).This allows us to compute the (effective) contribution of the handles and optical trap, x eff , to the total trap-pipette distance, λ.

Stochastic Gradient Descent in a Nutshell
The basic principle behind stochastic approximation can be backtracked to the Robbins-Monro algorithm of the 1950s 60 .Since then, stochastic gradient descent (SGD) methods have become one of the most widely used optimization methods [61][62][63][64][65] .SGD is an iterative method for optimizing an objective function, J(w), with suitable smoothness properties (e.g., differentiable or subdifferentiable).The set of parameters, w * , minimizing J(w), is iteratively approximated according to an update algorithm proportional to the antigradient of the objective function, −∇ w J(w).Starting from an initial guess of w, at each step of the algorithm, the parameters are updated according to where v t is the velocity of the optimization and η ≥ 0 is the step size (called learing rate).The parameter β (the so-called momentum coefficient) accounts for a fraction of the previous step in the current update.The critical difference between SGD and standard gradient descent algorithms is that information (total entropy and coefficients) from only one FEC segment (∆x k ) at a time is used to calculate the step, and the segment is picked randomly at each step.
The SGD convergence rate can be improved by considering Nesterov's Accelerated Gradient (NAG), introduced in 1983 66,67 .According to NAG, the update equations are (2) While the classic momentum (CM) algorithm updates the velocity vector by computing the gradient at w t , the NAG algorithm computes the gradient at w t + βv t .To make an analogy, while CM faithfully trusts the gradient at the current iteration, NAG puts less faith in it and looks ahead in the direction suggested by the velocity vector; it then moves in the direction of the gradient at the look-ahead point.If ∇ wt+βvt J(w) ≈ ∇ wt J(w), then the two updates are similar.The advantage of using NAG is that it converges at a rate of O(1/t 2 ), while CM converges at a rate of O(1/t).
To derive the DNA NNBP entropies from unzipping experiments, we used an SGD minimization implementing NAG update equations.Let us rewrite Eq.(2) (main text) as ∆S 0 = C∆s, where ∆S 0 is the vector of entropies measured with the Clausius-Clapeyron equation for each of the K FEC segments, ∆s is the vector of the I = 8 NNBP entropy parameters, and C is the K × I matrix of the coefficients, c k,i .
Thus, for a given loss function (ex., least squares), the algorithm has to minimize (3) By using this method, we measured the DNA entropies at the single base-pair level for each experimental temperature in the range [280, 315] K (see results in Fig. 3C, main text and Extended Data Table 3).

Prediction of the DNA Unzipping Curve
In unzipping experiments, the total trap-pipette distance, λ, can be written as where x b (f ) is the displacement of the bead from the center of the optical trap, x h (f ) and x ss (f, n) account for the extension of the two double-stranded handles and the ssDNA extension, respectively (described with the WLC model, Eq.(5), Methods), and x d (f ) is the projection of the folded hairpin of diameter d (typically d = 2nm for DNA and RNA hairpins 49 ) along the pulling axis 68 .It is modeled with the freely-jointed chain in Eq.(6), Methods.For a given λ, the total system free energy is given by where ∆G 0 (n) = n i ∆g 0,i , is the hairpin free-energy of hybridization according to the NN model and the other terms are the energy contributions of the corresponding elastic terms in Eq.(4)

Computation of the Equilibrium FDC
Let us consider the case where thermal fluctuations are neglected in the FDC computation.Thus, at a given value of λ, the system is always in the state of minimum energy, ∆G eq (λ) = ∆G tot (λ, n * ).To compute the equilibrium free energy of the system, let us first introduce the system partition function, Z.At each λ, this is defined as the sum over all the possible states, i.e., all the possible sequences of n open base pairs, which is where N is the total number of base pairs of the sequence.Finally, by recalling that ∆G = −k B T ln Z, the equilibrium force is given by: Computing Eq.( 6) requires solving the transcendental Eq.( 4) with respect to f (that can be performed numerically) and then computing Eq.( 5) for all n ∈ [0, N − 1].For each λ, the value n * minimizing the equilibrium freeenergy ∆G eq = ∆G tot (λ, n * (λ)) gives the most probable number of open basepairs.Eventually, the computation of the equilibrium force in Eq.( 7) gives a theoretical prediction for the unzipping curve of a given sequence (Extended Data Fig. 5).

Equilibrium Free Energy
The total free energy in Eq.( 5) is the sum of two main contributions: the hybridization energy ∆G 0 (n), which linearly depends on the number of hybridized NNBPs n, and the stretching energy ) depending on both n and λ.For a given λ, the equilibrium configuration of the system is that with minimum ∆G el (λ, n * ) and maximum ∆G 0 (n * ) among all possible values of n.Notice that for a hairpin of N bp, n ranges from 0 (native state) to N −1 NNBPs (totally unfolded), which gives N − 1 possible system configurations for each value of λ.
Let us suppose that the system starts at equilibrium, with n 1 open bp.Upon increasing λ, the elastic term in Eq.(5) also increases.The number of open bp, n 1 , remains constant until a value of n = n 2 > n 1 is found so that ∆G tot (λ, n 1 ) ≡ ∆G tot (λ, n 2 ) (Extended Data Fig. 4A, top): even though the total energy of these two states is the same, the energetic internal balance is different (Extended Data Fig. 4A, bottom).The system minimizes the elastic free energy and switches to state n 2 by releasing ∆n = n 2 − n 1 bp.Notice that, despite opening ∆n bp increases the system's energy, the released ssDNA causes the elastic contribution to decrease.In general, ∆G el ≫ ∆G 0 so the global balance of the state n 2 is lower than the one of n 1 .Therefore, the equilibrium free energy of hybridization, ∆G 0 (n * ), is a step function increasing with λ (Extended Data Fig. 4B) with each discontinuity corresponding to a rip along the equilibrium FDC.

Fit of the NNBP parameters
The T -dependent NNBP entropies and enthalpies permit us to derive the heat capacity changes ∆c p,i for each motif from the relations, where T m,i is the melting temperature of motif i, and ∆s m,i and ∆h m,i are the entropy and enthalpy at T = T m,i , respectively.The extraction of the NNBP thermodynamics parameters (∆c p,i , ∆s i , ∆h i , T m,i ) has to be carried out carefully as the results are susceptible to experimental errors and parameters initialization.In particular, ∆s m,i , ∆h m,i , and T r,i strongly depend on their initialization values when directly fitted from Eqs.(8) as an error in ∆s m,i (∆h m,i ) get compensated by T m,i and vice versa.
To derive the ∆c p,i , we fit the NNBP entropies to the equation ∆s i (T ) = A i + ∆c p,i log(T ), being A i = ∆s m,i − ∆c p,i log(T m,i ).Notice that we derive ∆c p,i from the NNBP entropies as they are obtained from the experimental data, in contrast to enthalpies that are computed from the free energies.Given ∆c p,i , we fit the NNBP free energies, ∆g i (T ), to the equation obtained by combining Eqs.(8) (blue dashed lines in Fig. 3B, main text).Notice that B i = ∆h m,i − ∆c p,i T m,i .By definition, T m,i is the high temperature value where ∆g i (T m,i ) = 0. Finally, a new fit to Eqs.(8a) and (8b) by using the previously derived values of ∆c p,i and T m,i (red and blue dashed lines in Fig. 2D, main text), gives ∆s m,i and ∆h m,i .The results are shown in Fig. 4 and Table 1 of the main text.
Extended Data: Figures The results are reported in Extended Data Table 1.A fit to data according to ∆s ss (T ) = ∆s ss,0 + ∆c ss p log(T /T m ) (orange dashed line), gives the ssDNA heat capacity change per base at zero force, ∆c ss p = −11.2± 0.2 cal mol −1 K −1 .(B) T -dependence of the total entropy change, ∆S 0 (T ), upon unzipping the 3.6kbp DNA hairpin measured using the Clausius-Clapeyron equation (see Eq.( 2), main text).At the force rip (black dots), the total free energy of the system is the same in both states, and the system changes from the highest free energy branch (n 1 , pink) to the lowest energy branch (n 2 , pink).(B) The free energy of hybridization upon unzipping the hairpin is a monotonically increasing step function, with each discontinuity corresponding to a rip along the equilibrium FDC.Predictions obtained with Eq.(10) of Sec. 6, Methods (blue squares) show a systematic discrepancy with respect to the experimental values (dashed red line).By accounting for the entropic correction (Eq.( 11) of Sec. 6, Methods), predictions agree with the experimental measurements within errors (orange triangles).Results are reported in Extended Data Table 5. (B) Derivation of the entropic correction, δ∆s.To do this, we subtracted the inverse of the measured, T Bulk m , and predicted, T U nz m , melting temperatures (blue squares).This equals the difference between the inverse of Eq.(11) and Eq.(10) (see Eq.( 12) in Sec.6, Methods).A linear fit to data (dashed red line) yields δ∆s = 6(1) cal mol −1 K −1 ∼ 4R log 2, where R = 1.987 cal mol −1 K −1 is the ideal gas constant.The orange triangles show the theoretical correction to T m per DNA duplex predicted by assuming δ∆s ≡ 4R log 2.
Extended Data: Tables Table 1: T-dependence of the DNA FDCs  The 10 DNA free-energies measured from unzipping a 3.6kbp hairpin in the temperature range [280, 315] K (see text).The entropy of the last two motifs (GC/CG and TA/AT) has been computed by the applying circular symmetry relations.The error (in brackets) refers to the last digit.Melting temperatures of the 92 DNA duplexes studied by Owczarzy et al. in Ref. 55 at a concentration c = 2µM and 1020mM NaCl.The experimental values (T Exp ) are compared with predictions obtained with the unzipping parameters by using Eq.(10) (T Unz Bi ) and Eq.(11) (T Unz Uni ) for bimolecular and unimolecular reactions, respectively (see main text and Sec.6, Methods).Finally, T UO and T Hug are obtained with the unified oligonucleotide parameters (UO) in Ref. 43 and the Huguet et al. (2017) parameters in Ref. 37 .Results are reported with errors: T Exp ± 1.6 • C, T Unz ± 1.5 • C, T UO ± 1.5 • C, and T Hug ± 1.5 • C. Temperatures are given in Celsius degrees.

Fig. 1 :Fig. 2 :Fig. 3 :
Fig. 1: Computation of the FEC from the experimental FDC.The force versus hairpin extension, x H , (black line) is computed by subtracting to the trap position, λ, (grey line) the elastic contribution of the optically trapped bead, x b , and DNA handles, 2x h , (green dashed line).To measure the T -dependent ssDNA elasticity, we fit the FEC after the last rip to the WLC model (orange dashed line).Notice that the average unzipping force, f m , (red line) remains constant upon computing the FEC.Data are shown at T = 25 • C.

Fig. 4 :
Fig. 4: Derivation of the theoretical FDC.(A) Schematics of the stretching and hybridization energy contributions.Upon unzipping, the molecule has n 1 open bp before the force rip (left) and n 2 > n 1 open bp after the rip (right).At the force rip (black dots), the total free energy of the system is the same in both states, and the system changes from the highest free energy branch (n 1 , pink) to the lowest energy branch (n 2 , pink).(B) The free energy of hybridization upon unzipping the hairpin is a monotonically increasing step function, with each discontinuity corresponding to a rip along the equilibrium FDC.

Fig. 6 :
Fig. 6: Prediction of the DNA duplexes melting temperatures.(A) Comparison of the melting temperatures for the set of 92 DNA oligos studied by Owczarzy et al. in Ref. 55 (horizontal axis) and the values predicted with the unzipping energy parameters (vertical axis).Perfect agreement between the two data sets would imply all points falling on the dashed grey line x = y.Predictions obtained with Eq.(10) of Sec. 6, Methods (blue squares) show a systematic discrepancy with respect to the experimental values (dashed red line).By accounting for the entropic correction (Eq.(11) of Sec. 6, Methods), predictions agree with the experimental measurements within errors (orange triangles).Results are reported in Extended Data Table5.(B) Derivation of the entropic correction, δ∆s.To do this, we subtracted the inverse of the measured, T Bulk FDC average unzipping force, f m , persistence length, l p , interphosphate distance, d b , and ssDNA stretching entropy per base, ∆s ss in the studied temperature range (in Celsius and Kelvin degrees).The errors (in brackets) refer to the last digit.The error in temperature is ±1 • C (K).