## Abstract

The high fidelity of DNA polymerase is critical for the faithful replication of genomic DNA. Several approaches were proposed to quantify the fidelity of DNA polymerase. Direct measurements of the error frequency of the replication products definitely give the true fidelity but turn out very hard to implement. Two biochemical kinetic approaches, the steady-state assay and the transient-state assay, were then suggested and widely adopted. In these assays, the error frequency is indirectly estimated by using the steady-state or the transient-state kinetic theory combined with the measured kinetic rates. However, whether these indirectly estimated fidelities are equivalent to the true fidelity has never been clarified theoretically, and in particular there are different strategies to quantify the proofreading efficiency of DNAP but often lead to inconsistent results. The reason for all these confusions is that it’s mathematically challenging to formulate a rigorous and general theory of the true fidelity. Recently we have succeeded to establish such a theoretical framework. In this paper, we develop this theory to make a comprehensive examination on the theoretical foundation of the kinetic assays and the relation between fidelities obtained by different methods. We conclude that while the steady-state assay and the transient-state assay can always measure the true fidelity of exonuclease-deficient DNA polymerases, they only do so for exonuclease-efficient DNA polymerases conditionally (the proper way to use these assays to quantify the proofreading efficiency is also suggested). We thus propose a new kinetic approach, the single-molecule assay, which indirectly but precisely characterizes the true fidelity of either exonuclease-deficient or exonuclease-efficient DNA polymerases.

## INTRODUCTION

The high fidelity of DNA polymerase (DNAP) is critical for faithful replication of genomic DNA. Quantitative studies on DNAP fidelity began in 1960s and became an important issue in biochemistry and molecular biology. Intuitively, the DNAP fidelity can be roughly understood as the reciprocal of the overall mismatch (error) frequency when a given DNA template is replicated with both the matched dNTPs (denoted as dRTP or R) and the mismatched dNTPs (denoted as dWTP or W). For instance, the synthetic polymer poly- was used as the template and the replication reaction was conducted with both dRTPs (dATP and ) and dWTP (dGTP). The ratio of the incorporated dRTPs to dWTP-s in the final products was then determined to quantify the overall error frequency[1]. Similarly, a homopolymer poly-dC was used as the template and the total number of the incorporated dWTP (dTTP) and dRTP (dGTP) was then measured to give the error frequency[2]. Beyond such overall fidelity, the site-specific fidelity was defined as the reciprocal of the error frequency at individual template sites. In principle, the error frequency at any template site can be directly counted if a sufficient amount of full-length replication products can be collected and sequenced(this will be denoted as true fidelity *f*_{ture}), *e.g.* by using deep sequencing techniques [3, 4]. However, this type of sequencing-based approach always requires a huge workload and was rarely adopted in fidelity assay. It is also hard to specify the sequence-context influences on the fidelity. A similar but much simpler strategy is to only investigate the error frequency at the assigned template site by single-nucleotide incorporation assays. Such assays are conducted for *exo*^{−}-DNAP (exonuclease-deficient DNAP), in which dRTP and dWTP compete to be incorporated to the primer terminal only at the assigned single template site and the amount of the final reaction products containing the incorporated dRTP or dWTP are then determined by gel analysis to give the error frequency, *e.g.*[5, 6]. By designing various template sequences, one can further dissect the sequence-context dependence of the site-specific error frequency. Although the above definitions of DNAP fidelity are simple and intuitive, the direct measurements are very challenging since mismatches occur with too low frequency to be detected even when heavily-biased dNTP pools are used. Besides, the single-nucleotide incorporation assays do not apply to *exo*^{+}-DNAP (exonuclease-deficient DNAP) because the coexistence of the polymerase activity and the exonuclease activity makes the reaction products very complicated and hard to interpret. Hence two alternative kinetic approaches were proposed.

The steady-state method was developed by A. Fersht for *exo*^{−}-DNAP, which is based on the Michaelis-Menten kinetics of the incorporation of a single dRTP or dWTP at the same assigned template site[7]. The two incorporation reactions are conducted separately under steady-state conditions to obtain the specificity constant (the quasi-first order rate constant) (*k*_{cat}*/K*_{m})_{R} or (*k*_{cat}*/K*_{m})_{W} respectively, *k*_{cat} is the maximal steady-state turnover rate of dNTP incorporation and *K*_{m} is the Michaelis constant. The site-specific fidelity is then characterized as the ratio between the two incorporation velocities, *i.e.* (*k*_{cat}*/K*_{m})_{R}[dRTP]/ (*k*_{cat}*/K*_{m})_{W} [dWTP] (denoted as steady-state fidelity *f*_{s·s}), which is nothing but the specificity commonly defined for multi-substrate enzymes. This assay has been widely acknowledged as the standard method in DNAP fidelity studies. Nevertheless, there is an apparent difference between the specificity and the true fidelity of *exo*^{−}-DNAP. Enzyme specificity is operationally defined and measured under the steady-state condition which is usually established in experiments by two requirements, *i.e.* the substrate is in large excess to the enzyme, and the enzyme can dissociate from the product after a single turnover is finished. These two requirements are often met by many reactions catalyzed by non-processive enzymes, and the enzyme specificity is indeed a good measure of the relative contents of final products of competing substrates. DNAP, however, is a processive enzyme and rarely dissociates from the template, which violates the second requirement. Additionally, DNA replication *in vivo* consists of only a single template DNA but many DNAPs, which violates the first requirement. Hence, no steady-state assumptions can be made *a priori* to single-nucleotide incorporation reactions either *in vivo* or *in vitro*. So, is the enzyme specificity really relevant to the true fidelity of *exo*^{−}-DNAP ? So far as we know, there was only one experiment work which did the comparison and indicated the possible equivalence of *f*_{s·s} to *f*_{true} for Klenow fragment (KF^{−})[6], but no theoretical works have ever been published to investigate the true fidelity of DNA replication and examine the equivalence of *f*_{s·s} and *f*_{true} in general.

Besides the steady-state method, the transient-state kinetic analysis was also proposed to obtain the specificity constant[8, 9]. Under the pre-steady-state condition or the single-turnover condition, one can obtain the parameter *k*_{pol}*/ K*_{d} (a substitute for *k*_{cat}*/K*_{m}) for the single-nucleotide incorporation reactions with *exo*^{−}-DNAP, and define the site-specific fidelity as (*k*_{pol}*/K*_{d})_{R}[dRTP]/(*k*_{pol}*/K*_{d})_{W} [dWTP] (denoted as transient-state fidelity *f*_{t·s}). Either *k*_{pol}*/K*_{d} or *k*_{cat}*/K*_{m} can only be properly interpreted by kinetic models, so the relation between the two parameters is actually model-dependent. For the commonly used two-step kinetic model (including only dNTP binding and the subsequent chemical step), it can be shown that they are equal[10]. For complex models including additional steps (*e.g.* DNA binding to DNAP, translocation of DNAP on the template, PPi release, etc.), their equivalence can also be proved in general (details will given in later sections). But again the relevance of *f*_{t·s} to *f*_{true} is not yet clarified. Although the experiment has indicated the possible equivalence of *f*_{t·s} and *f*_{true} for KF^{−}[6], a general theoretical examination is still needed.

Further, these methods fail to definitely measure the site-specific fidelity of *exo*^{+}-DNAP. For *exo*^{+}-DNAP, the total fidelity is often assumed to consist of two multiplier factors. The first is the initial discrimination *f*_{ini} contributed solely by the polymerase domain, which can be given by *f*_{s·s} or *f*_{t·s}. The second factor is the additional proofreading efficiency *f*_{pro} contributed by the exonuclease domain, which is defined by the ratio of the elongation probability of the terminal R (*P*_{el,R}) to that of the terminal W (*P*_{el,W}). Here the elongation probability is given by *P*_{el} = *k*_{el}*/*(*k*_{el} + *k*_{ex}), *k*_{el} is the elongation rate to the next site, and *k*_{ex} is the excision rate of the terminal nucleotide at the assigned site (*e.g.* Eq.(A1-A6) in Ref.[11]). *P*_{el,R} is usually assumed close to 100%, so *f*_{pro} equals approximately to 1 + *k*_{ex,W} */k*_{el,W}. Although these expressions seem reasonable, there are some problems that were not clarified. First, the definition of *f*_{pro} is subjective though intuitive, so a rigorous theoretical foundation is needed. Second, the rate parameters *k*_{el} and *k*_{ex} are not well defined since both the elongation and the excision are multi-step processes, *i.e. k*_{el} and *k*_{ex} are unknown functions of the involved rate constants but there is not a unique way to define them. They could be theoretically defined under steady-state assumptions (Eq.(6) in Ref.[12]) or operationally defined by experiment assays (*e.g.* steady-state assays[13, 14] or transient-state assays [15]), but different ways often lead to inconsistent interpretations and quite different estimates of *f*_{pro} (as will be clarified in RESULTS AND DISCUSSION Sec.2). Additionally, *k*_{el} should be more properly understood as the effective elongation rate in the sense that the elongated terminal (the added nucleotide) is no longer excised. This condition is not met if the *exo*^{+}-DNAP can proofread the buried mismatches (*e.g.* the penultimate or antepenultimate mismatches, etc.). In these cases, *k*_{el} is affected not only by the next template site but also by further sites. Such far-neighbor effects were not seriously considered in previous studies. So, what on earth is the relation between the total fidelity *f*_{tot} (= *f*_{ini} · *f*_{pro}) and *f*_{true}?

Recently two equivalent rigorous theories were proposed to investigate the true fidelity of either *exo*^{−}-DNAP or *exo*^{+}-DNAP, *i.e.* the iterated function systems by P. Gaspard [16] and the first-passage (FP) method by us [17]. In particular, we have obtained very simple and intuitive mathematical formulas by FP method to compute rigorously *f*_{true} of *exo*^{+}-DNAP, which can not be achieved by the steady-state or the transient-state analysis. With these firmly established results, we can address all the above questions in detail. In the following sections, we will first give a brief review of the FP method and the major conclusions already obtained for simplified kinetic models of DNA replication. Then we will generalize these conclusions to more realistic kinetic models for *exo*^{−}-DNAP and *exo*^{+}-DNAP, and carefully examine the relations between *f*_{s·s}, *f*_{t·s} and *f*_{true}. In particular, the FP analysis makes it possible to take full advantage of single-molecule techniques to investigate the site-specific fidelity, whereas the conventional steady-state or transient-state analysis applies only to ensemble reactions but not to single-molecule processes. Feasible single-molecule assays for either *exo*^{−}-DNAP or *exo*^{+}-DNAP will also be suggested.

## METHODS

### 1. Basics of the FP method

The first-passage (FP) method was proposed to study the replication of the entire template by *exo*^{+}-DNAP [17], which also applies to single-nucleotide incorporation reactions.

Here the highly simplified reaction scheme Fig.1 is taken as an example to illustrate the basic logic of this method. is the incorporation rate of dNTP to the primer terminal at the template site *i* − 1 (the dNTP-concentration dependence of is not explicitly shown here), is the excision rate of the primer terminal at the template site *i*. In Fig.1, dRTP and dWTP compete for each template site during the replication, so there will be various sequences in the final full-length products. The FP method describes the entire template-directed replication process by chemical kinetic equations, and directly compute the sequence distribution of the full-length products from which *f*_{true} can be precisely calculated. It is worth noting that the FP method does not need any extra assumptions like steady-state or quasi-equilibrium assumptions, or need to explicitly solve the kinetic equations as done in the transient-state analysis which is often a formidable task. Some illustrative examples of FP calculations will be given in later sections. Here we only list the major results in terms of and . Detailed computation can be found in Ref.[17]

Intuitively, and depend on the identity (A, G, T or C) and the state (matched or mismatched) not only of the base pair at site *i* but also of the one or more preceding base pairs. If there are only nearest-neighbor (first-order) effects, and can be written as and (or *X*_{i}) represents the nucleotide at site *i* − 1 (or *i*) on the template, *α*_{i−1} represents the nucleotide at site *i* − 1 on the primer, *α*_{i} represents the the next nucleotide to be incorporated to the primer terminal at site *i* (for ) or the terminal nucleotide of the primer at site *i* to be excised (for ). *X* and *α* can be any of the four types of nucleotides A, G, T and C. Similarly, there are etc. for the second-order neighbor effects, and so on for far-neighbor (higher-order) effects.

### 2. The true fidelity calculated by the FP method

For DNAP having first-order neighbor effects, in a wide range of the involved rate constants, we have derived the analytical expression of the fidelity at site *i* [17],
*R* represents the matched nucleotide, and *W* represents any one of the three types of mismatched nucleotides. For simplicity, we omit all the superscripts below unless it causes misunderstanding. Each term in the sum represents the error frequency of a particular type of mismatch, whose reciprocal is the mismatch-specific fidelity studied in the conventional steady-state assay or transient-state assay,
where
is the initial discrimination, and
is the proofreading efficiency. This is similar to *f*_{pro} defined in INTRODUCTION, if are regarded as *k*_{el,W} *k*_{ex,W} respectively.

For DNAP having second-order neighbor effects, with some reasonable assumptions about the rate parameters, we can obtain the fidelity at site *i* [17],
Each term in the sum represents the mismatch-specific error frequency at site *i*. Its reciprocal defines the mismatch-specific fidelity which again consists of the initial discrimination and the proofreading efficiency, but the latter differs significantly from *f*_{pro} in INTRODUCTION, since the effective elongation rate is not but instead which includes the next-nearest neighbor effects. The same logic can be readily generalized to higher-order neighbor effects where the proofreading efficiency will be more complicated [17, 18].

In real DNA replication, either the dNTP incorporation or the dNMP excision is a multi-step process. By using the FP method, the complex reaction scheme can be reduced to the simplified scheme Fig.1, and the fidelity can still be calculated by Eq.(2) or Eq.(5), with only one modification: and are now the effective incorporation rates and the effective excision rates respectively which are functions of the involved rate constants. In the following sections, we will derive these functions for different multi-step reaction models, and compare them with those obtained by steady-state or transient-state assays. For simplicity, we only discuss DNAP having first-order neighbor effects in details, since almost all the existing literature focused on this case. Higher-order neighbor effects will also be mentioned in SUMMARY.

## RESULTS AND DISCUSSION

### 1. Fidelity assays of exo^{−}-DNAP

#### 1.1. The true fidelity measured by the direct competition assay

Fig.2 shows a three-step kinetic model of the competitive incorporation of a single dRTP or dWTP to site *i* + 1. The true fidelity is precisely given by the ratio of the final product *D*_{i}*R* to *D*_{i}*W* when the substrate DNA are totally consumed, *i.e.*,
which can be calculated by FP method.

A part of the kinetic equations for this model are given below,
here *α* =R,W. The dNTP binding rate is denoted as . The basic idea of FP method is not to directly solve the kinetic equations rigorously (*e.g.* in the transient-state analysis) or approximately by imposing extra assumptions (*e.g.* the steady-state assumption). Instead, the two equations are integrated to give the products at time *t*,
The second term approaches to zero with *t* increases to infinity. dNTP is usually in large excess to template DNA either *in vivo* or *in vitro*, so [dNTP] remains approximately a constant during the reaction. Then the fidelity is simply given by
*f*_{true} is exactly the initial discrimination defined by Eq.(3) with the two effective incorporation rates and .

In practice, when the reaction time *t* is large enough for sufficient product accumulation (*i.e.*, the second term on the right side of Eq.(8) is far smaller than the first term), the measured [*D*_{i}*R*](*t*)*/*[*D*_{i}*W*](*t*) becomes nearly time-invariant, and thus it is a good measure of *f*_{true}. In the direct competition assay conducted by Bertram *et.al* [6], the incorporation reaction was terminated when about half of the substrate DNA were reacted. This termination criteria *per se* does not meet the above requirement. Other evidences should be considered. For instance, [*D*_{i}*R*](*t*)*/*[*D*_{i}*W*](*t*) is proportional to [dRTP]*/*[dWTP] if the reaction time *t* is large, so one can decide whether *t* is sufficient large by examining whether [*D*_{i}*R*](*t*)[dWTP]*/*[*D*_{i}*W*](*t*)[dRTP] becomes nearly a constant when [dWTP] or [dRTP] is changed. Combined with these evidences, Bertram *et.al* were able to show that [*D*_{i}*R*](*t*)*/*[*D*_{i}*W*](*t*) measured under their termination condition is really a good measure of the true fidelity.

#### 1.2. Effective rates of multi-step reactions uniquely determined by FP method

The above FP treatment can be directly extended to multi-state incorporation schemes like Fig.3 to get the effective incorporation rate, as below,

Details of the calculation can be found in Supplementary Materials (SM) Sec.I B. Here *k*_{1} is proportional to d-NTP concentration , so *k*^{∗} = *k*^{∗0}[dNTP]. The true fidelity is still given by Eq.(3) where and are effective rates defined here. Fig.3 describes the processive dNTP incorporation by DNAP without dissociation from the substrate DNA. If the dissociation is considered, the reaction scheme will be more complex, but the effective incorporation rates can still be given as above, as will be shown in RESULTS AND DISCUSSION Sec.2.2.

#### 1.3. The steady-state assay measures the true fidelity

The steady-state assays measure the initial velocity of product generation under the condition that the substrate is in large excess to the enzyme. The normalized velocity per enzyme is in general given by the Michaelis-Menten equation
Here the superscript *pol* indicates the polymerase activity, the subscript *s* · *s* indicates the steady state. Fitting the experimental data by this equation, one can get the specific constant *k*_{cat}*/K*_{m} either for dRTP incorporation or dWTP incorporation and estimate the fidelity (the initial discrimination) by *f*_{s·s} = (*k*_{cat}*/K*_{m})_{R}[dRTP]/(*k*_{cat}*/K*_{m})_{W} [dWTP]. What is the relation between *k*_{cat}*/K*_{m} and the effective incorporation rate in Eq.(3)?

To understand the exact meaning of *k*_{cat}*/K*_{m}, the complete multi-step incorporation reaction scheme Fig.(4) must be considered, which explicitly includes the DNAP binding step and the dissociation step. The last dissociation step is reasonably assumed irreversible, since the enzyme will much unlikely rebind to the same substrate molecule after dissociation because the substrate is in large excess to the enzyme. Under the steady-state condition, it can be easily shown

Here *k*^{∗} is defined in Eq.(10). *K*_{s·s} = 1 + *k*_{off} */k*_{on}, . *K*_{s·s} is exactly the same for either dRTP incorporation or dWTP incorporation. So the fidelity is given as
This is understandable: the steps before dNTP binding should not contribute to the initial discrimination. However, it does not mean that those steps do not contribute to the total fidelity. Actually they can affect the proof-reading efficiency, as will be demonstrated in RESULTS AND DISCUSSION Sec.2.

#### 1.4. The transient-state assay measures the true fidelity

The transient-state assay often refers to two different methods, the pre-steady-state assay or the single-turnover assay. Since the theoretical foundations of these two methods are the same, we only discuss the latter below for simplicity.

In single-turnover assays, the enzyme is in large excess to the substrate, and so the dissociation of the enzyme from the product is neglected. The time course of the product accumulation or the substrate consumption is monitored. The data is then fitted by exponential functions (single-exponential or multi-exponential) to give one or more exponents (*i.e.* the characteristic rates). In DNAP fidelity assay, these rates are complex functions of all the involved rate constants and dNTP concentration, which in principle can be analytically derived for any given kinetic model. For instance, for the commonly-used simple model including only substrate binding and the subsequent irreversible chemical step, one can directly solve the kinetic equations to get two rate functions. It was proved by K. Johnson that the smaller one obeys approximately the Michaelis-Menten-like equations[10].
The subscript *t* · *s* indicates the transient state. Similar to the steady-state assays, *k*_{pol}*/K*_{d} is regarded as the specific constant and thus DNAP fidelity is defined as the enzyme specificity *f*_{t·s} = (*k*_{pol}*/K*_{d})_{R}[dRTP]*/*(*k*_{pol}*/K*_{d})_{W} [dWTP]. It was also shown that *k*_{pol}*/K*_{d} equals to *k*_{cat}*/K*_{m} for the two-step model[10], so *f*_{t·s} = *f*_{s·s}.

The equality *k*_{pol}*/K*_{d} = *k*_{cat}*/K*_{m} actually holds for more general models like Fig.5. The rigorous proof is too lengthy to be presented here (details can be found in SM Sec.III B). Below we only give some intuitive explanations.

Since there are *N* states in the reaction scheme Fig.5, the time evolution of the system can be described by *N* exponentially-decay functions with *N* characteristic rates. If the smallest rate is much smaller than the others, it can be easily proven that follows the same form as Eq.(14) in general. Thus *k*_{pol}*/K*_{d} can be obtained by mathematically extrapolating [dNTP] to zero, . Intuitively, when [dNTP] approaches to zero, dNTP binding is the rate-limiting step, and all the steps after it will be so slow that the accumulation of each intermediate state is almost zero, *i.e.* they are approximately in steady state. This gives the velocity per substrate DNA, . *k** is defined by Eq.(10), [*D*_{0}] is the total concentration of DNA. On the other hand, all the steps before dNTP binding are relatively much faster and approximately in equilibrium, which leads to [*E*_{1} · *D*_{i−1}] ≈ (*k*_{on}*/k*_{off})[*D*_{i−1}], *i.e.*[*E*_{1} · *D*_{i−1}] ≈ [*D*_{0}]*/*(1 + *k*_{off} */k*_{on}). Here equals almost to the total DNAP concentration [*E*_{0}] since DNAP is in large excess to DNA. So the normalized velocity per substrate is , which leads to
*K*_{t·s} = 1 + *k*_{off} */k*_{on}. This is exactly the same as Eq.(12). So the fidelity can be given by

#### 1.5. A new approach to measure the true fidelity: a single-molecule assay

As stated above, neither the steady-state assay nor the transient-state assay can give the effective incorporation rates. This is not a problem for fidelity assay of *exo*^{−}-DNAP, but is indeed a serious problem for *exo*^{+}-DNAP (as shown in later sections). Then, how can one estimate the effective rates by a general method ? A possible way is to directly dissect the reaction mechanism, *i.e.* measuring the rate constants of each step by transient-state experiments [19–24], and then one can calculate the effective rate according to Eq.(10). This is a perfect approach but needs heavy work. Are there direct measurements of the effective rates? Here we suggest a possible single-molecule approach based on the FP analysis.

In a typical single-molecule experiment, the different states of the enzyme or the substrate can be distinguished by techniques such as smFRET[23, 24]. So, if the state *E*_{1} · *D*_{i 1} and in Fig.4 can be properly identified, the following single-molecule experiment can be done to measure the effective incorporation rates.

Initiate the nucleotide incorporation reaction by adding

*exo*^{−}-DNAP and dNTP to the substrate*D*_{i−1}and begin to record the state-switching trajectory of a single enzyme-DNA complex. Here dNTP can be dRTP or dWTP, and the primer terminal can be matched(R) or mismatched(W). When a single DNAP is captured by the substrate DNA, it can catalyze the incorporation of one or more nucleotides, depending on the template sequence context and the dNTP used. Then one can select a particular time window from the recorded trajectory, starting from the first-arrival at*E*_{1}·*D*_{i−1}(denoted as starting point) and ending at the first-arrival at (denoted as ending point).In this time window, the system may make multiple visits to

*E*_{1}·*D*_{i−1}. Count the total time the system resides at*E*_{1}·*D*_{i−1}. This so-called residence time may be clearly measured under low concentrations of dNTP.Collect sufficient samples to get the averaged residence time Γ

_{1,i−1}, which gives directly the required effective incorporation rate . Here . The proof of this equality is given in SM Sec.IV A.

The advantage of this single-molecule analysis is its model-independence. Since holds in general, the measurement of Γ_{1,i−1} does not depend on any hypothesis about the details of the reaction scheme (in fact, steps after dNTP binding are often unclear). So this method is hopefully an alternative of the conventional ensemble assays, particularly in cases where the latter may fail (see later sections).

### 2. Fidelity assays of exo^{+}-DNAP

It is widely conjectured that the total fidelity of *exo*^{+}-DNAP consists of the initial discrimination *f*_{ini} and the proofreading efficiency *f*_{pro}. The former *f*_{ini} can be well characterized by the methods introduced in the preceding sections. The latter *f*_{pro}, however, is assumed equal to 1 + *k*_{ex,W} */k*_{el,W}, where *k*_{ex} and *k*_{el} are not well defined and may have different meanings in different assays. Below we discuss some usual ways to characterize these rates. The reaction scheme under discussion is shown in Fig.6.

In this reaction scheme, before the excision, the primer terminal can transfer between *Pol* and *Exo* in two different ways, *i.e.* the intramolecular transfer without DNAP dissociation (the transfer rates are denoted as *k*_{pe} and *k*_{ep}), and the intermolecular transfer in which DNAP can dissociate from and rebind to either *Pol* or *Exo* (the rates are denoted as *k*_{on} and *k*_{off}). These two modes have been revealed by single-turnover experiments[15] and directly observed by smFRET[25]. *k*^{∗} is the effective incorporation rate, as explained below.

#### 2.1. Effective rates uniquely determined by the FP method

Applying the FP method to the kinetic equations for the reaction scheme Fig.6, one can reduce this complex scheme to the simplified scheme Fig.1, with rigorously defined effective rates given below. The logic of the reduction is the same as that in RESULTS AND DISCUSSION Sec.1.2. Details can be found in SM Sec.I C.
The rate constants can be written more explicitly such as , if the states of the base pairs at site *i, i* − 1, *etc.* are explicitly indicated. All the rate constants in the same formula have the same state-subscript. *k*^{∗} is defined by Eq.(10). *k*_{p→e} and *k*_{e→p} define the effective intermolecular transfer rates between *Pol* and *Exo*. So and represent the total transfer rates *via* both intramolecular and intermolecular ways. With these effective rates, the real initial discrimination and the real proofreading efficiency can be calculated by and respectively.

*f*_{true,pro} differs much from that given by K. Johnson *et al.* who may be the first to discuss the contribution of the two transfer pathways to the proofreading efficiency of T7 DNAP. Without a rigorous theoretical foundation, they gave intuitively [15]. The effective elongation rate *k*_{el,W} was interpreted as the steady-state incorporation velocity (at certain [dNTP]), which is incorrect as will be explained in the section below. The ambiguous quantity *θ* was supposed between 0 and 1 (depending on the fate of the DNA after dissociation) and, unlike , can not be expressed explicitly in terms of ,etc. So *f*_{pro} is not equivalent to *f*_{true,pro}.

#### 2.2. The steady-state assay can not measure the proofreading efficiency

Because of the co-existence of the polymerase activity and the exonuclease activity, reaction schemes consisting of a single-nucleotide incorporation and a single-nucleotide excision are theoretically unacceptable and also impossible to implement in experiments. So the usual steady-state assay does not apply to *exo*^{+}-DNAP. It’s also improper to define the elongation probability by imposing steady-state assumptions to such unrealistic reaction models, as given by Eq.(6) in Ref.[12]. Nevertheless, the steady-state assay can still be employed to study the polymerase and exonuclease separately.

When mixed with the *exo*^{+}-DNAP, the substrate DNA can bind either to the polymerase domain or to the exonuclease domain. For some *exo*^{−}-DNAP, the exonuclease domain may exist and still be able to bind (but not excise) the substrate DNA, which is not discussed in preceding sections. For the steady-state assays of d-NTP incorporation by such DNAPs, the reaction scheme becomes complicated (Fig.7. Again, the last enzyme dissociation steps are reasonably assumed irreversible under steady-state condition).

Under the steady-state condition, one can easily compute the specific constant
. So the fidelity is given by
which is exactly *f*_{true,ini}. Combined with Eq.(12), we conclude that the form of Eq.(18) is universal: *k*^{∗} represents the effective incorporation rate of the subprocess beginning from dNTP binding (DNAP dissociation is not involved), and is a simple function of the equilibrium constants of all steps before dNTP binding, no matter how complex the reaction scheme is. Since depends only on the identity and the state of the primer terminal but not on the next incoming dNTP, the enzyme specificity is indeed equal to *f*_{true,ini} in general.

There were also some studies trying the steady-state assay to define the effective elongation rate and the effective excision rate . For instance, some works used the specific constant[15, 19] or the maximal turnover rate [13] as . As shown above, however, *k* is not equal to the specific constant or *k*_{cat}. So the steady-state assay fails in principle to measure , unless . This condition is met by T7 DNAP (*k*_{pe} ≪ *k*_{ep} and were observed for the mismatched terminal[15]), and may even be generally met since replicative DNAPs are believed to be highly processive (*i.e.* ) and always tend to bind DNA preferentially at the polymerase domain (*i.e. k*_{pe}*/k*_{ep} ≪ 1). So the the specific constant, but not *k*_{cat}, might be used in practice as .

In the steady-state assay of the excision reaction (Fig.8), one may measure the initial velocity and interpret it as the effective excision rate [13]. Where-as is determined by all the rate constants in Fig.8, some rate constants like and are absent from . So in principle is not equal to . Under some conditions, *e.g.* and the dissociation of the enzyme from the substrate is fast enough after the excision, the initial velocity may be approximately equal to (details can be found in SM Sec.II C). But these conditions may not be met by real DNAP, *e.g.*, was observed for T7 polymerase[15]. Unless there are carefully designed control tests to provide compelling evidence, itself is not a good measure for .

One can also change the concentration of the substrate DNA to obtain the specific constant of the excision reaction in experiments[14], as can be shown theoretically
which is just irrelevant to . It can be shown further in any case
Details can be found in SM Sec.II C. In the experiment to estimate *f*_{pro} for ap-polymerse[13], the authors wrongly interpreted and as *k*_{cat} and respectively, and gave that . They thought this measure roughly reflects the true fidelity. Now it is clear that the two quantities are completely unrelated.

#### 2.3. The transient-state assay can measure the proofreading efficiency conditionally

When DNA can bind to either the polymerase domain or the deficient exonuclease domain, the scheme for the transient-state assay of the dNTP incorporation is depicted in Fig.9. By the same logic presented in preceding sections, the specificity constant defined by transient-state assays can be written as
. This is exactly the same as Eq.(18). So, like the steady-state assay, the transient-state assay also applies in general to estimate *f*_{true,ini}. Additionally, the specificity constant, but not *k*_{pol}, can be used to estimate when , as mentioned in the above section.

The transient-state assay of the exonuclease activity is often done under single-turnover conditions. The time course of product accumulation or substrate consumption is fitted by a single exponential or a double exponential to give one or two characteristic rates. In the single exponential case, the rate is simply taken as the effective excision rate . In the double exponential case, however, there is no criteria which one to select. This causes large uncertainty since the two rates often differ by one or more orders of magnitude. In the experiment of T7 polymerase[15], two types of excision reactions were conducted, with or without preincubation of DNA and DNAP. A single characteristic rate was obtained in the former, while two rates were obtained in the latter where the smaller one almost equals to the rate in the former case. So this smaller one was selected as . In the experiment of human mitochondrial DNAP[26, 27], however, the larger one of the two fitted exponents was selected in some cases. For instance, two fitted exponents of the excision reaction of the substrate 25×1/45 (DNA contains a single mismatch in the primer terminal) are 1.1*s*^{−1} and 0.04*s*^{−1} and the former was selected as [26]. Different choices of the exponents can result in estimates of differing by orders of magnitude. In the following, we show that the smallest of the fitted exponents may probably be equal to under some conditions.

The minimal scheme for the transient-state assay of the excision reaction is depicted in Fig.10. By solving the corresponding kinetic equations, one can get three characteristic rates and the smallest one is given by

Here . When the concentration of DNAP is large enough to ensure and *ϵ* ≈ 0 (compared to other terms in the denominator), Eq.23 can be simplified as
If , which is met if DNA binds preferentially to the polymerase domain, then we get (of the same order of magnitude), is defined in Eq.(17). So, if the real excision reaction follows the minimal scheme, may be interpreted as . In the experiment of human mitochondrial DNAP [26, 27], the author adopted this interpretation, but used *k*_{pol} as the effective elongation rate, and calculated the proofreading efficiency as . It is now clear that this quantity i not a proper measure of . Here may probably be replaced by (*k*_{pol}[dNTP]/*K*_{d})_{WR}, if and *k*_{ep} > *k*_{pe}.

It is worth emphasizing that the interpretation of is severely model-dependent. The reaction scheme could be more complicated than the minimal model in Fig.10, *e.g.* there may be multiple substeps in the intramolecular transfer process since the two domains are far apart (24 nm[28]), particularly when there are buried mismatches in the primer terminal. For any complex scheme, one can calculate the smallest characteristic rate and the real excision rate . These two functions always differ greatly (examples can be found in SM Sec.III D). So the single-turnover assay *per se* is not a universally reliable method to measure the effective excision rate. Is there a model-independent method to measure such excision rates? Below we suggest a possible single-molecule assay.

#### 2.4. The single-molecule assay measures the ture fidelity

Similar to RESULTS AND DISCUSSION Sec.1.5, a single-molecule assay can be proposed to directly measure the effective rates and , if the states in Fig.9 and Fig.10 can be well defined in the experiments. For instance, the states *E*_{p} · *D, E*_{e} · *D* and *E* + *D* can be clearly resolved by smFRET [25, 29].

To measure , the experiment is initiated by mixing DNAP and dNTP to the single molecule DNA. If *E*_{p} · *D*_{i}, *E*_{e} · *D*_{i}, *E* + *D*_{i} and *E*_{p} · *D*_{i+1} in Fig.9 can be distinguished in the nucleotide incorporation process, then the residence time at *E*_{p} · *D*_{i} can be counted from a time window of the state-switching trajectory of the enzyme-DNA complex with the starting point *E*_{p} · *D*_{i} and the ending point *E*_{p} · *D*_{i+1}. Collecting sufficient samples to obtain the averaged residence time Γ_{p,i}, one can get by the FP analysis

The measurement of follows the same logic. The experiment is initiated by adding DNAPs to the single molecule DNA. If the states *E*_{p} · *D*_{i}, *E*_{e} · *D*_{i}, *E* + *D*_{i} and *E*_{p} · *D*_{i−1} in Fig.10 can be distinguished in the excision process, the state-switching trajectory between the starting point *E*_{p} · *D*_{i} and the ending point *E*_{p} · *D*_{i−1} can be recorded. Then the averaged residence time Γ_{p,i} at *E*_{p} · *D*_{i} is obtained, which gives . Sometimes, however, the excision may occur without visiting *E*_{p} · *D*. The trajectory recorded in such cases are not taken for the averaging. Detailed explanations can be found in SM Sec.IV B. This analysis also applies to more complex reaction mechanisms and one can always get .

### 3. More realistic models including DNAP translocation

So far we have not considered the important step, DNAP translocation, in the above kinetic models. Good-man *et al.* had discussed the effect of translocation on the transient-state gel assay very early[30], and recently DNAP translocation has been observed for phi29 DNAP by using nanopore techniques[31–34] or optical tweezers[35]. However, so far there is no any theory or experiment to study the effect of translocation on the replication fidelity.

By using optical tweezers, Morin *et al.* had shown that DNAP translocation is not powered by PPi release or dNTP binding[35] and it’s indeed a thermal ratchet process. So the minimal scheme accounting for DNAP translocation can be depicted as Fig.11. *k*_{t} and *r*_{t} are the forward and the backward translocation rate respectively. *E*_{pre} · *D*_{i} and *E*_{post} · *D*_{i} indicate the pre-translocation and the post-translocation state of DNAP respectively. Here, the primer terminal can only switch intramolecularly between *E*_{e} and *E*_{pre} (but not *E*_{post}), according to the experimental observation [34]. We also assume DNAP can bind DNA either in state *E*_{pre} · *D*_{i} or in state *E*_{post} · *D*_{i} with possibly different binding rates and dissociation rates.

This complex scheme can be reduced to the simplified scheme Fig.1 by using the FP analysis. The obtained effective rat are briefly written as and Here *η, η*′, *ξ* are complex functions of all the rate constants in Fig.11 except *k*^{∗}, which are too complex to be given here (see details in SM Sec.V A). These effective rates are much different from that defined by steady-state or transient-state assays. Below we only show the difference between the calculated *f*_{true} and the operationally-defined *f*_{t ·s} in transient-state assays.
The initial discrimination can be precisely measured by transient-state assays, . But the two proofreading efficiencies are hugely different, (*f*_{t·s,pro})_{i} ≠ (*f*_{true,pro})_{i}. The complex functions (*f*_{true,pro})_{i} and (*f*_{t·s,pro})_{i} are given in SM Sec.V A,B. They may be approximately equal only under some extreme conditions, *e.g.* and . This may be true when the terminal is in the matched state, so the translocation is in fast equilibrium and the states *pre* and *post* can be treated as a single state, as always assumed in conventional gel assays or other ensemble assays. But these conditions may not be met if there is a terminal mismatch or a buried mismatch which may slow down the translocation [36]. In such cases, *f*_{t·s} is quite different from *f*_{ture}. To reliably estimate *f*_{ture}, we suggest the following single-molecule assay to directly measure the effective rates.

First, if the states *pre* and *post* cannot be distinguished in the experiment, indicating that the translocation is a fast process, the assays presented in preceding sections (RESULTS AND DISCUSSION Sec.1.5 or Sec.2.4) can be used.

Second, if the translocation is a relatively slow process, either *pre* or *post* can be directly observed (*e.g.* for Dpo4 polymerase by smFRET [37]), then the effective incorporation rate is no longer *k*^{∗} but *k*^{∗}(1 − *qη*′ *ξ*). However, it’s hard to obtain this effective rate in a single measurement, since it consists of both the polymerase and the exonuclease contributions. Fortunately wne can measure the factors *k*^{∗} and 1 − *qη*′ *ξ* separately. The measurement of *k*^{∗} is basically the same as that given in RESULTS AND DISCUSSION Sec.1.5 and Sec.2.4. The reaction scheme is shown in Fig.12. The experiment is initiated by mixing DNAP and dNTP to the single molecule DNA. The time trajectory between the starting point *E*_{post} · *D*_{i−1} and the ending point *E*_{pre} · *D*_{i} is selected, if *E*_{post} · *D*_{i−1}, *E*_{pre} · *D*_{i} and other states can be well distinguished. Then the average residence time at *E*_{post} · *D*_{i−1} gives or , which defines *f*_{true,ini}. Detailed explanations can be found in SM Sec.V C.

The logic to measure *qη*′ *ξ* is given below, as shown in Fig.13.

The experiment is initiated by mixing DNAP with DNA.

Record the state-switching trajectory of the complex. It may go directly to

*E*_{post}·*D*_{i−1}without visiting*E*_{pre}·*D*_{i}. Or it may arrive at*E*_{pre}·*D*_{i}*via*whatever pathway before the excision, and then go to*E*_{post}·*D*_{i−1}(with or without visiting*E*_{post}·*D*_{i}). We collect trajectories of the latter case, and denote*E*_{pre}·*D*_{i}as the starting point (it may be visited repeatedly),*E*_{post}·*D*_{i}and*E*_{post}·*D*_{i−1}as the two ending points.Select all the windows from the trajectories, which are between the starting point and either ending point. The windows are classified in two types,

*i.e.*between*E*_{pre}·*D*_{i}and*E*_{post}·*D*_{i}, or between*E*_{pre}·*D*_{i}and*E*_{post}·*D*_{i−1}without visiting*E*_{post}·*D*_{i}.Count the total number of either type of window

*n*_{post,i},*n*_{post,i−1}, and one gets*n*_{post,i−1}/(*n*_{post,i−1}+*n*_{post,i}) = (*qη*′/*ξ*)_{i}. Details can be found in SM Sec.V C.

The measurement of follows the same logic in RESULTS AND DISCUSSION Sec.2.4. The reaction scheme is shown in Fig.14. The experiment is initiated by adding DNAPs to the single molecule DNA. The time window selected from the trajectory is between the starting point *E*_{post} · *D*_{i} and the ending point *E*_{post} · *D*_{i−1}, if *E*_{post} · *D*_{i}, *E*_{post} · *D*_{i−1} and other states can be well distinguished. Then the average residence time at *E*_{post} · *D*_{I} gives . Similarly, the trajectory recorded without visiting *E*_{post} · *D*_{i} are not taken for the averaging. Detailed explanations can be found in SM Sec.V C.

## SUMMARY

The conventional kinetic assays of DNAP fidelity, *i.e.* the steady-state assay or the transient-state assay, have indicated that the initial discrimination *f*_{ini} is about 10^{4∼5} and the proofreading efficiency *f*_{pro} is about 10^{2∼3} [28]. Although these assays have been widely used for decades and these estimates of *f*_{ini} and *f*_{pro} have been widely cited in the literatures, they are not unquestionable since the logic underlying these methods are not well founded. No rigorous theories about the true fidelity *f*_{ture} have ever been proposed, and its relation to the operationally defined *f*_{s·s} or *f*_{t·s} has never been clarified.

In this paper, we examined carefully the relations among *f*_{s·s}, *f*_{t·s} and *f*_{true}, based on the the FP method recently proposed by us to investigate the true fidelity of *exo*^{−}-DNAP or *exo*^{+}-DNAP. We conclude that these three definitions are equivalent in general for *exo*^{−}-DNAP, *i.e.* either the steady-state assay or the transient-state assay can give *f*_{ture} precisely just by measuring the specificity constant (*k*_{cat}*/K*_{m} or *k*_{pol}*/K*_{d}).

For *exo*^{+}-DNAP, however, the situation is more complicated. The steady-state assay or the transient-state assay can still be applied to measure the initial discrimination *f*_{ini}, as done for *exo*^{−}-DNAP (so the above cited estimates of *f*_{ini} are reliable). But either method fails to measure the effective elongation rate and the effective excision rate and thus in principle can not characterize *f*_{pro}. So the widely cited estimates *f*_{pro} ∼ 10^{2∼3} are very suspicious. Our analysis shows that only if the involved rate constants meet some special conditions, the two assays can give approximately the effective incorporation rates, but only the transient-state assay can give approximately the effective excision rate. If there are no other supporting evidences to ensure the required conditions are met, the conventional fidelity assays of *f*_{pro} *per se* are not reliable. Of course, the transient-state method can be used to measure the rate constants of each step of the excision reaction[15] and then *f*_{pro} can be calculated, but this is definitely a hard work.

So we proposed an alternative method, the single-molecule assay, to obtain the fidelity of either *exo*^{−}-DNAP or *exo*^{+}-DNAP by directly measure all the required effective rates, without dissecting the details of the reaction scheme. It is hopefully a general and reliable method for fidelity assay if some key states of the enzyme-substrate complex can be well resolved by the single-molecule techniques. In RESULTS AND DISCUSSION Sec.1.5, Sec.2.4 and Sec.3, we have designed several protocols to conduct the single-molecule experiment and data analysis, which are feasible in principle though it may be hard to implement in practice.

Last but not least, we have focused on the first-order (nearest) neighbor effects in this paper, but higher-order neighbor effects may also be important to the fidelity. Here we take the second-order (next-nearest) neighbor effect as an example. According to Eq.(5), the initial discrimination *f*_{ini} is of the same form as that defined by Eq.(3), so it can still be correctly given by the steady-state or the transient-state assays (in fact, these assays can measure *f*_{ini} for any-order neighbor effects). The proofreading efficiency *f*_{pro} of *exo*^{+}-DNAP consists of two factors, and , which can be regarded as the first-order and the second-order proofreading efficiency respectively. These two factors are both dependent on the stability of the primer-template duplex. For naked dsDNA duplex, numerous experiments have shown that a penultimate mismatch leads to much lower stability than a terminal mismatch [38]. This implies that a penultimate mismatch may more significantly disturb the base stacking of the primer-template conjunction in the polymerase domain and thus the forward translocation of DNAP will be slower and the *P ol*-to-*Exo* transfer of the primer terminal will be faster, if compared with the terminal mismatch. In such cases, the second-order factor may be larger than the first-order factor. This enhancement has been mentioned in Ref.[26], though the steady-state and transient-state assays used in that work and thus the obtained estimates of the two factors are all questionable (as pointed out in RESULTS AND DISCUSSION Sec.2). The single-molecule assay can be directly adopted to demonstrate the second-order effects by measuring the effective rates by the same protocols. We hope the analysis and the suggestions presented in this paper will urge a serious examination of the conventional fidelity assays and offer some new inspirations to single-molecule experimentalists to conduct more accurate fidelity analysis.

## ACKNOWLEDGMENTS

The authors thank the financial support by National Natural Science Foundation of China (No.11675180,11774358), the CAS Strategic Priority Research Program (No.XDA17010504), Key Research Program of Frontier Sciences of CAS (No.Y7Y1472Y61), Research Fund of Wenzhou Institute CAS (No.WIUCASYJ2020004,WIUCASQD2020009).