A simple method for automated equilibration detection in molecular simulations

Molecular simulations intended to compute equilibrium properties are often initiated from configurations that are highly atypical of equilibrium samples, a practice which can generate a distinct initial transient in mechanical observables computed from the simulation trajectory. Traditional practice in simulation data analysis recommends this initial portion be discarded to equilibration, but no simple, general, and automated procedure for this process exists. Here, we suggest a conceptually simple automated procedure that does not make strict assumptions about the distribution of the observable of interest, in which the equilibration time is chosen to maximize the number of effectively uncorrelated samples in the production timespan used to compute equilibrium averages. We present a simple Python reference implementation of this procedure, and demonstrate its utility on typical molecular simulation data.


Molecular simulations use Markov chain Monte Carlo
techniques [ ] to sample configurations x from an equilibrium distribution π(x), either exactly (using Monte Carlo methods such as Metropolis-Hastings) or approximately (using molecular dynamics integrators without Metropolization) [ ].
Due to the sensitivity of the equilibrium probability density π(x) to small perturbations in configuration x and the di iculty of producing su iciently good guesses of typical equilibrium configurations x ∼ π(x), these molecular simulations are o en started from highly atypical initial conditions. For example, simulations of biopolymers might be initiated from a fully extended conformation unrepresentative of behavior in solution, or a geometry derived from a fit to di raction data collected from a cryocooled crystal; solvated systems may be prepared by periodically replicating a small solvent box equilibrated under di erent conditions, yielding atypical densities and solvent structure; liquid mixtures or lipid bilayers may be constructed by using methods that fulfill spatial constraints (e.g. PackMol [ ]) but create locally aytpical geometries, requiring long simulation times to relax to typical configurations.
As a result, traditional practice in molecular simulation has recommended some initial portion of the trajectory be discarded to equilibration (also called burn-in in the MCMC literature [ ]). While the process of discarding initial samples is strictly unnecessary for the time-average of quantities of interest to eventually converge to the desired expectations [ ], this nevertheless o en allows the practitioner to avoid what may be impractically long run times to eliminate the bias in computed properties in finite-length simulations induced by atypical initial starting conditions. It is worth noting that a similar procedure is not a practice universally recommended by statisticians when sampling from posterior distributions in statistical inference [ ]; the di erences in complexity of probability densities typically encountered in statistics and molecular simulation may explain the difference in historical practice.
As a motivating example, consider the computation of the average density of liquid argon under a given set of reduced temperature and pressure conditions shown in Figure . To initiate the simulation, an initial dense liquid geometry at reduced density ρ * ≡ ρσ 3 = 0.960 was prepared and subjected to local energy minimization. The upper panel of Figure depicts the average relaxation behavior of simulations initiated from the same configuration with di erent random initial velocities and integrator random number seeds (see Simulation Details). The average of realizations of this process shows a characteristic relaxation away from the initial density toward the equilibrium density (Figure , upper panel, black line). As a result, the expectation of the running average of the density significantly deviates from the true expectation (Figure , lower panel, dashed line). This e ect leads to significantly biased estimates of the expectation unless simulations are su iciently long to eliminate starting point dependent bias, which takes a surprisingly long ∼ ns in this example. Note that this bias is present even in the average of many realizations because the same atypical starting condition is used for every realization of this simulation process.
To develop an automatic approach to eliminating this bias, we take motivation from the concept of reverse cumulative averaging from Yang et al. [ ], in which the trajectory statistics over the production region of the trajectory are examined for di erent choices of the end of the discarded equilibration region to determine the optimal production region to use for computing expectations and other statistical properties. We begin by first formalizing our objectives mathematically. To illustrate the bias in expectations induced by relaxation away from initial conditions, replicates of a simulation of liquid argon were initiated from the same energy-minimized initial configuration constructed with initial reduced density ρ * ≡ ρσ 3 = 0.960 but di erent random number seeds for stochastic integration. Top: The average of the reduced density (black line) over the replicates relaxes to the region of typical equilibrium densities over the first ∼ 100 τ of simulation time, where τ is a natural time unit (see Simulation Details). Bottom: If the average density is estimated by a cumulative average from the beginning of the simulation (red dotted line), the estimate will be heavily biased by the atypical starting density even beyond 1000 τ . Discarding even a small amount of initial data-in this case initial samples-results in a cumulative average estimate that converges to the true average (black dashed line) much more rapidly. Shaded regions denote % confidence intervals. The statistical ine iciency g as a function of equilibration time choice t0 is initially very large, but diminishes rapidly a er the system has relaxed to equilibrium. Middle: The number of e ectively uncorrelated samples N eff = (T − t0 + 1)/g shows a maximum at t0 ∼ 100 τ (red vertical lines), suggesting the system has equilibrated by this time. Bottom: The cumulative average density ρ * computed over the span [t0, T ] shows that the bias (deviation from the true estimate, shown as red dashed lines) is minimized for choices of t0 ≥ 100 τ . The standard deviation among replicates (shaded region) grows with t0 because fewer data are included in the estimate. The choice of optimal t0 that maximizes N eff (red vertical line) strikes a good balance between bias and variance. The true estimate (red dashed lines) is computed from averaging over the range [ , ] τ over all replicates.

STATEMENT OF THE PROBLEM
Consider T successively sampled configurations x t from a molecular simulation, with t = 1, . . . , T , initiated from x 0 . We presume we are interested in computing the expectation The estimatorÂ ≈ A constructed from the entire dataset is given by for an infinitely long simulation , the bias inÂ [1,T ] may be significant in a simulation of finite length T . By discarding samples t < t 0 to equilibration, we hope to exclude the initial transient from our sample average, and provide a less biased estimate of A , We can quantify the overall error in an estimatorÂ [t0,T ] in a sample average that starts at x 0 and excludes samples where denotes the expectation over independent realizations of the specific simulation process initiated from configuration x 0 , but with di erent velocities and random number seeds. We can rewrite the expected error δ 2Â by separating it into two components: The first term denotes the variance in the estimatorÂ, while the second term denotes the contribution from the squared bias, We note that this equality only holds for simulation schemes that sample from the true equilibrium density π(x), such as Metropolis-Hastings Monte Carlo or Metropolized dynamical integration schemes such as hybrid Monte Carlo (HMC). Molecular dynamics simulations utilizing finite timestep integration without Metropolization will produce averages that may deviate from the true expectation A [ ].

BIAS-VARIANCE TRADEOFF
With increasing equilibration time t 0 , bias is reduced, but the variance-the contribution to error due to random variation from having a finite number of uncorrelated sampleswill increase because less data is included in the estimate. This can be seen in the bottom panel of Figure , where the shaded region ( % confidence interval of the mean) increases in width with increasing equilibration time t 0 .
To examine the tradeo between bias and variance explicitly, Figure plots the bias and variance (here, shown as standard error) contributions against each other as a function of t 0 (denoted by color) as computed from statistics over all replicates. At t 0 = 0, the bias is large but variance is minimized. With increasing t 0 , bias is eventually eliminated but then variance rapidly grows as fewer uncorrelated samples are included in the estimate. There is a clear optimal choice at t 0 ∼ 100 τ that minimizes variance while also e ectively eliminating bias (where τ is a natural time unit-see Simulation Details).

SELECTING THE EQUILIBRATION TIME
Is there a simple approach to choosing an optimal equilibration time t 0 that provides a significantly improved esti-mateÂ [t0,T ] , even when we do not have access to multiple realizations? At worst, we hope that such a procedure would at least give some improvement over the naive estimate, such that δ 2Â [t0,T ] < δ 2Â [0,T ] ; at best, we hope that we can achieve a reasonable bias-variance tradeo close to the optimal point identified in Figure that minimizes bias without greatly increasing variance. We remark that, for cases in which the simulation is not long enough to reach equilibrium, no choice of t 0 will eliminate bias completely; the best we can hope for is to minimize this bias.
While automated methods for selecting the equilibration time t 0 have been proposed, these approaches have shortcomings that have greatly limited their use. The reverse cumulative averaging (RCA) method proposed by Yang et al. [ ], for example, uses a statistical test for normality to determine the point before which which the observable timeseries deviates from normality when examining the timeseries in reverse. While this concept may be reasonable for experimental data, where measurements o en represent the sum of many random variables such that the central limit theorem's guarantee of asymptotic normality ensures the distribution of the observable will be approximately normal, there is no such guarantee that instantaneous measurements of a simulation property of interest will be normally distributed. In fact, many properties will be decidedly non-normal. For a biomolecule such as a protein, for example, the radius of gyration, end-to-end distance, and torsion angles sampled during a simulation will all be highly nonnormal. Instead, we require a method that makes no assumptions about the nature of the distribution of the property under study.

AUTOCORRELATION ANALYSIS
The set of successively sampled configurations {x t } and their corresponding observables {a t } compose a correlated timeseries of observations. To estimate the statistical error or uncertainty in a stationary timeseries free of bias, we must be able to quantify the e ective number of uncorrelated samples present in the dataset. This is usually accomplished through computation of the statistical ine iciency g, which quantifies the number of correlated timeseries samples needed to produce a single e ectively uncorrelated sample of the observable of interest. While these concepts are well-established for the analysis of both Monte Carlo and molecular dynamics simulations [ -], we review them here for the sake of clarity.
For a given equilibration time choice t 0 , the statistical uncertainty in our estimatorÂ [t0,T ] can be written as, where T t0 ≡ T − t 0 + 1, the number of correlated samples in the timeseries {a t } T t0 . In the last step, we have split the double-sum into two separate sums-a term capturing the variance in the observations a t , and a remaining term capturing the correlation between observations. If t 0 is su iciently large for the initial bias to be eliminated, the remaining timeseries {a t } T t0 will obey the properties of both stationarity and time-reversibility, allowing us to write, where the variance σ 2 , statistical ine iciency g, and integrated autocorrelation time τ (in units of the sampling interval) are given by with the discrete-time normalized fluctuation autocorrelation function C t defined as C t ≡ a n a n+t − a n 2 a 2 n − a n 2 .

( )
In practice, it is di icult to estimate C t for t ∼ T , due to growth in the statistical error, so common estimators of g make use of several additional properties of C t to provide useful estimates (see Practical Computation of Statistical Ine iciencies).
The t 0 subscript for the variance σ 2 , the integrated autocorrelation time τ , and the statistical ine iciency t 0 mean that these quantities are only estimated over the production portion of the timeseries, {a t } T t=t0 . Since we assumed that the bias was eliminated by judicious choice of the equilibration time t 0 , this estimate of the statistical error will be poor for choices of t 0 that are too small.

THE ESSENTIAL IDEA
Suppose we choose some arbitrary time t 0 and discard all samples t ∈ [0, t 0 ) to equilibration, keeping [t 0 , T ] as the dataset to analyze. How much data remains? We can determine this by computing the statistical ine iciency g t0 for the interval [t 0 , T ], and computing the e ective number of uncorrelated samples N eff (t 0 ) ≡ (T − t 0 + 1)/g t0 . If we start at t 0 ≡ T and move t 0 to earlier and earlier points in time, we expect that the e ective number of uncorrelated samples N eff (t 0 ) will continue to grow until we start to include the highly atypical initial data. At that point, the integrated autocorrelation time τ (and hence the statistical ine iciency g) will greatly increase (a phenomenon observed earlier, e.g. Figure of [ ]). As a result, the e ective number of samples N eff will start to plummet.
Figure demonstrates this behavior for the liquid argon system described above, using averages of the statistical ine iciency g t0 and N eff (t 0 ) computed over independent replicate trajectories. At short t 0 , the average statistical ine iciency g (Figure , top panel) is large due to the contribution from slow relaxation from atypical initial conditions, while at long t 0 the statistical ine iciency estimate is much shorter and nearly constant of a large span of time origins. As a result, the average e ective number of uncorrelated samples N eff (Figure ,  analyzed as a function of equilibration time choice t0, with colors denoting the value of t0 (in units of τ ) corresponding to each plotted point. Using replicate simulations, the average bias (average deviation from true expectation) and standard deviation (random variation from replicate to replicate) were computed as a function of a prespecified fixed equilibration time t0, with colors running from violet (0 τ ) to red (1800 τ ). As is readily discerned, the bias for small t0 is initially large, but minimized for larger t0. By contrast, the standard error (a measure of variance, estimated here by standard deviation among replicates) grows as t0 grows above a certain critical time (here, ∼ 100 τ ). If the t0 that maximizes N eff is instead chosen individually for each trajectory based on that trajectory's estimates of statistical ine iciency g [t 0 ,T ] , the resulting bias-variance tradeo (black triangle) does an excellent job minimizing bias and variance simultaneously, comparable to what is possible for a choice of equilibration time t0 based on knowledge of the true bias and variance among many replicate estimates.
Bias-variance tradeo . How will the simple strategy of selecting the equilibration time t 0 using Eq work for cases where we do not know the statistical ine iciency g as a function of the equilibration time t 0 precisely? When all that is available is a single simulation, our best estimate of g t0 is derived from that simulation alone over the span [t 0 , T ]will this a ect the quality of our estimate of equilibration time? Empirically, this does not appear to be the casethe black triangle in Figure shows the bias and variance contributions to the error for estimates computed over the replicates where t 0 is individually determined from each simulation using this simple scheme based on selecting t 0 to maximize N eff for each individual realization. Despite not having knowledge about multiple realizations, this strategy e ectively achieves a near-optimal balance between minimizing bias without increasing variance.
Overall RMS error. How well does this strategy perform in terms of decreasing the overall error δÂ [t0,T ] compared to δÂ [0,T ] ? Figure compares the expected standard error (denoted δÂ) as a function of a fixed initial equilibration time t 0 (black line with shaded region denoting % confidence interval) with the strategy of selecting t 0 to maximize N eff for each realization (red line with shaded region denoting % confidence interval). While the minimum error for the fixedt 0 strategy ( . ± . ) is achieved at ∼ 100 τ -a fact that could only be determined from knowledge of multiple realizations-the simple strategy of selecting t 0 using Eq. achieves a minimum error of .

DISCUSSION
The scheme described here-in which the equilibration time t 0 is computed using Eq. as the choice that maximizes the number of uncorrelated samples in the production region [t 0 , T ]-is both conceptually and computationally straightforward. It provides an approach to determining the optimal amount of initial data to discard to equilibration in order to minimize variance while also minimizing initial bias, and does this without employing statistical tests that require generally unsatisfiable assumptions of normality of the observable of interest. All that is needed is to save the timeseries {a t } T 1 of the observable A(x) of interestthere is no need to store full configurations x t -and postprocess this dataset with a simple analysis procedure, for which we have provided a convenient Python reference implementation (see Simulation Details). As we have seen, this scheme empirically appears to select a practical compromise between bias and variance even when the statistical ine iciency g is estimated directly from the trajectory using Eq. .
To show that this approach is indeed general, we repeated the analysis illustrated above in Figs. -for a different choice of observable A(x) for the same liquid argon system-in this case, the reduced potential energy u * (x) ≡ βU (x). The results of this analysis are collected in Fig. . As can readily be seen, this reduced potential behaves in essentially the same way the reduced density does, and the simple scheme for automated determination of equilibration time t 0 from Eq. does just as well.
A word of caution is necessary. One can certainly envision pathological scenarios where this algorithm for selecting an optimal equilibration time will break down. In cases where the simulation is not long enough to reach equilibrium-let Note that the reduced potential [ ] for the isothermal-isobaric ensemble is generally defined as u * (x) = β[u(x) + pV (x)] to include the pressure-volume term βpV (x), but in order to demonstrate the performance of this analysis on an observable distinct from the density-which depends on V (x)-we omit the βpV (x) term in the present analysis. Using replicate simulations, the root-mean-squared (RMS) error (Eq. ) was computed (black line) along with % confidence interval (gray shading). The RMS error is minimized for fixed equilibration time choices in the range -τ . If the t0 that maximizes N eff is instead chosen individually for each trajectory based on that trajectory's estimated statistical ine iciency g [t 0 ,T ] using Eq. , the resulting RMS error (red line, % confidence interval shown as red shading) is quite close to the minimum RMS error achieved from any particular fixed choice of equilibration time t0, suggesting that this simple automated approach to selecting t0 achieves close to optimal performance. alone collect many uncorrelated samples from it-no choice of equilibration time will bestow upon the experimenter the ability to produce an unbiased estimate of the true expectation. Similarly, in cases where insu icient data is available for the statistical ine iciency to be estimated well, this algorithm is expected to perform poorly. However, in these cases, the data itself should be suspect if the trajectory is not at least an order of magnitude longer than the minimum estimated autocorrelation time.

SIMULATION DETAILS
All molecular dynamics simulations described here were performed with OpenMM . [ ] (available at openmm.org) using the Python API. All scripts used to retrieve the so ware versions used here, run the simulations, analyze data, and generate plots-along with the simulation data itself and scripts for generating figures-are available on GitHub .
To model liquid argon, the LennardJonesFluid model system in the openmmtools package was used with parameters appropriate for liquid argon (σ = . Å, = . kcal/mol). All results are reported in reduced (dimensionless) units. A cubic switching function was employed, with the potential gently switched to zero over r ∈ [σ, 3σ], and a long-range isotropic dispersion correction accounting for this switching behavior used to include neglected contributions. Simulations were performed using a periodic box of N = 500 atoms at reduced temperature T * ≡ k B T / = 0.850 and reduced pressure p * ≡ pσ 3 / = 1.266 using a All Python scripts necessary to reproduce this work-along with data plotted in the published version-are available at: http://github.com/choderalab/automatic-equilibrationdetection available at http://github.com/choderalab/openmmtools Langevin integrator [ ] with timestep ∆t = 0.01τ and collision rate ν = τ −1 , with characteristic oscillation timescale τ = mr 2 0 /72 and r 0 = 2 1/6 σ [ ]. All times are reported in multiples of the characteristic timescale τ . A molecular scaling Metropolis Monte Carlo barostat with Gaussian simulation volume change proposal moves attempted every τ ( timesteps), using an adaptive algorithm that adjusts the proposal width during the initial part of the simulation [ ]. Densities were recorded every τ ( timesteps). The true expectation ρ * was estimated from the sample average over all realizations over [ , ] τ . The automated equilibration detection scheme is also available in the timeseries module of the pymbar package as detectEquilibration(), and can be accessed using the following code: from pymbar.timeseries import detectEquilibration # determine equilibrated region [t0, g, Neff_max] = detectEquilibration(A_t) # discard initial samples to equilibration A_t = A_t[t0:]

PRACTICAL COMPUTATION OF STATISTICAL INEFFICIENCIES
The robust computation of the statistical ine iciency g (defined by Eq. ) for a finite timeseries a t , t = 0, . . . , T deserves some comment. There are, in fact, a variety of schemes for estimating g described in the literature, and their behaviors for finite datasets may di er, leading to different estimates of the equilibration time t 0 using the algorithm of Eq. .
The main issue is that a straightforward approach to estimating the statistical ine iciency using Eqs. -in which the expectations are simply replaced with sample estimates causes the statistical error in the estimated correlation func-tion C t to grow with t in a manner that allows this error to quickly overwhelm the sum of Eq. . As a result, a number of alternative schemes-generally based on controlling the error in the estimated C t or truncating the sum of Eq. when the error grows too large-have been proposed.
For stationary, irreducible, reversible Markov chains, Geyer observed that a function Γ k ≡ γ 2k + γ 2k+1 of the unnormalized fluctuation autocorrelation function γ t ≡ a i a i+t − a i 2 has a number of pleasant properties (Theorem . of [ ]): It is strictly positive, strictly decreasing, and strictly convex. Some or all of these properties can be exploited to define a family of estimators called initial sequence methods (see Section . of [ ] and Section . . of [ ]), of which the initial convex sequence (ICS) estimator is generally agreed to be optimal, if somewhat more complex to implement.
All computations in this manuscript used the fast multiscale method described in Section . of [ ], which we found performed equivalently well to the Geyer estimators (data not shown). This method is related to a multiscale variant of the initial positive sequence (IPS) method of Geyer [ ], where contributions are accumulated at increasingly longer lag times and the sum of Eq. is truncated when the terms become negative. We have found this method to be both fast and to provide useful estimates of the statistical ine iciency, but it may not perform well for all problems.