A field theoretic approach to non-equilibrium population genetics in the strong selection regime

Natural populations are virtually never observed in equilibrium, yet equilibrium approximations comprise the majority of our understanding of population genetics. Using standard tools from statistical physics, a formalism is presented that re-expresses the stochastic equations describing allelic evolution as a partition functional over all possible allelic trajectories (‘paths’) governed by selection, mutation, and drift. A perturbative field theory is developed for strong additive selection, relevant to disease variation, that facilitates the straightforward computation of closed-form approximations for time-dependent moments of the allele frequency distribution across a wide range of non-equilibrium scenarios; examples are presented for constant population size, exponential growth, bottlenecks, and oscillatory size, all of which align well to simulations and break down just above the drift barrier. Equilibration times are computed and, even for static population size, generically extend beyond the order 1/s timescale associated with exponential frequency decay. Though the mutation load is largely robust to variable population size, perturbative drift-based corrections to the deterministic trajectory are readily computed. Under strong selection, the variance of a new mutation’s frequency (related to homozygosity) is dominated by drift-driven dynamics and a transient increase in variance often occurs prior to equilibrating. The excess kurtosis over skew squared is roughly constant (i.e., independent of selection, provided 2Ns ≳ 5) for static population size, and thus potentially sensitive to deviation from equilibrium. These insights highlight the value of such closed-form approximations, naturally generated from Feynman diagrams in a perturbative field theory, which can simply and accurately capture the parameter dependences describing a variety of non-equilibrium population genetic phenomena of interest.


Table of Contents
S1 Supplemental tables: quantity and parameter definitions, p.S2 Table S1: Function, functional, and symbol definitions, p.S2 Table S2: Population parameters definitions, p.S4  Table S1: Function, functional, and symbol definitions

S1 Supplemental tables
Dirac delta function centered at time (t − t ′ ).
The k th functional derivative of a function f (t); acts on a functional, analogous to a partial derivative. See Section S2. p(t) Allele frequency field; time-dependent function treated as a field, a scalar valued function at all time points t, where t ≥ t 0 (= 0) is the generation time in the continuum limit. p 0 Initial allele frequency p 0 ≡ p(t 0 ) (setting t 0 = 0 for convenience); treated as a tunable parameter, but various approximations and simulation comparisons assume the initial frequency p 0 = 1 2N (i.e., a single allele in one haploid individual) for convenience and biological relevance to new mutations. p(t) Frequency dual field; time-dependent function treated as a field. Defined by taking the continuum time limit ofp(t i ) = iz(t i ); here, t i represents discrete generation time and z(t i ) is the Fourier transform of allele frequency p(t i ) at each time point t i . This is the dual field to frequency p(t) such that an allele frequency path between times t 1 and t 2 originates atp(t 1 ) and ends at p(t 2 ); analogous to the relationship between x and p in the forward diffusion equation for φ(p, t 2 x, t 1 ), where p(t 1 ) = x. η(t) Gaussian white noise parameter; stochastic contribution to the allele frequency due to genetic drift, when the dynamics are represented in the form of a stochastic differential equation (i.e., the Langevin equation); defined to have mean E[η(t)] = 0, variance Var[η(t)] = E[η 2 (t)] = 1, and no temporal correlations (i.e., satisfies a Markov condition E[η(t)η(t ′ )] = δ(t − t ′ )).

Dp(t)
Differential element of the path integral; the product of an infinite number of differential elements dp for the allele frequency p at each time point t k : ∏ i=f i=0 dp(t k ), where t f = t is the time of observation. Dp(t) is defined in the same way. See Appendix A1.

J(t)
Sink function for p(t); arbitrary time-dependent function defining an absorbing 'sink' (or 'annihilation' operator) for allele frequency; used in the partition functional to generate moments of the distribution for p(t) via functional derivatives (analogous to the the Laplace transform variable for p(t) in a moment generating function).

J(t)
Source function forp(t); arbitrary time-dependent function defining an emitting 'source' (or 'creation' operator) for allele frequency; used in the partition functional to generate moments of the distribution forp(t) via functional derivatives. S[p(t ′ ), p(t ′′ )] The action; exponentiated in the partition functional; defines the probabilistic weights for allele trajectories (paths); dependent on the functional forms of the functionsp(t) and p(t). A field theory is fully defined by its action.
, p(t ′′ )] The free action; bilinear (inp and p) part of the action used to determine the propagator(s) for a field theory. The free action is usually defined to be fully soluble and is subsequently used extensively in perturbation theory (to approximate propagation between interaction events). Note that the free action for PGFT has a single time derivative (usually indicates diffusion-like or advection-like propagation, as opposed to wave-like propagation). S c [p(t ′ ), p(t ′′ )] The coupling action; summarizes all possible interactions in the theory; generally defined as all non-bilinear parts of the action. In PGFT, the coupling action contains terms describing drift and the quadratic effect of natural selection when frequency is high (i.e., the p 2 dependence in −sp (1 − p) in the diffusion or Langevin equations). The initial condition and mutational influx (derived allele mutation into the frequency bin 1 2N ) have been included in the coupling action for intuitive clarity, but can be incorporated into the free action by redefining the allele frequency, if desired.
Propagator of the 'free' allele frequency (i.e., from the free action) between an earlier time t ′ and a later time t; found by solving the Green's function equation for the operator defined in the free action (i.e., the propagator is the inverse of the linear operator acting on p(t) in the free action S 0 ). PGFT uses the following free propagator: . Note that the decay rate of the exponential in this propagator defines s ′ , the rate of frequency loss due to selection, back mutation, and saturation of the derived allele target size (see Population genetic parameters table). . Setting the arguments to J(t ′ ) and J(t) yields properties of the relationship between time points (e.g., the correlation function E[p(t ′ )p(t)] between the distributions at times t ′ and t).

E[p n (t)]
Time-dependent non-central moments of the allele frequency probability distribution. The first moment (n = 1) defines the time-dependence of the mean p(t) ≡ E[p(t)] frequency and is denoted as p(t) for convenience; k = 2 defines the timedependent homozygosity; higher moments describe the asymmetry and tails. κ n (t) Time-dependent cumulants of the allele frequency probability distribution; first cumulant is equivalent to p(t); second, third, and fourth cumulants represent variance, non-standardized skew, and non-standardized excess kurtosis. Cumulants are the primary object of interest in the theory, as they can be computed as the sum of connected Feynman diagrams (disconnected diagrams are related to lower cumulants). K n (t) Standardized time-dependent cumulants of the allele frequency probability distribution; the skew K 3 = κ3 κ 3 2 2 and excess kurtosis K 4 = κ4 are computed explicitly and compared to simulations. ξ ≡ 2K4 3K 2 3 Equilibrium parameter approximately equal to one when selection is sufficiently strong (2N s ′ ≳ 5); equivalently expressed as ξ = 2κ4κ2 3κ 2 3 . Under mutationselection-drift balance, this ratio of cumulants becomes roughly independent of the parameters s, µ, and N = const.; this is a property of the Gamma distribution for allele frequencies under strong selection described by Nei (1968) [1]. After correcting for sampling, estimation of this parameter may provide a test for consistency with mutation-selection-drift balance in a population of constant size.
Population size in number of diploid individuals (2N (t) haploids); treated throughout as a deterministic function with a parametric definition specified for each demography; see Supplemental table ST3. For constant population size, N ≡ N (t) = const. is used for notational convenience. µ Mutation rate from ancestral to derived allele. µ b Back mutation rate from derived to ancestral allele. L Target size for derived mutations; when not explicitly shown, assumed to represent a single locus (i.e., taken to be L = 1). s Selection coefficient; defined as a positive quantity (s ≥ 0) such that the negative sign is made explicit when selection is deleterious (i.e., purifying selection is represented as −s and beneficial selection is represented as s assuming s > 0). s hom Homozygous and heterozygous selection coefficients, respectively. Additive s het natural selection (s hom = 2s het ) is assumed throughout the manuscript, but input for neqPopDynx allows for specification of a general diploid selection coefficients; see Materials and Methods for description of neqPopDynx and Data availability for a link to download and input parameter instructions. s ′ ≡ (s + µ + µ b ) Effective selection coefficient defined by the decay rate of the allele frequency of the propagator G(t 2 , t 1 ) ∝ e −s(t2−t1) for the free action. Note, also referred to as the 'mass' (e.g., in particle physics). Accounts for all effects that deterministically reduce allele frequency: purifying selection, back mutation, and saturation of the derived mutational target size. θ ≡ 4N (t)µ Population scaled mutation rate. Not to be confused with the Heaviside function Θ(t). γ ≡ 2N (t)s Population scaled selection coefficient (dependent on s). Low mutation rate limit of the population scaled effective selection coefficient γ ′ . Not to be confused with the growth rate Γ. γ ′ ≡ 2N (t)s ′ Population scaled effective selection coefficient (dependent on s ′ ). Generalization of the population scaled selection coefficient γ such that mutation and back mutation rates may be appreciable. Not to be confused with the growth rate Γ. λ ≡ µ s Mutation-selection balance parameter with negligible back mutation rate (µ b ≪ µ) and without recurrent mutations (θ ≪ 1). Low mutation rate limit of the effective mutation-selection balance parameter λ ′ when µ ≪ s. λ ′ ≡ µ s ′ Effective mutation-selection balance parameter for general mutation and back mutation rates. Generalization of the mutation-selection balance parameter λ such that mutation and/or back mutation rates may be appreciable. Note that this parameter is bounded by λ ′ ≤ 1.
Initial population size in number of diploid individuals; used in parameterizations of exponential and bottleneck demographies. N exp (t) ≡ N 0 e Γt Time-dependent diploid population size for exponential demography; treated as a deterministic function (i.e., independent of the state of other population parameters.
Γ Population growth rate for exponential demography.
Modified population-scaled selection coefficient; emergent parameter appearing in moments in the exponential demography. T i Time when population is reduced from initial population size N 0 to bottleneck population size N b .
T b Duration of bottleneck; re-expansion from bottleneck to original population size occurs at time Heaviside functions centered after the bottleneck begins and after the bottleneck Θ i+b ≡ Θ (t − (T i + T b )) ends, respectively; terms proportional to (Θ i − Θ i+b ) are only nonzero during the bottleneck; defined to reduce algebraic complexity of bottleneck results.
Time-dependent diploid population size for cyclical/oscillatory demography; treated as a deterministic function. See definition in Equation S6.
τ Period of oscillation period (i.e., time to complete one cycle).
ϕ ≡ 2πt τ Angular parameterization of oscillation; this enforces correct behavior of cosine parameterization such that the population size is the same when t = nτ for any integer multiple n; defined to reduce algebraic complexity of oscillatory results.
N max Maximum and minimum diploid population size, respectively. Note that the N min definition for N osc (t) implies N osc (t 0 ) = N max after setting t 0 = 0 for convenience (due to the cosine, rather than sine); this is simply an initialization condition and is not meant to hold biological significance.
Linear combinations of N max and N min defined to reduce algebraic complexity N + bn ≡ (N max + N min ) of oscillatory results.

S2 A brief note on evaluating functional derivatives
Readers unfamiliar with functional calculus may want to consult Srednicki (2007) [2] for a more comprehensive discussion and context for the application of functional calculus to field theory. Though functional calculus may be less familiar, the rules are straightforward and analogous to multivariable calculus. In the present context, the application is the same as in [2]: functional derivatives are applied to the partition functional, a path integral defining the probability distribution of all possible allele trajectories, to evaluate moments of the allele frequency probability distribution . The partition functional is analogous to a moment generating function, but instead of applying partial derivatives to the density to evaluate scalar valued moments, functional derivatives are used to evaluate moments in the form of time-dependent functions (e.g., to find the time dependence of the mean frequency). In the manuscript text, it is noted that such moments can be computed from the partition functional Z[J(t a ), J(t b )] as follows.
Here, the functional derivative δ δJ(t) is defined by the following application toJ(t).
Applying the functional derivative to an arbitrary test function f (J(t a ), t a ) uses the functional equivalent of the chain rule.
In the second line, the partial derivative treats the functionJ as if it is a variable rather than a function such that ∂ ∂J f (J) = f ′ (J), which is followed by applying Equation S2. Functional derivatives of δ δJ(t) are defined in the same way. Additionally, functional derivatives commute with both integrals and path integrals. Since the partition functional is a functional integral, the functional derivatives are applied to the exponentiated action directly (i.e., the test function f is the function e −S[p,p] ).
After computing the derivatives, a limit must be taken to remove the source functions:J(t) → 0 and J(t) → 0 Formally, this limit is taken by defining the source functionsJ(t) = εj(t) and J(t) = εj(t), where ε is a scalar valued real number, and taking the limit ε → 0.

S3 Feynman diagrams and corresponding integrals
Here, integrals for all diagrams presented in the accompanying manuscript (repeated here for clarity) are computed using the appropriate Feynman rules. For an accessible overview relevant to the present context, please consult Chow and Buice (2015) [3]; further discussion of the mathematical underpinnings of Feynman diagrams can be found in [2] and [4]. For each integral, the relevant allele frequency propagators ) are shown and expressed in terms of the Heaviside function Θ(x). Results are presented for all relevant expressions that appear in the manuscript text for the four demographies discussed: constant population size, exponential population growth, a population that experiences an instantaneous bottleneck and subsequent re-expansion, and a cyclical population with oscillating size. Note that all expressions below are also presented in Supplemental File SF1 for easier use in any future applications. For simplicity, the integrals and corresponding results below are written using several definitions to shorten notation. First, all results are expressed in terms of the following, using the same notation as in the manuscript text.
For the population of constant size, specifically, the following notation is used.
In expressions for the exponential demography, the following definitions are used when appropriate.
For the bottleneck demography, the integrals are expressed using the following notation.
Integrals for the cyclical demography with oscillating size are expressed in terms of the following variables.
Additionally, the initial time will be set to t 0 = 0 in all cases such that the generation time t is a strictly positive quantity. Thus, all results should be considered as being implicitly multiplied by an overall factor of Θ(t).

S3.1 Diagrams for the mean frequency p(t)
Diagrams D2 and D3 are higher order corrections to the mean frequency p(t), as discussed in the manuscript. Diagram D3 represents the first (i.e., lowest-vertex order) contributions to the mean frequency that are dependent on drift (i.e., on the N 3 vertex). Thus, these diagrams become demography-dependent, and must be evaluated separately for the exponential, bottleneck, and cyclical populations. However, the expressions below make clear that the corresponding contributions to the mean frequency are small; these expressions are suppressed by 1 γ ′ , or the analogous variable defined for each demography (e.g., γ ′ exp , γ ′ 0 , γ ′ b ). Note that the results for nontrivial demography are not presented in the manuscript text, but appear here and in Supplemental File SF1.

S3.1.6 Diagram 3 (left)
Constant size: Exponential growth: Oscillating size: Note that the above expression is proportional to 1 4N max N min s ′ , which is part of the inverse of the population-scaled effective selection coefficient (i.e., the effective drift coefficient 1 2N osc multiplied by 1 s ′ ) averaged over one cycle, γ ′ osc = 2N osc s ′ = 4N max N min s ′ N + osc . I have chosen not to express this result in terms of γ ′ osc , as this can also be expressed in terms of the inverse of the scaled selection coefficient for the initial (or maximum) size γ ′ max = 2N max s ′ , if preferred. Similarly, all results for a cyclical population in this section are left in terms of 1 4N max N min s ′ for generality.

S3.2 Diagrams for the second cumulant
As is the case for the first (lowest-vertex order) corrections (D2 and D3) to the one-vertex approximation for the mean frequency (given by the deterministic equation), Diagrams D4 and D5 represent the first corrections to the variance of the allele frequency probability distribution. These diagrams are only computed for a population of constant size, but are computed for the three specified demographies here and in Supplemental File SF1. Unlike D2 for the mean frequency, all of the following diagrams represent drift-dependent contributions, as drift is inextricably incorporated into the variance of the frequency distribution, and thus the homozygosity.

S17
Bottleneck: Oscillating size: The function f osc D6L in the latter part of the above expression, used only to simplify the result, has the following dependence on time, s ′ , and the parameters that define N osc .

S19
Oscillating size: Here, sinh and cosh are the hyperbolic sine and hyperbolic cosine functions, respectively.

S3.3 Diagrams for the third cumulant
The counting factors for these diagrams are less apparent than for the above diagrams. Here, the total multiplicative factor is 2!×3!×2 2! = 2×3!. The factor of 2! is from the choice of the order of N 3 vertices to connect to the terminal (p 0 or µ) vertex, which is cancelled by a repeated vertex factor of 2! in the denominator.
The factor of 3! denotes the number of ways to connect edges exiting the internal vertices to the three external vertices. The additional factor of two comes from the choice of whichp exiting the first N 3 vertex is connected to the internal (N 3 ) vs. external vertex. This final factor is generally the hardest to identify, as it doubles the number of allowed topologies without looking particularly visually symmetric. In contrast, note that both edges exiting the second N 3 vertex are connected to external vertices, so the symmetry of this feature of the topology is already accounted for in the 3! associated with the choice of three external vertices. This example is also detailed in Appendix A2, which enumerates the allowed Wick contractions for Diagram D7. The counting factor is the same for both the left and right diagrams in D7, as well as in all demographies detailed below.

S3.4 Diagrams for the fourth cumulant
As with the third cumulant κ 3 (t), the counting factors for the above diagrams are less obvious than those for the lower moments. The same principle discussed for Diagram D7 applies to Diagram D8: there is an overall multiplicative factor of 3!×4!×4

3!
, with the counting factor of 3! coming from the number of ways to order the N 3 vertices, along with the cancellation by the repeated vertex factor 1 3!, the factor of 4! comes from the number of ways to connect the internal to the four external vertices, and the factor of 4 comes from the two possible arrangements of the connections exiting first and second N 3 vertices, one of which connects to the subsequent N 3 vertex and the other to one of the external vertices. In contrast, the overall multiplicative factor for Diagram D9 is simpler: 3!×4! 3! . In this case, the counting factors only represent the number of ways to order the N 3 vertices and connect to the external vertices. The symmetry associated with the maximally connected diagram structure does not allow for additional counting factors from the choice of connections between internal and external vertices; the factor of 3! already accounts for the choices associated with internal vertices and all external vertices are connected to the later-time N 3 vertices, with all possible choices of connections to the external vertices covered by the counting factor of 4! (similar to the final N 3 vertex in D7 or D8). Finally, note that the time dependence of the three coupling constants 1 4N (t i ) (for i ∈ {1, 2, 3, 4}) associated with the time of the three N 3 vertices (to be integrated over) in D8 and D9 substantially complicates the integrals for the bottleneck and cyclical demographies.

Constant size
Oscillating size

S3.4.2 Diagram 8 (right)
Constant size The functions f BN D8R and g BN D8R , used to simplify the the above expression, are defined below.

Oscillating size
Note that the following expression is extremely tedious, but is shown here for completeness. Any application of this expression should use the corresponding result in Supplemental File SF1.
The functions A osc D8R , B osc D8R , C osc D8R , and D osc D8R , defined below, are the oscillatory components of the above expression for κ 4D8R in a cyclical population.
Note that, despite the difference in Heaviside functions, due to the different topologies of Diagrams D8 and D9, and an overall multiplicative factor, the following integrals have the same exponentiated time dependence and the same temporal coordinates in the time-dependent population size. The two left (D8L and D9L) and two right (D8R and D9R) diagrams also have identical vertices, but distinct topologies, as discussed in the main text. As a result, after integration, Diagrams D8 and D9 have identical contributions to the fourth cumulant, up to differing overall constants.

S4 Comparison of analytic approximations to simulations over a range of parameters
Throughout this section, which compares the PGFT-based approximations discussed in the main text to neqPopDynx simulations, solid curves represent analytic approximations for the quantity of interest (e.g., third moment, second cumulant) and simulations of the same parameters are shown as a collection of timedependent points in a darker shade of the same color used for the analytic curve. Curves with the strongest selection (s = 10 −1 2 ) are shown in red and those for the weakest selection coefficient (s = 10 −5 ) are shown in purple. Additionally, note that, in all cases, the strongest selection curves deviate slightly from the simulated results because s = 10 −1 2 ≈ 0.316 is large enough that the linear approximation of fitness in the selection coefficient becomes inaccurate (i.e., 1 − e −s = 0.271 < 0.316 = s). In comparison, even at s = 0.1, the linearized approximation is quite accurate: 1 − e −s ≈ 0.095. I have left the corresponding curves in their linearized form with the intention of plotting the functional form of the relevant PGFT approximations as expressed in the manuscript. If, instead, the dependence on s is replaced with 1 − e −s , that analytic approximations in the extremely strong selection regime (s = 10 −1 2 ) are closer to the simulated results in all parameter regimes explored in this analysis (i.e., for any moment in any demography with any combination of demographic parameters).
For each demography, all plots are presented on a log-log scale (in base ten) to allow for easier visualization of dynamics that span many orders of magnitude; the simulations are roughly evenly spaced in generation time such that the density of points is greater at later values due to the logarithmic generation time axis. In presenting plots on a logarithmic scale, the accuracy or inaccuracy of an analytic expression can be judged with respect to the order (or half order) of magnitude due to the logarithmic spacing, rather than with respect to small deviations, which would be difficult to observe due to the stochastic noise in simulation results. Each simulated point plotted is the result of computing expectation values over L = 10 5 independent sites to characterize cumulants and moments of the allele frequency probability distribution (e.g.,

S4.1 Constant population size
For a population with a constant size of 2N haploid individuals, two parameters are varied in this section: the population size N , and the mutation rate µ (while maintaining µ b = µ 10 2 except where otherwise specified). For each parameter combination of (N , µ), the same range of selection coefficients will be presented, s ∈ [10 −5 , 10 −1 2 ], but coefficients near the value where the strong selection approximation breaks down (at roughly 2N s ′ ∼ 5) will be chosen appropriately for the corresponding population size such that 2N s ∈ {3, 5, 10}. Finally, the initial frequency p 0 will be varied while fixing fixing all other parameters (N = 10 4 , µ = 10 −8 , µ b = 10 −10 for a range of selection coefficients) to investigate the regime of validity of the approximations detailed in the main text; a breakdown is expected at some p 0 , but the numerical value is unclear from the perturbation series of the partition functional (but see discussion of the equilibration time for constant N , where p 0 > λ ′ for moments above the mean frequency). All moments will be shown for each parameter choice for comparison. Simulations with lower mutation rates (µ = 10 −8 ) appear to be more noisy at asymptotically late times (i.e., after equilibration); both the variance and mean of each moment are linear in the mutation rate such that the ratio of the standard deviation to the mean is proportional to µ −1 2 .

S4.1.1 Constant demography: time dependence of the population size
For a population of constant size, the time dependence is trivial. Analytic and simulation results in the following figures are shown in the same order as the population size plots below, from top to bottom: N = 10 4 , 10 3 , and 10 2 . Note that the left and right columns in Figure S1 are identical; each row shows results for the same demography but with distinct mutation rates µ = 10 −6 (left column) and µ = 10 −8 (right column). Back mutation rates are set to a fixed fraction of the mutation rate µ b = µ 100 in all cases.

S4.1.2 Constant demography: mean frequency p(t) to one-vertex order
The lowest order approximations to the mean frequency p(t) ≡ E[p(t)], the first cumulant and first moment of the allele frequency probability distribution, are plotted below for simulations with various population size and mutation rate combinations. These solutions are equivalent to the solution to the deterministic equation for allele frequency as a function of the initial frequency p 0 , the mutation rates µ and µ b , and the (purifying) selection coefficient s. Each plot shows the dynamics of a range of selection coefficients, with a higher density of selection coefficients near the breakdown at roughly s 5 2N . The strong selection approximation breaks down at larger selection coefficients for smaller population size. Reduction of the mutation rates (right column) results in more variance of the mean, but the analytic approximation presented remains equally valid. For smaller population size, simulations become discretized under very strong selection when approaching mutation-selection balance at late times.

S4.1.3 Constant demography: mean frequency p(t) to three-vertex order
The lowest-vertex order (3V) corrections to the one-vertex approximation of the mean frequency p(t) are shown below. Note that the corrections are very small in comparison to the leading order expression due to suppression by an additional factor of p 2 0 , λ ′ 2 = µ 2 s ′ 2 , 1 γ ′ = 1 (2N s ′ ) 2 , or some combination of these perturbative parameters. The effect of these corrections are clearest in the plots for the smallest population N = 10 2 , where there is less deviation from simulated values in the curves with 2N s = 3, 5, 10 than seen in the analogous one-vertex mean plots above.

S4.1.4 Constant demography: variance κ 2 (t) to two-vertex order
Plots of the lowest order (two-vertex) approximation to the second cumulant, the variance κ 2 (t) of the allele frequency probability distribution, are presented here for the same range of parameters as above. The variance (computed from simulations as E[(p − p) 2 ]) is very close to the homozygosity E[p 2 ] (shown below), since the square of the mean frequency only meaningfully alters the time-dependence when the variance is very small. The variance is initially zero at the initial delta function (unable to be seen on log-log plots) and equilibrates to values λ ′ 2γ ′ .

S4.1.5 Constant demography: homozygosity E[p 2 (t)] to two-vertex order
The lowest-vertex order (2V) approximation to the homozygosity is shown below using the same parameters as above. The homozygosity, computed as κ 2 (t) − (E[p(t)]) 2 , is initially p 2 0 due to the delta function initial condition and equilibrates to roughly the same asymptotic value as the variance for the parameter combinations presented below.

S4.1.6 Constant demography: homozygosity E[p 2 (t)] to three-vertex order
The lowest-vertex order (3V) corrections to the two-vertex approximation to the homozygosity is shown below using the same parameters as above. The corrections are again small in comparison to the leading order terms, but become meaningfully closer to the simulation results when the population size is small. Importantly, these corrections fail catastrophically when the perturbative parameters (most relevantly, λ ′ and 1 γ ′ ) become greater than one (see curves for the weakest selection coefficients below). This occurs because the three-vertex terms, which are strictly negative, grow larger than the two-vertex terms; this can be controlled by adding higher-vertex corrections, which overwhelm the three-vertex terms, but the approximation remains invalid in this regime unless an infinite series of corrections are computed.

S4.1.7 Constant demography: third cumulant κ 3 (t) to three-vertex order
Below are plots of the lowest-vertex (3V) approximation to the time-dependence of the third cumulant of the allele frequency probability distribution κ 3 (t), the non-standardized skew, for the same parameters.

S4.1.8 Constant demography: third moment E[p 3 (t)] to three-vertex order
Below are plots of the lowest-vertex (3V) approximation to the time-dependence of the third non-central moment E[p 3 (t)] for the same parameters. Like the homozygosity, the third moment is very similar to the third cumulant other than at the initial time when it has a value of p 3 0 from the delta function initial condition.

S4.1.9 Constant demography: skew K 3 (t) to three-vertex order
Here, the time dependence of the lowest-vertex (3V) order approximation to the skew of the allele frequency probability distribution K 3 (t) is plotted. Note that, when the approximation describes the dynamics (for 2N s > 5), the skew is independent of the strength of selection at asymptotically late times. The equilibrium value 1 √ λ ′ γ ′ = 1 √ µN is dependent only on the population scaled mutation rate θ = 4N µ (compare the left to right columns for each row below). Note that, for low mutation rate and very small population size, the skew fails to match analytic expectations due to discretization of the third cumulant, which can be seen in the Figures 9D and 9F below.

S4.1.10 Constant demography: fourth cumulant κ 4 (t) to four-vertex order
Below are plots of the lowest-vertex (4V) approximation to the time-dependence of the fourth cumulant of the allele frequency probability distribution κ 4 (t), the non-standardized excess kurtosis, for the same parameters. This is the first cumulant that differs from the central moment of the same order (i.e., the non-standardized kurtosis); all higher cumulants differ, as well.

S4.1.11 Constant demography: fourth moment E[p 4 (t)] to four-vertex order
The lowest-vertex order approximation to the time dependence of the fourth non-central moment E[p 4 (t)] is plotted below for the same parameter combinations. Again, the fourth moment is initially p 4 0 at the delta function initial condition, but equilibrates to roughly the same value as the fourth cumulant.

S4.1.12 Constant demography: excess kurtosis K 4 (t) to four-vertex order
Here, the time dependence of the lowest-vertex (4V) order approximation to the excess kurtosis of the allele frequency probability distribution K 4 (t) is plotted. Like the skew, when the approximation describes the dynamics (2N s > 5), the excess kurtosis is independent of the strength of selection at asymptotically late times. The equilibrium value 3 λ ′ γ ′ = 3 2N µ is dependent only on the population scaled mutation rate θ = 4N µ (compare the left to right columns for each row below). As with the skew, for low mutation rate and very small population size, the excess kurtosis fails to match analytic expectations due to discretization of the fourth cumulant, which can be seen in the Figures 12D and 12F below.

S4.2 Exponential population growth
For a population undergoing exponential growth as parameterized in Equation S4, the plots in this section are shown for varying values of the initial population size N 0 and exponential growth rate Γ. As seen in the plots for constant sized populations in the above section, the primary effect of reducing the mutation rate µ (again keeping the back mutation rate µ b fixed) is an increase in the relative amount of noise for each moment, since the equilibrated cumulants and moments are linear in the mutation rate at late times and thus equilibrate to lower quantitative values (the variance of each cumulant is also proportional to µ at late times such that the ratio of the standard deviation to the mean of each cumulant is proportional to √ µ).
This can be seen, for example, in the comparison between the fourth cumulants in Figure S13 below, where the mutation rate in Figure S13a, but all other parameters are held fixed.
(a) Example κ 4 with µ = 10 −6 , µ b = 10 −8 (b) Example κ 4 with µ = 10 −8 , µ b = 10 −10 Figure S13: The effect of reducing the mutation rate is an increase the amount of noise relative to an exponentially growing population with the same parameters but increased mutation rate µ (with back mutation rate µ b = µ 10 2 ). Here, the fourth cumulant is compared for different mutation rates in a population with exponential growth. The first, second, and third cumulants and moments and the fourth moment are affected in the same way.
In this section, figures are presented for a variety of parameter combinations of (N 0 , Γ) and the same range of selection coefficients used for populations with constant size; selection coefficients between 10 −5 and 10 −1 2 , along with selection coefficients chosen for each demography with 2N s = 3, 5, and 10. To reduce the number of figures, plots will only be shown comparing simulations to analytic curves with mutation rate µ = 10 −8 for all demographies and µ = 10 −6 for one value of the initial population size, N 0 = 10 4 , for visual comparison; moments computed from all simulations with µ = 10 −6 showed roughly the same shape with reduced noise and a simulated time-series nearly indistinguishable from the analytic curve in the regime of validity for the analytic approximations (2N 0 s ′ > 5). The one exception to this is for selection coefficients very close to the poles at s ′ = Γ (or γ ′ exp = 0), 2s ′ = Γ, etc., which show the clearest deviation in the fourth cumulant and non-central moment. The example parameter combinations shown in Figure S13 were chosen to show the effect of selection coefficients in the neighborhood of the poles in the analytic expression for κ 4 (t N exp ). In both plots, the curves for selection coefficient s = 10 −3 , which is very close in value to s ′ = (s + µ + µ b ) = 10 −3 (i.e., the value when Γ = s ′ in the above plots), deviate wildly from the controlled analytic behavior of other values of strong selection. This behavior is due to the nearby pole where the approximation is no longer well-described. The third cumulant shows similarly pathological behavior in the neighborhood of one of the poles. Although a pole is present in the second cumulant under exponential population growth, the approximation is well controlled even for values close to s = (Γ − µ − µ b ) (see Figure  S16). However, the third and fourth cumulants (along with the corresponding moments, the skew, and the excess kurtosis) appear to be more sensitive to this issue at selection coefficients close to the pole, likely due to the quadratic and cubic dependence on 1 γ ′ exp in the third and fourth cumulants, respectively, rather than the linear dependence appearing in κ 2 (i.e., κ 3 and κ 4 have higher order divergences than κ 2 ). For the following simulations, all moments and cumulants are computed as an average of L = 10 5 independent sites evolving through the same demography.

S4.2.1 Exponential demography: time dependence of the population size
The following log-log plots show the time dependence of the population size under exponential growth for all combinations of the initial population size N 0 = 10 4 (top two rows), 10 3 (third row), 10 2 (fourth row) and growth rate Γ = 10 −4 , 10 −3 , 10 −2 (from left to right). Analytic and simulation results will be presented in the same order as the time-dependent population sizes. The top two rows are identical in the figure below; the first row of all subsequent plots of cumulants and moments will be shown with mutation rates µ = 10 −6 , µ b = 10 −8 in the same demography as the second row with µ = 10 −8 , µ b = 10 −10 . The second, third, and fourth rows of plots will show results for various demographies with mutation rates µ = 10 −8 , µ b = 10 −10 .

S4.2.2 Exponential demography: mean frequency p(t) to three-vertex order
Since the lowest order approximation of the mean frequency p(t) under strong selection is independent of the population size, and thus the demography, plots for an exponentially growing population are identical to those in Figure S2 up to the number of simulated generations. The figure below shows the three-vertex approximation of the mean frequency, with three-vertex corrections to the one-vertex deterministic solution.
The effect of the demography on strong selection is infinitesimal. Simulations for weak selection show a marked difference from the lowest order approximation, but are not well-described by the analytic approximations presented in this manuscript. For all following subfigures in this demography, the mutation rate will be annotated, but the back mutation rate µ b (where µ b = µ 10 2 in all cases) will remain implicit.

S4.2.3 Exponential demography: variance κ 2 (t) to two-vertex order
For the variance, the second cumulant of the allele frequency probability distribution, the analytic approximation remains accurate beyond the strong initial population size regime (2N 0 s ′ > 5) due to rapid growth decreasing the effects of drift over time. This breaks down for slow exponential growth (Γ = 10 −4 ) for small initial population size. If the growth rate is sufficiently high (Γ ≳ 10 −2 ), the analytic approximations under strong selection apply for weak selection even in populations starting at very small sizes (e.g., N 0 = 100). For high mutation rates (µ = 10 −6 ) and relatively large initial size (N 0 = 10 4 ), the analytic approximation for E[p 2 (t)] is extremely accurate independent of the strength of selection and the rate of population growth. The associated dynamics and regime of validity apply equally well to the homozygosity, with differences only in the initial and final states, which vanish for the variance.

S4.2.4 Exponential demography: homozygosity E[p 2 (t)] to two-vertex order
The figure below depicts comparisons between the two-vertex approximation of the homozygosity, the second non-central moment of the allele frequency probability distribution, to simulations of the same moment. Like the variance, the homozygosity is drastically affected by rapid population growth and takes a highly nonmonotonic path through time: initially the homozygosity takes on a value of p 2 0 dictated by the delta function initial condition, then reaches a maximum at early times (see discussion of t κ2 max in the manuscript text), which is followed by reduction to a pseudo-equilibrated state (roughly until the e-folding time t ∼ 1 Γ) prior to eventually reaching an equilibrium value of ( µ s ′ ) 2 at asymptotically late times. The pseudo-equilibrated state is similar to the value of the homozygosity in a population of constant size N = N 0 (the initial size) and only appears for sufficiently strong selection such that the equivalent of the equilibration time in a constant population is faster than t ∼ 1 Γ (i.e., before the population size is dramatically changed).

S4.2.5 Exponential demography: third cumulant κ 3 (t) to three-vertex order
Much of the dynamics of higher cumulants echo the variance. In plots of the third cumulant of the allele frequency probability distribution below, there is a clear maximum at early times, a pseudo-equilibrated state until the e-folding time at t ∼ 1 Γ when the growth rate overtakes this and rapidly drives the third cumulant to zero (corresponding to the deterministic delta function distribution at asymptotically late times). Note that the overall scale of the third cumulant is substantially smaller than that of the second cumulant, as expected. The analytic approximation is extremely accurate for high mutation rates, and applies to weaker selection coefficients due to the exponential decay of the drift constant as the population size grows.

S4.2.6 Exponential demography: third moment E[p 3 (t)] to three-vertex order
The third non-central moment of the allele frequency probability distribution mimics the behavior of the homozygosity, analogous to the discussion of the third cumulant above. Like the second moment, the third moment under exponential growth asymptotes to a finite value, approaching the third moment of a delta function distribution at late times as the effects of drift are reduced.

S4.2.7 Exponential demography: skew K 3 (t) to three-vertex order
The dynamics of the skew in an exponentially growing population drastically differ from the skew in a constant sized population (see Section S4.1.9). The pseudo-equilibrium regime is not quite flat here, indicating that even after rapidly equilibrating under strong selection, the skew slowly decays until the e-folding time, after which it rapidly vanishes. Under many of the demographies shown herein, the dynamics under weak selection decouple and reach a different pseudo-equilibrium at late times. At the end of the plots shown below, the population sizes are so large (e.g., roughly 10 12 diploid individuals for demographies in the right column) that growth likely becomes unsustainable in most organisms; if the population size saturates, the pseudo-equilibrium shown for weak selection likely equilibrate in this state.

S4.2.8 Exponential demography: fourth cumulant κ 4 (t) to four-vertex order
The dynamics of the fourth cumulant under exponential expansion are largely the same as for the third cumulant, mimicking the second moment, but on an overall reduced scale. See description in Section S4.2.7.

S4.2.9 Exponential demography: fourth moment E[p 4 (t)] to four-vertex order
As with the fourth cumulant, the fourth non-central moment of the allele frequency probability distribution under exponential expansion is largely controlled by the dynamics of the second moment, except in overall scale, which is reduced, and a much more rapid decay (see Figure S23 for comparison after standardization).

S4.2.10 Exponential demography: excess kurtosis K 4 (t) to four-vertex order
The dynamics of the excess kurtosis are similar to those of the skew under exponential growth. Note that, like the skew, an intermediate value of selection (e.g., s = 10 −2 in the top right plot) appears to interpolate between the short-term pseudo-equilibration for strong selection and the long-term equilibration for weak selection, maximizing at a later time when the strong selection curves have begun to decay. The weak selection curves are again well-described for sufficiently rapid exponential growth (Γ = 10 −2 ), including the late-time equilibration where the rate of decay of the fourth cumulant matches that of the square of the second cumulant. Note that very small initial population size can limit estimation of the skew and excess kurtosis in simulations due to discretization (also seen for constant population size in Figures S9 and S12); this effect is minimal under rapid growth (Γ ≥ 10 −2 ).

S4.3 A population bottleneck and re-expansion
The effects of a temporary population bottleneck range from very subtle to extreme depending on the difference in drift due to the reduced population size. The parameter controlling the response of variation under selection to a bottleneck is the intensity of the bottleneck I b , which is proportional to both the strength of drift during a bottleneck (i.e., the drift parameter 1 2N b ) and the bottleneck duration T b (see, e.g., [5] for a discussion of the role of I b on the mean frequency).
Bottlenecks with low intensity, i.e., short bottlenecks with a relatively small decrease in population size from the initial, pre-bottleneck size, represent a scenario where the bottleneck provides a small perturbation to the equilibrium frequency distribution [5]. On the other hand, if a bottleneck imposes a large change in population size (e.g., a two order of magnitude difference in population size between N b and the initial size N 0 ), and is prolonged relative to the drift constant (i.e., Since the analytic approximations provided in this manuscript require 2N (t)s ′ ≫ 1 to be valid descriptions of the non-equilibrium dynamics, selection must remain strong within the bottleneck (i.e., 2N b s ′ ≫ 1) if the bottleneck intensity is very strong. As a result, some of the plots below (namely those with N b = 10 2 ) show large deviations from the field theoretic expectations (including for the mean frequency) for many selection coefficients, especially when the bottleneck intensity is large I b ≥ 1. Additionally, even for small bottleneck sizes, if the intensity remains very low ( 2N 0 ), the approximations remain valid due to the perturbative effects of drift. Due to the subtlety of the effects on various moments, I have chosen to show all bottleneck plots with mutation rates µ = 10 −6 and µ b = 10 −8 to limit the variance around each curve. As discussed above, decreasing the mutation rate (e.g., µ = 10 −6 → 10 −8 ) increases the relative noise and reduces the equilibrium values (e.g., the mean µ s ′ is reduced if µ is lower). The plots below, therefore, provide clearer expected values for each cumulant and moment. Since there are three demographic parameters for the bottleneck demography shown in S5, the plots below show all combinations of two values of the bottleneck population size N b ∈ {10 2 , 10 3 } (impacting the strength of drift relative to selection and mutation), the duration of the bottleneck T b ∈ {10 2 , 10 3 } (impacting the bottleneck intensity), and the time when the bottleneck begins T i ∈ {10 3 , 10 4 } (impacting where in the equilibration process of the initial population parameters the bottleneck begins), all while keeping the initial population size N 0 = 10 4 fixed. As with the previous plots, the selection coefficients shown range from s = 10 −1 2 to s = 10 −5 , as well as demography-specific values 2N b s = {3, 5, 10} to illustrate the breakdown of the strong selection approximation. Note that when T i = 10 3 (left column of all plots below), the bottleneck occurs during the equilibration process of all but the strongest selection coefficients (i.e., for selection coefficients of roughly s ≳ 10 −2 ), which impacts both the equilibration itself and the response of the population to the bottleneck demography. In contrast, a bottleneck beginning at generation T i = 10 4 occurs when a slightly larger range of selection coefficients have already reached equilibrium (T i > t eq ). As with the previous simulations, moments and cumulants are computed as an average of L = 10 5 independent sites evolving through the same demography.
As a final note on the bottleneck plots in this section, some of the plots below show a breakdown in the numerical plotting software (Mathematica [6]) when plotting analytic expressions for cumulants or moments in the presence of a population bottleneck in logarithmic space; the breakdown occurs for a small number of generations immediately after the bottleneck. This is due to near-discontinuities associated with the sharp slope immediately after re-expansion from an instantaneous bottleneck (i.e., the square bottleneck) The nearly piecewise nature of the results can be difficult plot for strong selection coefficients, particularly when the bottleneck drift constant 1 2N b is relatively large (e.g., on the order of 1 2 × 10 −2 for N b = 10 2 ).
However, this appears to only be an issue when plotting bottlenecks at late time T i = 10 4 in logarithmic space. To show that the resulting expressions are indeed smooth, except at t = T i and t = T i + T b , and that this is only a plotting error, one can plot the temporal dependence of the same moment starting from a time near immediately before the beginning of the bottleneck. One example of such a discontinuity, appearing in Figure S24a for the time-dependence of the variance of the allele frequency probability distribution over time t ∈ (0, 10 4 ] as a diagonal line immediately after re-expansion for s = 10 2N b (lighter blue curve), is shown below.
(a) Variance κ 2 for N b = 10 2 , T b = 10 3 , T i = 10 4 (b) κ 2 with same parameters starting at t = 9990 The right plot, Figure S24b, shows the time dependence of the variance with the same parameters, but the plot begins 10 generations before the bottleneck (at t = 9990). From this plot, it is clear that the period after re-expansion is nearly flat for s = 10 2N b due to a swift equilibration, with an equilibrium value of κ 2 = λ ′ 2γ ′ . This indicates that the diagonal part of the left plot, Figure S24a is simply a numerical plotting error. The same kind of numerical error appears in several plots below for N b = 10 2 , T i = 10 4 , but all of these plots should be viewed as effectively flat after re-expansion due to the rapid re-equilibration of strong purifying selection.

S4.3.1 Bottleneck demography: time dependence of the population size
The demography associated with a range of parameters for N b , T b , and T i are shown below, where N 0 = 10 4 , µ = 10 −6 , and µ b = 10 −8 remain constant. Results for these demographies are presented in this subsection in the same plot position as the corresponding demography.

S4.3.2 Bottleneck demography: mean frequency p(t) to three-vertex order
Below are plots of the mean of the allele frequency probability distribution p, shown here with the threevertex correction that accounts for a small amount of drift on the order of 1 2N s ′ , but breaks down when 1 2N b ≳ 1. Note that even the deterministic approximation of the mean breaks down at weak selection (2N b s ≤ 3) when the bottleneck intensity is large (I b ≫ 1 in Figures S26g and S26h).  Figure S27 shows plots of the two-vertex approximation of the variance of the allele frequency distribution κ 2 . The variance breaks down for high intensity bottlenecks (Figures S27g and S27h) because the variance equilibrates to the bottleneck population size N b and does not immediately decay for selection coefficients at or below 2N b s = 3. Note that, even for bottlenecks with intensity I b = 1 2, the analytic approximation remains accurate until 2N 0 s ∼ 3, indicating that the bottleneck is a perturbative effect and the regime of validity is determined by the initial population size. Figure S27: Time dependence of variance of the allele frequency probability distribution κ2for simulated populations with a population bottleneck and subsequent re-expansion. Parameter combinations correspond to those shown in Figure S26.

S4.3.4 Bottleneck demography: homozygosity E[p 2 (t)] to two-vertex order
The two-vertex approximation for the homozygosity E[p 2 ] is plotted below for an allele experiencing a population bottleneck and re-expansion. The dynamics are largely the same as for the variance, above, including the regime of validity of the analytic approximation. Note that, in both the variance and homozygosity, there is a numerical breakdown in the plotting software (appearing as diagonal lines after re-expansion) for plots with N b = 10 2 , T i = 10 4 (Figures S28f and S28h) immediately after the bottleneck, but the analytic approximations match simulations before, during, and after the bottleneck (after numerical issues).

S4.3.5 Bottleneck demography: third cumulant κ 3 (t) to three-vertex order
The lowest-vertex order (3V ) approximation to the third cumulant of the allele frequency distribution κ 3 is plotted below. The dynamics are largely controlled by the variance and the regime of validity is the same. Note that, as for the second cumulant and moments, if the bottleneck has sufficiently large intensity, the population equilibrates within the bottleneck for all selection coefficients, and is slow to decay after re-expansion when selection is weak (roughly when 2N 0 s ≲ 3).

S4.3.6 Bottleneck demography: third non-central moment E[p 3 (t)] to three-vertex order
Plots of the third non-central moment of the allele frequency distribution E[p 3 ] are shown below for the same range of bottleneck demographies. Again, the high intensity bottlenecks shown in the last row of plots deviate substantially from the analytic approximation for strong selection, but only when 2N b s ≤ 3.

S4.3.7 Bottleneck demography: skew K 3 (t) to three-vertex order
Unlike the previous moments, plotted above, the skew of the allele frequency probability distribution shows some deviation from the analytic expectation for less severe bottlenecks with N b = 10 3 . However, this happens at and below roughly 2N b s = 3, where the strong selection approximation is expected to fail. The contrast exists only because the previously plotted analytics aligned to simulated moments and cumulants for N b = 10 3 beyond the expected range of validity. The effect of the bottleneck is to increase the skew to a new value of 1 √ N b µ, which is sustained for bottlenecks with high intensity. Figure S31: Time dependence of the skew of the allele frequency probability distribution K4 for simulated populations with a population bottleneck and subsequent re-expansion. Parameter combinations correspond to those shown in Figure S26.

S4.3.8 Bottleneck demography: fourth cumulant κ 4 (t) to four-vertex order
The fourth cumulant of the allele frequency probability distribution κ 4 is plotted below. This is again welldescribed by the analytic approximations, sans numerical issues, in the first three rows of demographies. In the last row, the approximation again breaks down for selection coefficients below 2N b s ∼ 5. Figure S32: Time dependence of the fourth cumulant of the allele frequency probability distribution κ4 for simulated populations with a population bottleneck and subsequent re-expansion. Parameter combinations correspond to those shown in Figure S26.

S4.3.9 Bottleneck demography: fourth non-central moment E[p 4 (t)] to four-vertex order
The fourth non-central moment E[p 4 ] is again very similar to the corresponding cumulant, κ 4 .

S4.3.10 Bottleneck demography: excess kurtosis K 4 (t) to four-vertex order
The excess kurtosis of the allele frequency probability distribution, like the skew, approaches a new value of 3 2N b µ during the bottleneck, which independent of selection coefficients provided 2N b s ≳ 5. This indicates that the tails of the distribution temporarily have more mass such that, even for highly deleterious alleles, the distribution may harbor variation at higher frequency than in the initial population. Figure S34: Time dependence of the excess kurtosis of the allele frequency probability distribution K4 for simulated populations with a population bottleneck and subsequent re-expansion. Parameter combinations correspond to those shown in Figure S26.

S4.4 Cyclical demography with oscillatory population size
Cyclical populations are comparatively less well studied than more ubiquitous and transient features like exponential growth or population bottlenecks. However, such demographies are related to both rapid population growth and rapid decline; populations with regular, periodic oscillations in the number of individuals can be viewed as experiencing serial bottlenecks (see, e.g., [7][8][9]) at regular time intervals with subsequent re-expansion after each bottleneck. In this section, comparisons between simulations and the PGFT-derived closed form analytic approximations are shown for one class of cyclical demographies defined by the deterministic time-dependent oscillatory function in Equation S6 for N osc (t) diploid individuals. This smooth, continuous parameterization alleviates some minor lingering effects from the discontinuous Heaviside functions used to model a square bottleneck (see N BN (t) in Equation S5), such as the numerical plotting errors seen in Figure S24. The cyclical diploid population size N osc (t) is parameterized by three constants: the maximum population size N min , the minimum size N min , and the period of oscillation τ. The cosine dependence in N osc (t) describes populations that are initialized with the maximum number of individuals N (t 0 = 0) = N max and return to this size every τ generations thereafter (i.e., N (t = nτ) = N max for all integer values of n) and reach the minimum population size N min halfway between each maximum value (i.e., at each time point t = τ 2+nτ for all integer values of n). Importantly, this parameterization only meant to serve as one class of examples that aids in evaluating the performance of the analytic approximations; the choice of initial conditions is tunable and not intended to hold general biological significance (e.g., replacing the cosine with a sine function such that N (0) = N min results in more intense drift at early times and potentially different transient dynamics) and, if desired, a phase parameter can be introduced by taking ϕ(t) → ϕ(t) + const. in Equation S6 to generalize the initialization condition. As with the square bottleneck, I have chosen to reduce the parameter space in this section by fixing the initial value of N max = 10 4 , which limits the number of plots presented. In each figure, results are shown for two values of N min = N max 2 = 5 × 10 3 and N min = N max 10 = 10 3 , which describe moderate and significant oscillation amplitudes, respectively. The period is varied over three decades: results are shown for oscillations lasting τ = 10 2 , 10 3 , or 10 4 generations. Additionally, results are shown for mutation rates µ = 10 −8 and 10 −6 (again fixing µ b = µ 100 to limit the parameter space). Each moment and cumulant is computed by averaging over L = 10 5 simulated sites. In the figures below, each subfigure represents one parameter combination (N min , τ, µ) with fixed values (N max = 10 4 , µ b = µ 100 , L = 10 5 ) and shows results for a range of deleterious selection coefficients s ∈ [10 −5 , 10 −1 2 ].
As discussed in the previous sections, a lower mutation rate introduces comparatively more noise for a fixed target size L. Accordingly, the µ = 10 −6 simulations in the following figures provide the clearest comparisons between simulated and theoretical results. In this section, I have chosen to show one row of plots (three parameter combinations) with the lower mutation rate (µ = 10 −8 ) because, unlike the constant size and exponential demographies, the subtlety of some of the oscillatory dynamics can be almost entirely masked when the noise is increased. This suggests that the detection of oscillations in the population size from empirical data may be power-limited in some parameter regimes, resulting in the potential misspecification of the underlying population model unless an observable can be found that is both sensitive to oscillations in the population size and robust to noise due to low target size (or summarizes data from ≳ 10 6 sites).
Finally, as detailed in the manuscript, the late time asymptotic dynamics of the allele frequency probability distribution under cyclical population size are not quite those of steady state mutation-selection-drift balance. instead, for some parameter combinations, the distribution can oscillate around a shape determined by N osc , the effective population size (harmonically) averaged over one period. As one might expect, the allele frequency probability distribution expands and contracts on a time-scale determined by τ in a nonequilibrium waltz, and all cumulants above the mean (κ n for n ≥ 2) respond in turn. To identify the regime of validity of the analytic approximations for each moment, simulations were performed using selection coefficients 2N osc s = 3, 5, 10 (in the regime where s ′ ≈ s), as this is the appropriate scale that emerges from the asymptotic time-dependence of the variance (see manuscript text). Perhaps surprisingly, the speed of oscillations does not appear to affect the accuracy of the PGFT approximations (the analytic time-dependence fully describes the transition to a coarse-grained regime under rapid oscillations with s ′ τ ≪ 1), which faithfully track the dynamics while selection remains strong and break down roughly when s ≳ 5 2N osc . This provides a satisfying and intuitive analogy to the regime of validity for the dynamics of a population with constant size. Note that, as was the case for all previously discussed demographies, each cyclical demography plot is presented in logarithmic time. This may be particularly unintuitive for this demography, as the regular periodicity is obscured. However, this allows for better assessment of the impact of the oscillatory behavior on the transient dynamics at early times (i.e., prior to the waltz), which are much less intuitive than the resulting asymptotic dynamics, but appear to be theoretically well-described in the same regime of validity.

S4.4.1 Cyclical demography: time dependence of the population size
The following log-log plots show the time dependence of the population size in a cyclical populations spanning a range of parameters. These populations experience regular, periodic oscillations in size of varying amplitudes and periods. Analytic and simulation results in the following section are presented in the same order as the time-dependent population sizes in Figure S35. Each demography begins at an initial size of N (0) = N max = 10 4 diploid individuals and returns to this size at each multiple of τ generations; simulations are presented for periods of τ = 10 4 , 10 3 , and 10 2 generations, shown from left to right. The minimum population size occurs at time τ 2 (and thereafter at generations t = nτ + τ 2 for integer values of n) and is shown for two minimum population sizes: N min = 5 × 10 3 (= N max 2) in the top two rows and N min = 10 3 (N max 10) in the bottom row of plots. The first and second rows from the top are identical in the figure below; the first row of all subsequent plots of cumulants and moments will be shown with mutation rates µ = 10 −8 , µ b = 10 −10 in the same demography as the second row with µ = 10 −6 , µ b = 10 −8 . Lower mutation rates are shown for contrast, but obscure the performance of the analytic approximation, as the simulated stochastic noise partially masks the oscillatory behavior.

S4.4.2 Cyclical demography: mean frequency p(t) to three-vertex order
Plots of the first non-zero correction to the deterministic approximation for the mean of the allele frequency probability distribution p(t) are presented below for various cyclical demographies. The top row, representing the same demography as the middle row, but with lower mutation rate µ = 10 −8 , is substantially noisier, as discussed in Section S4.1. The higher order correction to the analytic approximation of the mean changes little; the deterministic approximation is highly accurate, despite oscillating population size. However, as can be seen in Figure S36g, the mean does mimic the oscillatory behavior and change appreciably for weaker selection coefficients; this effect is captured by the analytic approximation for selection coefficients just above and around the lower limit of the strong selection approximation (e.g., for s = 5 2N osc and 10 2N osc in Figure  S36g). This effect is present in Figures S36h and S36i but is less visible due to the speed of oscillations. This magnitude of change in the mean frequency is minimal (this is a subdominant effect captured by the three-vertex corrections) and is completely masked by noise when mutation rates are reduced. Oscillations in population size also transiently increase the variance of the mean frequency, an effect most clearly visible in Figures S36g and S36h due to the more severe population size reduction. This effect is the same as can be seen during a bottleneck, but is more pronounced due to the repetitive population size changes. ].

S4.4.3 Cyclical demography: variance κ 2 (t) to two-vertex order
Plots of the lowest-order approximation to the variance, the second cumulant, of the allele frequency probability distribution κ 2 (t) are presented below for various cyclical demographies. As seen for constant and exponentially growing population sizes, the variance exhibits a transient increase and is maximized at early times. However, unlike these demographies, but similar to a bottleneck, this may not be the globally maximum variance if the reduction in population size is severe. In that case, the maximal variance seems to occur during a trough in population size one or more periods after the first minimum at t = τ 2; this is more likely when s is on the order of or less than than the inverse of the period, resulting in variance maximization once per period during the waltz. The dependence on s ′ τ is evident in the expression for the variance, but has a complicated functional form. In contrast to the mean frequency, the variance is driven by the oscillatory behavior, which is evident from the ϕ(t) dependence at all times, including asymptotically, in the lowest order approximation. Note that, despite drift playing a dominant role in the dynamics of the variance, the perturbative approximation captures the oscillatory behavior impressively well-even at lowest order-lining up to simulations nearly exactly in the regime of validity.

S4.4.4 Cyclical demography: homozygosity E[p 2 (t)] to two-vertex order
Plots of the lowest-order approximation to the homozygosity E[p 2 (t)], the second non-central moment of the allele frequency probability distribution, are presented below for various cyclical demographies. As seen previously, the homozygosity is largely driven by the dynamics of the variance. The notable difference is in the initial value, when the variance is zero due to the delta function describing the initial frequency distribution but the homozygosity is given by a constant value p 2 0 . This moment thus exhibits a transition between the mean-driven and variance-driven regimes, which occurs during the transient period. Note that, while the oscillatory behavior is confounded with the noise for low mutation rates (see, e.g., Figure S38a for µ = 10 −8 , it is still visible. The variance of the homozygosity, dependent on the first four cumulants, also exhibits clear oscillations even when mutation is comparatively rare. Note that, if selection is sufficiently strong, the homozygosity (and higher moments) can briefly pseudo-equilibrate to a metastable value determined by the initial population size, but this only occurs if the timescale of selection is much less than the period (i.e., sτ ≪ 1); this is analogous to pseudo-equilibration for strong selection coefficients under exponential growth, taking the form of a relatively flat trough at early times in cyclical demographies. This is clearest in Figure  S38g (with τ = 10 4 ) for s = 10 −1 2 and 10 −1 , but vanishes as the period is reduced in Figure S38i (with for τ = 10 2 ). ], for populations with oscillating population size. Parameter combinations correspond to those shown in Figure S36.

S4.4.5 Cyclical demography: third cumulant κ 3 to three-vertex order
Plots of the lowest-order approximation to the third cumulant of the allele frequency probability distribution κ 3 (t) are presented below for various cyclical demographies. As discussed previously, the dynamics of the third cumulant are tied to those of the second cumulant such that the time dependence is highly similar to plots of the variance, but on an overall reduced scale. Again, the analytic approximations line up exceptionally well to simulated results, capturing both the transient and oscillatory dynamics for a large range of selection coefficients. The third cumulant is only maximized during the transient period for low mutation rate and very strong selection (see, e.g., selection coefficients s ≥ 10 −2 in Figures S39a-S39c). As seen in the second cumulant and moment, pseudo-equilibration may occur at early times if sτ ≪ 1.

S4.4.6 Cyclical demography: third non-central moment E[p 3 (t)] to three-vertex order
Plots of the lowest-order approximation to the third non-central moment of the allele frequency probability distribution E[p 3 (t)] are presented below for various cyclical demographies. The third moment mimics the behavior of the third cumulant, which is in turn driven by the dynamics of the variance, as discussed in the context of other demographies. Like the homozygosity, the third moment is initially at a fixed value of p 3 0 due to the delta function, after which it exhibits transient behavior and eventually reaches the asymptotic waltz around the value determined by the effective population size N osc . Despite comparatively noisy estimation of this moment relative to lower moments, the analytic approximation captures the oscillatory strong selection behavior extremely well, independent of the periodicity. Again, pseudo-equilibration can occur at early times for strong selection coefficients sτ ≪ 1.

S4.4.7 Cyclical demography: skew K 3 (t) to three-vertex order
Plots of the lowest-order approximation to the skew, the standardized third cumulant, of the allele frequency probability distribution K 3 (t) are presented below for various cyclical demographies. This moment is particularly informative about the oscillatory dynamics of the distribution. Comparing the figures below to those for constant demography (see Figure S9), the skew approaches an oscillatory time-dependent value, rather than a constant, centered around the value that would be associated with 2N osc µ in a constant demography. This indicates that not all of the oscillatory behavior in the third moment is contained in the variance dependence; under strong selection, the skew rapidly pseudo-equilibrates as if in a constant population of size N max (= N osc (0)), but begins to oscillate on the timescale of the period τ due to the asymptotic waltzing of the distribution during the second period. Transient pseudo-equilibration only occurs for selection coefficients that are efficient in the initial population (2N max s ≫ 1), but also requires sufficiently rapid adjustment in a regime dictated by the period (sτ ≫ 1); otherwise, the asymptotic waltz is reached uninterrupted. Asymptotically, the skew oscillates with an appreciable amplitude that is monotonically (and postively correlated) with both the period and the selection coefficient. This indicates that, while the variance of the skew may remain tight and looks unrelated to the period or selection coefficient, time-averaging the skew during the late-time dynamics (i.e., ∫ c+τ c dtK 3 (t) beginning at some late time c ≫ τ) results in an observable sensitive to the amplitude of oscillations and thus correlated to both the selection period; the appreciable amplitude of oscillations over time will be converted into variance of the time-averaged skew. Also, see Section S4.4.10.

S4.4.8 Cyclical demography: fourth cumulant κ 4 (t) to four-vertex order
Plots of the lowest-order approximation to the fourth cumulant of the allele frequency probability distribution κ 4 (t) are presented below for various cyclical demographies. As discussed previously, the dynamics of the fourth cumulant are tied to the second cumulant such that the time dependence is highly similar to plots of the variance, butt, like the third cumulant, on an overall reduced scale. Lowest-order approximations to the forth cumulant line up exceptionally well to simulated results, capturing both the transient and oscillatory dynamics for a large range of selection coefficients. The dynamics of the fourth cumulant are largely the same as those for the third cumulant. As seen in the lower cumulants, pseudo-equilibration may occur at early times if sτ ≪ 1.

S4.4.9 Cyclical demography: fourth non-central moment E[p 4 (t)] to four-vertex order
Plots of the lowest-order approximation to the fourth non-central moment of the allele frequency probability distribution E[p 4 (t)] are presented below for various cyclical demographies. The fourth moment mimics the behavior of the fourth cumulant, which is in turn driven by the dynamics of the variance, as described above. Like the second and third non-central moments, the fourth moment is initially at a fixed value of p 4 0 due to the delta function, exhibits transient behavior, and dances an asymptotic waltz at late times around the value determined by the effective population size N osc . Despite the additional noise inherent in estimating higher moments, the analytic approximation captures the oscillatory strong selection behavior extremely well, independent of the periodicity. Again, a brief pseudo-equilibration occurs in the fourth moment for parameters sτ ≪ 1. ] for populations with oscillating population size. Parameter combinations correspond to those shown in Figure S36.

S4.4.10 Cyclical demography: excess kurtosis K 4 (t) to four-vertex order
Plots of the lowest-order approximation to the excess kurtosis, the standardized fourth cumulant, of the allele frequency probability distribution κ 3 (t) are presented below for various cyclical demographies. The qualitative dynamics of the excess kurtosis are similar to those for the skew, indicating that the discussion in Section S4.4.7) is applicable here (acknowledging that estimation of the excess kurtosis is inherently noisier). Comparing to the skew, the transient pseudo-equilibration (for selection coefficients 2N osc (0)s ≫ 1, sτ ≫ 1) appears to be more pronounced in the excess kurtosis. In addition to the monotonic relations between the time-averaged excess kurtosis ∫ c+τ c dtK 4 (t) (or time-averaged skew ∫ c+τ c dtK 3 (t)) and both the selection coefficient s and period τ, the asymptotic behavior of distinct selection coefficients appears to decouple for both moments. A phase emerges that is dependent on the selection coefficient and describes a temporal delay in oscillations of the excess kurtosis (and skew) relative to oscillations in the population size itself (this behavior is contained in the analytic expression for both moments). This phase can also be ignored, if desired, by time averaging; this allows sites under distinct selection coefficients to be pooled to estimate the average excess kurtosis (or skew). For organism with readily accessible temporal data, the variance of the time average of the excess kurtosis (or skew), after pooling deleterious variants to increase power, may aid in the identification of oscillations in the population size, as this quantity is generically inflated for cyclical populations relative to a constant population (all other parameters remaining equal).

S5 On the computation of the fourth and higher cumulants of the allele frequency distribution
Although higher moments can be computed using the same methodology described in the manuscript text, enumeration of the diagrams and evaluation of the resulting integrals becomes complicated. The fourth cumulant will be computed here as an example for a method of computation for general cumulants κ n (the main text contains a summary of the results for the fourth cumulant presented here). However, this requires the warning that the results here only apply to contributions represented by diagrams analogous to diagrams that appear as the lowest contributions to the fourth cumulant. Additional moments can be analyzed in a similar way, but require the identification of any missing diagrams, including those that do not have topological analogs to diagrams for the first four moments (e.g., there is one new type of diagram for the lowest order contributions to the fifth moment, etc.). The class of diagrams that has analogous contributions to those for both the the third and fourth cumulants are presented, along with an example of a diagram topology that appears in the fourth cumulant but has no analog for lower moments. Thus, this section is intended to provide some generalization of diagrammatic contributions that appear for the general cumulant κ n , but cannot provide full expressions for higher cumulants; they must be constructed by adding the functional forms provided here to all additional diagrams that appear at that moment. Having said that, the presentation here may help in future applications that require analysis of specific higher moments or in an effort to find a general set of closed-form approximations for all moments, which is beyond the scope of this manuscript. Here, I will restrict my analysis to constant population size, but, in principle, the same methodology can be used in conjunction with integral expressions provided in Supplemental Material S3.4 to find generalized forms along the lines of expressions below (in practice, this is likely difficult for most demographies due to the complicated functional form of N (t); the exponential demography may provide an example of where this is possible).

S5.1 Maximally connected diagrams
The fourth cumulant has four contributing diagrams at lowest order, each containing four vertices. The first two are analogous to the diagrams for κ 3 in Diagram D7. These diagrams have a single terminal vertex (one with coupling constant p 0 and another with coupling constant µ) and three copies of the N 3 vertex in a minimally connected (or 'maximally long') chain of drift vertices. (D10) The associated contributions sum to the following expression.
For higher moments, all diagrams with this minimally connected topology have analogous contributions. The overall pre-factor is a product of n! (the number of ways to connect to the external vertices), (n − 1)! (n − 1)! (the number of ways to choose each subsequent N 3 vertex divided by the repeated vertex factor), and a factor of 2 n−2 (the number of N 3 vertices with two availablep connections that can be swapped), for an overall factor of 2 n−2 n!. The factor of 2 n−2 partially cancels the overall factor of 1 2 n−1 (2N ) n−1 from the n − 1 copies of the coupling constant 1 4N , where each 1 2N becomes a factor of γ ′ after each integral of an exponential produces a multiple of 1 s ′ . Additionally, integration over n − 1 vertex times t i for the diagram containing the p 0 results in an overall constant of 1 (n − 1)!. Similarly, the n vertex time integrals in the diagram containing the µ vertex results in an overall constant of 1 n!. Both of these cancel or partially cancel the n! dependence from the counting factor. Finally, integration of the two minimally connected diagrams contributing to the n th cumulant κ n results in the following time dependence and multiplicative factors.
κ n long = n p 0 2(γ ′ ) n−1 e −s ′ t 1 − e −s ′ t n−1 + λ ′ 2(γ ′ ) n−1 1 − e −s ′ t n (S81) Here, 1 < n ∈ Z + (i.e., any positive integer greater than one); contributions to the analogous diagrams for n = 2 and n = 3 are given in the equations for the variance and non-standardized skew in the main text, both of which follow the functional dependence in Equation S81 (also see results presented for Diagrams D4 and D7 in Supplemental Material S3.2 and S3.3, respectively). The contributions for the mean frequency (i.e., n = 1) in the manuscript text follow the same pattern (see results in Supplemental Material S3.1) with the caveat that the factor of 2 n−2 requires n to be greater than one to be valid.

S5.2 Minimally connected diagrams
The final two four-vertex diagrams for κ 4 have a topology with no analogous diagrams for cumulants κ n with n ≤ 3.

(D11)
These diagrams are a series of maximally connected N 3 drift vertices; higher moment analogs are diagrams with one terminal vertex (p 0 or µ) and n − 1 drift vertices contributing to the moment E[p n ] that are maximally connected or 'maximally wide' (i.e., the minimal average number of propagators between the terminal and external vertices in a connected diagram). Note that this is only well-defined for even cumulants κ 2 n (evenly spaced branches, as with the diagrams above), but there are a variety of diagrams that can be considered maximally connected for both odd cumulants and for even cumulants that are not powers of two. This is the first example thus far of multiple diagrams contributing to a single cumulant with identical vertices and distinct topologies (e.g., the two leftmost or two rightmost in Diagrams D10 and D11). This becomes more relevant for higher moments, but can also occur in diagrams with, for example, multiple selection vertices (with coupling constant s) when computing high-vertex order contributions to the mean frequency. Calculating the appropriate integrals results in the following.
These diagrams thus represent exactly half of the contributions from the minimally connected diagrams in Diagram D10, but are of the same order and thus non-negligible when computing κ 4 . The additional factor of two found in Equation S80 can be attributed to a counting factor of 2 2 = 4 associated with the choice of whichp exiting each of the N 3 vertices connects to a subsequent N 3 vertex (as opposed to an external vertex) and an additional factor of 1 2 from integration due to the difference in the propagators. The counting factor for maximally connected diagrams is one (see Appendix A2). This corresponds to the fact that, after laying down the first propagator to connect any two vertices in the above diagrams, the connections are fully determined for all remaining propagators (i.e., they either connect to the terminal vertex G(t i , 0), an external vertex G(t, t i ), or the remaining connection between two N 3 vertices G(t j+1 , t j ) a some specific vertex time t j ).
κ 4D11 = n p 0 e −s ′ t 2 n−1 γ ′ n−1 1 − e −s ′ t n−1 + λ ′ 2 n−1 γ ′ n−1 1 − e −s ′ t n (S83) Again, note that this expression is well-defined only for diagrams representing contributions to even cumulants, but it may also apply to some maximally connected diagrams for odd cumulants, though not necessarily for all (the author admits to not having proved the latter). It is clear from these generalized expressions that, as n increases for successively higher moments, minimally connected diagrams rapidly starts to dominate over maximally connected diagrams corresponding to the same cumulant κ n . This is due to a suppression by a relative factor of 1 2 n−2 in the expression for maximally connected diagrams, which originates from (a lack of cancellation of) the counting factor for the number of allowed sets of Wick contractions (see Appendix A2). Interestingly, contributions from minimally connected diagrams can effectively be ignored (if desired) for cumulants κ 2 n with sufficiently large n (e.g., roughly for n ≥ 3, where the relative contribution of maximally connected diagrams is ≤ 1 2 6 = 1 64). However, contributions from a large number of additional diagrams without analogs for cumulants n ≤ 4 tend to dominate due to the multiplicity of many diagrams with non-unity counting factors. This can be understood by comparing Equation S81 at asymptotically late times to moments of the equilibrium frequency distribution φ eq (p) in the strong selection limit [1] by using the knowledge that minimally connected diagrams have the highest counting number of all contributing diagrams.

S5.3 The sum of minimally and maximally connected diagrams
The time dependence of the fourth cumulant, the non-standardized excess kurtosis, is given by the sum of Diagrams S80 and S82, both of which have the same functional form.
Similarly, the subset of lowest-order contributions to the general cumulant κ n from directly analogous diagrams can be obtained by summing Equations S81 and S83. At the risk of repetition, these do not represent the total contribution to the lowest-order approximation for arbitrary cumulants; this presentation is intended to provide intuition for how general cumulants are computed.