Stable Equilibria under Random Fitness and Recombination Rates

We develop and analyze a two-locus biallelic genetic model with randomized recombination rates and fitness matrices. While deterministic models reveal a trajectory toward the simplex center before stabilizing at polymorphisms, introducing random recombination pinpointed regions of attraction dependent on the fitness matrix and recombination fraction range. As recombination randomness intensifies, transitions in these attraction regions are observed. Analysis with random fitness matrices highlighted genotype attraction towards the tetrahedron vertices with edges guiding genotypic clustering.


Introduction
The interaction between genetic linkage and selection has been a focus of evolutionary genetics since [1] first outlined the progression towards tighter linkage between two loci.
The first attempt at specific analysis of the model was by [2] using a symmetric viability model.Further studies delved into the influence of natural selection and viability on the for ab, where 4 i=1 x i = 1.Let w ij ≤ 0 denote the viability of an individual with genotype ij, these fitnesses are expressed in terms of a symmetric matrix AB Ab aB ab AB w 11 w 12 w 13 w 14 Ab w 21 w 22 w 23 w 24 aB w 31 w 32 w 33 w 34 ab w 41 w 42 w 43 w 44 .
Assuming the absence of sex and position effects, we can rewrite: BB Bb bb AA w 11 w 12 w 22 Aa w 13 w 14 w 24 aa w 33 w 34 w 44 , where each w ij is independently and identically sampled from a uniform distribution over the interval [0, 1].To account for the variability in meiotic recombination rates, the stochastic recombination fraction R is sampled at each generation from a uniform distribution over an interval within the range [0, 0.5].
The marginal mean fitnesses, represented by W i can be expressed as: for i ∈ {1, 2, 3, 4}.The transformation equations dictating the change in gametic frequencies from one generation to the next are: where the linkage disequilibrium function D = x 1 x 4 − x 2 x 3 , mean fitness W = W 1 x 1 + W 2 x 2 + W 3 x 3 + W 4 x 4 , and the choice of the ± operator for x 1 and x 4 is negative, while for x 2 and x 3 is positive.
For ease of presentation, for any given index i ∈ 1, 2, 3, 4, let π := 5 − i represent its corresponding 'twin' index, and let r, s = 1, 2, 3, 4 \ i, π denote the set of non-twin indices.This notation implies that r and s take values in 2, 3 when i ∈ 1, 4, and in 1, 4 when i ∈ 2, 3. Importantly, this establishes that w iπ = w rs remains constant across all values of i. Utilizing this concise notation, we can derive the following compact representation: Using the above representation and the inequality one deduces the following lower bound for the updated frequencies November 10, 2023 2/13 this follows by taking in (3) Inequality (4) plays a key role in studying the dynamics of the frequency process.It shows that, if one component x i is very large compared to the other ones then a lower bound on x ′ i can be established.Even more, assume that 1 from which, one infers, based on (4), that where, for convenience, we let R = 1 − R.This shows that 1 − x ′ i remains within the same order of magnitude as 1 − x i .
We utilize the tetrahedral representation to depict potential chromosome frequency states using barycentric coordinates.This representation provides a geometric interpretation of the model where the distances of a point from the tetrahedron's vertices to the opposite face are directly proportional to the gamete frequencies of that point.

Convergence behavior under fixed recombination
In deterministic scenarios, given any initial state and fixed recombination fraction, the system almost always (see Hastings 1984) deterministically converges to an equilibrium.In the case of symmetric viabilities, trajectories toward equilibrium first approach a curve defined by the pairs vertices AB and ab or Ab and aB and their polymorphisms x * .Subsequent evolution along this curve leads the system trajectories to converge to x * (See Figures 1 and 2).November 10, 2023 3/13  1 Numerically determined cases of polymorphic equilibria previously derived by [3].The trajectory density is visualized through a heatmap with equilibrium points highlighted in green.For these and subsequent figures: roman numerals correspond to parameters in Table 1, initialization involves 10000 uniformly distributed x0, and deterministic simulations are terminated when the difference across 100 consecutive generations is less than 10 −10 .
The trajectories approached a curved symmetric invariant set, representing a transient phase within the state space, after which trajectories then progress towards their respective equilibrium points.Indeed, the vector field for the deterministic case first directs trajectories toward the center of the tetrahedron and then directs them toward their equilibrium points (Figure 2).
November 10, 2023 For this and all subsequent vector field diagrams, the strength of a vector from a given blue sphere x within the parameter space is represented by the length and color of the corresponding vector arrow, with the region of attraction after the indicated number of generations marked by a red × symbol.

Convergence behavior under random recombination
When the recombination fraction, R changes randomly at each generation, the trajectories are stochastic.Given a specific range of uniformly distributed values of R, we find regions of attraction S such that trajectories from all x 0 in S remain restricted with high probability.For this and all following random R diagrams: the model runs until the change across 100 consecutive generations is ⪅ 10 −4 , depending on the range of the recombination rate, R is uniformly chosen from a range such that its mean is the corresponding deterministic R from Table 1, and for all x0 in a given transparent domain, S exists as an opaque convex hull of the same color.
The multicolored regions in Figure 3 denote the basin boundaries.Within these boundaries, the final equilibrium state appears to be determined, not by x 0 , but by the randomly selected R in the initial few generations.
In this system, genotypes, while initially in a basin of attraction, can be perturbed out of it when there's a consistent trend in the sampled R values over multiple generations.In the limit, such perturbations will consistently occur.Thus, when a system exhibits multiple domains of attraction within the tetrahedron's interior, it's more accurate to state that sequences do not truly converge to a region and instead fluctuate around a distribution.Conversely, for systems where genotype frequencies approach fixation, the frequencies of the other genotypes become so infinitesimal that in any finite population, they would be lost.
Moreover, setting progressively stricter convergence criteria by reducing the change November 10, 2023 5/13 threshold tends to prioritize regions with denser distributions.These regions exhibit slower divergence and are therefore more likely to meet stringent change thresholds.
Conversely, regions that, upon stabilization, exhibit consistent albeit lower concentration of points risk being overlooked.Thus a more accurate representation of the system's long-term behavior is to evaluate the distribution after an extensive number of generations.
In our analysis, we considered three genotypic fitness sets (see Figure 4   With random R, a "convergence diagonal" is described by the parametric equations: Distinct transitions in convergence were observed as the range of random recombination fraction, R, expanded.Specifically: 1.For the initial increase in the range of R, the convergence regions grow across the parametric convergence diagonal (marked in dashed green).
November 10, 2023 6/13 2. When R increases beyond a particular range threshold, the genotypic distribution converges to the corners (see Figure 7i).

Single Heterozygote Fittest
In the deterministic model with single heterozygote fittest (Figure 4ii), the genotypic distribution approaches fixation to the corners.As the recombination fraction increases, the domain of x 0 that converges to the corners aligns with the parametric diagonal (see Figure 8).Random recombination does not influence the genotypic outcomes.

Homozygote Fittest
With homozygote fittest (Figure 4iii), the deterministic, random, and no recombination scenarios converge to all non-homozygote edges.With a random matrix without recombination, genotypes exhibit a low but consistent distribution throughout the tetrahedron's interior.Recall from earlier that this spread-out distribution is attributed to a randomly occurring trend from the stochastic fitness elements, occasionally strong enough to drive the genotype toward the center.Genotypes exhibit a higher tendency towards the tetrahedral edges, a distribution caused by linkage disequilibrium.However, the highest concentration of genotypes is at the tetrahedral corners.
As the recombination fraction increases, the genotypes proximal to the AB/ab and Ab/aB edges experience a repulsion.Given that edges represent closely linked alleles, increased recombination disrupts these linkages, driving genotypes away.Random recombination does not influence genotypic outcomes; the dynamics behave analogously to those with a deterministic R equivalent to the mean of such a distribution.
In the stochastic setting, the dynamics of the process can be modeled as a Markov chain on the space of non-degenerate frequency vectors X = {x = (x 1 , x 2 , x 3 , x 4 ) : x 1 , x 2 , x 3 , x 4 > 0, x 1 + x 2 + x 3 + x 4 = 1}, If the fitness values are i.i.d from the uniform distribution on (0, 1), one would expect that the trajectories of the process "cover" the full state space; this is in contrast with the deterministic model, where trajectories are fixed by the fitness matrix and the initial state.An immediate consequence is that fixed equilibria, i.e. frequency vectors which remain invariant over time, do not exist for the stochastic model.In fact, the only global equilibrium points (i.e.equilibria for any possible fitness matrix) in the extended state space (obtained by including the boundary of the tetrahedron) are the four corners; this follows from inequality (4).Therefore, should some equilibrium exist for the stochastic model, it should come as no surprise if that would imply these corners (which, importantly, do not belong to the interior X ).Nevertheless, equilibria for Markov chains should be understood in a probabilistic sense and existence of such equilibria is the main concern of our analysis.
Formally, a Markov chain is a sequence of random variables {X n : n ≥ 0} with a particular type of dependence.It is defined by a transition probability on the state space which gives the conditional probability distribution of the next state, given the current one.We will formally denote that by where x ′ is the random vector generated by (1) and B is a (possibly open) subset of the state space X .The transition operator (5) together with the initial state X 0 determine the probabilistic behavior of the chain, just like the fitness matrix together with the November 10, 2023 9/13 initial state determine the trajectory of the deterministic process.We denote by P x the probability distribution of the chain started in X 0 = x.
For any set B ⊂ X we define the hitting time of B as the minimum random number of steps the process requires to first reach B; in the formula, where the minimum is infinite if the chain never reaches set B. If the initial state X 0 is a generic point in B then T B may be also referred as return time to set B. The set B is called accessible from x if that is, if the chain started in x reaches set B with positive probability.When B is some open ball, which we shall denote by B(z, ϵ) around z, of radius ϵ > 0, we say that state z is accessible from x.The Markov chain is said irreducible if for any pair of states x, z ∈ X there exists some ϵ > 0 such that L(x, B(z, ϵ)) > 0; in words, any state is accessible from any other state.
Irreducibility plays an important role in the analysis of a Markov chain.It ensures that, in principle, the chain will essentially visit any state in the state space, no matter what the initial state is.Indeed, this is the behavior we see in the generational limit in Figure 1.
To establish irreducibility of the Markov chain X we need to check under which circumstances there exist fitness values w ij ∈ (0, 1) to satisfy (2), for some given values x, z = x ′ ∈ X .Furthermore, note that solving the equation z = x ′ amounts to solving the system of equations for i = 1, 2, 3, 4.There are 4 equations and 10 variables, all of them constrained to be in (0, 1).A general solution to this system can be found by assigning random values in (0, 1) to the variables w pq , with p ̸ = q, and then letting (i = 1, 2, 3, 4) where we treat α ∈ (0, 1) as a parameter.Now we only need to impose that the calculated values w ii meet the constraints w ii ∈ (0, 1); this is because the last constraint, α ∈ (0, 1), will be then implicit since z ∈ X entails so, to make sure that the system has at least one solution, we require that there exist fitness values w pq ∈ (0, 1), for p ̸ = q, satisfying max November 10, 2023 10/13 where we denote δ i (x, w) = w ir x i x r + w is x i x s + Rw iπ x i x π + Rw rs x r x s .By a continuity argument, equation (7) gives sufficient conditions for state z being accessible, in one transition, from a state x.To summarize our analysis, for any x, z ∈ X , if there exist w pq ∈ (0, 1), for p ̸ = q, satisfying (7), then ∀ϵ > 0 : P (x, B(z, ϵ)) > 0.
With this result in hand, we can go now further to establish irreducibility.More specifically, we will show first that: (I) the center a is accessible from any other state; (II) any state is accessible from a small neighborhood of the center a.
For part (I) take z = a in (7) and, since z i = 1/4, we need to show that for some w pq ∈ (0, 1).Taking w ij = x1x2x3x4 xixj , for j ∈ {r, s}, this reduces to which obviously holds true for small enough w iπ .
By continuity of δ i (in both x and w), for part (II) it is enough to show that any state z is accessible from the center a; then the conclusion will follow true, as well, for at least some small neighborhood around a. Taking now x = a in (7), we need to find values w pq ∈ (0, 1) such that When w pq = ε, for p ̸ = q, for some ε > 0 (to be chosen later), this reduces to which is obviously true for ε < (min i z i )/(max i z i ); this proves the claim.
Finally, we are able to prove irreducibility: take some arbitrary x, z ∈ X and ϵ > 0.Then, by part (I) P (x, V) > 0, for any neighborhood V of a, and by part (II) and the latter is strictly positive since it is the expectation of some positive random variable over a set of non-null probabilities.This proves that any state z may be reached from any x 0 , with positive probability, in at most 2 steps, hence irreducibility follows true.
The high genotype concentration at the tetrahedral corners can be attributed to the behavior of genotype frequencies when approaching fixed points.To make this intuition precise, let us introduce the following terminology: let Proving the recurrence of the Markov chain extends beyond our Monte Carlo simulations and presents a fruitful avenue for future investigation.The stochastic equilibrium observed in Figure 1 is only possible in the limit of the chain is indeed recurrent.In this context, a stochastic equilibrium is understood as a probability distribution µ on X which is left invariant by the transition operator -that is, if X 0 ∼ µ then X n ∼ µ for any n ≥ 1.Nevertheless, for practical purposes, the generational span of our simulations should faithfully represent the dynamics of an actual population.

Discussion
Our investigation focused on the effect that random recombination rates and fitness matrices have on dynamics in the two-locus biallelic model.For deterministic trajectories, we identified a two-fold dynamic where genotype frequencies would move toward the center of the simplex before approaching stable polymorphisms.
Upon the introduction of randomly varying recombination, we identified regions of attraction S, to which x remains restricted with a high probability.Notably, the quantity and spatial distribution of these regions are contingent upon the fitness matrix and the range of the random recombination fraction.As the range of the random recombination fraction is increased, there are transitions in the location of the regions of attraction with both double and single heterozygote fittest cases from the edges across the parametric diagonal.In the homozygote fittest case, all recombination scenarios lead to fixation of the genotypes AB and ab.Further analysis with random fitness matrices demonstrated that genotype fixation acts potently as an attractor, drawing a high concentration of genotypes toward the tetrahedron's vertices.

Acknowledgments
Some of the computing for this project was performed on the Sherlock cluster.We would like to thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results.Special thanks to Professor Marcus Feldman for his invaluable guidance and advice throughout this project.

Fig 1 .
Fig 1. Convergent Trajectories for Numerical Simulations with Deterministic R

Fig 2 .
Fig 2. Vector Field Evolution of Trajectories

Fig 3 .
Fig 3. Regions of Attraction for Numerical Cases with Random R With double heterozygote fittest (Figure 4i) and a low deterministic recombination fraction (e.g., R = 0.05), there are two complementary polymorphisms.However, once the recombination fraction surpasses a distinct R range threshold-dependent on the fitness matrix-the genotype distribution converges to both the corners and the central point [0.25, 0.25, 0.25, 0.25] depending on x 0 (see Feldman and Liberman 1979).

Fig 5 .
Fig 5. Convergence with Double Heterozygote Fittest and Deterministic R

Fig 7 .
Fig 7. Convergence Progression for Expanding R Ranges

Fig 8 .
Fig 8. Convergence with Single Heterozygote Fittest Matrix and Deterministic R

Fig 10 .
Fig 10.Convergent Distribution Density as R Increases

Table 1 .
Numerical Cases of Polymorphic Equilibria with Symmetric Viabilities and Deterministic R , for α < 1, will be generically called central regions.The complement of the central region Ω α can be partitioned as follows: ∁Ω α = ∪ i=1:4 ∆ i,α , where ∆ i,α are disjoint (for α > 1/2), defined as ∆ i,α = {x : x i ≥ α}; these are corner regions, i.e. neighborhoods of the corresponding corners, and are getting small when α → 1.The sets Ω α are increasing w.r.t.α ∈ (1/4, 1), so one can define the (set) limits inf α Ω α and sup α Ω α ; the former is simply the center of the space a = (1/4, 1/4, 1/4, 1/4) and the latter is the standard (closed) simplex in R 4 without the four corners.Thinking of central regions Ω α as open, bounded neighborhoods of the center a, we see that the four corners of the tetrahedron appear as the points that are not covered by any "bounded" region Ω α .The chain consistently interacts with the corner regions, culminating in a high concentration in these areas over time. α