A lower prevalence for recessive disorders in a random mating population is a transient phenomenon during and after a growth phase

Despite increasing data from population-wide sequencing studies, the risk for recessive disorders in consanguineous partnerships is still heavily debated. An important aspect that has not sufficiently been investigated theoretically, is the influence of inbreeding on mutation load and incidence rates when the population sizes change. We therefore developed a model to study these dynamics for a wide range of growth and mating conditions. In the phase of population expansion and shortly afterwards, our simulations show that there is a drop of diseased individuals at the expense of an increasing mutation load for random mating, while both parameters remain almost constant in highly consanguineous partnerships. This explains the empirical observation in present times that a high degree of consanguinity is associated with an increased risk of autosomal recessive disorders. However, it also states that the higher frequency of severe recessive disorders with developmental delay in inbred populations is a transient phenomenon before a mutation-selection balance is reached again.


INTRODUCTION AND MAIN RESULT
The empirical observation that consanguinity is associated with an increased risk of autosomal recessive disorders, has been made in many countries. Martin, et al. recently showed that the contribution of autosomal recessive developmental disorders is 31% in the current British population if the autozygosity is above 0.02. [17] Likewise, in the Iranian population it is estimated that offspring from first-cousin unions have a probability for intellectual disabilities that is four times higher than in non-consanguineous partnerships. [11,13,18] Although most people probably agree that a lower burden of disease and child mortality is a desirable goal in a society, it is also clear that the occurrence and coexistence of different marriage patterns over many centuries cannot be understood by population genetics alone, especially as demographic, social and economic factors interact in a complex manner. [5] However, since there have been repeated attempts to motivate social conventions by genetic reasoning, we took a closer look at the validity of these arguments.
The European Court of Human Rights case of Stübing v. Germany concerned consanguine siblings who had four children following consensual intercourse, whereupon both siblings were charged with incest. [2] One of the siblings lodged a complaint, arguing that the legislature violated his right to sexual self-determination, his private and family life. The Court found that 24 out of 44 European States reviewed, criminalized consensual sexual acts between adult siblings, and all prohibited siblings from getting married. The German government argued that the law against incest partly aimed to protect against the significantly increased risk of genetic damage among children from an incestuous relationship. Motivating a law on avoiding a higher probability of disease can be viewed as eugenic. As the German Ethics Council opined after the judgment, no convincing argument can be derived from there being a risk of genetic damage, concurring with a statement from the German Society of Human Genetics that "The argument that reproduction needs to be thwarted in couples whose children possess an elevated risk for recessively inherited illnesses is an attack on the reproductive freedom of all". [3,1] Furthermore, as our work shows, the argument that there exists an increased risk of genetic damage, requires the definition of a reference population for comparison. However, there is neither agreement about a suitable reference nor an accurate measurement for mutation load.
Previously, the dynamics of mutation load and incidence rates have been analyzed for different demographic models including explosive population growth, but excluding different mating schemes. [10] In this work, we will therefore focus on simulations for random and consanguineous mating. In this framework, the mutation load is defined as the average number of lethal equivalents per individual. The lethal equivalents in the genome are deleterious alleles that are disease-causing if both copies of a gene in an individual harbor at least one such variant. The totality of these mutations could also be regarded as the theoretical superset of an extended carrier screen. [4] By this means, we are able to focus on the incidence rate of severe recessive disorders with early onset that prevent reproduction almost with certainty. Likewise, we can study how the selection of a partner, which we refer to as a mating scheme, influences the disease prevalence and mutation load and we are able to monitor these parameters in the population over time. This is done by counting the number of lethal equivalents that enter the gene pool due to a constant de novo mutation rate, or leave the gene pool due to selection. If the disease prevalence does not change any more, the population is in a steady state, that is a flux balance for lethal equivalents.
In our simulations, affected individuals do not have a different life span from unaffected individuals, therefore prevalence and incidence are equivalent and their rate is proportional to the amount of lethal equivalents removed from the gene pool per generation. In fact, the expected number of lethal equivalents that is lost by an affected individual that is not propagating is two. This is equivalent to the difference in the average mutation load between affected and unaffected individuals and can also be derived from the simulations. An expansion of the population will affect prevalence and mutation load as we will discuss in more detail in the following. However, the prevalence has to approach the same level for both mating schemes in the long run, as it is only constrained by parameters of genome biology. In contrast, the mutation load is affected by the final population size. Consanguineous mating goes along with a lower effective population size and therefore with a reduced capacity for lethal equivalents. [14] To investigate the dynamics of mutation load and incidence rates under different mating schemes in growing populations we developed two models and ran independent simulations in different frameworks. Each model had the advantage to handle certain aspects of nature particularly well. The first is an adaptation of the classical Wright-Fisher model with discrete, non-overlapping generations run in the forward genetic simulation framework SLiM. [7,9,19] The second is a diploid individual-based population dynamics model adapted from Bovier, et al. [6] In contrast to the Wright-Fisher model, individuals die and give birth at independent time points that are exponentially distributed. Both models resulted in the same qualitative and comparable quantitative results which will be described in the following. However, a detailed explanation of both models, the adaption of the "discrete" Wright-Fisher model, and the "continuous" model with adaptive dynamics can be found in the Supplemental Material.
We start our simulations with a population of 500 individuals that have around 500 generations to reach a steady state. After this initialization phase, the population expands to a size of 10 000 in about 130 generations which would correspond to roughly 2 500 years. [9] The population expansion follows a logistic growth curve, which rather looks like a step function (grey curve in Figure 2), because of the length of our entire simulations covering 2 000 generations. All individuals have diploid genomes with 1 000 recessive genes that we deem crucial for reproductive success. Their coding sequence ranges between 500 and 10 000 bp, novel alleles are introduced with a de novo mutation rate of 1.2 · 10 −8 per bp, and one out of nine mutations is expected to be a lethal equivalent. The choice of these parameters can be motivated by the distribution of coding lengths and the deleteriousness scores for known autosomal recessive genes. [15,16] Pairs for procreation are formed either randomly or based on their relatedness that is traced over the two most recent generations. In a highly consanguineous mating scheme, the number of potential partners is hardly affected by the population size, as most marriages happen within families. In our simulations this mating scheme is realized as follows: 50% of all partnerships share two grandparents, 30% share one grandparent, and only 20% share no grandparent. In this scenario the mutation load and prevalence do not change during population growth (Figure 2 A). In contrast, in a randomly mating population, there is a sharp transient drop of incidence rates during expansion at the expense of an increasing mutation load (Figure 2 B). Interestingly, even after the final size of the population is reached, it takes almost another 550 generations until the mutation load reaches its new plateau of approximately three deleterious mutations in 1 000 recessive disease genes. The prevalence, in contrast, returns to the value of the steady state, that is roughly 70 affected individuals per generation in a population of 10 000. The mutation load in the steady state increases in both mating schemes with the number of autosomal recessive genes, but with population size only for random mating (Figure 2 A, B). This is best explained by a limit of the effective number of available partners that the consanguineous mating scheme imposes, regardless of the final population size. In line with that argument is a transition from the dynamics of consanguineous to random when we incrementally increase family size, which would correspond to more potential mating partners ( Figure 2). Although the phase of population growth lasts only 130 generations in our simulations, the time span to reach the new equilibrium for the mutation load lasts much longer.
With respect to the British subpopulations of Pakistani (PABI) and European (EABI) ancestry in Martin, et al., this means that PABI are much closer to their steady state than EABI. This would imply that the disease prevalence for recessive disorders will rather increase in the following generations for EABI, while it will remain constant on a high level for PABI. The subgroups of EABI and PABI that share a similar proportion of developmental delay due to de novo mutations, are best suited to discuss the mutational load: At first glance, the higher proportion of recessive coding mutations in the PABI subgroup with > 2% autozygosity might indicate a higher mutational load. However, our simulations would rather suggest the opposite. Perhaps the higher mutation load in the EABI subgroup with multiple affected individuals is actually the key to understanding the high proportion of unexplained molecular causes in this subgroup. [12] If oligogenic modes of inheritance explain a part of the undiagnosed patients, this share should be higher in the EABI subgroup.

FIGURES FIGURE 1. DYNAMICS OF MUTATION LOAD AND PREVALENCE FOR SEVERE RECESSIVE DISORDERS:
A population expansion from 500 to 10 000 individuals (grey), starting in generation 500 does not affect prevalence (orange) nor mutation load (red) if partners are preferentially chosen within relatives (consanguineous mating scheme) (A). In contrast, in a random mating population, there is a transient drop of prevalence at the expense of an increasing mutation load (B). It takes more than 550 generations after the end of the growth phase, until the steady state is reached and the prevalence for both mating schemes are comparable again. The plots show the average of 50 exact trajectories of the stochastic process simulated with the discrete model. The mating scheme is characterized by the family size and a probability function that describes how many of the partners are chosen within the family. In a preferentially consanguineous mating population the dynamics change when the maximum family size increases (upper left panel to lower right from 10, 25, 50, 100, 500 up to 10 000). The mutation load starts to increase considerably if mating is happening in tribes of 500 individuals. However, at this stage there is still only a minor effect of further population growth. In the lower right the maximum of the allowed family size is equivalent to the population size and thus, dynamics do not differ from a random mating scheme any more. The plots show the average of 10 exact trajectories of the stochastic process simulated with the continuous model.
The capacity of the genome for deleterious mutations is larger in the random mating population. With an increasing number of genes and growing population size, deleterious mutations accumulate (A). In contrast, in the consanguineous mating scheme, family size limits the effective population size, and therefore mutation load is independent of the total number of individuals (B). Prevalence increases linearly in both mating schemes when the number of genes increases and is independent from population size, as regression analysis indicates (C,D).

CODE AVAILABILITY
All scripts to reproduce our simulation results can be found in the following repository: https://github.com/roccminton/Diploid_Model_Two_Loci. A video clip of our simulations can be found at: https://youtu.be/5hOgLyRqWPg.

SUPPLEMENTAL MATERIAL
To investigate the dynamics of mutation load and incidence rates under different mating schemes in growing populations we developed two models and run independent simulations in different frameworks. Each has the advantage to model different aspects of nature better than the other. One is an adaptation of the classical Wright-Fischer model with discrete, non overlapping generations [7,19] run in the forward genetic simulation framework SLiM [9]. The other one is a diploid individual based population dynamics model adapted from Bovier et. al. [6]. In contrast to the Wright-Fischer model individuals die and give birth at independent exponential times on a continuous timescale. Therefore refer to the Wright-Fisher adaptation as "discrete" and to the adaptive dynamics model as "continuous" model. Consider a finite population of individuals where each individual is characterized by a diploid set of N gene segments of different size. Mutations appear at every gene independently with a rate that is proportional to its size. As long as an individual carries a deleterious mutation at only one gene its fitness is unaffected. But as soon as both copies of a gene carry a mutation the individuals reproductive fitness is reduced to zero. Hence it will be excluded from the mating process and is not able to reproduce anymore. One can think of a mutation causing a severe recessive disorder. Besides that all individuals are equally fit, no matter how many recessive disorders they cary. Simulations always start with a small, healthy population. After a period of time in which a mutation selection balance is established a logistic growth phase starts, that settles after a new population equilibrium is reached. We are particularly interested in the evolution of two parameters during and after this growth phase. First the mutation load, which is the average number of mutations of an individual in the population. And second the prevalence of the recessive disorder, hence the fraction of the population that carries a mutation on both genes and thus expresses the disorder. Moreover we investigate changes of the dynamics of the above parameters when the population applies different mating schemes. On the one hand random mating, with an equal probability for each two individuals with non-zero fitness to mate. And on the other hand a consanguineous mating scheme, where individuals prefer to mate with close relatives.

DISCRETE MODEL
In the default setting, the simulation package from Haller and Messer [9] samples a diploid population evolution according to the standard Wright Fisher model. Sexes were added such that each sex is equally represented in the population at any time. In generation n ≥ 1 there is a finite number of individuals M n ≥ 0 with a total of 2M n genomes alive. In the initial burn-in phase the population size is held constant such that M n = M 0 for all generations n ≤ n grow . Afterwards the growth phase begins and the population size of each generation grows logistically with growth rate r > 0 until it approaches the carrying capacity K. Therefore the population size of each generation is determined by the following formula M n = K 1 + C 0 e −rK(n−n grow ) for all n ≥ n grow , The two mating schemes -random and consanguineous mating -are introduced as following. To create generation (n+1) first choose M n+1 women from generation n independently at random with replacement among all women with non-zero fitness. For random mating each woman then selects a man uniformly at random from the pool of potential partners with truly positive fitness. To implement the consanguineous mating scheme use the pedigree information SLiM provides for the last two generations backwards in time. Thus for every individual we know who are its parents and its grandparents. A woman in the consanguineous population now chooses a mate with a weighted uniform distribution on the set of all potential partners. For some weights α, β ∈ [0, 1] with α + β ≤ 1 the individual chooses a man with two common grandparents with probability α one common grandparent with probability β no common grandparents with probability 1 − (α + β) Notice that having two grandparents is similar to being cousins and having one grandparent in common relates to being half-cousins. From the whole human genome pick N gene segments with an independent, uniformly distributed number of base pairs w 1 , . . . , w N ∼ U [a,b] , where a, b > 0 is the minimum resp. maximum segment size. Moreover the whole genome is divided into n c chromosomes. At birth not only mutation alters the offsprings genetic information, but also recombination. For each chromosome start an independent Poisson Process with rate r rec > 0 marking the recombination breakpoints. Here r rec is the overall recombination rate.

CONTINUOUS MODEL
The adaptive dynamics model is continuous in time, hence time is not measured in n ∈ N discrete generations, but on the positive real axis t ∈ R + . No exact pedigree are available for this continuous model, therefore introduce a new diploid family trait, which indicates the ancestry of an individual. Hence every individual is characterized by two diploid traits. The first refers to the family origin whereas the second gives insight in the genetical information of the individual. Introduce F ⊂ N as the finite set of all possible family traits and f = ( f 1 , f 2 ) ∈ F 2 being the family trait of an individual in the current population. Moreover the diploid genetic information of an individual is a finite vector where the entries x i 1 and x i 2 represent the number of mutation at the i th gene segment in the first and second genome. Thus if there are (f 1 , x 1 ), . . . , (f M t , x M t ) individuals alive at time t > 0 in an arbitrary order, define the population state as a point measure on Individuals give birth and die at exponential rates b(f, x) resp. d(f, x) which depend on the family trait and the chromosomal configuration of the individuals. One can think of every individual carrying two independent clocks, a birth and a death clock. If the birth clock rings first the individual makes its mating choice, reproduces and resets its clock. Whereas if the death clock rings the individual disappears from the population. Additional to the intrinsic death rate every individual sense the competition pressure of every other individual in the population. The term C(f, x, g, y) gives the competition pressure executed by an individual of type (g, y) and felt by an individual of type (f, x). Hence the total death rate of an individual in population ν is increased by the term X C(f, x, g, y)dν(g, y) When an individual gives birth it chooses a partner at random from the population according to the partners birth rate and the reproductive compatibility between them. Notice that the reproductive compatibility R f (g) ∈ [0, 1] of two individuals depends only on their family traits f and g. After a mate was chosen the newborns family trait will be a uniform random combination of the four parental family traits unless both parents have the same traits. In this case the child inherits the exact same couple of traits. The genetic configuration of the newborn is not only a random combination of the parental alleles since the effect of mutation and the reshuffling of the parental chromosomes come into play. For an accurate definition of the reshuffling of the diploid parental chromosomes to a mixed haploid set, define first the sections on the genetic information, that form chromosomes. Let n c ∈ N be the number of chromosomes for every individual. Introduce the chromosome breakpoints {c 1 , c 2 , . . . , c n c +1 } ∈ {1, . . . , N} with 0 = c 1 < c 2 < · · · < c n c < c n c +1 = N. Divide the genetic information of every individual x = (x 1 1 , . . . , x 1 N , x 2 1 , . . . , x 2 N ) into chromosomes of the same length in both copies for j ∈ {1, 2} and i ∈ {1, . . . , n c }. Finally get the reshuffled gamete from x via a selection variable τ : {1, 2, . . . , n c } −→ {1, 2} as follows Selecting τ among all possible assignments equals an uniform recombination of the diploid chromosomes into haploid set. At each birth a Poisson distributed number of mutations is added to every offspring. The expectation of these identically distributed and independent Poisson random variables equals the total mutation rate 2µw, where µ is the mutation rate per base pair andw = w 1 + · · · + w N is the sum of the length of the gene segments under consideration. After sampling the number of mutations per birth, these mutations are distributed equally on the 2N gene segments according to their length. Finally a mutation at a given gene increases the genetic value by one. All of this is captured in the mutation with recombination operator M rec f,x,g,y for a mating between individuals (f, x) and (g, y) ((x, y), dh) ,else and the mutation measure m is defined as where 2N k l ∈ N 2N + | l 1 + · · · + l 2N = k is the set of all lattice vectors in N 2N + with one norm equal to k moreover Z k > 0 is a normalizing constant depending on the size of 2N k .
Let x 1 ), . . . , (f n , x n ) ∈ X        be the set of all finite point measures on X. The dynamics of the continuous time, M(X)valued jump process (ν t ) t≥0 can be described by the generator L defined for any bounded measurable function φ : M(X) −→ R as Assume the boundedness of the birth and death rates b and d as well as the boundedness of the competition kernel C. Starting in a initial state ν 0 ∈ X such that E [ν 0 , 1] < ∞ existence and uniqueness in law of the process with infinitesimal generator L and initial condition ν 0 can be derived from [8].
Keep families small To ensure a stable mating scheme during the evolution of the population split families which became too big up into two subfamilies. Therefore introduce the following sequence of stopping times. θ 0 0 θ k+1 inf t > θ k : ∃f ∈ F 2 s.th. ν, 1 f > κ for some fixed κ > 0. Notice that at any stopping time it is always a unique family f ∈ F 2 that exceeds the maximum family size, since at any time there is at most one individual entering or exiting the population. At these random times the big family is split up at random into two subfamilies, where the size of each subfamily is binomial distributed with mean 1 2 . One family keeps the old family trait and the other one gets a completely new, homogeneous one. To make this precise associate a number to each individual in a family. Therefore let where the x i for i = 1, . . . , k are the genetic configurations of the individuals in ((g 1 , x 1 ) , . . . , (g n , x n )) with g i = f and where x σ(1) · · · x σ(k) is the lexicographical order on N 2N and k = ν, 1 f is the family size of f. Then the splitting of a family with trait f ∈ F 2 can be expressed with the following operator for any bounded measurable function φ : where ∆ ν,f (π) executes the splitting of the family f in ν into two with configuration π, hence , ∀g ∈ F } chosen deterministically, is a homogeneous family trait which is entirely new to the population. A possible way of choosing the new family trait at time θ k is to set f new = n F + k where n F is the number of different families in the initial population ν 0 . Hence the dynamics of the evolutionary process with splitting is given as the solution of the following martingale problem. Let ν 0 ∈ M(X) be a initial population then for any real, continuous, bounded function φ on M(X) the process is a martingale. Choices of Parameters Introduce the subset D N ⊆ X of traits having at least one mutation in the same gene segment on both copies of the chromosome as D N (f, x) ∈ X 2 ∃n ∈ {1, . . . , N} s.th. x 1 n > 0 and x 2 n > 0 and set the birth and death rate to constant unless an individual falls in the set of non- To ensure uniform competition among all individuals set the competition pressure constant to C(f, x, g, y) =b −d K for all individuals (f, x) and (g, y). Therefore the population size in equilibrium fluctuates around the carrying capacity K of the system. The different mating schemes are defined as follows. First the random mating, where R rnd f (g) = 1 for all f, g ∈ F 2 and the consanguineous mating scheme with where α ∈ [0, 1) and κ > 0 is the maximum family size. Therefore the probability of mating within their own family that is of size κ 2 in a population that is at its stable equilibrium size K is constant α. Note that for families with family size smaller than κ 2 the probability of mating within the family is slightly lower, whereas it gets bitter when the family size surpasses this size. Furthermore the probability increases at the beginning of the growth phase when the carrying capacity K gets uplifted and the population size starts to grow slowly. During this initial phase of expansion there will be more consanguineous mating overall. This imbalance levels off as soon as the population size approaches K. Start the evolution with a small, healthy population of size M 0 > 0 in population equilibrium, where nobody carries any mutation. The clonal individuals are following divided into n F ∈ N >0 families with homogeneous family traits (1, 1), (2, 2), . . . , (n F , n F ). After an initial phase during which a mutation selection balance is established raise the carrying capacity to generate a natural exponential population growth up to the new equilibrium. The population parameters we are particular interested in, the mutation load and the prevalence rate for the disability can be formulated in terms of the population process. The relative mutation load of a population ν is defined as N n=1 x i n And the relative number of individuals in the population ν belonging to the set D N is To generate stochastically correct trajectories of the population dynamics process we implemented a variation of Gillespie algorithm for the above model in Python.

COMPARING BOTH MODELS
Both models, the discrete generation model implemented with SLiM and the continuous time model with the Gillespie algorithm, fit different aspects of nature better than the other. A key feature of SLiM and the discrete model is the exact pedigree information generated for every individual. Whereas the continuous model can only cluster roughly in family clans, but cannot differ between members of one family. A major drawback of the discrete model are the non overlapping generations. That excludes the possibility of matings between individuals on different pedigree levels such as uncle -niece marriages. This deficit is set aside by the continuous time model. Since individuals give birth and die independently, all individuals are of different age, and thus different generations are alive simultaneously. On the one hand alike the Wright-Fisher model, the discrete model works with constant respectively deterministically increasing population sizes. On the other hand the continuous model has a fluctuating and natural growing population. Notice that for large populations the random fluctuation in population size are of order 1 K and the stochastic process converges in law to the solution of a deterministic, logistic equation [8]. Recombination is also implemented differently. As SLiM works with true inter chromosomal recombination the continuous model only reshuffles the parental chromosomes while producing the gamete. This is due to the fact that SLiM saves the exact base position of mutations on the human genome, whereas the continuous model oxnly knows the number of mutations per gene segment and not their exact location within each segment. Under the assumption that all genes are compound heterozygote the different implementation of recombination has not affect on the fitness of individuals. Besides all differences parameters are chosen to be equal for both simulations. Among these the number of gene segments N, the initial and equilibrium population size M 0 and K, and many more. Moreover family sizes in the continuous model are chosen such that the number of potential partners in both models in the consanguineous setting are approximately equal. Likewise the birth rate in the continuous time model is set to b = 1, such that in generation t there are M t+1 birth events, where M t is the population size at that time.