Pervasive selection pressure in wild and domestic pigs

Animal domestication typically affected numerous polygenic quantitative traits, such as behaviour, development and reproduction. However, uncovering the genetic basis of quantitative trait variation is challenging, since it is probably caused by small allele-frequency changes. To date, only a few causative mutations related to domestication processes have been reported, strengthening the hypothesis that small effect variants have a prominent role. So far, the studies on domestication have been limited to the detection of the global effect of domestication on deleterious mutations and on strong beneficial variants, ignoring the importance of variants with small selective effects. In addition, very often, the study of the effects of selection are conducted on genome sequences that contain a non-negligible fraction of missing data, especially in non-model organisms. Hence, appropriate methods to account for these positions are needed. To overcome these difficulties, here we propose to estimate the proportion of beneficial variants using the asymptotic MacDonald-Kreitman (MK) method based on estimates of variability that summarizes the site frequency spectrum (SFS) while accounting for missing data and use them to perform an Approximate Bayesian Computation (ABC) analysis to infer the Distribution of Fitness Effects (DFE) of each population. We applied this approach to 46 genome sequences of pigs from three different populations, one wild and two domestics, with very different demographic histories and selective pressures. The obtained results showed that domestic and wild pig populations do not differ in nonsynonymous fixed mutations. Therefore, differences in α estimation among breeds are determined by their polymorphisms. The comparison of α for total and exclusive mutations suggests that the different domestic populations have suffered recent divergent changes in their functional versus neutral polymorphisms ratio, while the wild population is compatible with α=0. Besides, the DFE inferred with ABC indicates that both wild and domestic pigs display a large number of deleterious mutations at low frequency and a high number of neutral and/or nearly-neutral mutations that may have a significant effect on the evolution of domestic and wild populations. In addition, models not considering beneficial mutations have higher posterior probabilities, suggesting that beneficial mutations are difficult to detect or are scarce. Indeed, for all three populations, the median proportion of the strong favourable mutations are very low (≤ 0.1%) in those models that includes positive selection, with the average values of weak beneficial mutations around 0.6% for wild boar and 0.8-1.0% for the domestic pigs. Lastly, the analysis based on exclusive mutations showed that recent demographic changes may have severely affected the fitness of populations, especially that of the local Iberian breed.

To overcome these difficulties, here we propose to estimate the proportion of beneficial variants using the  Domestic animal histories are evolutionary experiments that have often lasted for millennia 47 resulting in dramatic phenotypic changes to suit human needs. In addition, domestic species can 48 be structured into subpopulations (breeds) that are partly or completely genetically isolated and 49 can display a wide catalogue of specific phenotypes. Therefore, they offer a very valuable material 50 of utmost interest to study the interplay between demography and accelerated adaptation. 51 However, as their demographic history can be quite complex, many events remain unknown or 52 poorly documented nowadays. 53 The pig (Sus scrofa) is a particularly interesting species because of its domestication history and 54 its relatively well-annotated genome. S. scrofa originated in Southeast Asia ~1-4 MYA and spread 55 throughout Eurasia ~0.2-1.2 MYA, colonizing all climates except the driest (Frantz et al. 2013, 56 functional effects and divergence, Kono et al. 2016, Makino et al. 2018). Kono et al. (2016) and 107 Perez- Enciso et al. (2016) found an excess of deleterious variants affecting phenotypes of interest, 108 suggesting, as we previously mention above, that protein sequence may have a stronger influence 109 than regulatory changes in the domestication process. Kono

Estimation of levels and patterns of variability 194
Genetic diversity and divergence per pig population were estimated using mstatspop software 195 (Nevado, Ramos-Onsins, and Perez-Enciso 2014; Bianco et al. 2015;Guirao-Rico et al. 2018, 196 available from the authors, https://github.com/cragenomica/mstatspop). The multi-VCF file was 197 converted into a tfasta (transposed fasta) file and mstatspop was run on i) the whole genome, ii) 198 windows of 5-Mb size, and iii) gene coding regions. Note that, given the ubiquitous presence of 199 missing data, the SFS with highest sample size that contained more coding SNPs, retained only 200 around 20-25% of total coding variants. As the aim of this work is to detect the presence of (weak) 201 beneficial selective effects and as not to lose power or bias the results of the analysis, we preferred 202 to use an alternative method that consider the whole set of SNPs. We used four different estimators 203 of nucleotide variability that takes into account missing data (Ferretti, Raineri, and Ramos-Onsins 204 2012): Watterson (Watterson 1975), Tajima (Tajima 1983), Fu&Li (Fu and Li 1993) and 205 Fay&Wu's estimators (Fay and Wu 2000). The variability was estimated for total, shared and 206 exclusive variants, being shared and exclusive nucleotide variability counted regarding to the total 207 positions (i.e., total = shared + exclusive). 208 209

Filtering for artefactual effects 210
A preliminary analysis of the variability showed a moderate negative correlation (~ 0.3) between 211 the levels of variability and divergence and the proportion of missing data for each gene. To 212 eliminate this artifactual correlation, we plotted the estimators of variability and divergence versus 213 the ratio of missing data and eliminated those genes that showed a ratio of missing data greater 214 than 0.3. Since this filtering was not enough to completely remove the bias, we also removed genes 215 with extreme values of variability and divergence (higher than 99% quantile of the total genes). 216 The remaining ~13,500 genes (70% of the total annotated genes) showed a low or null correlation 217 with missing data and were used in the present analysis (Table S2).
where !n is the nonsynonymous variability, !s is the synonymous variability, )n is the 231 nonsynonymous divergence, )s is the synonymous divergence and α is the proportion of adaptive 232 variants that have been fixed. To estimate the proportion of nonsynonymous substitutions that are 233 adaptive (α) the previous expression is reordered (e. g., Eyre-Walker 2006): 234 A higher ratio of nonsynonymous to synonymous divergence versus polymorphisms suggests that 237 positive selection has fixed adaptive variants (α > 0) and the opposite case (α < 0) suggests the 238 presence of deleterious mutations segregating in the population. 239 If we consider that weak deleterious mutations are segregating in the population, we expect that 240 their relative proportion will be higher at lower frequency variants and low or zero for fixed 241 deleterious mutations. Following the same notation as in equation 2: 242 where i refers to the frequency at which the calculation of variability is estimated, + # is the 245 proportion of weakly deleterious polymorphic mutations at frequency i, + $ is the proportion of 246 weakly deleterious fixed mutations. + $ < + # was assumed at any frequency. Then, solving for the We see that in case of calculating α without considering the effects of deleterious mutations, this 251 would be underestimated depending on the frequency at which the estimates of variability are 252 calculated. If we assume that the deleterious variants would hardly be fixed, a good estimator of α 253 using equation 2 would be the one that estimates variability based on high frequencies, as it would 254 hardly contain deleterious mutations. This is in agreement with the asymptotic arguments used in 255 Messer and Petrov (2013) and implemented in . 256

257
Similarly, if we also consider that weak positively selected variants are segregating in the 258 population, we expect that their relative proportion, compared to neutral ones, is higher at higher 259 frequencies: 260 where . # is the proportion of weakly advantageous polymorphic mutations at frequency i, and . $ is 263 the proportion of weakly advantageous fixed mutations. Again, solving for the proportion of fixed 264 adaptive variants (α+. $ ): 265 Furthermore, in cases where two populations are from the same species and there are no fixed 282 mutations between them (e.g., they have equal divergence ratios versus the outgroup), we can 283 estimate the possible differential effect of the selection (positive and negative together) at any 284 frequency between populations from the ratios of synonymous to nonsynonymous polymorphisms 285 of the two populations (0_+. # ): 286 In addition, a comparison of the 0_+. # values calculated using different variability estimators 291 (hereafter 0_+. # pattern) can be used to inform about the effects of selection. For example, values 292 over 1 indicate that the population 2 has a higher ratio of nonsynonymous to synomymous 293 polymorphisms compared to population 1, either produced by an accumulation of deleterious or 294 of beneficial polymorphisms. Importantly, different demographic effects (e.g., bottlenecks) 295 together with the presence of mutations with small selective effects may also disturb the ratios of 296 variability and hence must be considered when interpreting the results. We include a couple of 297 possible scenarios that can account for possible patterns: (i) after split of two the two populations, 298 both populations have the same population size, but population 1 is affected by the action of 299 positive selection on a quantitative trait (polygenic effect), which causes an increase in the 300 frequencies of some of its variants without getting fixed. Under this scenario, we expect a R_βγ > 1 when this is calculated based on high frequencies. (ii) after split of two the two populations, the 302 population 2 remains equal population size as before the split and the population 1 suffers a 303 reduction in its effective population size, which causes that the slightly deleterious mutations 304 become effectively neutral. Then, R_βγ is expected to be > 1 when it is calculated based on low 305 frequency variants. 306 The effect of linkage disequilibrium between selective (deleterious or adaptive) and neutral 307 Table S4 shows the parameters of each model and the prior distributions used in the analysis. We 360 used polyDFEv2 (Tataru et al 2019) to obtain the expected unfolded site-frequency spectrum 361 (SFS). The code of polyDFEv2 was slightly modified in order to print the SFS and the parameters 362 for a large number of conditions, which are needed to perform the ABC analysis using summary 363 statistics. For each model, one million iterations were run using different parameter conditions and 364 the resulting SFS for each condition were kept to later calculate the ratios of variability, divergence 365 and the α statistic. The ABC analysis was performed using the R library abc (Csillery et al. 2012). 366 We performed a cross validation analysis to evaluate the ability of the approach to distinguish 367 between models using the cv4postpr() function, as suggested in the abc library documentation. 368 The confusion matrix indicated that these three models were quite distinguishable with a 369  (Table S5). Posterior probabilities of each model given the observed 372 data (i.e., the probability assigned to each model relative to the other models of the analysis), were 373 obtained using the postpr() function and considering a multinomial logistic and a rejection 374 approach. Additionally, a goodness of fit analysis, which evaluates whether the prior distribution 375 for model parameters are realistic, was also performed. The best model was selected based on 376 posterior probabilities. Once the best model was chosen, the ability to infer the parameters of the 377 model was assessed using the cv4abc() function. Prediction errors for the parameter inference of 378 each model are shown in Table S6 and Figure S2. The parameters of the best model were inferred 379 with the abc() function using a local linear regression and a rejection approach. Posterior predictive 380 simulations were performed using the α statistic to determine whether the simulated data generated 381 from the estimated parameter of our best model resembled the observed data (1000 replicates). We have additionally tested whether there is a significant correlation between α and 406 recombination, gene density, missing rate, %GC and CpG islands across genomes. 407 408 Testing the differences in the estimates of α using whole-genome data versus the mean of 409 gene estimates. 410 We have studied the behaviour of the α statistic when it is estimated considering a single large 411 dataset (i.e., genome) or when it is estimated using the mean of many subsets (i.e., genes). To do 412 that, we made an R script (check_ratios_vs_meanratios.R) in which we simulated a hypothetical 413 number of polymorphisms and substitutions per gene, following a Poisson distribution (we 414 considered ~10x more substitutions than polymorphisms, 2x more nonsynonymous positions than 415 synonymous and 10x more functional constraint at nonsynonymous versus synonymous). We 416 estimated α per window and per total. The distribution of α per gene can be strongly skewed to

Predominance of shared variants and global similar selective effects of mutations in 426 genomic sequences of pig populations 427
We found a total of 6,684,142 SNPs in autosomes, with 149,440 SNPs located in coding regions. 428 12.5% of the SNPs in the coding regions are shared among the three populations, 32.2% are shared 429 between at least two populations, 31.2% are exclusive to Large White (LW), 2.2% are exclusive 430 to Iberian (IB) and 34.4% are exclusive to Wild boar (WB) ( Table 1 and Table S7). The proportion 431 of exclusive SNPs in each population is in accordance with its specific demographic history 432 For each breed, coding positions were classified as polymorphic, fixed (i.e., different allele from 452 the outgroup) or ancestral allele (i.e., same allele as in the outgroup), with the aim of identifying 453 those variants that appeared previously or posteriorly to the domestication process (Table 1). 454 Surprisingly, we found very few fixed mutations between populations, indicating that the 455 phenotypic traits of each population are not associated with fixed coding variants. Similarly, we 456 found very few fixed coding variants in domestic (IB or LW) versus wild (WB). There are few 457 variants fixed in the domestic breeds that are polymorphic in the wild population, suggesting that 458 these variants were previously present in wild breeds or, alternatively, were introgressed into WB 459 from domestic breeds. Most of the variants that are exclusive of a single breed are polymorphic, 460 which is in agreement with the recent origin of these variants. We found a large number of fixed 461 variants in the IB that are polymorphic in LW and WB, likely due to a reduction of the effective 462 population size of the IB breed. The ratio of nonsynonymous to synonymous polymorphism was 463 always lower than one and showed similar values for the three populations regardless of the 464 variability estimator used (Table S9). This result suggests that, on average, there are no differential

Limited influence of genomic context and the network topology on selective patterns 485
The heterogeneity in the recombination rate, the gene density, the %GC and the distribution of 486 CpG islands across the genome can affect the local levels of variability. A previous study on the 487 IB breed detected a strong correlation between recombination and variability, although no 488 correlation was observed between variability and gene density or GC content (Esteve-Codina et 489 al. 2013). However, the effect of these factors on the estimation of the proportion of adaptive 490 nonsynonymous mutations (α) has not been previously studied. When we assess whether there is 491 a correlation between the estimated α and the above-mentioned factors, we found that there is no 492 correlation between the estimated α and recombination, gene density, %GC and CpG in any of the 493 three breeds (P-values > 0.01). 494 495 Next, we investigated the effect of gene network topology on the selective patterns. It has been 496 claimed that topology limits the 'evolvability' of genes and that highly connected genes are more 497 constrained and, consequently, less likely to be targets of positive selection. We compared the 498 network topology features (betweenness, out-degree and in-degree) of genes within pathways 499 regarding the estimates of α, grouping genes with positive versus negative α values. We found that 500 genes with negative α values show significant large values of the betweenness statistic in the three 501 pig breeds compared to genes with positive α values (P-value < 0.01; Figure S4). LW and WB 502 showed significant values (P-values < 0.01) of the in-degree statistic for genes with negative α 503 values compared to genes with positive a values. However, we did not observe significant 504 differences in the out-degree values between genes with negative and positive a values in any of 505 the three breeds ( Figure S4). These results suggest that, in the three breeds, genes that are more 506 central in a pathway are more evolutionary constrained compared to peripheral genes. In addition, 507 in LW and WB, the genes that are more constrained tended to have a higher number of upstream 508 genes that regulated them, which is also in agreement with the central position of these genes in 509 the pathway. We did not observe significant differences in in-degree statistic in the IB breed 510 between genes with negative and positive a values, likely because of a relaxation of functional 511 constraints as a consequence of the reduction of its effective population size. To assess the selective effect of domestication, we first studied the pattern of variation at 516 synonymous and nonsynonymous positions using four estimators of variability that differentially 517 weight the SNP frequencies (See Material and Methods). Ideally, it would be more informative to 518 analyze the whole genome Site Frequency Spectrum (SFS). Unfortunately, the relatively high 519 number of positions with missing data discourages their use. A possible alternative would be to 520 obtain the SFS from a reduced number of samples, and therefore use only a partial number of SNPs 521 for subsequent analysis. However, by using this alternative we can either lose power or introduce 522 some sort of bias, hence, we preferred to analyze the whole set of SNPs using those estimates of 523 variability based on different frequencies of the spectrum that account for missing data. 524 Nevertheless, in order to clarify the patterns of the SFS for these populations, we estimated the 525 SFS for a subset of SNPs (around 25-30% of the available coding variants, depending on the breed) 526 for a projection of variants on 38 haploid samples in both LW, WB and on 10 haploid samples in 527 IB ( Figure S6). The SFS profile for both synonymous and nonsynonymous showed a rapid 528 decrease in the number of variants from lower to higher frequencies. We observed a slight increase 529 in the number of polymorphisms at the highest frequencies at both synonymous and 530 nonsynonymous sites and no apparent signals of admixture (i.e., no sudden peaks at specific ranges 531 of frequencies). Estimates of whole-genome variability levels per nucleotide using different 532 estimators are shown in Figure 1 and detailed in Table S8. We have considered the synonymous 533 positions as neutral reference since no strong bias in codon usage has been detected ( Figure S5). 534 We expect that, under the Standard Neutral Model (SNM), the values for the different estimates 535 of variability should be similar whereas differences among them may indicate demographic and/or 536 selective effects. We observed that i) the levels of variability are different for each estimator within 537 breeds and ii) the levels of variability are different for the same estimator for different breeds. 538 However, for each breed, we observed a similar ratio of nonsynonymous to synonymous 539 polymorphisms regardless of the used estimator, suggesting that demographic effects are 540 responsible for the differences in the levels of variability (Figure 1). The less variable population 541 is the IB breed, which shows far fewer singletons compared to WB and LW, probably as a 542 consequence of the known reduction of its population size. Note than in all the three populations, 543 high-frequency variants are proportionally more abundant than those at intermediate frequencies,

α's values and
R_βγ ratios based on all SNPs might reflect a differential effect of selection 553 due to domestication 554 The differential effect of selection in the domestic and wild populations can be studied by 555 comparing their respective α values. Figure 2A and Table S9 show  it is calculated based on high frequencies (Table S9)  We observed a high ratio of nonsynonymous to synonymous singletons (αFu&Li, Figure 2B Figure 2D). This could be due to i) the active elimination of new presence of nonsynonymous variants targeted by the process of domestication that shifts them 625 toward high frequencies (αFay&wu). Nevertheless, we cannot discard that this excess of 626 nonsynonymous variants at high frequencies and the lack of nonsynonymous singletons at low 627 frequency could be due to a more complex and not previously explored demographic scenario.     variants (Table 2). The posterior probability for this model is especially high for the IB breed 718 (0.97). Nevertheless, note that the model D is just below the DN model by less than half of the 719 probability in the case of WB and LW. Finally, the posterior predictive analysis indicated that the 720 observed α values for the three populations are within the range determined by the minimum and 721 maximum simulated α values (i.e., Q1-1.5*IQR, Q3+1.5*IQR, respectively, being IQR the 722 Interquantile Range Q3-Q1) under both models DN and D, although not always inside the Q1 and 723 Q3 quantiles ( Figure 5). The mean parameters of the DFE inferred for each population are shown 724 in Table 3 (see also Table S13). The obtained results indicated that the DFE is quite similar among 725 all three populations, which is not entirely surprising because they share a long-term history. 726 According to model DN, and despite there is a lot of uncertainty in the inferred estimates (Table  727   S13 For models A and C we used the local regression method and tolerance=0.0005. For models DN and D we used lower tolerance (0.0001) but the rejection method, to limit the parameter estimates proportions between 0 and 1.  and IB breeds based on exclusive variants are the same than those for total variants (Table 2). 752

WB
However, for LW and based on exclusive variants, the model D has higher posterior probabilities, 753 in contrast to what was obtained based on total variants ( Table 2). The posterior predictive 754 simulations showed that the models DN and D yielded similar estimated a values to those from 755 the observed data, with the IB breed exhibiting the posterior distributions of a values more distant 756 to the observed data ( Figure 6). The estimates of the parameters of the models indicate that, for all 757 populations, the exclusive segregating variants exhibit less strongly deleterious effects compared 758 to those based on total and shared variants (Table 3). Indeed, the IB breed shows significantly 759 lower proportions of strong deleterious mutations, according to its assumed recent population 760 decline. As in the analysis based on total variants, posterior predictive analysis based on exclusive 761 variants showed that models DN and D are those generating a values more similar to the observed 762 ones, but in this case, the observed a values for the IB breed were slightly closer to the simulated 763 a's, compared to those based on total variants ( Figure 6).The results obtained based on shared 764 variants also show that the DN is the most likely model for all populations, although the model C 765 shows closer probabilities (Table 2), especially in the case of LW, which might suggest that shared 766 variants may have played a significant role as a substrate for adaptive process. However, posterior 767 predictive distribution of a values for this breed under this model resembled less the observed data  For models A and C we used the local regression method and tolerance=0.0005. For models DN and D we used lower tolerance (0.0001) but the rejection method, to limit the parameter estimates proportions between 0 and 1.  For models A and C we used the local regression method and tolerance=0.0005. For models DN and D we used lower tolerance (0.0001) but the rejection method, to limit the parameter estimates proportions between 0 and 1.    The study of the genetic effects produced by domestication can be challenged for many reasons. 790

WB
First, the current domesticated species have been severely manipulated by humans, which means 791 that most of individuals were not crossed randomly, and consequently, different complex 792 domestication scenarios can be found such as a high degree of structuration, a fast creation of new 793 lineages from highly inbreeding crosses, a forced introgression between far related populations or 794 from close species and a very divergent selective events across time and space, among others (e.g., issues. This approach is designed to be used as an alternative in case of the estimation of the full 811 SFS is compromised by large amounts of missing data. Although it can be less precise (we used 812 only four statistics to capture the entire trend of α's across the SFS), it allows analyzing a larger 813 number of positions, helps to reduce the variance (by summarizing the SFS into few statistics) and 814 facilitates the visual interpretation. To illustrate the utility of the proposed approach, we used this 815 methodology to perform an exhaustive comparative study of the observed patterns of functional 816 versus neutral diversity and divergence in domestic pigs and wild boars. In addition, and to delve study, including several diverse evolutionary scenarios and the inference of DFE parameters given 819 different selective models. Note that the DFE was inferred using Bayesian calculations (ABC) 820 instead of exact Bayesian or Likelihood methods since despite ABC requires additional steps and 821 validation analysis and is in general less precise, it allows contrasting models and inferring 822 parameters from complex datasets or data containing missing information (Beaumont et al. 2002). 823 Finally, we have also analysed exclusive and shared polymorphisms separately to extract 824 information about the new and the past domestication events of the demographic history of these 825 populations, but also about the role of population admixture. we found that these genes show little or no nonsynonymous polymorphisms or fixed variants 874 (Table S14) and shared mutations, which suggest that different types of variants (total, shared and exclusive) 898 could be capturing different aspects of the domestication process (demography versus selection). 899 The estimation of the DFE from the ABC analysis showed that the most likely evolutionary model 900 for all three populations based on total variants was that consisting in a discrete DFE without 901 significant positive selection effect (model DN; Table 2), showing a clear genome-wide effect of 902 the action of purifying selection. We also observed a reduced effect of purifying selection in IB 903 and in less extend in WB when the analysis was performed based on exclusive variants, which 904 suggest a reduction of the population size of these two populations. However, for LW and when 905 the analysis was performed based on exclusive positions, the most likely model was that with a 906 discrete distribution that includes the effect of positive selection (model D; Table 2), which may 907 reflect the increase of new Asian variants which increased in frequency by artificial selection.. relatively small, suggesting that, in general, a few proportion of beneficial mutations contributed 910 to the domestication process. In fact, under model D, the estimated global proportion of beneficial 911 mutations (weak and strong) was relatively small and slightly higher when the analysis is based 912 on shared variants (0.1% based on total and exclusive SNPs and 1.4% based on shared SNPs; 913 Table 3) and similar in wild and domestic populations. Nevertheless, this proportion of mutations 914 may be substantial in absolute numbers (i.e., several thousand mutations). 915 Although an excess of nonsynonymous shared variants compared to the synonymous ones can be 916 explained by some demographic scenarios such as bottlenecks, they may also reflect biological 917 constraints at the species level. For instance, the phenotypic variation in a polygenic selective 918 scenario could be caused by subtle changes in the frequencies of many genes (in an infinitesimal 919 scenario) which would result in the observed phenotypic differences among the breeds. On the 920 other hand, exclusive variants may reflect recent and breed-specific selective hallmarks and hence, 921 would be responsible for the observed differences between domesticated and wild breeds. In both 922 cases, shared and exclusive polymorphisms are contributing to the differences in the SFS between 923 functional and non-functional positions. Nevertheless, the differences of the DFE when the 924 analysis was performed based on total or shared variants is very small, suggesting that exclusive 925 variants would be more informative to detect the effects of the change of selective effects. 926 In addition, our simulated domestication scenarios indicate that the effect of positive selection 927 irrespective of being either strong and affecting a small percentage of variants or weak and 928 affecting a large percentage of variants is not reflected as marked changes in the estimated patterns 929 of α. This could be due to the short time since the change in the fitness effects of variants occurred 930 but also by the interaction of positive and negative selection and demographic processes in the 931 case of the complex scenarios, which are the most compatible with the observed data. In fact, the 932 observed α patterns are compatible with the simulated demographic effects (population size 933 reduction in WB and IB and gene flow in LW) but also, as in the ABC analysis results, with the 934 effect of positive selection in LW when the analysis was based on exclusive variants. 935 We are aware that the evolutionary models used here are very simple and contain few parameters 936 and the real observations contain high heterogeneity that could not be fitted to these models. The 937 reasons for this heterogeneity may be technical (e.g., not adequate filtering of raw sequences), models to explain the real data). In any case, the model that assumes a discrete distribution of 940 deleterious mutations (model DN) seems to generally explain better the observed data, together 941 with the model D (model DN but including the effect of beneficial selection) in a minor degree 942 and also that exclusive variants seemed to be more informative to detect the changes of the 943 selective effects.