Bait-ER: a Bayesian method to detect targets of selection in Evolve-and-Resequence experiments

For over a decade, experimental evolution has been combined with high-throughput sequencing techniques in so-called Evolve-and-Resequence (E&R) experiments. This allows testing for selection in populations kept in the laboratory under given experimental conditions. However, identifying signatures of adaptation in E&R datasets is far from trivial, and it is still necessary to develop more efficient and statistically sound methods for detecting selection in genome-wide data. Here, we present Bait-ER – a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments. The model has overlapping generations, a feature that describes several experimental designs found in the literature. We tested our method under several different demographic and experimental conditions to assess its accuracy and precision, and it performs well in most scenarios. Nevertheless, some care must be taken when analysing trajectories where drift largely dominates and starting frequencies are low. We compare our method with other available software and report that ours has generally high accuracy even for trajectories whose complexity goes beyond a classical sweep model. Furthermore, our approach avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data. We implemented and released our method in a new open-source software package that can be accessed at https://github.com/mrborges23/Bait-ER.

where 1 + σ is the fitness of any A-type offspring and σ the selection coefficient for allele A. If 169 σ = 0, i.e. A is evolving neutrally, then none of the alleles is preferred at reproduction. Let X t be 170 the number of copies of A in a population of N individuals; the probability of a given allele 171 trajectory X can be defined using the Markov property as where T is the total number of time points measured in generations at which the trajectory was can be generalised for R replicates by assuming their independence The main caveat for pool-seq data is the fact that it provides estimates for allele frequencies, not 178 true frequencies. For that reason, we assume that the allele counts are generated by a binomial 179 or beta-binomial sampling process which depends on the frequency of allele A and the total 180 sequencing depth C obtained by pool-seq. We then recalculate the probability of the Moran 181 states given an observed allele count c, which becomes the following with binomial sampling This step is key for it corrects for sampling noise generated during data acquisition. This is 183 particularly relevant for low frequency alleles and poorly covered loci.
Our algorithm is defined using a gamma prior on σ. The posterior cannot be formally obtained, where α and β are the shape and rate parameters, respectively, and c the normalization constant.

194
The gamma fitting represents a good trade-off between complexity, since it only requires two 195 parameters, but its density may take many shapes. As one requires the values of α and β that 196 best fit the gamma density for further analyses, we find the least squares estimates of α and β 197 (and c), such that the error is minimal. The estimation is as follows 198α = −(s 2 s 4 + s 2 4 − s 6 − s 7 )(s 2 1 − s 8 ) − (s 3 + s 1 s 2 + s 1 s 4 + s 5 )(s 1 s 4 − s 5 ) s 7 s 2 1 − 2s 4 s 5 s 1 + s 2 5 + s 2 4 s 8 − s 7 s 8 ∧ β = −s 3 s 2 4 + s 2 s 5 s 4 + s 1 s 6 s 4 − s 5 s 6 − s 1 s 2 s 7 + s 3 s 7 s 7 s 2 1 − 2s 4 s 5 s 1 + s 2 5 + s 2 4 s 8 − s 7 s 8 ,  Bait-ER was implemented with an allele frequency variance filter that is applied before performing 203 the inferential step of our algorithm. This filtering process excludes any trajectories that do not 204 vary or vary very little throughout the experiment from further analyses. To do that, we assess 205 the trajectories' frequency increments and exclude loci with frequency variance lower than 0.01.

206
These correspond to cases where trajectories are too flat to perform any statistical inference on.

301
Each of these settings was tested in different population scenarios that we defined by varying 302 population size, starting allele frequency, and selection coefficient. We assessed the error of the

326
Misspecifying it merely rescales time in terms of Moran events rather than changing the 327 relationship between N e and the number of Moran events in the process. Further, we observed 328 that the BFs are generally higher when the specified N e is greater than the true value, suggesting 329 that an increased false positive rate. The opposite pattern is observed when the population size 330 one specifies is lower than the real parameter. Additionally, we investigated the relationship 331 between BFs computed with the true N e and those produced under a misspecified N e . We found 332 that these BFs are highly correlated (Spearman's correlation coefficients were always higher than  Regarding computational performance, Bait-ER seems to be the fastest of the four methods, 359 even though it is comparable to WFABC (see fig. 6). However, we tested WFABC on the first 360 replicate population data rather than the five experimental replicates used for the remaining 361 methods. Additionally, WFABC does not provide any statistical testing output such as a Bayes

362
Factor. In contrast, CLEAR and LLS are slower than the other two approaches. While CLEAR 363 takes less than 40 seconds on average to analyse 100 sites, LLS is the slowest of the four, 364 averaging around 4 minutes. Overall, these results suggest Bait-ER is just as accurate and 365 potentially faster than other currently available approaches, which makes it a good resource for 366 testing and inferring selection from genome-wide polymorphism datasets.  fig. 7 (a)). Its performance is as good as the CMH test's, but it does underperform slightly picture to that of the sweep simulation emerges for the truncating selection scenario ( fig. 7 (b)). Bait-ER is more conservative and that the CMH test is more prone to producing false positives.

398
To assess why Bait-ER seems to be outperformed by CLEAR, we further investigated CLEAR's selection coefficient estimates. We note that Bait-ER assumes a continuous-time Moran model, 400 whilst CLEAR uses a WF model for inference, much like the simulated data analysed here.

401
Comparison of selection coefficients estimated by Bait-ER and CLEAR showed that Bait-ER is 402 slightly more accurate at estimating true targets' σ ( fig. S7). In addition, it seems that those Bait-ER and CLEAR perform to a similar high standard. However, the frequency variance filter 406 implemented in Bait-ER seems to explain our method's slight underperformance shown in fig. 7.

407
Whilst the two method's false positive rates seem to be comparable, Bait-ER excluded a few  generations. This dataset is particularly suited to demonstrate the relevance of our method, as highly polygenic basis of adaptation has proved challenging to measure and summarise thus far.

467
One of the main aims of E&R studies is to find targets of selection in genome-wide datasets. replication is key, but large populations are not necessarily required for achieving good results.

492
We also assessed Bait-ER's performance on a complex chromosome arm dataset simulated by as the more polygenic a trait is the more likely linkage between selected sites is to generate 589 selected haplotypes. Nevertheless, directional selection will cause some proportion of selected

Selection coefficients:
True A priori A posteriori Figure 2: Impact of the prior on the posterior estimates of the selection coefficients. The posterior distribution of σ was calculated using gamma priors G(α, β), where α and β are the shape and rate parameters. We set α = β and allowed β to vary from 0.001 to 10 5 (i.e. ranging from a very uninformative to a very informative prior, respectively). The different priors were tested under three E&R experiment scenarios: the first was a sparse experimental design (coverage (C) = 20x, number of time points (T) = 2 and number of replicates (R) = 2), while the second mimicked a standard set up (C = 60x, T = 5 and R = 5). Finally, the third scenario had the most thorough experimental conditions (C = 100x, T = 11 and R = 10). Red lines indicate the true value of σ. Blue lines point to the mean of the prior imposed on σ. Black lines and points correspond to the posterior mean of σ and credibility intervals at 0.95.  Figure 3: Impact of E&R experimental design on the estimated selection coefficients. Each square of the heatmap represents the error of the estimated selection coefficients, i.e., the absolute difference between the estimated and the true σ: |σ − σ|, for a range of population dynamics and E&R experimental conditions. (A) Number, span and distribution of sampled time points. The six time schemes differ according to the following criteria: most time schemes have five sampling events, except for TS1 and TS6, which have two and eleven time points, respectively; all time schemes have a total span of N e /5 generations, except for TS5, which has double the span (2N e /5); uniform sampling was used in most scenarios but for TS3, which is more heavily sampled during the first half of the experiment, and TS4, during the second half. The two maximum experiment lengths considered (0.  Allele's starting frequency: Figure 4: Impact of the user-specified population size on the estimation of selection coefficients. The plots show the distribution of the estimated selection coefficients where the population size is misspecified. Vertical lines and points indicate the interquartile range and median selection coefficient. Each plot represents a specific scenario that was simulated by varying the population size, the true selection coefficient (indicated within brackets (N e , N e σ)) and starting allele frequency (indicated by the yellow-to-red colour gradient). The numbers next to each bar correspond to the Spearman's correlation coefficient, which correlates the BFs of the 100 replicated trajectories between the cases where we have either under-and overspecified the population size (N e = 100 or 1000, respectively) and the case where we use the true population size (N e = 300). Regarding simulated experimental design, we defined a base experiment with five replicates, five uniformly distributed time points (total span of 0.20Ne generations) and a coverage of 60x.  Figure 6: Real computational time for Bait-ER and the other three approaches tested. From left to right, computational time in seconds including both inference and hypothesis testing for Bait-ER, CLEAR, LLS and WFABC is shown here. Similarly to figs. 5 and S6, these boxplots include estimates for those trajectories simulated with starting frequencies of 10% and 50%, as well as both study lengths investigated, i.e. 150 and 75 generations. Four NA's produced by LLS were again removed from these plots. : Variance versus mean sigma on the X chromosome. This graph compares log transformed variances in σ estimates to average σs. The variance is calculated using the inferred rate and shape parameters for the beta distribution, and the average σ is the mean value of the posterior distribution estimated by Bait-ER. Orange coloured points are significant at a conservative BF threshold of log(0.99/0.01), approx. 4.6. Orange coloured points correspond to BFs which are greater than log(0.99/0.01) (approx. 4.6) and p-values less than or equal to 0.01, i.e. those that are considered significant by both tests. Blue coloured points indicated that the computed BF is greater than our threshold, but not significant according to the CMH test. Additionally, dark grey points are significant according to the CMH test, but not to Bait-ER, and light grey points are inferred not significant by both tests.