The asexual genome of Drosophila

The rate of recombination affects the mode of molecular evolution. In high-recombining sequence, the targets of selection are individual genetic loci; under low recombination, selection collectively acts on large, genetically linked genomic segments. Selection under linkage can induce clonal interference, a specific mode of evolution by competition of genetic clades within a population. This mode is well known in asexually evolving microbes, but has not been traced systematically in an obligate sexual organism. Here we show that the Drosophila genome is partitioned into two modes of evolution: a local interference regime with limited effects of genetic linkage, and an interference condensate with clonal competition. We map these modes by differences in mutation frequency spectra, and we show that the transition between them occurs at a threshold recombination rate that is predictable from genomic summary statistics. We find the interference condensate in segments of low-recombining sequence that are located primarily in chromosomal regions flanking the centromeres and cover about 20% of the Drosophila genome. Condensate regions have characteristics of asexual evolution that impact gene function: the efficacy of selection and the speed of evolution are lower and the genetic load is higher than in regions of local interference. Our results suggest that multicellular eukaryotes can harbor heterogeneous modes and tempi of evolution within one genome. We argue that this variation generates selection on genome architecture.

Genetic linkage affects molecular evolution by coupling the selective effects of mutations at different loci. 2 This coupling, which is often called interference selection, generates two basic evolutionary processes: 3 a strongly beneficial mutation can drive linked neutral and deleterious mutations to high frequency; 4 conversely, a strongly deleterious mutation can impede the establishment of a linked neutral or beneficial 5 mutation. The first process is an instance of hitchhiking or genetic draft [1,2,3,4,5,6,7,8,9,10,11,12], 6 the second is known as background selection [13,14,15,16,17,18,19,20,21]. In both cases, interference 7 selection has the same consequence: by spreading the effect of selected mutations onto neighbouring 8 genomic sites, it reduces speed and degree of adaptation. 9 In an evolving population, interference links between genomic loci are established by new mutations, 10 reinforced by selection on these mutants, and reduced by recombination. Hence, strength and genomic 11 range of interference are set by all three of these evolutionary forces. In sexually reproducing organisms 12 with a sufficiently high recombination rate, interference has limited effects because it remains local; 13 that is, it acts only on mutations at proximal genomic positions but is randomized by recombination at 14 larger distances. Without recombination, however, interference becomes global: it couples the evolution 15 of mutations across an entire chromosome. Under strong selection pressures, chromosome-wide genetic 16 linkage generates a specific mode of evolution in which populations harbour competing clades of closely 17 related individuals, each clade containing a distinct set of beneficial and deleterious mutations. This 18 evolutionary mode, which is commonly called clonal interference [22], is well known in asexually evolving 19 microbial [23] and viral populations [24,25]. Theoretical models suggest that a similar mode of evolution, 20 in which selection acts on extended segments of genetically linked sequence, arises in sexual populations 21 at low recombination rates [26,27]. However, finding evidence for this mode of evolution in the eukaryotic 22 genome of an obligate sexual organism has remained elusive so far. 23 Drosophila is an ideal system to map differential effects of interference in one genome. Fruit flies 24 show overall high rates of adaptive genetic mutations [28, 29,30], at least in high-recombining parts of 25 the genome (low-recombining parts have been excluded from most previous studies). At the same time, 26 local recombination rates vary by orders of magnitude within the Drosophila genome. In particular, 27 extended segments of low-recombining sequence are located in genomic regions flanking the centromeres 28 and, to a lesser extent, the telomeres [31]. In this paper, we present a systematic analysis of linkage 29 effects across the Drosophila genome. For two populations of D. melanogaster, we map the frequency 30 statistics of mutations and the divergence from the neighboring species D. simulans in their dependence 31 on the local recombination rate. Consistently in both populations, we find two clearly distinct regimes: a 32 local interference regime in high-recombining regions and an interference condensate regime in extended 33 low-recombing regions, which cover about 20% of the autosomes. We delineate these two regimes by the 34 statistics of synonymous mutations. In the local interference regime, the amount of synonymous mutations 35 decreases with recombination rate and the frequency spectrum follows an almost perfect inverse-frequency 36 power law, as predicted by classic theory [32]. This indicates that the establishment of synonymous 37 mutations is constrained by background selection [14,15,13,16,18], but established mutations evolve 38 predominantly under genetic drift. In contrast, the frequency spectrum of the interference condensate 39 regime shows a specific depletion of intermediate-and high-frequency variants, which is consistent with 40 genetic draft over extended genomic distances [6,7]. 41 To corroborate this scenario, we develop a scaling theory of evolution under positive and negative 42 selection and limited recombination, building on recent models of asexual and sexual evolution [22, 11, 43 27, 33, 34, 35, 20, 36]. This theory provides a unified framework for background selection and genetic 44 draft. It describes how amount and frequency-dependence of mutations depend on the recombination 45 rate, and it predicts the transition point from local interference to the interference condensate. Over a 46 wide range of evolutionary parameters, the transition occurs at a threshold recombination rate that is 47 close to the sum of the rates of deleterious mutations and of beneficial substitutions per unit sequence; 48 these rates can be inferred from genomic summary statistics. In the Drosophila genome, the predicted 49 threshold recombination rate is numerically close to the point mutation rate, in perfect agreement with 50 the transition point observed in mutation frequency spectra. 51 Our scaling theory also provides the tools to infer key evolutionary features of the condensate regime 52 from genomic data. We use this inference to quantify similarities of the Drosophila interference con-53 densate to asexual evolution, and to understand the likely biological impact of the condensate mode of 54 evolution. Specifically, we show that genes in condensate regions are less evolvable in response to positive 55 selection and have a higher genetic load than genes in the local interference regime. We discuss how these 56 evolutionary differences can impact gene functions in the condensate and generate selective pressure on 57 genome architecture. 58

59
Evolutionary modes in recombining genomes 60 Why there are two distinct evolutionary regimes in recombining genomes can be understood from a 61 remarkably simple scaling theory. We consider genome evolution under deleterious mutations with rate 62 u d , beneficial mutations generating substitutions with rate v b , and recombination with rate ρ; all of these 63 rates are measured per base pair unit of haploid sequence and per generation. In terms of these rates, we 64 estimate the probability that a given selected mutation evolves autonomously -i.e., free of background 65 selection and genetic draft-, generalizing a previous argument by Weissman and Barton for evolution 66 solely under beneficial mutations [26]. We first compute the average amount of interference generated by 67 the substitution of a beneficial mutation with selection coefficient s at a given focal site in the genome. This 68 mutation takes an average time of order τ s ∼ 1/s from establishment to high frequency and generates a 69 linkage correlation interval of size ξ s = 1/(ρτ s ) ∼ s/ρ around the focal site. Other mutations at a distance 70 r ξ s are likely to retain their genetic linkage to one of the alleles at the focal site and are subject to 71 strong interference; more distant mutations are likely to randomize their genetic linkage to the alleles 72 at focal site by recombination within the time span τ s . Hence, each beneficial substitution generates an 73 interference domain with an area τ s × ξ s ∼ 1/ρ around its focal point in genomic space-time [26]. By 74 exactly the same argument, each deleterious mutation creates background selection in an interference 75 domain with area τ s × ξ s = 1/ρ around its focal point. In this case, τ s ∼ 1/s is the expected time between 76 origination and loss of the deleterious allele. While genetic draft acts on all other mutations in the genomic 77 neighborhood of the driver mutation, background selection strongly affects only mutations on the genetic 78 background of the deleterious allele, but this difference does not affect the scaling of interference domains. 79 To estimate the joint effects of beneficial and deleterious mutations, we combine both kinds of interference 80 interactions into a single interference density parameter This parameter delineates two universal evolutionary modes of recombining genomes:

82
The local interference mode (ω 1) has a dilute pattern of interference domains: the domains of 83 beneficial substitutions and of deleterious mutations are randomly distributed with probabilities u b and 84 v b per unit sequence and per unit time (Fig. 1a). The space-time shape of a domain, which is given by the 85 scales ξ s and τ s , depends on the selection coefficient of its focal mutation, but its area 1/ρ is universal. 86 Because the interference domains are, on average, well-separated in the local intereference mode, different 87 mutations under selection are statistically independent. Hence, the rate of beneficial substitutions with 88 selection coefficient s is related to the underlying mutation rate u b and the haploid effective  (1), delineates two regimes that differ in strength and genomic range of interference selection. (a) Local interference (ω 1). This regime has a dilute pattern of interference domains generated by deleterious mutations (blue) and beneficial substitutions (red). These domains have space-time densities u b and v b , respectively; each domain covers a space-time area 1/ρ. Interference selection occurs only within interference domains; mutations under selection evolve in a largely independent way. (b) Interference condensate (ω 1). In this regime, interference domains generate a densely packed pattern. Interference domains have characteristic sizes ξ =σ/ρ and τ = 1/σ in space and time, as detailed in equation (5). Mutations are subject to strong interference selection, which curbs the speed of evolution. mutation evolves autonomously if its own interference domain has negligible overlap with any of the 91 other interference domains, which happens with probability p 0 = e −ω . Two target mutations on different 92 genetic backgrounds see the same red domains but different blue domains; however, the no-interference 93 probability p 0 is independent of background.

94
The interference condensate mode (ω 1) has densely packed and overlapping interference domains. 95 This indicates strong interference over extended genomic segments: genomic space-time is jammed by 96 mutations under selection (Fig. 1b). By the definition of ω, the condensate is a broad evolutionary 97 regime: it is generated by a sufficiently large supply of substantially beneficial or deleterious mutations, 98 or a combination of both, but it does not depend on details of their effect distribution. Remarkably, the 99 condensate domains have not only a universal area but also a typical shape that is given by universal 100 scales τ and ξ = 1/(ρτ ). Below, we will infer these scales from genomic data in Drosophila.

101
A key evolutionary quantity to map these evolutionary modes is the average fitness variance in a 102 population, Σ, which measures the efficacy of selection: by Fisher's fundamental theorem, Σ equals 103 the rate of fitness increase by frequency gains of fitter genetic variants in the population. As detailed 104 in Methods, our scaling theory captures the dependence of Σ on the evolutionary parameters in both 105 interference modes. In the local interference regime, Σ depends linearly on the rates u d and u b , which 106 reflects the statistical independence of mutation events. In the interference condensate, Σ is curbed to 107 a sublinear function of u d and u b , which signals a reduced efficacy of selection caused by the jamming 108 of genomic space-time (Fig. 1b). To test these results of our scaling theory, we performed simulations 109 of evolving populations (see Methods for simulation details). In Fig. 2a, the average fitness variance per 110 unit sequence, ς 2 , is plotted against ω (with a suitable rescaling factor, as explained in Methods). In the 111 regime ω 1, the rescaled fitness variance data obtained for a wide range of parameters u d , u b , ρ collapse 112 onto a uni-valued, linear function of ω. In the interference condensate (ω ω * ), the fitness variance is 113 seen to be curbed, and the rescaled data for different evolutionary parameters show some spread. Here 114 we evaluate ω in terms of the mutation rates and the average selection coefficient of beneficial mutations, 115 ω = (u d + 2Nsu b )/ρ, which maps the same regimes as equation (1) because v b 2Nsu b for ω 1.

116
The reduction in the efficacy of selection and, in particular, the diminishing return of beneficial 117 mutations in the interference condensate are hallmarks of asexual evolution in large populations, which 118 are known from experiments with microbial populations [38,23,39,40] and from theoretical models 119 [22,36,12,33]. They signal the competition between genetic clades in an evolving population, which 120 prevents some beneficial mutations to reach substantial frequencies because they are outrun by other 121 clades. At the same time, deleterious mutations can reach fixation if they are part of a successful clade; 122 this effect is often referred to as Muller's ratchet [41,42,43,44,45,46]. We conclude that the condensate 123 regime shares important characteristics with asexual processes, in accordance with previous results in 124 refs. [26,27,11]. Our scaling theory expresses this link mathematically: the modes of evolution in 125 recombining systems can be mapped onto corresponding modes of asexual evolution in a specific low-126 recombination limit (Methods).

127
In a minimal scaling theory that is based on the interference density, the local interference regime and 128 the interference condensate are separated by a smooth transition at a characteristic value ω * of order 1. 129 The transition occurs when the interference probability e −ω becomes of order 1; the transition point marks 130 the onset of nonlinearity in the fitness variance. For a broad range of evolutionary parameters, which 131 includes realistic assumptions for the Drosophila genome, this behavior is confirmed by our simulations 132 (Fig. 2a) and is consistent with analytical results for specific cases [26,27], including the low-recombination 133 limit [33, 20, 12]. In Methods, we detail this minimal scaling theory and discuss extensions that cover 134 parameter regimes with systematic shifts of the transition point ω * above or below 1 (Supplementary 135 Figure S1). Simulation results for parameters in the local interference regime (blue dots) and in the interference condensate (orange dots) are shown together with fits to the spectral functions Q 0 (x; ν = 0) (blue line) and Q 0 (x; ν = 2.7) (orange line). The shape distortion at intermediate frequencies is a universal characteristic of the condensate regime. The secondary branch at high frequencies is a feature of outgroup-directed spectra, which are appropriate for our data analysis; see text below and equations (19) and (25). (c) Shape parameter ν, plotted against interference density ω. Site frequency spectra from simulations are fitted to the spectral function Q 0 (x; ν). (d) Neutral sequence diversity π plotted against interference density ω. Simulation data is shown together with theoretical predictions for the local interference regime (ω 1, black line) and the interference condensate (ω 1, green line). See equations (11) and (12). Simulation parameters: N = 2000, ρ = 10 −7 − 10 −4 , u d = 0 − 3.0 × 10 −6 , Genomic signature of interference selection 137 The fitness variance is a key summary statistics to map evolutionary modes under interference selection 138 but it depends on the a priori unknown rates u d and v b . The distribution of mutation frequencies x at 139 neutrally evolving sites, the so-called neutral site frequency spectrum q(x), provides an alternative test 140 that can be directly evaluated from population-scale sequencing data. At high recombination rates, the 141 neutral spectrum is dominated by genetic drift and has the universal Kimura form q(x) ∼ 1/x [32] (blue 142 line in Fig. 2b). At low recombination, the spectrum shows a characteristic depletion of intermediate-and 143 high-frequency mutation counts (red line in Fig. 2b). This shape distortion is a result of genetic draft, 144 which generates faster frequency changes and, hence, fewer variants in this frequency range than genetic 145 drift. This distortion turns out to be a robust feature that can be read off from genomic data even if the 146 exact form of the spectrum is hidden by noise and confounding factors. 147 We can compare the site frequency spectrum inferred from genomic data with spectra derived from 148 specific evolutionary models. All analytically solvable models make strong simplifying assumptions on 149 the evolutionary process, specifically on rate and effect distributions of mutations generating interference 150 selection. The exact form of the spectral function depends on these model details, but broad shape features 151 are a universal markers of interference. An important class of models are so-called travelling fitness 152 waves, which describe the asymptotic regime of linked genetic variation generated by multiple coexisting 153 mutations with individually small selection coefficients [33,34,36,47]. Fitness waves generate a steady 154 turnover of sequence variation at a characteristic rate σ. In the asymptotic wave regime, genetically linked 155 neutral sites have a spectrum depleted at intermediate frequencies, as given by an inverse-square power 156 law q(x) ∼ 1/x 2 for x < 1/2 and a minimum near x = 1/2 [27]. These models underscore an important 157 general point: genetic draft -i.e., mutation frequency trajectories shaped by the substitution dynamics 158 of a beneficial allele at a genetically linked locus -and the associated shape distortion of site frequency 159 spectra is universally generated by a sufficient supply of deleterious or by beneficial mutations [45,46]. In 160 the following, we use a specific model with spectral shapes depleted at intermediate frequencies that are 161 tunable to the Drosophila data reported below. The model contains a focal site that evolves by mutations, 162 selection, and genetic drift; the site is also subject to background selection and genetic draft with rate 163 σ. Draft is generated by linked strongly beneficial alleles, each of which occurs on a random genetic 164 background and leads to instantaneous fixation or loss of mutations at the focal site. The resulting 165 spectral function of neutral sites takes a simple approximate form, Q 0 (x; ν) = e −νx /x, with a shape 166 parameter ν that is proportional to the draft rate σ (dashed lines in Fig. 2b; details on Q 0 (x; ν) are 167 given in Methods). In the following, this model will serve to parametrize the universal shape distortion 168 of empirical spectra, without pretence to resolve details of the underlying selective forces.

169
In Fig. 2cd, we show that mutation frequency data produce two consistent markers of interference. 170 First, we fit neutral site frequency spectra obtained from numerical simulations to the form Q 0 (x; ν) and 171 plot the inferred shape parameter ν against the interference density ω of the underlying evolutionary 172 process. Over a wide range of rates u s , u b , and ρ, we find ν ≈ 0 (i.e., neutral spectra of the form 173 q(x) ∼ 1/x) in the local interference regime (ω ω * ) and ν > 1 (i.e., neutral spectra with depletion of 174 intermediate and high frequencies) in the condensate regime (ω ω * ). Below, we will link the shape 175 parameter ν to specific evolutionary characteristics of the condensate regime. Second, we record the 176 sequence diversity at synonymous sites as a function of the interference density ω (Fig. 2d). This quantity 177 shows a strong dependence on ρ in the local interference regime (ω ω * ) and a weaker dependence in the 178 interference condensate (ω ω * ), a pattern that is predicted by our scaling theory and will be described 179 in more detail below. Hence, both genetic draft on neutral mutations and the depletion of the diversity 180 pattern set on at the transition point ω * from local interference to the interference condensate (Fig. 2cd), 181 the same point that is marked by the onset of nonlinearity in the fitness variance (Fig. 2a). The validity 182 range of our inference method is detailed in Methods.

183
Interference selection in the Drosophila genome 184 To obtain a genome-wide map of interference in the Drosophila melanogaster genome, we use sequence 185 data from an American [48] and an African population [49]. To equalize coverage, we take a random sample 186 of 25 individuals in each population. At this sampling depth, site frequency spectra are quite insensitive 187 to low-frequency variants (which would arise, for example, from a recent population expansion) but are 188 perfectly suitable for studying intermediate-frequency variants, which are at the center of this study. 189 Based on a published high-resolution recombination map for the Drosophila genome [31], we partition 190 genomic sites in the autosomes (i.e., chromosomes 2L, 2R, 3L, 3R) according to the local recombination 191 rate evaluated in windows of 10 5 base pairs. This partitioning covers a range of rates between 10 −10 and 192 10 −7 with an average of 2.4 × 10 −8 per unit sequence and per generation (often reported in units of 10 −8 193 per unit sequence and per generation, called centiMorgans per Megabase).

194
In each recombination rate bin and in different sequence categories, we record the outgroup-directed 195 site frequency spectrum,q o (x), which is defined as the number of sites per unit sequence at which a 196 fraction x = k/n of the sampled individuals have a mutant allele and a fraction 1 − x have the outgroup 197 allele (with k = 0, 1, . . . , 25 and n = 25; we disregard sites with more than two alleles). Following common 198 practice, we determine the outgroup allele by alignment with the reference genome of the neighboring 199 species D. simulans. These empirical spectra differ in two ways from the model spectra introduced above. 200 First, the spectraq o (x) are evaluated for discrete frequencies in a small sequence sample, which introduces 201 sampling corrections compared to model distributions derived for larger populations (we use a hat to mark 202 this difference; sampling corrections are detailed in Methods). Second, model spectra are directed from 203 the ancestral allele at the origination time of the mutation. A substitution between the ingroup and the 204 outgroup species reverses the role of the ancestral and allele mutant at part of the sequence sites. In a 205 sequence class with a given density d of substitutions, the ancestor-directed and the outgroup-directed 206 spectra are related by a linear map, ; the same map relates the sample 207 spectraq o (x) andq(x). Hence, outgroup-directed spectra have a primary branch with a maximum at low 208 frequency and a secondary branch with a maximum at high frequency.

209
In Fig. 3ab, we plot the sample spectra of 4-fold synonymous sequence sites,q o s (x), for three representa-210 tive bins of high, intermediate, and low recombination rates in the American and the African population. 211 These spectra have a striking common pattern. Across high and intermediate recombination rates, they 212 follow almost perfectly the standard Kimura inverse-frequency form, , which 213 appears as straight lines over most of the frequency range in the double-logarithmic plots of Fig. 3ab. 214 This form indicates that the dominant evolutionary force acting on synonymous genetic mutations at high 215 and intermediate recombination rates is genetic drift. It shows that the average selection at synonymous 216 sites is weak, making this class a good approximation of neutrally evolving sequence. It also excludes 217 strong demographic effects affecting the spectral form at intermediate frequencies (notwithstanding small 218 differences at low frequencies between the American and the African population, which reflect differences 219 in their recent demography [50, 51]). Strong selective sweeps are known to deplete the density and to 220 distort the spectrum of synonymous mutations in the local vicinity of the positively selected site [29,52]; 221 however, the rate of these sweeps is low enough not to affect the aggregate spectra (Fig. 3ab). In con-222 trast, the spectra at low recombination rates show a depletion of intermediate and high frequencies. This 223 depletion signals genetic draft on synonymous mutations, which we attribute interference selection. The 224 argument for interference selection will be completed below, where we show that the onset of the shape 225 distortion occurs at a value ω * ∼ 1 predicted by the interference scaling model and is accompanied by a 226 consistent ρ-dependence of the synonymous sequence diversity.

227
To map the transition point between local interference and interference condensate, we calibrate draft 228 model spectra for neutral sites,  (2) and (25). (c) Spectral shape of synonymous mutations. Inferred shape parameters ν from two populations are plotted against the scaled recombination rate ρ/µ. (d) Sequence diversity at synonymous sites, π s , plotted against the scaled recombination rate ρ/µ (dots). Data from two populations is shown together with theoretical predictions for the local interference regime (ω 1, black line) and the interference condensate (ω 1, green line). See equations (11) and (12). The data of (c,d) consistently maps the condensation transition to a threshold value ρ * /µ ∼ 1, as predicted by scaling theory.  (3) and (27). Our inference uses the shape parameter ν inferred from synonymous mutations (Fig. 3c); other inferred model parameters are reported in Supplementary Table S2. consistent Bayesian inference scheme that includes sampling effects and the map from outgroup-directed 231 to ancestor-directed spectra (Methods). This scheme provides maximum-likelihood values and confidence 232 intervals of the the shape parameter ν (Fig. 3c) and of the mutation density θ s (shown below in Fig. 5a) in 233 each recombination bin, without additional fit parameters. The synonymous spectral data signal an onset 234 of interference selection at a threshold recombination rate ρ * ≈ µ, corresponding to a shape parameter or 235 rescaled draft rate ν = 1. Regions with ρ ρ * are inferred to be in the local interference regime, regions 236 with ρ ρ * are in the interference condensate. This regime is characterized by moderate interference 237 with values of the shape parameter in the range ν 2. To obtain some insight on the selective effects 238 causing interference, we infer "corrected", ancestor-directed sample spectraq s (x) by an inverse linear map 239 (Methods). The low-recombination spectraq s (x) are monotonic and well approximated by the spectral 240 functions Q 0 (x; ν) of the draft model, but they do not show the minimum at x = 1/2 characteristic of 241 the travelling-wave model (Supplementary Figure S2). This suggests the underlying interference selection 242 includes drivers under substantial selection and is at some distance from the travelling-wave regime of 243 multiple mutations with individually small effects.

244
The partitioning of the Drosophila genome in local interference regions and interference condensate 245 regions can be consistently traced in all sequence categories (Fig. 4, Supplementary Figure S3). At low 246 recombination rates, all of these site frequency spectra show qualitatively the same depletion of interme-247 diate and high frequencies that is characteristic of the condensate regime. Specifically for nonsynonymous 248 mutations, we calibrate a two-component model to the empirical spectraq a (x). Here Q(x; ζ, ν) is a spectral function for sequence sites with mean selection 250 coefficient s a that are subject to genetic draft with rate σ. This function contains branches Q ± (x; ζ a , ν) = 251 e (±ζ−ν)x that correspond to beneficial and deleterious mutations, respectively, and depend on the rescaled 252 selection coefficient ζ (Methods). We obtain maximum-likelihood model parameters θ a , θ a , ζ a in each 253 recombination bin, using an extended Bayesian inference scheme. This scheme includes a model for 254 cross-species evolution under selection and the resulting, more complex linear map between ancestor-255 directed and outgroup-directed spectra (Methods). Remarkably, the spectra for nonsynymous mutations 256 ( Fig. 4) can be explained across all recombination classes by the two-component model (3) with the 257 same shape parameter as inferred for synonymous sites (Fig. 3c) and, hence, with the same threshold 258 rate ρ * . The maximum-likelihood model includes near-neutral sites with spectral shape Q 0 (x, ν), which 259 is of Kimura form for ρ > ρ * , as well as moderately selected sites with spectrum Q(x; ζ a , ν) and mean 260 effect of order ζ a ∼ 20, which produce excess frequency counts in the range x 0.1 (see dashed vs. 261 solid lines in Fig. 4). These excess counts cannot be explained by demographic factors, because they 262 are common to both populations and no comparable excess is observed at synonymous sites. Across 263 all recombination rates, the inferred mutation densities θ a and θ a are much lower than the density θ s at 264 synonymous sites (Fig. 5a), indicating that a large fraction of amino acid changes is under strong selection 265 and hence, suppressed in the frequency range of our spectra. The inferred selective effects of amino acid 266 changes that do appear in our spectra are consistent with the expected fitness landscape of proteins. 267 Important molecular phenotypes of proteins, such as fold stability or enzymatic activity, are quantitative 268 traits encoded by multiple sequence sites. Such traits generically contain weakly and moderately selected 269 constitutive sites, even if the trait itself is under strong stabilizing selection [53]. The maximum-likelihood 270 selection coefficients ζ a (Supplementary Table S2) are just one order of magnitude higher than the scaled 271 draft rate ν in the lowest recombination classes. This suggests that a fraction of nonsynonymous sites is 272 affected by genetic draft in the condensate regime; this point is discussed further below.

273
Predicting the condensation transition in Drosophila

274
The threshold recombination rate marking the onset of interference selection is numerically close to the 275 point mutation rate in the Drosophila genome, µ = 2.8 × 10 −9 per generation and per unit sequence 276 [54] (see Fig. 3b). In light of our scaling theory, this is hardly surprising: the interference density ω, 277 which determines the transition between local interference and interference condensate, is determined by 278 a balance between the rates of local mutations and recombination. We now combine the scaling theory, 279 our evolutionary model, and our Bayesian inference scheme to independently predict the transition point 280 ρ * , as well as the behavior of the sequence diversity in both interference regimes, solely from genomic 281 data in the high-recombination regime. This serves as a stringent consistency test for the scaling theory 282 and provides additional evidence for our inference of interference selection in the Drosophila genome. 283 First, we estimate the rate of deleterious mutations in protein-coding sequence from the reduction 284 in mutation density of amino acid changes compared to synonymous changes, u d /µ = α d , where α d = 285 1 − (θ a /θ s ) is the fraction of amino acid mutations that are deleterious. Moderately deleterious and 286 strongly deleterious amino acid changes contribute partial fractions θ a /θ s and (θ s −θ a −θ a )/θ s , respectively. 287 Second, we estimate the rate of beneficial substitutions in a similar way from the excess of amino acid 288 substitutions compared to the number expected in the near- is the fraction of amino acid substitutions that are beneficial. In Methods, we 290 derive these expressions from our evolutionary model and show how they can consistently be extrapolated 291 into the condensate regime. The expression for α b resembles a McDonald-Kreitman test [55,56], but our 292 mixed model (3) affords an improved estimate of the mutation density θ a by discounting moderately 293 deleterious mutations. Equation (1) then gives a simple estimate of the interference density from genomic 294 summary data, In Fig. 5ab, we collect the relevant data of the Drosophila genome in all recombination classes: the 296 maximum-likelihood mutation densities θ s and θ a inferred from spectral data, and the corresponding 297 sequence divergence levels d s and d a , defined as the number of substitutions between each D. melanogaster 298 population and the D. simulans reference genome. From these data, we infer ρ-dependent fractions α d 299 and α b (Fig. 5cd) and the resulting interference density ω (Fig. 5e).

300
For ρ > ρ * , we consistently find a deleterious mutation rate u d ≈ 0.9µ and a beneficial substitution 301 θ s (synonymous mutations) θ a (nonsynonymous mutations, neutral) θ' a (nonsynonymous mutations, weakly sel.) mutation density θ rate v b ≈ 0.1µ that are approximately independent of ρ (Fig. 5cd). Hence, the local interference regime 302 has an approximately constant density of interference domains and an interference density that is inversely 303 proportional to the recombination rate, ω ∼ 1.0 µ/ρ. As inferred above, the Drosophila genome includes a 304 sizeable fraction of moderately selected sites and genome-wide positive selection is not dominated solely by 305 strong selective sweeps; these characteristics suggest the minimal scaling theory in terms of the interference 306 density is applicable. This theory makes three quantitative predictions:

307
(a) In the local interference regime, the mutation densities θ s and θ a , as well as the sequence diversity 308 are proportional to the probability of no interference, p 0 = e −ω ≈ e −µ/ρ . This formula generalizes 309 the standard model of background selection, which predicts the size of the error-free sequence class 310 to depend exponentially on the rate of deleterious mutations [14]. Indeed, the observed ρ-dependence 311 of the sequence diversity at synonymous sites is in good agreement with this theory in the local 312 interference regime, π s ∼ e −µ/ρ (Fig. 3d, see definition in Methods), in broad agreement with previous 313 observations [57,48].

314
(b) The transition to the interference condensate occurs at a threshold interference density ω * of order 1. 315 This determines the threshold recombination rate ρ * ∼ µ (Fig. 5e), in agreement with the observed 316 onset of interference selection (Fig. 3c).

317
(c) In the interference condensate, the sequence diversity depends only weakly on the recombination rate. 318 This dependence can be derived from a simple scaling argument based on extremal value statistics: 319 an expected number ω/ω * of beneficial mutations with average selection coefficients originate in each 320 interference domain, but only the fittest of these mutants reaches fixation. This determines the draft 321 rate σ ≈s(1 + log(ω/ω * )), which sets the sequence diversity π s 2µ/σ [6,19,27]. With condensate 322 interference densities bounded in the range ω ≈ (0.85 − 1.0)µ/ρ (Fig. 5e), we obtain the leading 323 ρ-dependence π s ∼ (1 + log(ρ * /ρ)) −1 , which is in agreement with the observed pattern (Fig. 3d). Our 324 scaling argument is consistent with the scaling of σ in the numerical simulations (Fig. 2c) and with 325 previous results for evolution solely under beneficial mutations [26]. In a state of stationary fitness, 326 however, beneficial substitutions are a generic feature of the condensate regime. Even in the absence 327 of adaptation, they compensate the fixation of deleterious mutations fixed by interference selection 328 [44, 58]. Below, we will discuss the likely loci of these dynamics in the Drosophila genome.

329
Taken together, genetic variation in Drosophila is in remarkable quantitive agreement with our in-330 terference scaling theory over two decades of recombination rates. Fig. 6 charts the local interference 331 density ω in the autosomes of D. melanogaster, the high-resolution recombination map of ref.
[31] and 332 our genomic inference. Extended condensate regions, shown in orange, are located primarily adjacent to 333 the centromere regions and, to a lesser extent, to the telomeres. The major part of condensate sequence 334 maintains a residual level of recombination, corresponding to interference densities in the range 1 < ω ≤ 4. 335 The remaining 9% of the autosomal genome consists of 38 contiguous segments with no recorded recom-336 bination (ω > 4); these segments have an average length of 0.2 Mb and a maximum length of 0.9Mb. We 337 now turn to inferring key genomic and evolutionary features of the interference condensate. Although the condensate is a complicated regime of strongly correlated mutations, it has remarkably 340 simple emergent scaling properties. Because interference domains in the condensate are densely packed, 341 the draft rate σ becomes similar to the neutral coalescence rate,σ, which is also the scale of fitness 342 differences between competing clades. The emergence of a characteristic scale of genetic turnover is a 343 common feature of models of asexual evolution [12,59]. Under finite recombination, the rateσ sets the 344 genomic correlation length ξ =σ/ρ; coexisting mutations at a distance r ξ are likely to retain their 345 genetic linkage over a mean coalescence time interval τ = 1/σ (Fig. 1b). We can estimate the coalescence 346 linkage correlation length, ξ, plotted against the scaled recombination rate ρ/µ. In the condensate regime, these scales determine the characteristic shape of interference domains (Fig. 1b). See equations (5) and (21). (c) Fitness variance per site, as given by equation (6). This quantity displays a drastically reduced efficacy of selection in the condensate, consistent with the reduced fraction of adaptive substitutions (Fig. 5d).
rate from the inferred values of draft rate and synonymous mutation density [12],σ = σ + 2µ/θ s , or 347 equivalently from the neutral sequence diversity π = 2µ/σ (Methods). Together, we obtain the simple 348 estimates which determine the universal shape of interference domains in the condensate regime (Fig. 1b). In 350 coalescent models under selection, the same scalingσ ∼ µ/π links the coalescence rate to the neutral 351 sequence diversity, in some cases with logarithmic corrections [19,27]. In the condensate regime, we find 352 coalescence times τ about an order of magnitude lower than at high recombination rates (Fig. 7a) and 353 genomic correlations up to ξ 10 4 base pairs (Fig. 7b), signalling that neighboring genes are often in 354 common interference domains.

355
Speed and cost of evolution in the condensate 356 The most important asexual feature of the interference condensate is the drastically reduced efficacy 357 of selection. In the Drosophila genome, we can quantify this effect in two ways. First, the fraction 358 of beneficial substitutions, α b , which takes stable values of about 50% in the local interference regime, 359 sharply drops in the condensate to below 10% in the lowest recombination class (Fig. 5d). Second, the 360 fitness variance per unit sequence of the condensate is related to the neutral sequence diversity, (Methods). The ρ-dependent values of ς 2 inferred from the synonymous sequence diversity π s (Fig. 7c) 362 show a sharp drop in the efficacy of selection within the condensate regime; the fitness variance in the 363 lowest recombination class is by a factor 10 lower than at the transition point ρ * . The strong dependence of 364 the fitness variance on the recombination rate is in tune with the simulation results (Fig. 2a). We conclude 365 that interference selection curbs rate and selective effects of adaptive evolution in the condensate regions 366 of the Drosophila genome.

367
The reduced efficacy of selection has an immediate consequence for genome functionality in the conden-368 sate regime: interference selection generates emergent neutrality of sequence sites with selection coefficients 369 s σ; these sites become disfunctional because their alleles are randomized by interference selection [12]. 370 We can estimate the resulting fitness cost (or genetic load) for a protein, ∆F = ( /2) σ 0 s f (s) ds, where 371 f (s) is the distribution of selection coefficients and is the length of the protein. This cost increases with 372 decreasing ρ, because σ as inferred through the shape parameter increases (Fig. 3c). Emergent neutrality 373 says that the genetic load in the two modes differs not only in magnitude, but qualitatively. In the local 374 interference regime, a genetic locus under moderate selection (s > 1/2N ) incurs classical mutational load, 375 where the beneficial allele is always prevalent and only a small minority fraction of the population, of 376 average µ/s, carries the deleterious allele. In the condensate, deleterious and compensatory beneficial 377 substitutions generate a new equilibrium in which the deleterious allele becomes dominant in the popu-378 lation with probability 1/(1 + e s/σ ) [12]. Hence, the functional impact of moderately deleterious alleles 379 (s < σ) becomes important. Emergent neutrality likely affects part of the nonsynonymous sites in the 380 moderate selection class (Fig. 4), as well as intron, UTR, and synonymous sites under selection for codon 381 usage. Assuming that just a few percent of these sites become effectively neutral, the above estimate 382 predicts a substantial scaled fitness cost 2N ∆F ∼ 10 − 100 per gene, even if the effect of each individual 383 site is weak. This fitness cost is specific to genes in the condensate regime; its likely consequences for 384 genome architecture are discussed below.

386
The main method development of this paper is a unified scaling theory of genetic draft and background 387 selection. This theory identifies a dominant scaling variable, the interference density ω = (u d + v b )/ρ, to 388 discriminate between two evolutionary modes: the local interference regime (ω ω * ) and the interference 389 condensate (ω ω * ). In the local interference mode, mutations evolve in an approximately independent 390 way by selection and genetic drift; in the condensate, they are locked into clades of genetically linked 391 sequence segments and many are governed by linkage. This mode requires a sufficiently high supply 392 of mutations under substantial (beneficial or deleterious) selection but is insensitive to details of the 393 evolutionary process -in particular, to the rate of adaptation. Over a broad range of evolutionary 394 parameters, the transition point ω * between local interference and condensate is of order 1. The frequency 395 spectrum of neutral mutations can be used as a marker of the evolutionary mode: the "convective" 396 frequency evolution in the condensate regime is signalled by a characteristic depletion of intermediate and 397 high frequencies.

398
In the Drosophila genome, we build a case for the joint presence of these evolutionary modes from a 399 number of mutually consistent observations from sequence data. We infer the rates of deleterious amino 400 acid changes, u d , and the rate of beneficial substitutions, v b , in protein coding sequence. Given these 401 selective building blocks of the interference density ω, our scaling theory predicts how genetic variation 402 depends on recombination: the sequence diversity varies strongly in the local interference regime, π ∼ e −ω , 403 and weakly in the condensate, π ∼ (1 + log(ω/ω * )) −1 ; the transition point between these regimes, ω * ∼ 1, 404 marks the onset of genetic draft. Together, amplitude and shape of mutational spectra change in a 405 concerted way. These predictions are in agreement with direct genomic data of synonymous mutations. 406 While any single characteristic of genetic variation could be explained by alternative evolutionary sce-407 narios, the consistent joint pattern of diversity and spectral shape over the entire range of recombination 408 rates provides strong evidence for an interference condensate in the Drosophila genome (Fig. 3cd). 409 Our results suggest that the established rationale of a strong evolutionary advantage of sex applies 410 to about 80% of the Drosophila genes, which are in the local interference regime. The other 20%, 411 some 3000 genes in the interference condensate, show evolutionary similarities with asexual systems. 412 In the condensate regions, we infer a significantly lower fitness variance per unit sequence, indicating 413 reduced evolvability in response to adaptive pressure. This may signal that condensate genes respond less 414 efficiently to existing positive selection for change or that they are subject to less selection for change in 415 the first place. We also infer a significantly increased fitness cost (genetic load) concentrated in weakly and 416 moderately selected sequence sites, whose alleles are randomized by emergent neutrality [12]. This finding 417 suggests that the evolutionary partitioning of the Drosophila genome is also a functional partitioning. We 418 hypothesize that condensate genes have systematically lower intrinsic fold stability than other genes. They 419 should also have reduced codon usage bias, which may affect speed and efficiency of translation and, hence, 420 increase the cost of protein expression. These hypotheses on the functional impact of interference selection 421 can be tested by experiment and by targeted sequence analysis.

422
A salient feature of Drosophila is that both evolutionary modes coexist in one genome. This implies 423 that functional and fitness differences between condensate genes and other genes play out in the same 424 individual, the same environment, and the same population. Over macro-evolutionary time scales, these 425 differences can generate feedback effects on genome architecture. First, we expect selection against too 426 long recombination coldspots. This is qualitatively in line with observations: in 91% of the autosomes, the 427 Drosophila genome maintains a residual level of recombination, keeping interference selection capped to 428 a moderate level (ω ≤ 4); the remaining zero-recombination sequence is fragmented into short contiguous 429 segments (Fig. 6). Second, a given gene incurs a fitness cost that depends on its target function and 430 on the interference regime it is placed in. Therefore, genes with high requirements on protein stability 431 or translation efficiency should be suppressed in the condensate. Whether there are differences in gene 432 content and gene functions between the local interference regime and the condensate that can be explained 433 as a consequence of differences in interference selection is an interesting question for future research.

435
Scaling theory 436 The heuristic scaling approach used in this paper is based on three main ingredients: (i) In the local 437 interference regime, the behavior of an evolutionary observable can be calculated approximately from 438 single-site population genetics. (ii) The crossover to the interference condensate regime can be described 439 by a scaling function that depends only on the variable ω given by equation (1). Here and in the following, 440 crossover is used as a technical term of scaling theory that is not to be confused with the genetics term. 441 (iii) In the condensate, evolutionary observables follow broad heuristic constraints, and there is a matching 442 condition between both scaling regimes at the crossover point ω * . We consider long genomic segments that 443 evolve under limited recombination; individual sequence sites have a distribution of selection coefficients 444 with 2Ns 1, wheres is the average selection coefficient and N is the effective population size. For 445 simplicity, we neglect prefactors of order 1 and corrections to scaling, which often depend on more specific 446 model assumptions. Our analysis in the main text builds on a minimal model with the following scaling 447 relations: 448 (a) The average fitness variance per unit sequence, ς 2 , takes the form where σ is a characteristic selection strength in the condensate regime. The local interference ex-450 pression ς 2 (u d + v b )s follows from direct calculation by single-site population genetics, assuming 451 statistical independence of selected alleles at different sites. The leading condensate asymptotics 452 ς 2 ρσ is then already determined by the scaling properties (ii) and (iii). Specifically, we evaluate 453 the matching condition σ(ω * ) =s at the crossover point ω * ∼ 1 of the minimal scaling theory with 454 the requirement that ς 2 depends only weakly on u d and v b in the condensate. This is expected from 455 the jamming of genomic space-time shown in Fig. 1b and implies that the recombination rate becomes 456 a limiting factor of ς 2 in the condensate. The scaling argument given in the main text suggests a 457 specific functional form in the condensate, which is given by 458 σ s (1 + log(ω/ω * )); see also ref. [26]. Equation (7) can be rescaled to a dimensionless form, which is confirmed by our simulation results (Fig. 1c). Equations (7) and (9) and, in particular, the 460 minimal crossover scaling ω * ∼ 1 are consistent with previous results for evolution under beneficial 461 mutations [26] and under background selection including moderate effects (log(2N s) ∼ 1) [17, 60]. 462 Strong heterogeneities in the effect distribution or background selection by strongly deleterious mu-463 tations generate systematic shifts of the crossover point ω * ; the corresponding extensions of scaling 464 theory are discussed below.
Equations (7) - (11) show that π measures key characteristics of the condensate regime, the spacetime 477 scaling (equation (5)) and the fitness variance per unit sequence (equation (6)). In the main text, we 478 use the sample sequence diversity at synonymous sites, π s = 2 n k=0 (k/n)(1 − k/n)q o s (k/n) to infer these 479 characteristics in the Drosophila genome. As discussed in the main text, the Drosophila genome has 480 a distribution of selective effects that includes sites with weak and moderate selection, for which the 481 minimal scaling theory with ω * ∼ 1 should be applicable. We find clear evidence that π s follows the 482 scaling behavior predicted by equations (11) and (12); see Fig. 3d.

483
Extensions of scaling theory 484 Evolution by beneficial and deleterious mutations is a complex process whose details depend on their rates 485 and effect distribution. The minimal scaling theory is a coarse approximation of this process. It provides 486 useful approximations of genomic statistics over a wide range of evolutionary parameters, which includes 487 settings appropriate for Drosophila. Here we discuss two extensions that serve to link our scaling theory 488 to existing evolutionary models and to delineate the range of validity of the minimal model.

489
(a) Background selection with strong effects. Equation (11) predicts the onset of interference selection 490 for sites of selection coeffcient s at a characteristic value 491 ω * = log(N s).
This expression is consistent with known results of background selection theory [15,20,27,43]. If 492 background selection involves only strongly selected sites (2N s  1), we obtain a shift of the crossover 493 point ω * observed in aggregate data to values above 1 (Supplementary Figure S1a). The crossover 494 point is still marked by the onset of interference selection on neutral sites and the resulting spectral 495 shape distortion. This regime is not relevant for Drosophila, where we observe ω * ∼ 1 and consistently, 496 genomic sites under weak and moderate selection. . Under this process, a genomic focal site is subject to linked sweeps at a rate 500 Focal sites of selection coefficient s < s b are strongly affected by 501 interference for σ sweep > s, which sets the crossover point to interference selection, In the special case of equal selection coefficients at all sites (s = s b ), the crossover point is again 503 ω * = 1, independently of s b [26,61] (Supplementary Figure S1b). The onset of interference on 504 neutral sites and the resulting spectral shape distortion occur at a value ω 0 = 1/(2N s b ) = ω * /(2N s), 505 which is smaller than ω * . This regime is not observed in Drosophila; strong sweeps are too rare to 506 distort neutral spectra for ω < 1.

507
Link to asexual evolution 508 An explicit expression for σ can be obtained if we identify the total fitness variance per correlation interval, 509 ξς 2 = σ 2 , with the corresponding quantity in models of asexual populations with a genome of length ξ [26]. . This form has a leading large-ξ asymptotics 513 consistent with our scaling argument given in the main text, σ s log ξ s log(ρ * /ρ). In ref.
[27], an 514 analogous identification is discussed for evolutionary processes dominated by weakly selected alleles.

515
The theoretical limit of strictly asexual evolution, which is reached at very low recombination rates 516 ρ σ/L in a genome of length L, can be described in terms of our scaling theory by substituting L for 517 the correlation length ξ = σ/ρ resp. ξ s = s/ρ. In this limit, the interference density (1) for mutations of 518 effect s takes the form which depends on the genome-wide rates U d = Lu d and V b = Lv b . For V b = 0, the identity (13) for ω * 520 becomes the well-known criterion for the onset of Muller's ratchet in asexual populations, U d /s ∼ log(N s) 521 [42,62,43,45].

522
Evolutionary model: mutation frequency distributions 523 As explained in the main text, we use a specific model of interference selection to parametrize site 524 frequency spectra: individual sites evolve under mutations (with rate µ), selection (with site selection 525 coefficient s > 0), genetic drift (in a population of effective size N ), and periodically recurrent genetic 526 draft (with rate σ). The draft model generates site frequency spectra that can be estimated analytically 527 by a saddle-point approximation to the path integral of mutation frequency paths [63]. For beneficial 528 mutations (of selection coefficient s) and deleterious mutations (of selection coefficient −s) at two-allelic 529 sites, we obtain the frequency distributions respectively; these distributions depend on the mutation density θ = θ 0 e −ω = µN e −ω 1, the scaled 531 draft rate ν = 2N σθ/θ 0 , and the scaled site selection coefficient ζ = 2N sθ/θ 0 . The function q 0 (x) = 532 x −1+θ (1 − x) −1+θ denotes the neutral spectrum under genetic drift. The no-sweep probability p(τ, σ) 533 over a time interval τ is assumed to be strongly suppressed for τ 1/σ. The exponential weight involves 534 the maximum-likelihood frequency path with effective selection coefficients, which is denoted by x(t,s). 535 This path follows the equation of motionẋ(t,s) =sg(x(t,s)) with g(x) = x(1 − x) and has a sojourn 536 time τ (s, x) up to frequency x. The prefactor Z ± (θ, ζ, ν) ensures the normalization 1 0 q(x) dx = 1. An 537 approximate evaluation of the integral in equation (16) results in the remarkably simple spectral function 538 This function consistently interpolates between the asymptotic regimes of effectively neutral mutations 539 (ζ ν, i.e.,s σ), which are dominated by genetic draft (Q e −νx /x), and strongly selected mutations 540 (ζ ν, i.e.,s s), which evolve in an autonomous way (Q e ±ζx /x). For the spectral function of 541 neutral sites, we use the shorthand The family of spectral functions (18) provides a good parametrization of the spectral data in our simula-543 tions (Fig. 1d), as well as in all sequence classes and recombination rate classes of Drosophila (Fig. 3ab,  The spectral functions of the draft model map the crossover from drift-dominated to draft-dominated 547 evolution in analytical form. The neutral site frequency spectrum (2) determines the sequence diversity 548 which is consistent with the scaling behavior (11). In the local interference regime π θ = µN e −ω ; 549 i.e., background selection reduces diversity but does not affect the shape of the frequency spectrum. In 550 the condensate regime, π 2µ/σ < θ, i.e., diversity and spectral shape are determined by interference 551 selection [6]. These features hold for broad classes of interference selection [46], making the spectral 552 functions a convenient choice for parametrizing the Drosophila site frequency spectra. Specifically, we use 553 the condition ν > 1 on the shape parameter inferred from the spectrum of synonymous sequence sites as 554 a marker of the interference condensate regime. The draft model also serves to parametrize the sequence evolution between the ingroup species D. 557 melanogaster and the outgroup species D. simulans. In this model, allele substitutions at individual 558 sites take place with Kimura-Ohta rates that depend on the local coalescence rate (or inverse effective 559 population size) (ω 1, local interference) σ (ω 1, interference condensate). (21) Consistently with equations (11) and (20), the coalescence rate maps again the crossover between local 561 interference and condensate regime. The beneficial and deleterious substitution rates depend on the scaled selection coefficient ζ and the scaled coalescence rateν ≡ 2Nσθ/θ 0 = ν + 1. Models 563 of this form have been shown to provide an excellent approximation to the equilibrium substitution 564 dynamics in linked genomes under different scenarios of interference selection [12,58]. The rates (22) 565 consistently determine the equilibrium occupancy probabilities of beneficial and deleterious alleles, as well as the expected sequence divergence between in-and outgroup species, where τ d is the divergence time and d 0 = µτ d the expected divergence at neutral sites.

568
Ancestor-directed, outgroup-directed, and corrected frequency spectra 569 As explained in the main text, the evolutionary model specified by equations (22) -(24) serves to relate 570 the outgroup-directed frequency spectra q o (x) and the basic frequency distributions of the draft model, 571 q ± (x) (equation 17) without additional fitting parameters. Specifically, the ancestor-directed spectrum 572 q s (x; θ, ν) at synonymous sites, which is of the form (2), determines the outgroup-directed spectrum with the spectral function Q 0 (x; ν) given by equation (19). In other sequence classes, we use ancestor-574 directed spectra of the form (3), 575 q(x; θ, θ , ζ, ν) = θ Q 0 (x; ν) + θ Q(x; ζ, ν) with the spectral functions Q ± (x; ζ, ν) given by equation (18) and the allele occupancy probabilities 576 λ ± (ζ, ν) given by equation (23). These determine the outgroup-directed counterparts We can reconstruct the synonymous spectrum q s (x) from q o s (x) by inverting the linear map (25), Applying this transformation to the outgroup-polarized spectral data of synonymous mutations,q o s (x) 579 (Fig. 3ab), produces the corrected spectral dataq s (x) shown in Supplementary Figure S2. These provide 580 a bona fide improved approximation to the underlying spectrum q s (x). However, the reconstruction 581 becomes noisy in the limit x → 1, whereq o s (x) is dominated by the componentq s (1 − x).
582 Bayesian estimation of model parameters 583 Consider a sequence class with population frequency spectrum q(x; θ, θ , ζ, ν) given by a two-component 584 model of the form (3); the associated outgroup-polarized spectrum q o (x; θ, θ , ζ, ν) is given by equa-585 tion (27). In that class, a sample of n random individuals contains mutations of discrete outgroup-586 polarized frequency x = k/n with probability (see also ref. [64]) 587 q o (k/n; θ, θ , ζ, ν) = n k this expression yields closed analytical expressions involving hypergeometric and Gamma functions.

588
By calibrating the model distributionsq(k/n; θ, θ , ζ, ν) with observed site frequency spectraq o (k/n) 589 and divergence data, we can infer parameters of the model (25) for synonymous sites and of the mixed 590 model (27) for other sequence classes. Our inference is based on total log likelihood score of the observed 591 frequency counts in a given sequence class, 592 S(θ, θ , ζ, ν) = L n k=0q o (k/n) logq o (k/n; θ, θ , ζ, ν), where L is the total number of sequence sites in the class. We have developed a consistent Bayesian 593 inference scheme that takes into account the allele occupancy (23), the evolutionary dynamics (24), and 594 the sampling statistics (29). This scheme proceeds in a hierarchical way: we first determine a posterior 595 distribution of parameters (θ s , ν) for synonymous sites, using the single-component model (25). Then we 596 obtain the posterior distribution of parameters (θ a , θ a , ζ a , ν) for amino-acid changes and the analogous 597 distributions for other sequence classes, using the mixed model (3) with the same value of ν as for 598 synonymous sites (this constraint does not induce a significant drop in likelihood score). Our inference 599 scheme is implemented in a software called "hfit" https://github.com/stschiff/hfit using special 600 functions and numerical optimization routines from the Gnu Scientific Library http://www.gnu.org/ 601 software/gsl/, and a custom MCMC algorithm to obtain Maximum Likelihood estimates and confidence 602 intervals for all parameters.

603
The Bayesian inference scheme, together with the substitution model given by equations (22) -(24), 604 allows a direct estimate of the rates u d , v b and of the interference density ω from observed frequency spectra 605 and substitutions at synonymous and non-synonymous sites. First, the rate of deleterious mutations in a 606 given sequence class is simply u d = µλ + θ/θ s . Equation (27) then determines the total rate of deleterious 607 nonsynonymous mutations, which is the sum of contributions from moderately deleterious changes and from strongly deleterious 609 changes. Second, the rate of adaptive amino acid substitutions is given by the excess of nonsynonymous 610 divergence compared to the expectation from the equilibrium model (24), Here we have treated synonymous mutations as (approximately) neutral. In the local interference regime, 612 we have ζ a 1 and, hence, λ + (ζ a , ν) ≈ 1 and d(ζ a , ν) ≈ 0. Equations (31) and (32) then reduce to 613 the expressions given in the main text, u d /µ = α d with α d = 1 − (θ a /θ s ) and v b /µ = α b (d a /d s ) with 614 α b = 1 − (d s /d a )(θ a /θ s ). These expressions are evaluated using measured divergence data d s , d a and 615 maximum-likelihood spectral parameters θ s , θ a . They enter equation (4) for the interference density ω, 616 which serves to estimate the threshold recombination rate ρ * from the condition ω * = 1. To estimate the 617 fraction of adaptive substitutions, α b , in the condensate regime (Fig. 5d), we use the full expression (32). 618 Genomic data and sequence annotation 619 We downloaded the complete genome sequences of 168 lines from the Drosophila Melanogaster Reference 620 Panel (DGRP) from the DGRP website http://dgrp.gnets.ncsu.edu and of 27 lines sampled from 621 Rwanda from the Drosophila Population Genomics Project http://dpgp.org as fasta files. We down-622 loaded the reference sequences from Drosophila simulans and from Drosophila yakuba, aligned to the refer-623 ence sequence of Drosophila melanogaster from the UCSC genome browser (https://genome.ucsc.edu). 624 For both outgroups, we compute outgroup-directed allele frequencies at all sites at which (i) there is a 625 valid outgroup allele, and (ii) at least 150 lines of the DGRP sequences or 25 lines of the DPGP sequences 626 have a called allele. We then downsample all sites to 25 called alleles, using random sampling without 627 replacement (hypergeometric sampling). 628 We downloaded gene annotations from flybase [43]. We define annotation categories as follows. In-629 tergenic: intergenic regions that are at least 5kb away from genes, Intron: introns of protein-coding 630 genes, UTR: untranslated regions in exons, Synonymous: protein-coding sites of the reference genome 631 at which none of the three possible point mutation changes the encoded amino acid, Nonsynonymous: 632 protein-coding sites on the reference at which any of the three possible point mutations changes the en-633 coded amino acid. Most genes have multiple associated transcripts due to alternative splicing. We choose 634 the transcript corresponding to the longest encoded protein coding sequence for each gene and annotated 635 introns, UTRs, synonymous and nonsynonymous sites according to that transcript. See Supplementary 636 Table S1 for the number of sites in a given annotation category on the different chromosomes.

637
Maps of mean recombination rates within 100kb windows were obtained from Comeron et al.
[31] 638 through the website http://www.recombinome.com. We use the recombination map to annotate every 639 site in the Drosophila genome. We then use only synonymous sites on the autosomes (2L, 2R, 3L and 640 3R) and define quantile boundaries on this set. Specifically, we sort all recombination rate values of this 641 set of sites and determine recombination rate bins by dividing the data set into 21 equally large subsets 642 of values. We then use these quantile boundaries to bin all sites (not just synonymous sites) into bins 643 according to their local recombination rate. The quantile boundaries used in this study for autosomal 644  Simulations of evolutionary processes in recombining populations 648 We use the SLiM simulator [65] to simulate a population of sequences evolving under mutations, drift, 649 selection, and recombination; the genome of each individual has 100, 000 sites. To mimic the Drosophila 650 phylogeny, we start from a single population of size N = 1000 that evolves for 10, 000 generations, then 651 splits into ingroup and outgroup populations of size N = 1000; these evolve in isolation for another 10, 000 652 generations. Finally, we sample one individual from the outgroup population and 20 individuals from the 653 ingroup population. 654 We consider three classes of mutations: neutral mutations, beneficial mutations and deleterious mu-655 tations, the latter two with fixed selection coefficient s ad = 0.01. The rate of neutral mutations is 656 µ = 1.5 × 10 −6 , the rate of beneficial mutations varies from u b = 0 to 2.5 × 10 −7 , and the rate of deleteri-657 ous mutations from u d = 0 to 3 × 10 −6 . We also run simulations with only one class of selected mutations 658 (i.e., u d = 0 or u b = 0). The recombination rate ρ varies in the range 10 −7 to 10 −4 . 659 We use these simulations to display the transition from local interference to the interference condensate 660 and to corroborate our scaling theory (Fig. 2). In particular, the simulations demonstrate that the 661 maximum-likelihood shape parameter ν * inferred from the spectral data of synonymous sequence sites 662 can serve as a faithful marker of the interference condensate regime.

664
We would like to thank P.W. Messer for comments on an earlier version of the manuscript. Tables   666   Supplementary Table S1: Drosophila site frequency spectra. The table lists the site frequency 667 spectrum data that we analyse here. The data is separated into five annotation classes and 21 recombi-668 nation bins.     Figure S3: Frequency spectra of mutations in UTR, introns and intergenic sequence. Outgroup-polarized sample spectraq o (x) from two populations of D. melanogaster, together with maximum-likelihood spectra q o (x). We use the shape parameter ν inferred from synonymous mutations (Fig. 3c) and a two-component model of the same form as for nonsynonymous mutations; see equations (3) and (27).