Nucleosome positioning stability is a significant modulator of germline mutation rate variation across the human genome

Nucleosome organization is suggested to affect local mutation rates in a genome. However, the lack of de novo mutation and high-resolution nucleosome data have limited investigation. Further, analyses using indirect mutation rate measurements have yielded contradictory and potentially confounded results. Combining >300,000 human de novo mutations with high-resolution nucleosome maps, we reveal substantially elevated mutation rates around translationally stable (‘strong’) nucleosomes. Translational stability is an under-appreciated nucleosomal property, with greater impact than better-known factors like occupancy and histone modifications. We show that the mutational mechanisms affected by strong nucleosomes are low-fidelity replication, insufficient mismatch repair and increased double-strand breaks. Strong nucleosomes preferentially locate within young SINE/LINE transposons; subject to increased mutation rates, transposons are then more rapidly inactivated. Depletion of strong nucleosomes in older transposons suggests frequent re-positioning during evolution, thus resolving a debate about selective pressure on nucleosome-positioning. The findings have important implications for human genetics and genome evolution.

Nucleosome organization is suggested to affect local mutation rates in a genome. 13 However, the lack of de novo mutation and high-resolution nucleosome data have 14 limited investigation. Further, analyses using indirect mutation rate measurements 15 have yielded contradictory and potentially confounded results. Combining >300,000 16 human de novo mutations with high-resolution nucleosome maps, we reveal 17 substantially elevated mutation rates around translationally stable ('strong') 18 nucleosomes. Translational stability is an under-appreciated nucleosomal property, 19 with greater impact than better-known factors like occupancy and histone 20 modifications. We show that the mutational mechanisms affected by strong 21 nucleosomes are low-fidelity replication, insufficient mismatch repair and increased  Here, we focus on the role of nucleosomes in modulating germline mutation rates. 61 Chromatin is considered important because structural constraints could affect the 62 mutability of genomic sequences (Makova and Hardison 2015). Nucleosome 63 organization (including positioning and occupancy) has been reported as a significant 2 Results 86

Datasets used for analysis 87
We used >300,000 human de novo single-nucleotide variants (SNVs) and >30,000 88 short insertions/deletions (INDELs), having removed genomic regions that could 89 confound downstream analysis (Fig. 1a, Supplementary Fig. 1a; see Methods). Using these data, we classified the genome into three groups of regions ( Fig. 1b; sex 106 chromosomes excluded): i) those containing translationally stable, 'strong', 107 nucleosomes (198Mb); ii) those with rotationally but not translationally stable 108 nucleosomes (796Mb); and iii) all other non-N base genomic regions (1,703Mb). 109 West et al. (2014) reported that with the exception of a few specific loci such as 110 transcription start sites, overall nucleosome positioning varies little between cell types. 111 None of the nucleosomal datasets were produced using germ cells, therefore as a 112 precaution we excluded nucleosomes that differ in positioning between cell types 113 (~23Mb; see Methods).   . 1c) and ~15% more de novo INDELs (Fig. 1d) than expected. Similar increases 129 are also apparent for extremely rare variants (Supplementary Fig. 1b,c), though 130 effect sizes are smaller than for de novo mutations, probably due to the fact that 131 highly mutable sites are under-represented among extremely rare variants (Harpak et 132 al. 2016). Restricting the analysis to strong nucleosomes, we found that those with 133 higher translational stability scores also exhibit higher mutation rates (Fig. 1c,d;  134 scores from Gaffney et al., 2012). These results suggest that translational stability is 135 associated with local variation in mutation rates across the genome, a previously 136 unappreciated aspect. Regions containing rotationally stable nucleosomes, in 137 contrast, are slightly depleted of both mutation types; we didn't perform further 138 analysis on this, as effect of rotational positioning has been comprehensively 139 discussed by Pich et al. (2018). A more detailed view with meta-profiles clearly 140 depicts increased SNV and reduced INDEL densities around dyad regions of strong 141 nucleosomes compared with flanking linker regions (Fig. 1e), in line with 142 observations made using polymorphism data (Tolstorukov et al. 2011). 143 Interestingly, ~80% of strong nucleosomes overlap with repeats ( Fig. 1f,  144 Supplementary Fig. 1d), especially SINE/Alu (~44%) and LINE/L1 elements (~26%). 145 Genetic variations in repeats are traditionally hard to detect because of poor 146 mappability and so analyses have tended to be cautious in calling variants, resulting 147 in many false negatives (though, few false positives; Lee and Schatz (2012)). 148 Therefore, the above observations probably underestimate the true enrichment of de 149 novo mutations in strong nucleosomes. We subdivided strong nucleosomes into 150 three groups: i) Alu-associated, ii) L1-associated and iii) others. Alu-associated 151 nucleosomes display increased SNV rates around the dyads, as seen in the 152 metaprofiles for all strong nucleosomes (Supplementary Fig. 1e), whereas non-Alu 153 nucleosomes show increased SNV rates ~60bp away from the dyads, close to the 154 nucleosome edges. Such differences may be due to the different local sequence 155 composition (discussed in next section). In contrast, the patterns of INDEL densities 156 are relatively similar among different groups (Supplementary Fig. 1e). 157

Controlling for potential confounding factors 158
Many factors are associated with mutation rate variation. One of the most important 159 is local sequence context -for example, CpG sites are known to be highly mutable 160 and CpG density profiles correlate well with mutation rate profiles in strong 161 nucleosomes (Supplementary Fig. 1e). Functional factors like DNA methylation, 162 rate are also relevant. Therefore, to systematically assess the contribution of 164 nucleosomes to mutation rate variation, we used a logistic regression framework to 165 control for potential confounding factors (Fig. 2). 166

176
We defined three variables to quantify nucleosomal properties relative to a specific 177 nucleotide position in the genome. Two relate to translational positioning: d mean , the 178 mean distance between the focal position and the midpoints of mapped MNase-seq 179 fragments (maximum distance of 95 bp) and d var , the variance of these distances (Fig.  180   2a). A smaller d mean means that a nucleotide position is closer to nucleosome dyads stable. As the relationship between between d mean and SNV rates is non-linear, we 183 defined d mean a categorical variable binned into five intervals (Methods; Fig. 1e, 184 Supplementary Fig. 1e). The third variable is nucleosome occupancy calculated as 185 a normalised per-base MNase-seq fragment coverage (see Methods). Other factors 186 considered are local nucleotide sequences (±5bp of the focal site) and functional 187 genomic measurements in human germ cells or other cell types if no available germ-188 cell data (see Methods). d var has a relatively weak but statistically significant 189 correlation with many of these factors, suggesting non-independence 190 ( Supplementary Fig. 2). 191 To assess the contribution of each factor to local mutation rates, we compared a full 192 Our statistical framework recapitulates reported observations (Fig. 2b, (measured by ATAC-seq) are also associated with many mutation types. 208 Transcription levels, however, lack any links with local mutation rates here. 209 Turning to nucleosomal properties, translational stability (d var ) is associated with 210 elevated mutation rates at A/T, non-CpG C/G and CpG sites, with the first two 211 showing the greatest effect sizes. INDELs also show similar effects, though the 212 higher p values compared with SNVs could partly be due to the smaller sample size. 213 Examining specific SNV mutation types, d var is significantly associated with all A/T 214 and C/G mutations ( Supplementary Fig. 3), except for CpG>TpG (adjusted p = 215 Table 1), indicating that translational stability is positively associated with mutation 218 rates thus corroborating the patterns observed in Fig. 1. As expected from Fig. 1, the 219 mean distance to dyads, d mean , also displays statistically significant associations with 220 mutations rates at A/T and C/G sites (Fig. 2b,c). Finally, nucleosome occupancy is 221 also statistically significant; in contrast to the positioning variables however, here the 222 effect is much larger for INDELs than SNVs (Fig. 2b,c; INDELs, adjusted p = 5.8e-37; 223 SNVs, adjusted p = 0.21, 1.6e-6 and 2.2e-7). The regression coefficients of 224 occupancy are negative for SNVs at A/T sites, but positive for SNVs at CpG sites 225 Table 1 Since many strong nucleosomes are associated with repeat elements, we added 237 repeat status as a predictor in the regression models (Methods). We still achieved 238 strong statistical significance for translational stability after considering repeat status 239 ( Supplementary Fig. 4c), suggesting that translational stability is independently 240 associated with mutation rate variation. We also tested repeat and non-repeat 241 regions separately, and in most tests (including those for non-repeat regions) 242 translational stability is a significant factor (Supplementary Fig. 4d). 243 Taken together, the logistic regression modeling analysis recapitulated known factors 244 and confirmed the independent contribution of nucleosome translational stability as a 245 new significant factor to local mutation rate variation. We first consider the relative frequencies of the 96 mutation types in the whole 259 genome and strong nucleosomes in different repeat contexts (Fig. 3a). The results 260 account for background differences in trinucleotide frequencies between these 261 regions (Methods). Several mutation types display distinct frequencies in strong 262 nucleosomes, suggesting differences in the underlying mutational processes. For with cancer genomes and so some germline mutational processes may not be well 286 represented. Nevertheless, our analysis identified several candidate mutational 287 processes associated with strong nucleosomes, such as the mutagenesis linked to 288 DNA mismatch repair (Signatures 6, 20 and 26) and DNA double-strand repair 289 (Signatures 3 and 5). Therefore, to gain deeper insights and to obtain independent 290 evidence for these mutational processes, we examined multiple published genomic 291 and functional genomic datasets below.  repeat-based subgroups of mutations (1.6%, 6.7% and 4.3% increase for 'Alu', 'L1' 312 and 'Others', respectively). 313 We analyzed somatic mutations from two sets of ultra-hypermutated cancer 314 genomes (Campbell et al. 2017). The first comprised genomes with driver mutations 315 in the POLE gene encoding the catalytic subunit of DNA polymerase ε (Pol ε, the 316 major replicase for the leading strand) and in one or more of the core MMR genes 317 (MLH1, MSH2, MSH6, PMS1 and PMS2). The second contained cancers with 318 mutated POLE but intact MMR. As it is even more challenging to detect somatic 319 mutations in tumor-derived data than re-sequencing of normal individuals, we 320 focused this analysis on strong nucleosomes found in high-mappability regions of the 321 genome (Methods). 322 We reasoned that differences in mutation distributions between the two sets of 323 genomes could be attributed to the MMR pathway. The overall mutation patterns are 324 similar in both cases, with much higher mutation rates at strong nucleosome 325 boundaries and adjacent linker DNA than the surrounding regions (Fig. 4a). This 326 implies that errors introduced during error-prone replication by a deficient Pol ε 327 escape repair by the MMR pathway when they coincide with strong nucleosomes. 328 Next, we calculated an 'MMR escape ratio' to quantify the relative amount of 329 replication errors that escapes MMR repair in the POLE only mutant cancers 330 compared with the POLE and MMR double mutants. Strong nucleosomal regions 331 (especially boundaries and adjacent linkers) display ~10% higher escape ratios than 332 the genome-wide background (Fig. 4a). Although A/T sites have higher escape ratios 333 than C/G sites around strong nucleosomes, both C/G and A/T sites exhibit similarly 334 elevated escape ratio profiles, suggesting independence of sequence context.   We also studied the effect of strong nucleosomes on replication fidelity by examining 354 data from children with inherited biallelic mismatch repair deficiency (bMMRD; 355 and polymerase δ defects (Pol δ, the major replicase for the lagging strand). We 357 estimated Pol δ and Pol ε escape ratios (escaping the proofreading correction of 358 polymerases) using the same reasoning as above (Fig. 4b). We found that strong 359 nucleosomes have higher escape ratios for both polymerases relative to the genomic 360 background (Fig. 4b), implying that they have lower replication fidelity in these 361 regions. The proofreading escape ratios for both polymerases are even higher than 362 that for MMR (Fig. 4a,b) and A/T sites display higher proofreading escape ratios than 363 C/G sites (Supplementary Fig. 5a). Again, the periodic pattern in the relative escape 364 profiles (Fig. 4b, Supplementary Fig. 5a) suggests that nucleosome positioning 365 contributes to the heterogeneity in replicase fidelity across the genome. 366 The aetiology of Signature 12 is currently unknown. Here, we found that it contributes 367 21.15%~21.99% to mutations in POLD1-mutant bMMRD genomes (inferred by 368 MutationalPatterns, Supplementary Fig. 5b,c), but much less for other bMMRD 369 samples (0~2.88% for POLE-mutant, and 3.32%~10.43% for POLE/POLD1-intact). 370 This suggests that Signature 12 is probably associated with Pol δ and that many de 371 novo mutations around strong nucleosomes arise from errors escaping Pol δ 372 proofreading. Surprisingly, Signature 10, known to be associated with Pol ε 373 deficiency, is absent from strong nucleosomal de novo SNVs (Fig. 3b). This 374 suggests that although both Pol ε and Pol δ have high proofreading escape ratios (i.e. 375 low fidelities) around strong nucleosomes (Fig. 4b), the majority of the replication 376 errors that are eventually converted to de novo mutations are derived from lagging 377 strand replicase Pol δ. humans are more frequently located in the linker regions (Supplementary Fig. 6) 383 rather than the dyads, suggesting that the mutagenic effects of Okazaki junctions are 384 different in the two organisms. This may partly be because yeast lacks the typical H1 385 histone found in human and other eukaryotes. However, the very short reads (single-386 ended 50bp) of OK-seq data restricted our analysis to nucleosomes with high 387 mappability (~10% of strong nucleosomes), limiting the strength of the conclusions 388 here. 389

Tubbs et al. (2018) studied the genome-wide distribution of DSBs using END-seq 393
and suggested that poly(dA:dT) tracts are recurrent sites of replication-associated 394 DSBs. Our analysis of this data revealed a higher frequency of DSBs around strong 395 nucleosomes compared with genomic background (Fig. 4c). The trend holds for 396 experiments with and without hydroxyurea treatment (HU, a replicative stress-397 inducing agent), suggesting that strong nucleosomes are endogenous hotspots (i.e. 398 without HU treatment) of DSBs during replication. It is notable that young Alu and L1 399 elements harbor prominent poly(dA:dT) tracts, which are enriched at the boundary 400 and linker regions of strong nucleosomes (Supplementary Fig. 7a). The patterns of 401 high DSB frequency still hold true when looking at strong nucleosomes associated 402 with different repeats (Supplementary Fig. 7b,c). However, because the END-seq 403 data were sequenced with single-ended 75bp reads and majority of young Alu and 404 L1 elements cannot be assessed with such short reads, we could not pursue further 405 detailed analysis. Since DSB repair can be error-prone (Rodgers and McVey 2016), 406 even using high-fidelity homologous recombination, frequent DSB formation and 407 subsequent error-prone repair likely contribute to the elevated mutation rates around 408 strong nucleosomes. 409

Strong nucleosome positioning is mostly associated with young repeat 410 elements and undergoes frequent turnover 411
Above, we highlighted that ~70% of strong nucleosomes are located in Alu and L1 412 retrotransposons (Supplementary Fig. 1d). Upon examination of the subfamilies 413 ( Fig. 5a,b), we uncovered a strong enrichment for evolutionarily young L1s (e.g. 414 L1PA2 to L1PA11) and Alus (e.g. AluY to AluSx). Since younger repeats have poorer 415 mappability, these observations probably underestimate the true enrichment. This 416 may also explain why several of the youngest L1 subfamilies (L1PA2 to L1PA5) have 417 lower enrichments than the slightly older subfamilies (Fig. 5a). 418 The preference for nucleosomes to occupy specific sections of Alu elements is 419 older elements. We also observed that younger Alus exhibit elevated de novo 424 mutation rates compared with old ones (Fig. 5c), and the weaker translational 425 SNVs and INDELs (Fig. 5c). Thus, there is an intriguing interplay between Alus, 427 strong nucleosomes and mutation rates. 428 The histone octamer is thought to preferentially bind DNA sequences presenting 429 lower deformation energy costs (Tolstorukov et al. 2008). We estimated deformation 430 energies using the nuScore software (Tolstorukov et al. 2008) based on the DNA 431 sequence and nucleosome core particle structure and we found that Alus do indeed 432 exhibit lower deformation energies than surrounding regions (Fig. 5c). Furthermore, 433 the energies of Alu elements tend to increase with age, suggesting that the 434 accumulated mutations in Alu sequences reduced their nucleosome-binding stability. 435 This is also supported by comparing deformation energies of Alu consensus 436 sequences (ancestral states) and those of current genomic sequences 437 (Supplementary Fig. 8a). We further analyzed the 3' end sequences of L1 elements 438 harboring strong nucleosomes and observed similar patterns (Supplementary Fig.  439 8b,c).  Since these results were mainly based on human polymorphisms or inter-species 459 divergence, indirect mutation rate measurements were potentially confounded by 460 selection and non-adaptive processes. The use of de novo mutations helps resolve 461 this debate to some extent. 462 As we showed above, there is considerable de novo mutation rate variation around 463 strong nucleosomes (Fig. 1e, Supplementary Fig. 1), which cannot be ignored in 464 any selection analysis. Furthermore, strong nucleosomes are clearly preferentially 465 present in young SINE/LINE elements and the strength of translational stability 466 decays substantially over time (Fig. 5). These observations support the re-positioning 467 model over a long evolutionary scale. Since a large majority of strong nucleosomes 468 associated with SINE/LINE elements are expected to become non-strong ones in 469 future, selection for preserving positioning might not be as widespread as previously 470 suggested, though it may happen at some particular regions or within a short 471 evolutionary scale. 472

Discussion 473
Though the involvement of nucleosome organization in DNA damage/repair 474 processes was recognised nearly 30 years ago (Smerdon 1991), its genome-wide 475 effects on germline mutation rates (particularly in higher eukaryotes) have remained 476 poorly understood. Our analysis combining large-scale de novo mutation and 477 nucleosome datasets in human provides several important insights into this topic. 478 A major finding is that strong translational positioning of nucleosomes is associated 479 with elevated de novo mutation rates, which is also supported by observations using 480 extremely rare variants in polymorphism data. The ability to use de novo mutations 481 here allowed us to bypass confounding evolutionary factors such as selection, thus 482 allowing direct assessment of the impact on background mutation rates. Importantly, 483 our statistical tests controlling for nucleosome occupancy and other related factors 484 confirmed the significant contribution of translational stability to mutation rate 485 variation. Therefore, we have discovered a novel factor that significantly modulate 486 germline mutation rate variation. 487 Investigating the underlying mutational processes responsible for this association 488 remains challenging. Nevertheless, we obtained several informative results regarding 489 potential mechanisms by leveraging published omics data related to DNA damage 490 and repair. In doing so, we revealed that MMR, replicase fidelity and DSB contribute 491 significantly to elevated mutation rates around strong nucleosomes. In particular, 492 multiple sets of ultra-hypermutated cancer data allowed us to quantify the 493 performance of MMR and replicases by calculating the repair escape ratios. The were considered in our analysis (Supplementary Fig. 1a). Extremely rare variants 554 classified the human genome into three groups based on the nucleosome contexts 564 (Fig. 1b): i) regions covered by translationally stable ('strong') nucleosomes; ii) 565 regions covered by rotationally but not stable translationally nucleosomes; and iii) the 566 remaining genomic regions. Chromosomes X and Y were excluded from analysis as 567 some other datasets used in our work lacked data for these chromosomes. As the 568 nucleosome maps we used were not derived from germ cells, for downstream 569 analysis we excluded the genomic regions in which nucleosome positioning were 570 found to differ between human embryonic stem cells and differentiated fibroblasts 571 we divided the one million strong nucleosomes into three categories of equal sizes 573 with different levels of stability -'high', 'middle' and 'low', which were used for 574 analysis shown in Fig. 1 and Supplementary Fig. 1. 575

Accounting for mappability 576
Sequencing read mappability can significantly affect variant calling results and other 577 aligned read-depth based measurements (e.g. nucleosome occupancy). The 578 sequencing reads for detecting de novo mutations used in our analysis were mainly 579 150bp paired-end reads, with fragment sizes ranging from 300-700bp 580 (Supplementary Fig. 1). We used the Genome Mappability Analyzer (GMA) (Lee and Schatz 2012) to generate the mappability scores for simulated paired-end 150 582 reads with fragment sizes set to be 400bp. Only the regions with GMA mappability 583 scores of >=90 (~2.59Gb) were considered for most analyses, unless specified 584 otherwise. We did not use the mappability tracks from ENCODE for the de novo 585 mutation data, because those tracks were only for single-ended reads. For some 586 analyses, additional filtering were applied if other associated datasets suffered from 587 more severe mappability issues. For measuring nucleosome occupancy, we used the 588 where denotes the probability that a genomic position is mutated (for 628 testing individual SNV mutation types, e.g. A>T, is the probability that a site is 629 mutated to a specific nucleotide), represents the observations for the considered 630 variables (categorical or continuous, e.g. d mean , d var , adjacent nucleotides, etc.), and 631 is the vector of parameters to be estimated. 632 We used the Bayesian logistic regression model implemented in the 'bayesglm' 633 (Gelman et al. 2008) of the R package 'arm', which was reported to perform well in 634 handling the complete separation issue in logistic regression models (Gelman et al. 635 2008). The complete separation issue is common when one class is rare relative to 636 the other and (or) there are many regressors in a model. As we had only ~300,000 637 de novo mutations, the probability for a given site to be mutated in our data is considered variables to a reduced model without one specific variable by performing 641 likelihood-ratio tests in R ('anova' function) to evaluate the significance for each 642 variable. The resulting p values of a set of likelihood-ratio tests were adjusted for 643 multiple testing with Benjamini-Hochberg correction. 644 To perform the regression analysis, we generated the data of all variables for the de 645 novo mutation sites and subsampled a fraction of the non-mutated sites as the 646 control sites. We did not use all the non-mutated sites in the genome as it would lead 647 to a large imbalance in the sizes of two classes ('mutated' and 'non-mutated') and 648 much larger computational burden. For de novo SNVs, we randomly generated For SNVs, we performed logistic regression tests for mutation types at A/T sites and 660 C/G sites separately and distinguished C/G sites in CpG and non-CpG contexts. We 661 also tested for nine individual SNV mutation types (three for A/T sites, three for C/G 662 sites at CpG contexts, and three for non-CpG contexts, Supplementary Fig. 3) The contribution of COSMIC mutational signatures (Alexandrov et al. 2013) to 700 different sets of mutations (de novo SNVs and somatic mutations from bMMRD 701 samples) was predicted using the 'fit_to_signatures' function in the R package 702 'MutationalPatterns' (Blokzijl et al. 2018). For the sets of de novo SNVs associated 703 with strong nucleosomes, the corrected frequencies described above were used for 704 running 'fit_to_signatures'. 705 proofreading during replication, producing unusually large numbers of mutations (so 707 called 'ultra-hypermutation') which facilitated our analysis. POLE mutated genomes 708 from PCAWG project (Campbell et al. 2017) were used to evaluate the differential 709 MMR efficiency between strong and non-strong nucleosome regions. We compared 710 the mutation densities in cancer genomes with POLE mutated and a deficient MMR 711 (4 individual samples) to those with POLE mutated and a proficient MMR (6 samples). 712 The MMR pathway was considered deficient if a driver mutation (annotated by the 713 PCAWG consortium) was found in one of five MMR core genes -MLH1, MSH2, 714 MSH6, PMS1 and PMS2. 715 For a given bin (10bp-size) in the meta-profile, we calculated the relative MMR 716 escape ratio relative to genomic background around strong nucleosomes as 717 described in the following formula, 718 where m i is the mutation density for the ith bin (observed number of mutations in the 719 ith bin divided by the bin size), and is the genome-wide average mutation density 720 of a specific sample group (observed number of mutations in the simulated windows 721 divided by the total window size), estimated by simulating random windows in the 722 genome. A similar logic was used when evaluating relative proofreading escape 723 ratios of Pol ε (mutated POLE) and Pol δ (mutated POLD1) using the somatic 724 mutation data from the bMMRD project (Shlien et al. 2015). 725 When analyzing PCAWG and bMMRD data, to account for potential mappability 726 issues, we focused on the highly mappable regions based on the CrgMapability 727 scores from ENCODE. We used CrgMapability scores here, which are more stringent 728 than GMA ones, because detecting somatic mutations in tumors is more difficult than 729 for ordinary individual re-sequencing data. We considered the strong nucleosomes 730 which have a 100mer CrgMapability score of 1 (meaning any 100-bp read from these 731 regions can be mapped uniquely in the genome) within ±800bp of the dyads. We 732 then simulated a same number of 1600bp-sized regions from the genome that satisfy 733 the mappability requirement to calculate the background mutation density. Note that 734 in theory the mappability issue in the relative escape ratios should be very small 735 because the two sets of samples have the same mappability for a given bin and the 736 ratio calculation normalizes the effects of different mappability among regions. 737 The raw reads of OK-seq data (Petryk et al. 2016) were downloaded from NCBI and 738 mapped to the human genome. We kept only the uniquely mapped reads for inferring 739 Okazaki junctions. The very 5' end sites of aligned reads (separating reads mapped 740 to Watson and Crick strands) were considered putative Okazaki junction signals. 741 To investigate DSBs around strong nucleosmes, we downloaded the genome-wide 742 tracks of human END-seq data (GSM3227951 and GSM3227952) (Tubbs et al. ('hg19.fa.align.gz') and mapped the hg19-based coordinates to the coordinates in the 756 consensus sequences. Strong nucleosomes appear to be under-detected in very 757 young L1 elements, which we think is due to difficulties in mapping short MNase-seq 758 reads (Alus are easier to map because they are much smaller). 759 Nucleosome deformation energies of all sites in the human genome were estimated 760 using nuScore (Tolstorukov et al. 2008). We also used nuScore to estimate the 761 deformation energies of Alu/L1 subfamily consensus sequences. For the L1 analysis 762 shown in Supplementary Fig. 8