Short-range, orientation-reversing template-switching events occur at a high frequency in the human and yeast genomes

The identification of structural variations in genomes using next-generation sequencing approaches greatly facilitates the study of genetic and genomic diseases. The data generated using these approaches also provide interesting new means to examine DNA repair, recombination, and replication to better understand sources of genomic instability. To better utilize this data, we developed SCARR (Systematic Combination of Alignments to Recreate Rearrangements) to identify DNA rearrangements, and used it to examine the occurrence of orientation-reversing events in human and budding yeast genomes. SCARR exceeds the sensitivity of previous genome sequencing approaches, and identifies rearrangements genome-wide with base-pair resolution, which helps provide insights into the mechanisms involved in their formation. We find that short-range orientation-reversing events occur at high rates in both human and yeast genomes. We quantified these rearrangements in yeast strains lacking various DNA repair factors, and propose that these short-range events often occur through template-switching events within a replication fork. We hypothesize that this mechanism may act as an error-prone alternative to fork reversal to restart stalled replication forks.

To avoid a bias for simple SVs, we developed SCARR (Systematic Combination of Alignments to 71 Recreate Rearrangements), which uses no a priori assumptions about the relative positions of DNA 72 sequences around a breakpoint. SCARR instead iterates through all possible genome-wide alignment 73 combinations to find the best match. Using this approach, it is possible to identify deletions, tandem 74 duplications and breakpoints resulting from template-switching events at base-pair resolution with 75 reasonable accuracy. Consequently, SCARR is particularly well suited for the detection of rare events, 76 such as SVs that result in loss of cell viability. This allows for the study of DNA rearrangements based 77 on a number of parameters, such as microhomology usage, base-pair distance between the 78 rearranged sequences, or SV type. In addition to linking specific SVs to phenotypes, our strategy can 79 also examine how specific factors (DNA repair proteins, replication stresses, DNA damage) impact 80 DNA stability in greater detail. In this study, we used SCARR on datasets from healthy human tissues 81 as well as from several yeast mutant strains exposed to a range of stress conditions to examine 82 patterns of SVs and mechanisms linked to short-and long-range orientation-reversing events.

84
SCARR sensitivity exceeds previous approaches to detect deletions, duplications, and 85 inversions 86 To determine whether SCARR provides more sensitive detection of DNA rearrangements than 87 previously-published computational methods, we used SVsim and WGSIM to generate simulated 88 next-generation sequencing datasets of a genome containing rearrangements, as previously 89 described [16]. SVsim generates rearrangements in a reference genome, and WGSIM simulates 90 sequencing data from the resulting file. We then identified SVs in the simulated datasets using 91 SCARR and LUMPY to compare their sensitivity and rate of false discovery at low coverage. Even at 92 1X sequencing coverage, SCARR identifies over 40% of deletions, duplications, and inversions, 93 whereas the false discovery rate remains around 10% (Supplemental Fig S1). In contrast, LUMPY 4 94 has a much higher sensitivity for inversions (23.7%) than for deletions (4.3%) and duplications (5.2%), 95 but nevertheless only reports less than 25% of total inversions. Since SCARR identifies all 96 rearrangements at base-pair resolution, the data can be used quantitatively. Its actual sensitivity 97 reaches close to 50% when rearrangements that are identified multiple times are taken into 98 consideration. As an additional control, we performed the same analysis on simulated data containing 99 no SVs. These studies revealed that approximately half of the false positives generated by SCARR 100 correspond to genuine SVs being misidentified, whereas the other half arise as a result of mapping 101 ambiguities in the reference genome independently of any SVs (Supplemental Fig S2).

102
To test the accuracy of SCARR in identifying template switching events, we developed and used 103 SimulateFoSTeS to generate 5X coverage paired-end sequencing in which FoSTeS events occur 104 every 1000 read pairs (See Materials and Methods for details). Unlike SVsim and WGSIM,

105
SimulateFoSTeS does not rearrange the reference genome prior to sequencing, but rather produces 106 unique template-switching events from random positions on the given reference genome at a given 107 rate. Of the 50,188 unique events simulated, SCARR was able to correctly identify 26,846 reads 108 containing SVs, producing no false positives in the process (Supplemental Table S6). This sensitivity 109 of 53.5% is consistent with the results obtained for other types of SVs using SVsim and WGSIM 110 (Supplemental Fig S1). Of the 26,846 SVs identified, 21,349 (79.5%) perfectly matched both 111 fragments of the breakpoint junction, and 5048 (18.8%) perfectly matched one fragment of the 112 breakpoint junction (Supplemental Table S6). Non-matching reads often occurred as a result of 113 repeated sequences in the human genome. This level of accuracy suggests that SCARR can be used 114 to detect patterns of genome instability using rare events at low genome coverage.

115
To test SCARR on a real dataset, we analyzed public datasets from whole genome sequencing of 116 healthy human brain and liver tissues as paired-end reads of 101 bases. We identified a surprisingly 117 high number of rearrangements for the sequencing coverage of the initial datasets, with over 17,000 118 rearrangement junctions per genome copy on average (Table 1 and Supplemental Tables 1-2).

119
Following detection, SCARR classifies rearrangements as deletions, duplications, inversions or 120 translocations. Since inversions are detected as single breakpoints for the most part, they include 121 orientation-reversing events, in which two sequences of opposing orientation from the same 122 chromosome are joined, as well as true inversions, in which a sequence is replaced by its reverse 123 complement. For each dataset, we found that over 7000 SVs per genome copy are either deletions or 124 duplications of less than 50 bp, and this accounts for approximately 40% of the SVs detected.

125
Interestingly, the number of each type of rearrangement relative to coverage is very similar between 126 the two datasets, with the exception of inversions, which are approximately 50% more abundant in the 127 brain dataset.

136
To determine the effect that read length has on rearrangement detection using SCARR, we 137 sequenced DNA from healthy human brain and spleen tissues as paired-end reads of 151 bases.

138
These datasets yielded approximately twice as many deletions, duplications, and inversions per 139 genome copy as the shorter reads from the publicly available datasets (Table 1 and Supplemental   140   Tables 3-4). This improvement in sensitivity is most likely the result of better sequence alignments in 141 longer reads, which helps compensate for the higher mutation rates found at rearrangement junctions.

142
The added sequence length also increases the chance of successful BLAST+ alignments that do not 143 overlap, with a less pronounced impact on the overlapping reads that result from sequence homology.

144
This leads to a higher proportion of junctions that share little to no homology when longer reads are 145 used (Table 1).

146
Short-range orientation-reversing events are frequent in the human nuclear genome

147
Since SCARR is based in part on an earlier approach that proved very useful in identifying replication 148 U-turns in organelle genomes [11], we also examined the occurrence of these events in the human 149 nuclear genome (Fig 1A). To do this with SCARR, we therefore calculated total orientation-reversing 150 events for each distance between 0 and 200 bases and normalized them to genome coverage (Fig   151   1B). In all samples, we found that a large majority of short-range orientation-reversing events occur at 152 distances under 50 bases, with a maximum peak at distance 0.

153
When SCARR fails to explain a read as a combination of two shorter alignments, the script then 154 attempts to explain it as a combination of three alignments. These SVs, which we refer to as paired 155 rearrangements, can provide additional context to rearrangements within a single DNA molecule.

156
Since uninterrupted replication following orientation reversal results in acentric or dicentric 157 chromosomes [9], they can result in large sequence alterations that would be deleterious to the cell.

158
We therefore investigated paired rearrangements in human datasets with 151-bp reads to determine 159 whether or not orientation-reversing events are followed by a second switch that restores replication 160 in the original direction. Since this requires two rearrangements within one read length, the brain and 161 spleen datasets yielded only 920 and 802 total paired rearrangements (Supplemental Tables 3-4), 162 compared to 343,899 and 273,099 single rearrangements, respectively.

163
Fifteen of the paired rearrangements in the brain dataset and fourteen in the spleen dataset are 164 paired inversions, in which at least one occurred at less than 50 bp (Supplemental Tables 3-4).

165
Interestingly, 5 of the total paired orientation-reversing events were identified 2 to 4 times on 166 independent DNA fragments. Considering the sequencing coverage of approximately 6X for each 167 dataset, these independently-identified paired inversions likely correspond to heterozygous alleles in 168 the individuals from which the samples were obtained. In all cases except one, the longest distance of 169 the two events is less than 400 bp, suggesting that either the nascent strand reanneals to its original 170 template following a template switch or that a second orientation-reversing event occurs. The SVs 171 produced can be complex, creating tandem inverted duplications which result in some DNA segments 172 being triplicated (Fig 2A). In other cases, the SVs are almost perfect true inversions of short sequence 173 fragments ( Fig 2B). Given their tendency to form hairpins, we have been unable to validate short-174 range orientation-reversing events by PCR. However, we were able to amplify and sequence three of 175 the paired orientation-reversing events to confirm the alignments obtained with SCARR, which also 176 supports their presence as allelic variants (Supplemental Fig S3).

178
To explore the proteins and factors involved in short-range orientation-reversing events in eukaryotic

192
We used SCARR to identify rearrangements in the yeast datasets, and analyzed the patterns for   (Fig 4). Since NHEJ is mostly active before S phase, we obtained the most striking 209 results for these two strains in the presence of CPT + αF. Although these datasets showed a 210 decrease in short-range orientation reversal compared to the wild-type (WT) strain, they also yielded 211 some of the lowest numbers for long-range orientation reversal. In contrast, all other conditions 212 showed very minor difference between the WT, dnl4-Δ and yku70-Δ strains, suggesting that HU 213 stress mostly creates DSB ends between S and M phase, when recombination pathways are active.

214
These results suggest that most long-range events and some short-range events require the end-215 joining machinery during the G1 phase when DSB end-resection is suppressed. However, the milder 216 decrease in short-range orientation-reversing event in these conditions suggests that they also occur 217 through an alternate mechanism that requires neither end-joining nor end-resection.

218
Extensive end-resection leads to long-range orientation reversal under HU stress 219 Since Exo1 and Sgs1 are involved in extensive DSB end resection, which is necessary for 220 homologous recombination (HR) [19], their absence was expected to lead to an increase in 221 rearrangements from error-prone mechanisms after NOC treatment. Indeed, the exo1-Δ and sgs1-Δ 8 222 strains both showed increases in long-range orientation reversal in the absence of stress. However, 223 these strains presented almost no difference from the WT strain under CPT + NOC stress, with only a 224 modest increase in short-range orientation reversal in the absence of Exo1 (Fig 4). The increase in 225 long-range events observed in the absence of stress therefore seems to occur outside of the G2/M 226 phase. Interestingly, the exo1-Δ and sgs1-Δ strains also showed a dramatic decrease in long-range 227 events under HU stress (Fig 4 and Supplemental Fig S4). The distance distribution for the Exo1-or 228 Sgs1-dependent events observed in the WT strain falls well within the expected resection range of the 229 two nucleases [19]. This is consistent with an increase in snap-back DNA synthesis following long 230 resection involving Exo1 or Sgs1 under HU stress.

231
The impact of Mre11 on short-range orientation reversal events seems independent of its role 232 in end resection.

233
The Mre11-Rad50-Xrs2 (MRX) complex and Sae2 are known to participate in the initiation of 234 resection by Exo1 [20][21][22]. As such, we expected the mre11-Δ and sae2-Δ strains to show similar 235 effects as the exo1-Δ strain. However, the sae2-Δ strain yielded similar levels of orientation reversal 236 to the WT strain under all four conditions, except for a reduction in long-range events with CPT + αF 237 stress, similar to the exo1-Δ and sgs1-Δ strains (Fig 4). In contrast, the mre11-Δ strain presented 238 short-range orientation reversal levels that are noticeably different from all other mutants (Fig 4). The 239 mre11-Δ strain displayed more short-range events than any other yeast deletion strain in the absence 240 of stress, but fewer short-range events than the WT strain under HU and CPT + αF stresses. The 241 mre11-Δ strain also presents a reduction in long-range events comparable to the dnl4-Δ and yku70-Δ 242 strains in the presence of CPT + αF. These differences between mre11-Δ and other mutants involved 243 in end resection suggests Mre11 might play a distinct role in short-range orientation reversal events.

244
Short-range orientation-reversing events do not seem to occur through recombination

245
In a previous study, we found that the single-stranded DNA-binding Whirly proteins and the bacterial-

256
Mgs1 possesses single-stand annealing activity, and the mgs1-Δ mutation was shown to cause an 257 increase in recombination and genome instability after replication [28]. Surprisingly, our datasets with 258 the mgs1-Δ strain showed a decrease in long-range orientation reversal compared to the WT strain 259 under CPT + NOC and CPT + αF stresses, with little change to short-range events (Fig 4). The 9 260 opposite was observed in the presence of HU, with long-range events maintaining a level similar to 261 the WT, but short-range events being reduced by more than half. These effects are consistent with 262 different roles for Mgs1 in the presence or absence of DNA-damage [29].

265
In our results, neither the srs2-Δ nor the mph1-Δ strain showed any large variation from the WT strain 266 under CPT + NOC stress, suggesting a limited impact for homologous recombination pathways on 267 orientation reversal events.

268
Replication stress contributes to short-range orientation reversal events

270
In the absence of stress, the rad9-Δ strain presented the lowest level of short-range inversions, but 271 the highest level of long-range events (Fig 4). Compared to the WT strain, this strain displayed fewer 272 short-range orientation reversal events under HU stress, but more under CPT + NOC. The effect of 273 Rad9 on short-range events therefore seems to be different between conditions of replication stress 274 and DNA damage.

275
Pol32 is an error-prone polymerase that was found to be responsible for replication during repair of  investigated the length of sequence similarity involved in short-range events in our data. Since we are 289 mostly interested in short-range events that occur through mechanisms that do not produce long-290 range events, we looked in more detail at yeast deletion mutants and stress conditions that suppress 291 long-range orientation reversal. These conditions are also likely to suppress short-range events 292 occurring through the same mechanisms, and should therefore yield a pattern of sequence similarity 293 requirement more specific to short-range template-switching events. We then compared these 294 mutants to the WT strain without induced stress.
In the WT dataset, we observed two main peaks with similar profiles for both short-and long-range 296 orientation reversal events: at 0 bases of homology, and between 4 and 15 bases of homology (Fig   297   5A). In each case, the two peaks reach approximately the same maximum values. In contrast, the 298 dnl4-Δ, yku70-Δ, mre11-Δ and rad51-Δ strains in the presence of the CPT + αF stress all yielded the 299 same peaks, but with a higher relative proportion for short-range events with 0 bases of homology 300 compared to homologies between 4 and 15 bases (Fig 5B and Supplemental Fig S5). The same 301 pattern is also observed for the exo1-Δ and sgs1-Δ strains under HU stress (Fig 5C and   302 Supplemental Fig S5). Interestingly, the human datasets with reads of the same length also presented 303 the same pattern for short-range events, in spite of having noticeably different patterns for long-range 304 events (Fig 5D). These results further support a distinct mechanism for the short-range orientation-305 reversing events that occur in the human nuclear genome, and suggest that sequence homology is 306 not a requirement in this mechanism.

308
Replication U-turns were initially observed in fission yeast using a reporter system with perfect 2.6 kb 309 inverted repeats, and were detected with repeats as short as 150 bp [9,10]. From these results, it was 310 suggested that shorter homologies may be sufficient for U-turns to occur at an appreciable rate. By 311 analyzing the genome-wide occurrence of short-range orientation reversal events in budding yeast, 312 we observe that a significant proportion of these rearrangements occur independently from end-313 joining mechanisms and in the absence of end resection. This is consistent with a replication-based 314 template-switching mechanism. We also note that these short-range orientation-reversing template-315 switching events may be favored when the nascent strand anneals to the new template using inverted 316 repeats, but that they also occur frequently in the absence of sequence homology at the breakpoint 317 junction. These results support a model where replication U-turns form when a template switch occurs 318 before a stalled fork collapses. The availability and proximity of single-stranded DNA within a 319 replication fork could explain their frequent occurrence in the absence of sequence similarity. It is 320 possible that the nascent strand uses a stretch of sequence similarity a few bases away from its 321 extremity to anneal to a new template and resume replication using a low-fidelity polymerase.

322
We also report the detection of short-range orientation-reversing template-switching events in the 323 human nuclear genome with distance and homology usage patterns that closely resemble those 324 observed in yeast. In some cases, two such orientation reversal events are paired to return the DNA 325 strand to its original orientation. A similar mechanism was recently proposed to explain mutation 326 clusters observed between chimpanzee and human reference genomes, as well as between the 327 assembled genomes of individual humans [33]. The rate at which we identified short-range orientation 328 reversal events in healthy human tissue further supports the idea that they play a role in both the 329 evolution of genomes and the appearance of genetic diversity within species. Interestingly, similar 330 rearrangement patterns as described in Fig 2A,  proposed to occur via a recombination event followed either by an end-joining or break-induced 333 replication event. However, considering the large difference in scale between these rearrangements 334 and those identified by SCARR, it is unclear whether they arise from the same mechanism.

335
The presence of U-turns in the human nuclear genome also suggests a new mechanism by which 336 stalled replication forks can be restarted (Fig 6). A previous study has found that the hypersensitivity 337 of BRCA2-deficient cells to DNA-damaging agents depends on the recruitment of MRE11 to stalled 338 replication forks [37]. BRCA2, together with RAD51 and the SMARCAL1 helicase, participates in the 339 conservative restart of stalled forks through fork reversal [38]. Though replication U-turns result in 340 alterations in the genome sequence, they also allow DNA synthesis to resume without creating DSB 341 intermediates. Combined with our results in yeast, this raises the possibility that this type of 342 mechanism may play a role for the chemoresistance observed in BRCA2-deficient cells. Since HR is 343 impaired in BRCA2-deficient cells, fork restart through U-turns could explain a reduced sensitivity to 344 fork stalling in the absence of fork reversal. The rate at which we observe short-range orientation-345 reversing template-switching events in healthy tissues and the proposed evolutionary model also 346 suggest that the SVs resulting from U-turns can occur with little to no deleterious effects [33].

347
Since datasets with longer reads improve the sensitivity of SCARR, next-generation sequencing 348 technologies that produce longer read lengths will be an important avenue to explore in the future.

349
The current version of SCARR checks each read for either one or two rearrangements, but the 350 approach can be extended to any number of rearrangements per read, provided the sequences are 351 long enough to successfully align a sufficient number of fragments. As such, long reads will also help 352 to provide context for rearrangements within single molecules. This possibility will be particularly 353 interesting in the study of orientation-reversing template-switching events, since they require a second 354 template-switching event to occur in order to resume DNA synthesis in the original direction.

358
SVs were randomly created using SVsim (https://github.com/GregoryFaust/SVsim) to generate 2,500 359 deletions, duplications, and inversions in the human reference genome (build 38) from which the 360 mitochondrial genome has been removed. The variants each range from 100 bp to 10,000 bp, as 361 described previously [16]. Virtual sequencing was performed using WGSIM 362 (https://github.com/lh3/wgsim) to a coverage of 1X, using paired-end reads of 150 bp and default 363 settings for insert size and error rates, also as per Layer et al. 2014. Rearrangements with lengths 364 that differ by more than 20 bases from the simulated SV lengths were classified as false positives.

365
The same sequencing was also performed on the unmodified human reference genome GRCh38

366
(minus the mitochondrial genome), with default error and indel settings. Rearrangements were 367 detected with SCARR 1.0 and LUMPY 0.2.13.

371
Additional library and sequencing information relating to the datasets was previously described [39].

372
Human DNA samples 373 Human DNA was purchased commercially from BioChain, pre-extracted from the following healthy

407
The bands corresponding to SVs were gel-extracted and TOPO cloned for Sanger sequencing. 418 sequences or less and the work was parallelized. The output from BLAST+ was given as input to 419 SCARR with default parameters. SCARR is a custom Python script available on GitHub that tests all 420 possible combinations of BLAST+ alignments to determine whether a satisfactory match can be found 421 (Supplemental Fig S6). Additional custom scripts were used to sort and organize the output 422 rearrangement files from SCARR, and are also available on GitHub.

424
For each rearrangement, microhomology length is determined by the number of bases from the 425 original read that are aligned to both sides of the breakpoint. When some bases at the breakpoint fail 426 to align to either side, they are counted as inserted bases. When the alignments map to different 427 chromosomes, the rearrangement is labeled as a translocation, and no further analysis is performed.

428
When both alignments map to the same chromosome, their relative directions are verified. When they 429 are in opposing directions, the rearrangement is labeled as an inversion, and the distance represents

455
SimulateFoSTeS simulates rare template-switching events from an unmodified reference genome.

456
Simulated template-switching events do not take into consideration any sequence similarity at the