Scarless engineering of the Drosophila genome near any site-specific integration site

We describe a simple and efficient technique that allows scarless engineering of Drosophila genomic sequences near any landing site containing an inverted attP cassette, such as a MiMIC insertion. This 2-step method combines phiC31 integrase mediated site-specific integration and homing nuclease-mediated resolution of local duplications, efficiently converting the original landing site allele to modified alleles that only have the desired change(s). Dominant markers incorporated into this method allow correct individual flies to be efficiently identified at each step. In principle, single attP sites and FRT sites are also valid landing sites. Given the large and increasing number of landing site lines available in the fly community, this method provides an easy and fast way to efficiently edit the majority of the Drosophila genome in a scarless manner. This technique should also be applicable to other species.


38
Reverse genetics is a powerful tool to study the functions of genes and proteins. To answer many 39 important biological questions, it is necessary to make precise genomic changes at the base pair 40 resolution, preferably in a scarless manner, such that the final alleles only have the desired 41 mutation(s). It is therefore important to have simple and efficient techniques for scarless genome 42 engineering. 43 The fruit fly Drosophila melanogaster is well known for its superior genetic tool kit. There have been 44 many efforts to precisely engineer the Drosophila genome. The first successful attempt used so-45 called ends-in targeting by homologous recombination to generate a local duplication, followed by 46 homing nuclease-mediated resolution of the duplication (Rong et al., 2002). The final mutant alleles 47 RMCE line has been generated, our method takes less than two months to obtain a final scarless

128
Test of principle: engineering of the Antp locus by sequential resolution 129 The Hox gene Antennapedia (Antp) was selected for an initial test of this technique. There is a MiMIC 130 insertion (Antp MI02272 ) in the intron between the first coding exon and the small second coding exon, 131 where the so-called W-motif is located (Figure 2A) (Merabet and Mann, 2016). The W-motif, also 132 called the hexapeptide, is a protein-protein interaction motif present in nearly all Hox proteins that 133 mediates the interaction between Hox proteins and their shared cofactor, the TALE family       Prior to testing the simultaneous resolution of both sides, we first tested the sequential resolution of 174 each side (Figure 1-figure supplement 1). The right side was resolved first by expressing the homing 175 endonuclease I-SceI ( Figure 3A), which has an 18 bp recognition site that is not present in the 176 Drosophila genome (Bellaiche et al., 1999). The hs-I-SceI flies were crossed to the RMCE flies (cross 177 I) ( Figure 3A), and their F1 embryos/larvae were heat-shocked at 37°C for 1 hour to induce I-SceI 178 expression. 100 F1 adult males were then individually crossed to a balancer stock (cross II) ( Figure   179 3A). Every fertile cross II produced at least one male progeny that had lost the 3xP3-RFP marker, 180 suggesting a high efficiency. To ensure all resolved lines were independent, only one male that lost 181 3xP3-RFP from each individual cross II was selected to generate a stock (cross III) ( Figure 3A).

182
In total, 94 independent right side-resolved lines were obtained, and 60 lines were randomly selected  Simultaneous resolution of both sides 243 Next, we tested the simultaneous resolution of both sides, which would significantly simplify and 244 shorten the entire process ( Figure 1B). The overall procedure was similar to left-side or right-side 245 resolution, except that both I-SceI and I-CreI were expressed together. The simultaneous resolution 246 crosses for chromosome III targets are shown in Figure 4A, and those for chromosome II and X 247 targets are shown in Figure 4-figure supplement 1. We tested heat shock at 37°C for 10, 20, 30 and 248 40 minutes, and found that a 20-minutes heat shock gave the highest rate of productive cross II (data 249 not shown), defined as the fraction of cross IIs that lead to a final stock (see Materials and Methods 250 for more details).

251
To gain a better measure of the efficiency and robustness of this method, 8 different verified RMCE 252 lines were subjected to simultaneous resolution ( Figure 4B). After a 20-minute heat shock, essentially 253 normal viability and fertility were observed. On the other hand, not all individual cross IIs generated 254 male progeny that lost both the mini-white and the 3xP3-RFP markers ( Figure 4B); as expected, we 255 frequently observed cross II progeny that lost either mini-white or 3xP3-RFP, but not both.

256
Nevertheless, except for one RMCE line (line F), the rate of productive cross II ranged from 50% to 257 70%, confirming the high efficiency of simultaneous resolution ( Figure 4B). 258 We selected the final alleles resolved from 3 different RMCE lines for further characterization. PCR 259 was first used to genotype the selected alleles. Because Antp's W-motif motif is expected to be 260 necessary for viability, only the presence of the 3xFLAG tag was examined for all homozygous viable 261 final alleles. For selected homozygous lethal alleles, the presence of the 3xFLAG tag, the 262 YPWM->AAAA mutation, as well as the potential right-side marker deletion were tested ( Figure 4C). 263 We detected some right-side marker deletion events, as expected. One homozygous lethal allele had 264 the apparent genotype of 3xFLAG-Antp+. Presumably, an unwanted mutation occurred during      the initial orientation. We tested this with an RMCE line in the opposite orientation ( Figure 4B).

291
Indeed, this line showed a resolution efficiency that was among the highest of all 8 tested RMCE 292 lines.

293
To confirm the accuracy of the final alleles, we further characterized all 9 homozygous viable alleles 294 generated from this particular RMCE line. Of these 9 alleles, 3 had the 3xFLAG-Antp genotype, while 295 the other 6 were untagged ( Figure 4C). We selected 2 of the 3 3xFLAG-Antp alleles for further

297
The sequences of these 2 alleles confirmed that there were no additional mutations.  One fully verified Ubx landing site allele was selected as the starting strain for engineering the Ubx 306 locus. A Ubx targeting plasmid was generated, which contained a 7.8 kb fragment with a 3xFLAG tag 307 at the N terminal end of the Ubx ORF and the YPWM->AAAA mutation ( Figure 5B). This targeting 308 plasmid was injected into the F1 progeny of the vas-int(X) females and the Ubx landing site males, 309 and multiple independent RMCE lines were obtained and further verified by Southern blot. One fully 310 verified RMCE line was subjected to simultaneous resolution, following the same procedure as for the 311 Antp locus. From 100 individual cross IIs, we were able to achieve a success rate of ~50% (              All final alleles were homozygous viable and fertile, and the homozygotes were verified in several 423 steps ( Figure 7E). First, the presence of the desired deletion was determined by PCR using primers 424 flanking the deletion. Next, those alleles that generated the correctly sized PCR product were       The advances of CRISPR based techniques have made the engineering of the Drosophila genome 475 much easier, but many custom mutant alleles generated with CRISPR still contain sequence scars.

476
Although generating scarless custom mutations in Drosophila is feasible, significant effort is required. 477 And regardless of which CRISPR strategy is used, a major uncertainty is that the selected gRNA(s)  insertions with single attP sites, or even FRT sites, are also potential landing sites (see below). 525 Finally, the 5 kb limit for genome modification is also a conservative estimate. Taken together, we 526 estimate that with available landing sites, this method could be used to precisely engineer the 527 majority of the fly genome in a scarless manner. In case there is no suitable landing site near the 528 locus of interest, such as our engineering the Ubx locus, a custom landing site can be generated to 529 facilitate scarless genome editing.

531
Sequential resolution vs. simultaneous resolution 532 We have tested two different resolution strategies, sequential resolution and simultaneous resolution.

533
Simultaneous resolution is much faster and can generate the desired alleles from the RMCE lines in 534 less than 2 months. Sequential resolution, on the other hand, takes longer because the one-side 535 resolved alleles must be verified before the second side is resolved. The sequential resolution 536 strategy, however, offers higher efficiency. Except for difficult mutations, essentially over 90% of 537 independent cross IIs were successful, and the failures were only due to sterile male flies. Therefore, 538 when difficult mutations, such as large insertions or deletions, are to be generated, a sequential 539 resolution strategy might be preferred. In fact, to generate the 7.5 kb Gr28b gene, all correct deletion 540 alleles were obtained by sequential resolution, except that the first resolution occurred spontaneously 541 during RMCE. When performing sequential resolution, the starting RMCE lines must have the correct 542 orientation, but RMCE lines with the opposite orientation can be used for simultaneous resolution, 543 without an apparent decrease in efficiency. ). Nevertheless, given the high efficiency of this technique, we expect that the desired alleles can 560 still be generated.

561
In addition, flippase (FLP) mediated recombination between FRT sites has been used to integrate 562 plasmids into the Drosophila genome in a site-specific manner (Horn and Handler, 2005). In principle, 563 FRT sites could also be used as an initial landing site for this method. However, due to the 564 bidirectional nature of recombination between FRT sites, the plasmid integration efficiency would be 565 expected to be lower than the unidirectional attB-attP integration mediated by phiC31 integrase. Once 566 successful integration events are obtained, the resolution step should work equally well compared to 567 attB-attP integration events. Targeting vectors for single attP and FRT landing sites have been 568 generated (Figure 1-figure supplement 2).

569
The general principle we demonstrate in this study is that any genomic locus can be engineered in a 570 scarless manner if a DNA fragment can be integrated nearby. Due to the highly conserved 571 homologous recombination pathways, we expect this principle to be applicable to other organisms.      The TALEN plasmids were linearized by restriction digestion and gel purified, and were used as 674 templates for in vitro transcription using the AmpliScribe SP6 Transcription Kit (Epicentre AS3106).

675
The mRNAs were then capped in a subsequent reaction using the ScriptCap m 7 G Capping System 676 (Cellscript C-SCCE0625).

677
A vector, pCassette-ubiDsRed, was generated, which has an ubiDsRed marked inverted attP 678 cassette flanked by two different multiple cloning sites (MCS) for inserting homologous arms. Ubx-N-L 679 and Ubx-N-R homologous arms were cloned into these two MCS sites to generate the pCassette-

746
As mentioned in the Results section, some transformants lost the landing site marker, but only mini-747 white was present, and no 3xP3-RFP was observed. This class was most likely because of 748 spontaneous resolution of the right end during phiC31 integrase mediated RMCE, in which dsDNA 749 breaks were introduced within the attP and attB sites, and could have triggered homologous 750 recombination. The RMCE transformants were usually selected by the presence of mini-white, and 751 the presence of 3xP3-RFP and the absence of the landing site marker were confirmed later.

752
Therefore, it is reasonable to expect that 3xP3-RFP+, white-, yellow-(ubiDsRed-) transformants also 753 existed, but they were unidentified. Spontaneous resolution of both ends during RMCE might also 754 happen at low frequency.

755
The primer sequences for verifying MiMIC and RMCE alleles are in Supplementary file 2.  For Cross I, several vials of crosses were set up, and the flies were allowed to accommodate for a 764 few days. The adults were then allowed to lay embryos for 72 hours before being transferred to new 765 vials, and the embryo/larvae in the old vials were heat shocked at 37°C. If I-SceI was the only homing 766 nuclease expressed, 1-hour heat shock was performed. A 20-minute heat shock was performed if I-

767
CreI was involved, either with or without I-SceI (Note: in the sequential resolution reported here, a 40-768 minute heat shock was performed to induce I-CreI expression, but later results showed that a 20-769 minute heat shock might give better efficiency if that Cross II might have produced more males that lost the desired marker(s).
For the purpose of easy scoring and comparison, a productive Cross II was defined as an individual 780 Cross II that eventually generated a final stock. Occasionally, the selected single male from a Cross II 781 was sterile, and this particular Cross II would be scored as non-productive. In some cases, the final 782 stock from a Cross II might not be a correctly resolved allele (for example, it might be a right marker 783 deletion event), but such a Cross II would be scored as productive according to the above definition.

816
In this study, we calculated the fraction of the fly genome that can be accessed by our technique from 817 a mapped MiMIC insertion, assuming 5 kb near a landing site can be reached without difficulty. The

831
Since the mapped MiMIC lines represent only a subset of all available landing sites, and the estimate 832 that 5 kb flanking a landing site can be engineered is a conservative one, the actual fraction of 833 accessible fly genome is expected to be significantly larger than 50%.