Abstract
Here we describe a simple strategy for tagging of genes in mammalian cells. The method enables efficient creation of endogenously expressed protein fusions. Only PCR for the generation of a DNA fragment is required. This avoids the handling of RNAs, recombinant proteins or cloning of plasmids. The fragment, termed ‘PCR cassette’, is then transfected into cells along with a CRISPR/Cas12a helper plasmid and integrates into the target locus specified by sequences provided by the oligonucleotides used for PCR. The method is robust and works in all cell lines tested with tagging efficiency of up to 20% without selection, and up to 60% when selection markers are used.
Introduction
In mammalian cells, chromosomal ‘knock-ins’ for applications such as gene tagging are typically done using an endonuclease-based strategy to promote integration of the desired fragment via homology directed repair (HDR) or non-homologous end joining (NHEJ). For this, suitable reagents are required, often including recombinant proteins, RNAs, single-stranded DNA (ssDNA) or the cloning of tailored and gene-specific plasmids, to provide all the necessary components for integration (Yamamoto and Gerbi, 2018). This makes tagging rather cumbersome, time consuming and costly. In yeast genomic tagging is done using a strategy based on PCR (Baudin et al., 1993; Wach et al., 1994), now commonly referred to as ‘PCR tagging’. It requires two gene-specific DNA oligonucleotides and a generic ‘template plasmid’ that provides the tag and a selection marker to generate a ‘PCR cassette’. Directed by the oligonucleotide sequence, this PCR cassette integrates into the genome by HDR, owing to the efficient homologous recombination machinery in this species. Despite of various improvements through CRISPR/Cas9 applications, this procedure still constitutes the de facto standard in yeast research for rapid functional analysis using genomic modifications of genes. A similar procedure for gene tagging in mammalian cells would be highly desirable.
Here, inspired by some improvements of the method in yeast (Buchmuller et al., 2018, bioRxiv) we now describe ‘mammalian PCR tagging’ where we engineered the PCR tagging process making it compatible for gene tagging applications in mammalian cells. To enhance site specific integration of the PCR cassette we incorporated a CRISPR/Cas12a (Cpf1) (Zetsche et al., 2015) based strategy, while retaining the simplicity and effectiveness of the yeast procedure. Similar to yeast, mammalian PCR tagging involves the direct transfection of a PCR cassette into cells (Fig. 1a). Generation of the PCR cassette is simple – it requires two gene specific oligonucleotides (termed M1 and M2 tagging oligos) to conduct a PCR using a template plasmid (Fig. 1b). The M1 and M2 tagging oligos provide homology arms that specify the integration site. To promote integration by homologous recombination via a double strand break near the integration site, we incorporated into the M2 oligo a CRISPR RNA (crRNA) for Cas12a. PCR with a ‘template plasmid’, which provides a U6 polymerase III (Pol III) promoter, generates a functional gene expressing a crRNA from the PCR cassette. Therefore, in addition to the yeast method, a Cas12a helper plasmid needs to be co-transfected to provide a source for the endonuclease (Fig. 1a).
M1 and M2 tagging oligos can be rapidly designed using an online tool (www.pcr-tagging.com). Template plasmids are generic since they can be used with any M1/M2 tagging oligo (Fig. 1b). A template plasmid also provides the desired tag (e.g. GFP) along with a terminator and can contain additional features such as a selection marker. Upon PCR, a PCR cassette is generated that contains three essential functional elements (Supplementary Fig. S1): a Cas12a crRNA gene to direct the endonuclease to the target locus, flanking homologous sequences matching the target gene, and the tag itself. For integration, the cassette is co-transfected with a plasmid encoding Cas12a. Inside the cell, the crRNA and Cas12a are expressed and assemble into a functional complex that cleaves the target gene (Fig. 1c). The resulting double stand break (DSB) then stimulates DNA damage repair. DNA repair can occur via different pathways. One option is that the DSB is repaired using the transfected PCR cassette that contains homology arms that match the region adjacent to the cleaved site. Only this yields the desired integrands expressing the appropriately tagged proteins from the target locus.
Results
Implementation of mammalian PCR tagging and optimization of procedures
To test if our approach could be used for efficient gene tagging in mammalian cells we designed tagging plasmids containing the bright green fluorescent protein mNeonGreen (Shaner et al., 2013). For tagging we selected a list of 15 genes encoding proteins with a diverse range of cellular localizations (Supplementary Table S1) with sufficiently high endogenous expression levels (Geiger et al., 2012; Schaab et al., 2012) for easy detection by fluorescence microscopy of the corresponding mNeonGreen-tagged fusion proteins by fluorescence microscopy. We co-transfected the PCR cassettes together with a Cas12a encoding plasmid into HEK293T cells and inspected the sample three days later for the presence of fluorescence. For all genes, we observed between 0.2% and 13% of fluorescent cells with the expected protein-specific localization pattern (Fig. 1d), e.g. Endoplasmic Reticulum for CANX, mitochondrial staining for TOMM20, or a diffuse and a dotted nuclear staining for HNRNPA1 and PCNA, respectively (Fig. 1e). The formation of cells with correctly localized fluorescence signal depended on the presence of Cas12a and matching combinations of homology arms and crRNA, irrespective of whether they are on the same, or different PCR products (Fig. 1f). In the presence of a crRNA for a locus different from the one targeted by the homology arms, we found very rarely cells where the cassette became integrated into the foreign locus, indicating that in addition to HDR also other integration pathways such as NHEJ are used (Fig. 1f and Supplementary Fig. S2). Together, these results establish that the crRNA is transcribed from the transfected PCR cassette and that it directs Cas12a for cleavage of the target locus. Furthermore, we conclude that the Cpf1-mediated double strand break is repaired frequently using HDR.
In addition to cells with the expected localization of the green fluorescence we observed in several transfections also cells with diffuse cytoplasmic fluorescence of variable brightness (Fig. 1d-e, see examples labeled with arrows in 1e). This fluorescence was independent on Cas12a or matching combinations of crRNA and homology arms (Fig. 1f). This indicates that the non-specific cytoplasmic signal resulted from the transfected PCR cassettes alone.
Non-specific cytoplasmic fluorescence is caused by unstable extra-chromosomal fragments
The nature of the diffuse cytoplasmic fluorescence observed in a fraction of the cells was unclear. The cytoplasmic fluorescence could originate from extra-chromosomal fragments in the nucleo-cytoplasm or fragments that have chromosomally integrated at off-target loci. To investigate the fate of the transfected fragments in the cells three days after transfection we used Anchor-Seq (Meurer et al., 2018). This method amplifies all junctions between the tagging cassettes and their local DNA neighborhoods for analysis by next generation sequencing (Fig. 2a). With this analysis we detected junctions that resulted from correctly inserted PCR cassettes (Fig. 2b), consistent with the observation of correctly localized fluorescence signal. We inspected the sequences also for signatures of off-target integrations, but could not detect any. However, the detection sensitivity was limited because of a large number of reads that did not extend beyond the sequence of the M1 or M2 tagging oligos (Fig. 2b). This suggests that they result from PCR cassettes of the transfection that are still present in the cultured cells. In addition, we also detected a substantial fraction of reads that originate from ligated ends of transfected cassettes, consistent with the idea of their recognition and ‘repair’ by NHEJ. Different types of fusions were detected (Fig. 2b), and the frequency of the different fusion types was not distributed as one would expect from random joining. In fact, the most frequently observed fusion type comprised a fusion of the right and the left arm of the PCR fragment (LR homo fusion, Fig. 2b). This can best be explained by an intramolecular fusion of the ends of the same PCR cassette, fragments, leading to their circularization.
To validate the occurrence of cassette fusions, we transfected into the same cells a mixture of PCR cassettes for >15 different genes. This detected hybrid-fusions between PCR cassettes for different genes, validating the idea that after transfection the cassettes are ligated together, e.g. via NHEJ mediated DNA damage repair (Fig. 2b). Nevertheless, the LR homo fusion remained the most abundant event also in the transfection of the mixture. This can best be explained by a preference for intra-molecular ligation. Together, these data support the idea that small mini-circles are the most frequent outcome of DNA repair processes upon transfection of the PCR cassettes.
In such mini-circles the crRNA gene is fused to the 3’ end of the mNeonGreen sequence with the homologies of the M1 and M2 tagging oligonucleotides in between. This could yield an mNeonGreen expressing DNA element driven by the U6 Pol III promoter of the crRNA gene. The used U6 Pol III promoter has previously been shown to also mediate Pol II driven expression (Rumi et al., 2006), in which case a translation competent capped mRNA could be produced. To assess whether the nonspecific cytoplasmic fluorescence involves the translation initiation codon of mNeonGreen, we next transfected a PCR cassette where the ATG codons at position 1 and 10 of the mNeonGreen open reading frame (ORF) have been deleted and substituted with a codon for valine, respectively. This largely, but not completely, suppressed the population of cells with unspecific cytoplasmic signal, while the fraction of cells with specific localization indicative for correct gene tagging was similar to unaltered mNeonGreen (Fig. 2c). This indicates that the necessary ATG is often provided by mNeonGreen itself. Additionally, the crRNA or homology sequences may provide an ATG in frame with the mNeonGreen ORF.
‘Mini-circles’ are unlikely to be stable over consecutive cell divisions. We tested this hypothesis by growing transfected cells for several days. Over the course of the experiment we observed a gradual loss of the fraction of cells with unspecific cytoplasmic fluorescence, while the fraction of cells with correctly localized fluorescence signal remained constant (Fig. 2d). Our results confirmed the transient nature of the cytoplasmic signal and argue for a general applicability of mammalian PCR tagging for targeted ‘knock-in’ of PCR cassettes.
Parameters influencing tagging efficiency
To explore mammalian PCR tagging methodology further, we determined tagging efficiency as a function of various parameters.
DNA delivery
We first explored basic parameters such as amount of DNA and transfection method. We found that equal amounts of Cas12a plasmid DNA and PCR cassette DNA are optimal (Supplementary Fig. S3a), whereas the transfection method did not seem to influence the outcome (Supplementary Fig. S3b). We furthermore noticed that PCR cassette purification using standard DNA clean up columns (that do not remove long oligos) can be used. However, we observed that inefficient PCR amplification resulting in the presence of significant contamination of the final product with M1 and M2 tagging oligos can potentially lower the yield of integration at the correct loci (data not shown).
Length of homology arms
From yeast it is known that approx. 28 to 36 nucleotides (nts) of continuous sequence homology are minimally required for homologous recombination of transfected DNA with the genome (Rothstein, 1991). For PCR tagging in yeast, homology arms between 45 and 55 nts in length are routinely used. To obtain some insights into the requirement in mammalian cells, we tested the integration efficiency as a function of the length of the homology arms. This revealed that already short homology arms of 30 nts on both sides allow efficient integration of the cassette (Fig. 3a), but increasing the length results in more efficient integration.
Dependence on homology arms
Our control experiment (Fig. 1f) suggested that PCR tagging depends on the presence of homology arms. However, it could still be that a fraction of the productive events is not mediated by HDR, but by alternative DNA repair pathways. To test this directly we generated a series of PCR cassettes with different types of ends. In particular, we also generated a PCR cassette with compatible overhangs for direct ligation, by using a Type II restriction enzyme (HgaI). This enzyme generates ends that contain 3’ overhangs of 5 nts on both sides, which were designed such that they are compatible with the ends produced by Cas12a in the corresponding genomic locus (Fig. 3b, D). We observed in-frame integration of the Hga1 cut fragment, but with lower frequency when compared to the integration in the presence of homology arms (Fig. 3b). This demonstrates the requirement of homology arms for efficient integration. Insertion of the PCR cassettes via NHEJ can be observed, but it is rather inefficient.
Modified oligonucleotides
Multimerization of transfected dsDNA inside cells can be hindered when bulky modifications such as Biotin are introduced at the 5’-end of the DNA fragment. This has been reported to enhance targeting efficiency ~2-fold in Medaka (Gutierrez-Triana et al., 2018) and the Biotin-modification could contribute to enhance targeting efficiency in mouse embryos (Gu et al., 2018), leading to the insertion of preferentially one copy of the donor DNA. We tested M1/M2 tagging oligos with multiple phosphorothioate bonds (to prevent exonuclease degradation) with and without Biotin at the 5’-end. Synthetic oligonucleotide synthesis occurs in the 3’ to 5’ direction, and oligo-preparations without size selection are contaminated by shorter species without the 5’-modifications. Therefore, we additionally included size selected (PAGE purified) oligos. Overall, we obtained mixed results (Fig. 3c). For TOMM20 we observed that with increasing number of modifications the tagging efficiency increased to a maximum of 2-fold. It was irrelevant, whether the oligos were size-selected or not. However, for HNRNPA1 and also CANX the modifications did not appear to change tagging efficiency, whereas for CLTC and DDX21 again a 2- to 3-fold improvement was observed. Importantly, however, in all cases we observed a 2- to 3-fold reduced frequency of cells with diffuse cytoplasmic fluorescence. This is consistent with the idea that this fluorescence results from ligated PCR cassettes, and that the modifications are effective in suppressing such ligations, at least partially.
Taken together, these experiments demonstrate the robustness of the procedure and dependency on homology arms for efficient recombination with the target locus, leading to the tagged gene.
Selection of clones using antibiotics resistance markers and multi-loci tagging
Next, we generated template plasmids that additionally incorporated selection markers for different antibiotics and used them to generated PCR cassettes for some of the genes shown in the previous figures, but also including five genes that we have not tagged before (Supplementary Table S1). After amplification of the PCR cassettes we used DpnI or FspEI digestion to selectively destroy the DAM methylated template plasmid DNA (which also contains the selection marker). Using Zeocin or Puromycin resistance as selection markers yielded cell populations highly enriched in cells exhibiting the correct localization of a fluorescent fusion protein (Fig. 4a). The selected populations still contained cells with the non-specific cytoplasmic fluorescence, but the fraction remained constant or even decreased sometimes, indicating the labile nature of the source of this signal.
After enrichment of positive cells by Zeocin selection, single cell clones for CANX tagging were obtained by limited dilution of cells transfected with the CANX-specific PCR cassette and analyzed in detail. PCR identified in all clones correct insertion junctions on the side of the fluorescent protein tag, and in 4 out of 5 also on the rear side of the PCR cassette. Antibodies detected the corresponding mNeonGreen fusion protein (Supplementary Fig. S4a). HEK293T cells are aneuploid and appear to have up to 5 copies of the CANX gene (Lin et al., 2014). We also detected the wt copy of CANX in all clones, indicating that not all copies were tagged (Supplementary Fig. S4a).
We aimed to amplify the inserted construct using primers that bind outside of the inserted fragment. Only in one out of five clones could the inserted construct be amplified by external primers, indicating more extensive genome alterations in the other clones. Indeed, using primers that bind at both ends of the PCR cassette and that are outwards oriented we detected bands that can best be explained from concatenated fragments (Supplementary Fig. S4b). This indicates that frequently not only single cassettes, but two or more ligated cassettes are inserted into the genome. However, since a STOP codon and a transcriptional terminator accompany the inserted tag. Therefore, these additional copies should not interfere with the function of the tagged gene. Using modified M1/M2 tagging oligos (Fig. 3c) it might be possible to reduce the frequency of multimeric insertions.
Using Anchor-Seq we next investigated off target integrations. We found and validated by PCR two off target integrations of the PCR cassette in the 5 positive clones (Supplementary Fig. S4c), one just downstream of the CANX gene, which resides on chromosome 5, and one site on chromosome 1, which contained an insertion in two clones. This indicates the occurrence of multiple insertions in the same clone, maybe caused by off-target activity of the crRNA.
To gain insight into the frequency of multiple tagging events, we generated for CANX and HNRNPA1 two PCR cassettes each, one for tagging with the red fluorescent protein mScarlet-i and one with mNeonGreen, respectively. The resulting four cassettes were then co-transfected into HEK293T cells in mixtures of pairs of two, using all four possible red-green and gene-gene combinations. This detected three types of cells, with green, red, or green and red fluorescence in the nucleus or the Endoplasmic Reticulum (ER) respectively, as shown for the example of the HNRNPA1-mScarlet-i/HNRNPA1-mNeonGreen transfection (Supplementary Fig. S5). The frequency of each of the three types of cells was roughly equal, no matter whether the same or two different genes were tagged (Fig. 4b). This indicates high double tagging efficiency of different loci, and demonstrates that often more than one allele is tagged. This suggests applications of PCR tagging for the analysis of protein-protein interactions using epitope tagging, or protein co-localization using different fluorescent proteins. We validated this in different double tagging experiments (Fig. 4c), which demonstrated simultaneous detection of various cellular structures within one transfection.
Together, this analysis demonstrates that all positive clones contain insertions by homologous recombination that yield the correct fusion protein. Insertions are not necessarily single copy, but concatenated segments of ligated tagging cassettes. Since the PCR cassette provides STOP codon and a transcriptional terminator along with the tag, the generated transcript is properly defined.
Applications of PCR tagging: different cell lines
So far, we have provided a robust workflow for chromosomal tagging in HEK293T cells. To challenge the general applicability of PCR tagging, we tested additional human but also murine cell lines to tag genes already tagged successfully in our initial experiments. In each cell line we identified for most genes cells that showed correctly localized green fluorescence, with a frequency of 0.2 to 5% (Fig. 5a-d). Examples of tagged murine myoblast (C2C12) cells are shown in Supplementary Fig. S6a. For HeLa cells, we additionally subjected the cells to selection, and found up to 40% of cells exhibiting the correct localization (Supplementary Fig. S6b). In conclusion, these results demonstrate that PCR tagging works for different cell lines and species, including differentiated and stem cells.
crRNA design, PAM site selection and genomic coverage
Next, we asked how well Cas12a-targeted PCR tagging covers the human genome. Our tagging approach relies on relatively short homology arms of the PCR cassette. This constrains the target sequence space, since cleavage of the target locus must be inside the area of the homology arms, leaving enough sequence for recombination. Second, insertion of the cassette needs to destroy the crRNA cleavage site, in order to prevent re-cleavage of the locus. For C-terminal protein tagging these criteria confine potentially useful protospacer-associated motif (PAM) sites to a region of 17 nts on both sides of the STOP codon including the STOP codon, with the PAM site or protospacer sequence overlapping the STOP codon (Fig. 6a). So far, we have used Cas12a from Lachnospiraceae bacterium ND2006 (LbCpf1) (Zetsche et al., 2015), but PAM sites that are recognized by this Cas12a (TTTV) (Gao et al., 2017) and that are located in this area of a gene are relatively infrequent and would allow C-terminal tagging of about one third of all human genes (Fig. 6b). To increase this number we first tested different Cas12a variants with altered PAM specificities (Gao et al., 2017). The results demonstrated that other variants and PAM sites are also functional and can be used for PCR tagging (Fig. 6c). Considering these Cas12a variants renders approx. 72% of all human genes accessible for C-terminal PCR tagging (Fig. 6b). To increase this number further we extended the search space for suitable PAM sites into the 3’-UTR (typically 50 nts) (Fig. 6a) and adjusted the design of the M2 tagging oligo such that a small deletion occurs that removes the binding site of the crRNA. Since tagging introduces a generic terminator for proper termination of the tagged gene, this small deletion is unlikely to have an impact on the tagged gene. Considering the extended search space and the currently available palette of Cas12a variants (Fig. 6b) we calculated that potentially 98% of all human ORFs are amenable for C-terminal PCR tagging.
PCR tagging toolkit for mammalian cells
Our results outline PCR tagging as a rapid, efficient and cost-effective procedure facilitating chromosomal knock-ins of large DNA fragments in mammalian cells, e.g. for C-terminal tagging or gene disruption. To facilitate application of the method for various purposes we set up a webpage for oligo design (Fig. 1b). The online tool (www.pcr-tagging.com) requires as input the genomic DNA sequence around the desired insertion site, i.e. the STOP codon of the gene of interest for C-terminal tagging. The software then generates the sequence of the M1 oligo, which specifies the junction between the gene and the tag. Next, the software identifies all PAM sites for the available Cas12a variants and uses these to generate crRNA sequences and to assemble corresponding M2 tagging oligos. M2 tagging oligos are designed such that the integration of the PCR cassette does lead to a disruption of the crRNA binding site or PAM site in order to prevent re-cleavage of the locus. M2 tagging oligos are then ranked based on the quality of the PAM site and the presence of motifs that might interfere with crRNA synthesis or function. M1/M2 tagging oligos can be used with template plasmids based on different backbones: either without a marker, with the Zeocin or with the Puromycin resistance gene (Fig 7a). We generated a series of template plasmids containing different state of the art reporter genes (Table 1, examples shown in Fig. 7b).
Ongoing efforts continue to improve optimal crRNA prediction and to eliminate crRNAs with potential off-target binding activity. The current version of the server already allows to flexibly add novel Cas12a variants, by adjusting PAM site specificity and the sequence of the corresponding constant region of the crRNA.
In conclusion, Cas12a-mediated PCR tagging of mammalian genes using short homology arms is a rapid, robust and versatile method enabling endogenous gene tagging. The versatility of the method suggests many types of applications for functional or analytical gene and protein studies in mammalian cells.
Discussion
In this paper we demonstrate efficient targeted integration of DNA fragments of several kb in size into the genome of mammalian cells, guided by short homology arms. Integration is assisted by CRISPR/Cas12a and a crRNA that is expressed from the DNA fragment itself. This enables a PCR- only strategy for the production of the gene specific reagents for tagging of cell lines, thus allowing quick and low-cost experimentation. We developed a software tool for oligo design and established streamlined procedures for application in several cell lines.
PCR tagging is potentially useful to disrupt gene function. We tested this by generating a PCR cassette to disrupt genes by inserting STOP codons and a terminator and found that this did work as well (data not shown), making the method also applicable for gene KO studies.
Beyond mammalian cells, there may be other species where this strategy could boost tagging methodology, i.e. many fungal species that require a DNA double strand break for targeted integration of a foreign DNA fragment.
PCR tagging can be easily up-scaled and parallelized – since it needs only two oligonucleotides per gene. In yeast where PCR tagging is very efficient even in the absence of an endonuclease the ease of up scaling permitted the creation of many types of genome wide resources where all genes were modified in the same manner, i.e. by gene deletion or by tagging with a fluorescent protein or affinity tag (Gavin et al., 2002; Ghaemmaghami et al., 2003; Huh et al., 2003; Meurer et al., 2018; Winzeler et al., 1999).
The use of tagged genes always raises the question about the functionality of the tag-fusion. Here, two questions matter: How does tagging affect gene regulation, and how does it affect protein function? Many aspects of protein tagging have been discussed in literature, i.e. from functional or structural points of view. But ultimately, one has to be aware of the fact that a cell expressing a tagged gene is a mutant, and that the tag does not necessarily report correctly about the behavior of the untagged protein. As part of good laboratory practice this demands for some sort of phenotypic analyses to investigate the functionality of the tagged gene/protein and/or orthogonal experiments to obtain independent validation of the conclusions that were derived with the tagged clone(s).
In yeast. genome wide C-terminal epitope tagging of haploid yeast cells revealed that >95% of the ~1000 essential yeast genes, when endogenously tagged with a large tag such as a fluorescence protein reporter, retain enough functionality to not cause an obvious growth phenotype under standard growth conditions (Khmelinskii et al., 2014)
When using PCR tagging, it needs furthermore to be considered that C-terminally inserted tag is accompanied by a generic transcription termination site that replaces the endogenous 3’ untranslated region (UTR).
Various methods for gene tagging with long DNA fragments in mammalian cells have been developed (Agudelo et al., 2017; He et al., 2016; Lackner et al., 2015; Merkle et al., 2015; Suzuki et al., 2016; Zhang et al., 2017; Zhu et al., 2015). Besides methods that are tailored for particular DNA repair pathways such as NJEJ, also classical approach to use long homology arms to direct insertion via the repair of a double strand brake has been used in combination with CRISPR/Cas9 or other endonucleases in various implementations, using circular or linear repair templates which can be generated ex vivo, or in vivo upon endonuclease excision of the repair template. Because of low efficiency or the use of alternative repair pathways, often substantial number of clones need to be screened in order to obtain a few correct ones (Koch et al., 2018). In non-germline cells, the insertion precision seems to be not always satisfactory and errors such as small in-dels are observed frequently near one or the other side of the inserted fragment, thus compromising the sequence of the tagged gene. Since PCR tagging relies on a heterologous terminator to terminate transcription of the tagged gene, the insertion precision of the down stream end of the PCR cassette is rather unimportant, and, if erroneous, will only affect 3’UTR regions of the gene, which is not used for the tagged allele. Obviously, this constitutes a compromise, and bears the possibility that important gene regulatory sequences are omitted from the tagged gene. While for mammalian cells no global data set about the regulatory impact of the 3’UTR on gene expression is available, data from yeast, where seamless tagging was compared with tagging using a generic 3’UTR, demonstrated that only about 11% of the genes were impacted in their expression more than 2 fold (Meurer et al., 2018).
If it is essential to retain the endogenous 3’ UTR, the PCR tagging strategy can be modified so that the crRNA gene and the tag are on different PCR fragments (see Fig. 1f). In this case the PCR product of the tag can be tailored to integrate seamless into the target site, without any additional sequence.
PCR tagging is highly efficient, as it is easy to obtain enriched populations containing the correct gene fusion. Nevertheless, the presence of high NHEJ activity complicates the situation, as shown in our detailed analysis using CANX tagged clones where we detected inserted tandem fusions (Supplementary Fig. S4). Given the fact that enriched populations are composed from many different clones, it is possible to use such populations for a rapid first assessment of an experimental questions, for example the localization of an endogenously expressed protein in a specific condition, environment or cell line, by simply scoring multiple cells. Since they are derived from different clones, clone-specific effects can be spotted rapidly and considered in the analysis. This avoids the need of perfectly characterized cell lines with exactly the intended genomic modification. Depending on experimental requirements, individual lines can then be isolated for detailed characterization prior to further experimentation.
The toolset available for PCR tagging can easily be expanded by constructing new template plasmids. Maintaining a certain level of standardization such as the preservation of the primer annealing sites for the M1 and M2 tagging oligos in new template cassettes, makes it is possible to re-use already purchased M1/M2 tagging oligos of the same gene for many different tagging experiments.
Further improvements of the tagging efficiency might be possible. While we found that inhibitors of NHEJ did not exhibit a positive impact on the integration efficiency (data not shown), it might be possible to target the repair template to the CRISPR endonuclease cut site (Gu et al., 2018; Roy et al., 2018), or further enhance Cas12a expression, to improve tagging efficiency further.
In conclusion, PCR mediated gene tagging has the potential to impact how research is done in an entire field, not only because of the simplicity of the method, but also because the required reagents are easy to handle, cost effective, and freely exchangeable. Moreover, PCR tagging is quicker than the construction of a plasmid for transient transfection, while simultaneously preventing the danger of studying overexpression artifacts.
With PCR tagging at hand, many different and exciting experimental avenues are becoming possible, from the rapid assessment of protein localizations to high throughput localization studies of many proteins.
Competing financial interest statement
The authors declare no competing financial interests.
Author contribution
M.K. and M.M designed the project and together with M.K.L and J.F. designed the experiments. J.F., K.H., M.M., B.K., J.D.K. and D.K. performed the experiments. K.H. analyzed the NGS data, K.G. wrote the web-tool for primer design. All authors analyzed the data and discussed the results. M.K. wrote the manuscript with input from all authors.
Materials and Methods
Plasmids and oligos
Plasmids are listed in Table 1 and Supplementary Tables S2 and S3. Sequences are provided for download. Plasmids can soon be obtained from www.addgene.org. All used oligos for cloning, Anchor-Seq and gene tagging are listed in Supplementary Table S4.
Construction of template cassettes
All clonings were performed by standard restriction enzyme digests or oligo annealing and ligations using enzymes from NEB. Most of the elements inside the template cassettes (M1-mNeonGreen-SV40polyA-ZeocinR-BGHpolyA-hU6promoter) were custom synthesized (gBlock, IDT) and cloned via BsiWI and XbaI into a BsiWI and SpeI cut pFA6a backbone. The SV40 promoter was cloned separately into the cassette via SalI and EcoRI, since it contains repeats and could not be synthesized together with the other elements. In addition to the ZeocinR marker we have also introduced a PuromycinR marker. Because the standard DNA sequence for this marker is very GC-rich and difficult to amplify by PCR, we synthesized a new version with lower GC-content and cloned it via EcoRI and PstI into the cassette. For a cassette without a marker the SV40promoter-ZeocinR-BGHpolyA sequence was removed by digest with SalI and XhoI and subsequent religation of the backbone. This resulted in 3 different plasmids based upon the backbone pFA6 (see Fig. 7a).
The mNeonGreen ORF of these template plasmids is flanked by unique restriction sites and is therefore easily exchangeable. For introduction of new tags BamHI and SpeI sites can be used. For a high flexibility in cloning, the sticky ends of both restriction sites are compatible to sticky ends produced by other enzymes (BclI/BglII and AvrII/NheI/XbaI, respectively).
All tags listed in Table 1 and Supplementary Table S2 are cloned either by amplification from template plasmids with oligos containing restriction sites or by annealing of two oligos and are ligated into BamHI/SpeI cut backbones of pMaM523/526/541 (for detailed information see Supplementary Table S2) to retrieve template cassettes called pMaCTag (plasmid for Mammalian C-terminal Tagging) with the following naming scheme:
pMaCTag-xy: Tag xy, no marker, pMaM526 backbone
pMaCTag-Zxy: Tag xy, Zeocin marker, pMaM523 backbone
pMaCTag-Pxy: Tag xy, Puromycin marker, pMaM541 backbone
M1 and M2 tagging oligo design
The online oligo design tool (www.pcr-tagging.com) was implemented using Shiny. The interactive web application was developed in R v3.4.4 (R Core Team, 2014) with the R packages shiny v1.1.0 (Chang et al., 2018) and shinyjs v1.0 (Attali, 2017). The R package Biostrings v2.46.0 (Pagès et al., 2017) is used for searching PAM sites. The latest code is available from our GitHub repository (www.github.com/knoplab). Oligo design principles are as follows:
M1 oligo: The design of the M1 oligo is straight forward as it contains only two functional elements: the primer annealing site for PCR, which is constant in all template cassettes (TCAGGTGGAGGAGGTAGTG), and the sequence of the homology arm, which is derived from the target locus.
Example: M1 tagging oligo (for TOMM70)
Description of elements:
5’-homology (90 bases before the insertion site, direct orientation)---primer annealing site for PCR
Sequence: ATGGAGATGGCCCATCTGTATTCACTTTGCGATGCCGCCCATGCCCAGACAGAAGTTGCAAA GAAATACGGATTAAAACCACCAACATTATCAGGTGGAGGAGGTAGTG
M2 tagging oligo
The design of the M2 tagging oligo is more complex. It contains the annealing site for PCR (GCTAGCTGCATCGGTACC), the direct repeat sequence of the crRNA, which is Cas12a-variant specific, and the protospacer sequence of the crRNA, which depends on available PAM sites at the target locus, a terminator for the Pol III RNA polymerase and the homology arm, as outlined below.
Example: M2 tagging oligo (for TOMM70)
Description of elements:
3’-homology (55 bases after the insertion site, reverse orientation)---Pol III terminator---crRNA protospacer sequence---crRNA direct repeat sequence---primer annealing site for PCR
Sequence:
CAGTTGAAGAGGGGGTAAACTTTTAAAAAGAGGGTCAGTCTGCTTTCCCCCTGTTAAAAAAAG TCTGCTTTCCCCCTGTTTATCTACAAGAGTAGAAATTAGCTAGCTGCATCGGTACC
Criteria used for ranking crRNAs currently implemented in www.pcr-tagging.com, listed according to priority:
Location of the crRNA binding site in the genome in a region where it becomes destroyed upon cassette integration in order to prevent re-cleavage. This can be on either side in close proximity of the insertion site (17 nts up and downstream of the insertion site). If no suitable crRNA binding site is found in this confined search space, the software offers the option to select PAM sites in the 3’-region of the insertion site (extended search space). In this case the design of the homology arm of the M2 tagging oligo is adjusted in such a manner that the target site of the crRNA is deleted. This results in a small deletion in the 3’ UTR of the gene after the insertion site of the cassette. Since the PCR cassette contains a transcriptional terminator, we deem this to be non-critical. With these criteria, it is possible to design suitable crRNAs for C-terminal tagging of the vast majority of mammalian genes (Fig. 6b).
The protospacer sequence should preferably not contain 4 or more ‘T’-s in a row, since this might lead to premature termination of the Pol III transcription of the crRNA (Arimbasseri et al., 2013). In practice, we observed that crRNAs with ‘TTTT’ are frequently functional.
PAM sites are ranked according to literature (Gao et al., 2017; Zetsche et al., 2015). In addition, unconventional PAM sites were considered (MCCC for AsCas12a and RATR for LbCas12a), based on depositor comments on the Addgene webpage. For ranking crRNAs, conventional PAM sites are preferred.
If multiple crRNAs are fulfilling these criteria, they are ranked according to position of the STOP codon, with a preference for closer distance after the STOP.
Synthesis of M1 and M2 tagging oligo
All M1 and M2 tagging oligos were obtained from Sigma-Aldrich using a 0.05 µmole synthesis scale and are RP1 cartridge purified, unless otherwise stated (as in Fig. 3c).
PCR of template cassettes using M1 and M2 tagging oligos
PCR using long oligos is not always easy and requires optimized protocols. We routinely use a self-purified DNA polymerase for PCR (Pfu-Sso7d (Wang et al., 2004)). Alternatively, for M1 and M2 PCR also commercial high-fidelity polymerases can be used. We have tested Phusion (ThermoFisher) and Velocity polymerase (Bioline). We note that the Phusion polymerase using the buffer provided by the manufacturer does not work for PCR cassette amplification with M1 and M2 tagging oligos, whereas good amounts of the product are obtained for Velocity polymerase using the buffer provided by the manufacturer.
We found that all polymerases work well using the buffer conditions and amplification scheme shown below, yielding similar amounts of PCR cassette.
PCR mixture:
5.0 µl of 10x HiFi-buffer (200 mM Tris-HCl, pH 8.8; 100 mM (NH4)2SO4, 500 mM KCl, 1% (v/v) Triton X-100, 1 mg/ml BSA, 20 mM MgCl2)
5.0 µl of dNTPs (10 mM stock, Bioline, BIO-39026)
1.0 µl of MgCl2 (50 mM stock)
5.0 µl of betaine (5 M stock, Sigma-Aldrich, 61962)
0.3 µl of template DNA (200 ng/µl stock)
2.5 µl of M1 tagging oligo (10 µM stock)
2.5 µl of M2 tagging oligo (10 µM stock)
x µl of H2O up to 50 µl
1 µl self-purified DNA polymerase (1 U/µl), 0.5 µl Phusion or 0.25 µl Velocity polymerase
PCR was mixed on ice and was carried out in a Biometra TRIO (Analytik Jena) using the following program:
3 min at 95 °C
30 cycles of:
20 s at 95 °C
30 s at 64 °C
XX s at 72 °C (45 s per kb) (see Supplementary Table S2)
5 min at 72 °C
4 °C
After PCR, 0.4 µl DpnI or FspEI (and 1.67 µl Enzyme activator) was added to the reaction mixture and incubated at 37 °C for 1 h to digest the template that contains a selection marker that would contaminate the transfection.
PCR products were analyzed by agarose gel electrophoresis and purified using column purification (Macherey-Nagel).
Note: Sometimes a particular pair of oligos does not yield a product upon PCR. In this case it is worth testing whether adding 2 min. on top of the calculated elongation time does solve the problem. If not, it might be that synthesis of the primer went wrong. To determine the faulty primer, pair-wise PCR with established M1 M2 primers can be used to identify the faulty primer. Usually, ordering the same primer again solves the problem. Providers may wave the cost of re-ordering.
Preparation of genomic DNA
Genomic DNA was isolated from HEK293T cells according to a protocol adapted from Sambrook et al. (Greene and Sambrook, 2012). After washing with PBS, a confluently grown 6-well was lysed in 600 µl SNET buffer (20 mM Tris pH 8.0, 400 mM NaCl, 5 mM EDTA pH 8.0, 1% SDS) and 2 µl of RNase A (10 mg/ml RNAse A, 10 mM Tris-HCl pH 8.0, 10 mM MgCl2) was added for 30 minutes at room temperature. Afterwards Proteinase K (20 mg/ml Proteinase K, 50 mM Tris-HCl pH 8.0, 1.5 mM CaCl2, 50% glycerol) was added for another 30 minutes at room temperature. Proteins were precipitated using 200 µl 3 M K-Acetate solution, followed by precipitation of the DNA with Isopropanol and washing with 70% Ethanol. DNA was dried and dissolved in TE (10 mM Tris, 1mM EDTA) buffer.
Next-generation sequencing of genomic DNA with Anchor-Seq
Sequencing libraries for cassette integration sites were prepared based on our previously published Anchor-Seq protocol (Meurer et al., 2018) with some modifications to the adapter design to include unique molecular identifiers (UMIs) (Supplementary Table S4; Buchmuller et al., 2018, bioRxiv). Quantified libraries were sequenced on a NextSeq 550 sequencing system (Illumina) with a spike- in of 20% phiX gDNA. Raw reads were trimmed from technical i.e. adapter and cassette sequences using custom scripts (Julia v.0.6.0 and BioSequences v0.8.0). The trimmed reads were aligned to the human reference genome (Genome Reference Consortium Human Build 38 for alignment pipelines, ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/) using bowtie2 (v2.3.3.1 (Langmead and Salzberg, 2012)). Template cassette sequences were included in the reference genome as decoy. Aligned reads were grouped with UMI-tools (Smith et al., 2017) based on unique molecular identifiers included in the Anchor-Seq adapters. Enriched integration sites were further evaluated and counted using IGV (v2.4.10 (Robinson et al., 2011)).
Cell counting and Fluorescence microscopy
For Fig. 5b-d - Cells were grown on coverslips (No. 1.5, Thermo Fischer Scientific), washed once with PBS and fixed with 3% PFA for 10 minutes at 37 °C. After fixation, coverslips were washed 3 times with PBS, incubated in PBS containing 0.1 µg/ml 4’, 6-Diamidin-2-phenylindole (DAPI) for 10 minutes and embedded in Mowiol. Coverslips were coated with 0.1% gelatin Type B (Sigma Aldrich) for culturing C2C12 and 0.2% gelatin Type A (Sigma Aldrich) for C2C12 and mES cells, respectively. Images of RPE-1 and C2C12 cells were acquired as Z stacks using Zeiss Axio Observer Z1 equipped with 40x NA 1.3 PlanNeo oil immersion objective, and AxioCam MRm CCD camera using ZEN software. Images of mESC colonies were acquired as Z stacks using Nikon A1R confocal microscope equipped with Nikon Plan Apo λ 20x NA 0.75 objective, using NIS elements software. Maximum intensity projections of the Z stacks were prepared using Fiji (Schindelin et al., 2012; Schneider et al., 2012).
For cell counting, random fields of view were inspected in the HOECHST/DAPI channel and all nuclei present in the entire field of view were counted. Cells containing transfected fluorescent protein expressing cassettes were then counted subsequently in the same fields of view using the appropriate illumination wavelengths. In some experiments counting was done using images recorded in the same manner.
For Fig. 4c images were taken with Zeiss LSM 780 confocal microscope using a Plan-APOCHROMAT 63x, 1.40 NA Oil Objective (panels i-iii) or a Leica Spinning DMi8 Spinning Disk microscope with HC PL APO 63x, 1.40 NA Oil Objective (panel iv).
For all other Figs. For live cell imaging, cells were splitted 24h after transfection into 8 well µ-slides (Ibidi). Analyses of transfected cells were performed 3 days after transfection or as described in the figure legends. Cells were stained with Hoechst 33342 (4 µg/ml in PBS, Thermo Fisher Scientific) for 5 minutes and then the medium was changed to FluoroBrite (Thermo Fisher Scientific) supplemented with 10% FBS (Gibco) and 20 mM HEPES-KOH, pH 7.4 (Thermo Fisher Scientific).
For counting and imaging different microscopes were used: Nikon Ti-E widefield epifluorescence microscope or a DeltaVison with each 60x oil immersion objectives (1.49 NA, Nikon, 1.40 NA, DeltaVision). Z-stacks of 11 planes with 0.5 µm spacing were recorded with 100 ms exposure time. Single plane images and maximum intensity z-projections are shown. Subcellular localizations were identified and scored visually.
Western Blotting
Cells were solubilized in SDS sample buffer (50 mM Tris-Cl pH 6.8; 10 mM EDTA, 5% glycerol, 2% SDS, 0.01% bromphenol blue) containing 5% β-mercaptoethanol. All samples were incubated for 15 min at 65 °C. Denatured and fully-reduced proteins were resolved on Tris-glycine SDS-PAGE followed by western blot analysis using the following antibodies: rat monoclonal anti-HA (11867423001; Roche), mouse monoclonal anti-V5 (V8012; Sigma), anti-S-tag mouse monoclonal antibody (MA1-981; Thermo Fisher), rabbit polyclonal anti mNeonGreen Tag (53061S, Cell Signaling), rabbit anti Calnexin (ab22595; abcam).
Tissue culture
h-TERT-immortalized Retinal Pigment Epithelial (RPE-1, ATCC, CRL-4000, USA) cells were grown in DMEM/F12 (Sigma Aldrich) supplemented with 10% fetal bovine serum (FBS, Biochrom), 2 mM L-glutamine (Thermo Fisher Scientific) and 0.348% sodium bicarbonate (Sigma Aldrich). Mouse myoblast C2C12 cells (gift from Edgar R. Gomis, iMM, Portugal) were grown in DMEM High Glucose (Sigma Aldrich) supplemented with 20% fetal bovine serum (FBS, Biochrom). Mouse embryonic stem cell line E14 (gift from Frank van der Hoeven, DKFZ, Germany) were grown in Knockout DMEM (Thermo Fisher Scientific) supplemented with 10% ESC qualified FBS (Thermo Fisher Scientific), 2 mM GlutaMax (Thermo Fisher Scientific), 0.1 mM β-mercaptoethanol, 103 units of murine leukemia inhibitory factor (LIF from ESGRO, Millipore). mES cells were grown under feeder-free conditions on 0.2% gelatin Type B coated dishes (Sigma Aldrich).
HEK293T, HeLa and U2OS cells were grown in DMEM High Glucose (Life technologies) supplemented with 10% (vol/vol) fetal bovine serum (Gibco).
All cell lines were grown at 37 °C with 5% CO2, and regularly screened for mycoplasma contamination.
Selection was performed using 1 µg/ml Puromycin (Sigma Aldrich) or 500 µg/ml Zeocin (Invitrogen) for HEK293T cells. For HeLa cells 300 µg/ml Zeocin was used.
Transfection
Chemical transfection - Transfection of HEK293T, HeLa and U2OS cells was performed using Lipofectamine 2000 (Invitrogen) according to protocol of the manufacturer and using a 24-well format. If not otherwise described, 500 µg Cas12a Plasmid and 500 µg of the PCR cassette were used for transfection of one well in a 24-well plate.
Electroporation - Plasmids containing Cas12a variants and PCR cassettes were electroporated into RPE-1, C2C12, and mESCs using 2 mm gap cuvettes and NEPA-21 electroporator (Nepa Gene, Japan) according to manufacturer’s instructions. OPTI-MEM (Thermo Fisher Scientific) was used as electroporation buffer.
For electroporation of HEK293T cells the Neon Transfection System (Thermo scientific) was used according to the protocol of the manufacturer using 2 pulses of 20 ms and 1150 V.
Colony picking and generation of clonal lines
After Zeocin selection cells were trypsinized from a confluent plate and counted in a Neubauer chamber. Three cells per well were calculated and seeded in a 96-well plate. After 5 d wells were checked for single clones. After another 7-10 days cells were checked for fluorescence and positive clones were transferred to a 24-well plate.
Acknowledgements
The authors wish to thank Cyril Mongis for help with IT infrastructure, Anne Schlaitz and Frauke Melchior for critical reading of the manuscript. We acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, grant KN498/12-1), the Collaborative Research Center SFB1036, the state of Baden-Württemberg through bwHPC for high-performance computing and SDS@hd for data storage (grant INST 35/1314-1 FUGG). K.H. is supported by a HBIGS graduate school fellowship. G.P. and B.K are supported by the Collaborative Research Center SFB873 and the Heisenberg Program of the DFG (granted to G.P.).