Abstract
Here we describe a C-SWAT library for high-throughput tagging of Saccharomyces cerevisiae ORFs. It consists of 5661 strains with an acceptor module inserted after each ORF, which can be efficiently replaced with tags or regulatory elements. We validate the library with targeted sequencing and demonstrate its use by tagging the yeast proteome with bright fluorescent proteins, determining how sequences downstream of ORFs influence protein expression and localizing previously undetected proteins.
Genome-wide libraries of strains where every open reading frame (ORF) is fused to a constant tag are valuable resources for proteome-wide studies in Saccharomyces cerevisiae. Different libraries are available to assess properties such as protein localization, abundance, turnover and protein-protein interactions for a large fraction of the yeast proteome1-6. However, construction of such libraries is costly and time-consuming, which hampers genome-wide endeavors with improved tags such as novel fluorescent proteins, tags bearing sequences for RNA detection7 or regulation of gene expression8.
To overcome these limitations, we recently developed the SWAp-Tag (SWAT) approach for high-throughput tagging of yeast ORFs and used it to N-terminally tag proteins of the endomembrane system9. This approach requires a one-time construction of SWAT strains where individual ORFs are marked with an acceptor module (Fig. 1a). New strains can be rapidly derived from SWAT strains using automated procedures to replace the acceptor module with practically any tag or regulatory element provided on a donor plasmid9 (Fig. 1a).
To apply the SWAT approach to the whole yeast proteome, here we introduce a genome-wide C-SWAT library. This library enables high-throughput genome engineering at 3’ ends of yeast ORFs and can be used for high-throughput C-terminal protein tagging. We constructed C-SWAT strains using conventional PCR targeting10,11 to insert a C-SWAT acceptor module before the stop codon of individual ORFs at endogenous chromosomal loci (Fig. 1a). The acceptor module consists of homology arms (L3 and L4, for subsequent recombination with the desired tag), a heterologous transcription terminator (T), a restriction site for the I-Scel endonuclease (♦), a selection/counter-selection marker (URA3) and a second truncated selection marker (hphΔN) (Fig. 1a, Supplementary Information).
To verify correct integration of the acceptor module in each C-SWAT strain, we developed a high-throughput targeted sequencing approach (Anchor-Seq) to sequence the junctions between the 3’ end of each tagged ORF and the 5’ end of the acceptor module (Fig. 1b). In Anchor-Seq, genomic DNA is isolated from a pooled library of strains, where a different ORF is modified in each strain. The junctions of interest are then selectively amplified using vectorette PCR12 and subjected to high-throughput sequencing (Fig. 1b, S1, Supplementary Information). We performed Anchor-Seq on pools of six replicates of the C-SWAT library, corresponding to six independent transformants for each ORF. In total, we obtained validated C-SWAT strains for 94% of verified or uncharacterized S. cerevisiae ORFs and for 238 dubious ORFs (Fig. 1c, Table S1).
To tag ORFs with the C-SWAT library, a construct for conditional expression of the I-SceI endonuclease and a donor plasmid carrying the desired tag can be introduced into C-SWAT strains in high-throughput by automated genetic crossing with a donor strain9 (Fig. 1a, Supplementary Information). Three types of donor plasmids with different selection strategies can be used: type I for seamless replacement of the acceptor module with the tag and counter-selection for the loss of the acceptor module, type II for selection of tagging events via reconstitution of the hygromycin resistance marker (hph) and type III, which introduces the tag together with a new selection marker (Fig. S2a). We estimated the tagging efficiency with these strategies using C-SWAT strains for 20 highly expressed genes. We observed an average tagging efficiency of ~98% with a type I donor and > 99% with the other two donor types (Fig. S2b). This demonstrates that the C-SWAT library can be used for high-throughput strain construction without the need for subsequent clonal selection. We note that for 1-4% of ORFs, endogenous repetitive sequences surrounding the tag integration site could interfere with seamless tagging using the C-SWAT approach13.
The yeast GFP library1, in which 4159 ORFs are tagged with GFP(S65T)14, has been widely used to study the yeast proteome. However, since the construction of this library, various fluorescent proteins with improved properties have been developed. The C-SWAT library provides a platform to profit from these developments. We found that in yeast the green fluorescent protein mNeonGreen15 and the red fluorescent protein mScarlet-I16 are up to three times brighter than fluorescent proteins used in previous libraries1,4,9 (Fig. S3). Using the C-SWAT library, we tagged the yeast proteome with mNeonGreen and mScarlet-I, generating three new libraries: mNG-I (seamless tagging with mNeonGreen), mNG-II and mSC-II (where mNeonGreen and mScarlet-I are followed by a heterologous terminator) (Fig. 2a). We determined the expression levels of proteins tagged in these libraries using fluorescence measurements of colonies4. Over 4300 proteins were expressed at detectable levels (> 1.2 fold above background) in each library (Fig. 2b, Table S2). This is consistent with the number of proteins detected with mass spectrometry in yeast grown under standard laboratory conditions17. Protein expression levels correlated well between the three libraries (Fig. S4a) and with independent estimates of protein abundance17 (Fig. S4b), demonstrating the reproducible and reliable nature of proteome-wide tagging with the C-SWAT library.
Having libraries with seamless and non-seamless protein tags allowed us to examine how regulatory elements downstream of each ORF contribute to protein expression. We observed that protein levels were on average ~20% higher in mNG-II strains (non-seamless tagging) compared to mNG-I strains (seamless tagging) (Fig. 2c). Protein levels differed by more than two fold for ~11% of the proteome, with 466 and 10 proteins exhibiting lower and higher expression in mNG-I strains, respectively. Moreover, the difference between mNG-I and mNG-II strains correlated with the strength of the endogenous transcription terminator for each ORF18 (Spearman’s rank correlation coefficient = 0.509, Fig. 2d). Consistent with these observations, the heterologous ADH1 terminator used in mNG-II and mSC-II libraries is stronger than terminator sequences of most yeast ORFs18. Together these results demonstrate that tagging modules with heterologous terminators, commonly used for C-terminal protein tagging in yeast19,20, can measurably impact protein expression and suggest applications of the C-SWAT library to study regulation of gene expression.
We observed expression of 208 proteins in the mNG-I and mNG-II libraries that were previously undetected with various independent approaches17 (Fig. 3a). Compared to the entire C-SWAT library (Fig. 1c), this group is enriched in ORFs annotated as uncharacterized (62 ORFs) or dubious (i.e., unlikely to encode functional proteins based on available data, 133 ORFs). We used fluorescence microscopy to examine the localization of 60 such proteins. Notably, for 9 of them (5 uncharacterized and 4 dubious ORFs) we could detect expression and a specific non-cytosolic localization even in mNG-I strains, where transcription is not influenced by a heterologous terminator (Fig. 3b, c, Table S2), suggesting that these are indeed functional proteins. With updates to the yeast genome annotation, 80 of 238 dubious ORFs in the C-SWAT library were recently reclassified as verified or uncharacterized (Table S2). We note that the reclassified and the remaining dubious ORFs exhibit similar expression levels when tagged with mNeonGreen (Fig. S4c), raising the possibility that more dubious ORFs actually encode functional proteins21.
In conclusion, the C-SWAT library is a versatile resource for exploring the yeast genome and proteome. With this tool at hand, the ORFeome can be efficiently manipulated to generate libraries with a variety of tags for protein or RNA detection, to study regulation of gene expression or to explore genomic position effects. It is our hope that the simplicity and cost-effectiveness of C-SWAT will make construction of custom genome-wide libraries routine and facilitate systematic studies.
Author contributions
MK, MM and AK planned the work. YD and MM, together with ES, constructed the library, with help from BCB, VD, KH, FH, DK, IK, MŠ, KVL, AK and MK. ES and EDL developed Anchor-Seq and EDL analyzed the sequencing data, with input from MM and KH. YD, MM, IK, AK and DK generated and analyzed the mNeonGreen and mScarlet-I libraries. AK and MK wrote the manuscript with EDL and ES, with input from all authors.
Acknowledgments
This work was supported by the Deutsche Forschungsgemeinschaft (DFG) Collaborative Research Center SFB1036 (MK, AK, TPD, MKL), the Weizmann Institute of Science (ES and EDL), the China Scholarship Council (YD), fellowships from the HBIGS graduate school (IK, KH and FH), an SFB1036 travel grant (ES), the Alexander von Humboldt Foundation (MŠ) and, partially, by the DFG grant KN498/11-1 (MK), the I-CORE Program of the Planning and Budgeting Committee grants 1775/12 and 2179/14 (EDL) and the HFSP Career Development Award CDA00077/2015 (EDL). We thank the CellNetworks Deep Sequencing Core Facility (Heidelberg University), the Genomics Core Facility (EMBL) and acknowledge generous support of the DFG for data storage (LDSF2).