Abstract
Protein interaction network (PIN) or interactome has been mapped vigorously for the entire genome. We recognize, nonetheless, that such a map could illuminate profound insights had its context been revealed. We describe a scalable protein lableling method that could re-supply natural context back to the map of protein interactome. Genetically encoded fluorescent proteins, position-specific genomic integration and GAL4-responsive expression control enable labeling proteins A, B and C each with a either an eGFP, mCherry or NirFP in specified cells of optically transparent animals such as Drosophila embryos. While following multiple proteins through development and behavior, these labels offer separable pairs of Förster resonance energy transfer between proteins A and B and proteins B and C. We test and observe FRET interactions between specific protein pairs controlling cytoskeleton, nuclear signaling and cell polarity. By using our protein labeling method, it will be possible to map protein interaction network in situ — isPIN.
INTRODUCTION
Protein interaction network (PIN) or interactome has been mapped vigorously for the entire genome of Drosophila (Giot et al., 2003; Guruharsha et al., 2011) and human (Hein et al., 2015; Rolland et al., 2014). We recognize, nonetheless, that such a map could illuminate profound insights had its context been revealed. Since 1994, GFP (green fluorescent protein) (Chalfie et al., 1994) and other genetically encoded fluorescent proteins (Tsien, 1998) have labeled numerous proteins. They are useful in exposing inherent distribution biases of proteins within cells of diverse animals (Lukyanov, 2011), as well as serving as an immunological tag for purification of additional proteins capable of co-complexing (Guruharsha et al., 2011; Hein et al., 2015). In addition to these applications, valuable information regarding the way proteins interact in vivo can be collected using labeled proteins. Proximity attained by a pair of proteins that are labeled with spectrally overlapping fluorophores induces Förster resonance energy transfer (FRET), i.e., transfer of energy at near field from a fluorescence donor to a fluorescence acceptor (Förster, 1948). Therefore, occurrence of FRET as described best through quantum physics (Lakowicz, 2006) serves as a visual proxy for when and where proteins of interest interact with one another (Clegg, 2010). The idea of proximity-based scrutiny of interacting proteins (Sharifai et al., 2014) is not new, as the use of two-domain protein GAL4 transcription factor of yeast (Fields and Song, 1989) has expanded successfully to non-yeast proteins (Giot et al., 2003; Rolland et al., 2014). In contrast to yeast two-hybrid assays that sensitively demonstrate proteins’ binding compatibility, we maintain the native environment of the proteins we assess although, similarly to these assays, for a pair at a time. Our optimization aims for both the quality design in labeling proteins and the tight control in expressing labeled proteins. By using our protein labeling method (see Table1), it will be possible to re-supply natural context back to the map of protein interactome.
RESULTS
To label individual proteins, we chose genetically encoded fluorescent proteins eGFP (Bierhuizen et al., 1997), mCherry (Shaner et al., 2004) and NirFP (Shcherbo et al., 2010) (FIG1). These labels serve as versatile imaging tools to study proteins within live animals. First, each of these fluorophores has structural resemblance to a “cage”. This cage-like configuration is likely the strongest natural means for protecting a set of fluorescing atoms from surrounding stress. More specifically, the extremely rare ß sheet barrels constituting these proteins are proposed to solidly safeguard a small α helix, T65-Y66-G67 in eGFP for instance, capable of exhibiting fluorescent properties (Ormo et al., 1996). Another well-known example of ß sheet barrels protecting delicate internal structures occurs among channel proteins, such as in a tetrameric potassium channel (Zhou et al., 2001), that regulate ion selectivity of the cell surface. Second, each of these fluorescent labels is physically linked to a protein of interest through a “loop.” Unlike synthetic chemicals that need conjugation through much weaker non-covalent bonds, this robustly covalent yet non-structural linkage achieves two consequences: (1) a short and constant distance between the label and the protein that is labeled, and (2) randomization of dipole orientation known as κ2 of the label with respect to the protein it labels. Unlike synthetic fluorescent dyes (Pertz et al., 2006) or fluorescence-absorbing metal atoms (Koch and Larsson, 2005), neither eGFP, mCherry, nor NirFP requires pre-imaging dissection or incubation of animals after attaining full maturation, typically within 2-4 hours. As mentioned elsewhere (Chong et al., 2015; Hudson et al., 2008; Sarov et al., 2012; Trinh le et al., 2011; Venken et al., 2011), these and other genetically encoded fluorescent proteins are extremely valuable for time lapse imaging of proteins during development and behavior of animals. eGFP, mCherry and NirFP are derived from jellyfish Aequoria and sea anemone, Discosoma and Entacmaea, respectively. Unlike their natural forms, however, custom-tuned eGFP and mCherry are each monomeric and do not dimerize by themselves. Also important, these two and NirFP do not bind one another directly when co-expressed in model organisms such as Drosophila (our observation). This is satisfactory because during imaging, both eGFP and mCherry are required to serve as a FRET donor. Any inherent affinity, whether previously known or unknown, between labeled proteins needs to be the sole reason for inducing FRET. Absorption and emission spectra of these fluorescent labels are distinct from each other. Notably, the eGFP-mCherry-NirFP trio offers two excellent yet separable FRET pairs: eGFP-to-mCherry with a Förster distance of 5.2 nm and mCherry-to-NirFP with a Förster distance of 4.4 nm. Multiplexing FRET capabilities using this trio may therefore serve as the basis for exposing the logic governing and/or the extent delimiting proteins interacting with each other (not shown). Although we are not aware of any systematic test on the functional neutrality of protein labeling, we believe that genetic knockouts of any specific proteins, if attempted, may be rescued by using fluorescently labeled but otherwise wild type forms of proteins. In fact, the very first use of GFP in fruit flies reported that genetic knockout of a certain nuclear protein could be rescued by expressing a GFP-fused wild-type form (Wang and Hazelrigg, 1994). More recently, Cdc42 protein, required for initiating the dendrites in neurons (Kamiyama and Chiba, 2009), was shown to rescue animals lacking a cdc42 gene, regardless of whether the wild type transgene was fused to eGFP or not (our observation). We did not adopt split GFP as in the GRASP method (Feinberg et al., 2008) or sequential two-step GFP formation as in the GFP11 knock-in method (Kamiyama et al., 2016) because these strategies do not distinguish transient from permanent interactions. It should be further noted that these fluorescent proteins cannot readily fuse to DNA or RNA covalently, thus limiting their potential usage for exposing binding specificity of certain transcription factors or other DNA-binding/RNA-binding proteins toward particular nucleotide sequences through FRET.
We developed expression vectors, pUAS-C-eGFP-BD-attB, pUAS-C-mCherry-BD-attB, pUAS-N-eGFP-BD-attB and pUAS-N-mCherry-BD-attB for generating fluorescently tagged proteins of interest (POIs) (FIG2). First, we transferred protein open reading frames (ORFs) to the expression vectors using the Cre-Lox recombination cloning system, similar to the strategy used to make FLAG-HA constructs for tandem-affinity purification conducted previously (Guruharsha et al., 2011). The use of 100% sequence-validated XO and XS clone sets (Yu et al., 2011) prevents any unknown or non-intentional modification of the proteins being analysed. For 286 selected genes we used 283 BO clones to produce carboxyl terminal fusions from the XO clone set and 264 BS clones to produce amino terminal fusions from the XS clone set. We generated 953 expression clones in total: 236 UGS (eGFP amino tag), 245 URS (mCherry amino tag), 228 UGO (eGFP carboxyl tag) and 244 URO (mCherry carboxyl tag). Second, the attp sequence added to each expression vector is required for targeted integration of transgenes into specific landing sites in the Drosophila genome. We used phiC31 integrase, as originally characterized in retroviruses (Groth et al., 2004). The landing sites selected, attp40 and vk01 on chromosome 2 and attp2 on chromosome 3, have high integration rates and are also known to induce sufficiently high rates of expression as GAL4-reponsive transgenes (Markstein et al., 2008; Venken et al., 2006a). It is also straightforward to introduce two transgenes onto one chromosome as a double transgene stock though intentional homologous recombination. Third, a tandem UAS sequence added to the vectors allows for expressing transgenes under the control of yeast GAL4 transcription factor (Brand and Perrimon, 1993). This enhancer hijacking in a foreign host genome, i.e., Drosophila in our case, offers total control over the time and place for co-expressing up to three transgenes all at a constant dosage.
Using this position-specific integration in the Drosophila genome, various single transgene, double transgene, and triple transgene stocks can be created (FIG3a-c). Homologous recombination occurs in the female germ line and, therefore, the use of balancer chromosomes during standard genetic crosses suppresses unwanted recombination of two transgenes in a single chromosome, e.g., chromosome ii. As illustrated schematically, intentional uses of balancers afford stress-free combination of multiple transgenes, with all undergoing expression at constant dosage. Incidentally, a single genetic dosage is sufficient for detecting fluorescence from all labeled proteins, as well as detecting FRET quantitatively. The low-dosage expression also minimizes photo-toxicity during repeated and/or time-lapse imaging. The genotype of typical experimental animals has three GAL4-responsive transgenes for co-expressing proteins A, B and C labeled with eGFP, mCherry and NirFP, respectively, allowing for visualization of co-localizing POIs (FIG3d). Similarly, when a protein couple is present, the two may exhibit specific affinity with each other and can be directly demonstrated through FRET (FIG3e). Nevertheless, all FRET quantifications are to be compared to “blank” controls in which the fluorescently labeled POI is co-expressed with un-fused (cytoplasmic) fluorescent labels.
A label may be added to either the amino or carboxyl terminus of any POI, offering independent ways to visualize the same protein in the same microenvironment (FIG4a). The same protein may be labeled by any of the three colors (FIG4b). In our experience, choice in the label position or label color has not altered the intensity or distribution pattern of fluorescence, supporting their interchangeability as labels. The GAL4 driver may be ubiquitously expressed (e.g., tub-GAL4 drives expression in all cells throughout development), specific to a whole tissue (e.g., elav-GAL4 drives expression in all neuronal but not glia cells) or to a single cell (e.g., eve- GAL4 drives expression in aCC motoneurons in addition to frequent pCC interneurons and RP2 motoneurons in all 14 segments of the ventral nerve cord) (FIG4c). In neuroscience, the question of presynaptic versus postsynaptic components of synapses is important as discussed on long-term potentiation, for example (Sanes and Lichtman, 1999). Being able to limit expression to either the presynaptic neurons, postsynaptic neurons or target cells would be valuable. In a general sense, biology of multi-compartmental cells can be scrutinized in ways not easily possible with endogenous or other non-GAL4 controls. The three fluorescent colors may label three different proteins A, B and C in the same cell or cell type defined by a particular GAL4 driver (FIG5). There, proteins may be observed in patterns that are unique to each tissue and/or a single cell within a given animal. Co-expression is necessary but not predictive of their interaction, nevertheless. In fact, patterns of expression for each protein as well as those of co-expression may not correlate at all to when and where FRET occurs (FIG6). Therefore, the true protein interactomic map awaits future characterization though direct imaging of fluorescently labeled proteins within animals, i.e., the natural context.
DISCUSSION
The labeling method described in this paper employs the exogenous GAL4 system. Potential drawbacks of this, or any other methods not using the endogenous expression control, are two-fold. First, the timing and dosage of expression can be inappropriate for assessing the function of the protein of interest. The endogenous expression patterns are often available through mRNA in situ hybridization and/or antibody staining of their gene products. In theory, one can focus only on sites and times that are justifiable. Even so, any difference between the native endogenous expression and controlled exogenous expression can be a cause of concerns. Second, the endogenous gene expression remains untouched. Consequently, labeled protein A would potentially interact with both exogenously expressed and endogenously present labeled protein B and un-labeled protein B, respectively. When looking for evidence of FRET, there is always a possibility of under-estimating the true interaction probability between proteins A and B. (Note: Protein trapping, that in theory circumvent issues arising from endogenous versus exogenous expression controls, has been designed for the Drosophila studies using green and red fluorescent proteins (Quinones-Coello et al., 2007; Venken et al., 2011); but the method described in this paper integrates notable advantages — see below.)
Our method incorporates several useful design concepts (see Table 1). In particular, the use of genetically encoded fluorescent proteins guarantees a one-to-one relationship between the label and the labeled. Also, their sturdy yet flexible linkers are thought to drastically simplify FRET computation. In addition, the GAL4 expression system, despite the discussion above, offers an unparalleled control over the time and place of expression, as well as maintaining it at a constantly low dosage. Furthermore, with position-specific transgene integration, the variability of transgene expression levels — a major source of complication in many genetic experiments — becomes minimal if not a non-issue in practice. As an added advantage, it is relatively straightforward to combine multiple labeled proteins through a series of genetic crosses. An additional detail not to be overlooked is that with an ever-expanding list of available GAL4 drivers (Jenett et al., 2012), the access to single cells becomes increasingly feasible. This would grant a significant advantage to neuroscience where axons, dendrites and synapses of the same cell can manifest distinct properties. Altogether, co-localization of three proteins A, B and C, as well as possible interaction between A and B, independent of the interaction between B and C, may all be documented. Finally, principles from most of these genetic engineering tools may be applicable to other metazoan model organisms such as nematodes, zebra fish and small mammals. Whether or not the concepts used in design of labeling or expression control may vary in detail, proteins that are labeled fluorescently in diverse animal systems will open a way to map freely interacting proteins within their natural context.
METHODS
Acceptor Vector construction
To generate the C-terminus tagged vectors, the pUAST expression vector (Brand and Perrimon, 1993) was cut with NotI and XbaI. The pUAST-C-TAP-BD expression vector (Stapleton) was cut with NotI and XbaI to release a 256 bp fragment containing loxP, chloramphenicol promoter and splice acceptor sequences. This 256 bp C-terminus BD cassette was inserted into the linearized pUAST vector by overnight ligation. The resulting C-terminus BD-adapted pUAST vector was linearized again by cutting it with XbaI. To generate the N-terminus tagged vectors, the pUAST expression vector was cut with EcoRI and XhoI. The pUAST-N-TAP-BD expression vector was cut with EcoRI and XhoI to release a 167 bp fragment containing loxP and chloramphenicol promoter sequences. This 167 bp C-terminus BD cassette was inserted into the linearized pUAST vector by overnight ligation. The resulting N-terminus BD-adapted pUAST vector was linearized again by cutting it with EcoRI. The CDS of the EGFP and mCherry fluorescent tags were obtained by PCR from the pEGFP-C1 (Clontech) and pRSET-B-mCherry (Invitrogen) vectors, respectively. C-terminus fluorescent tags were amplified using XbaI-tailed PCR primers, while N-terminus tags were amplified using EcoRI-tailed PCR primers (see below). The resulting PCR products were agrose gel-purified and cut with either XbaI (C-terminus tag) or EcoRI (N-terminus tag). Following ethanol precipitation, the XbaI digested C-terminus fluorescent tags were inserted into the XbaI linearized BD-adapted pUAST vector by overnight ligation. The EcoRI digested N-terminus tags were inserted into the EcoRI linearized BD-adapted pUAST vector by overnight ligation. The resulting C- or N-terminus fluorescent-tagged and BD-adapted pUAST vectors are cut with BamHI to release the portion of the vector containing the 5x UAS promoter, hsp70, BD cassette, fluorescent tag and SV40 terminator sequences. Agarose gel-purify the BamHI digested fragments. We linearized the pBDP vector by cutting it with BamHI and ethanol precipitated the digested product. We inserted the purified C- or N-terminus fluorescent-tagged and BD-adapted BamHI fragments from the pUAST vectors into the linearized pBDP vector by overnight ligation. The resulting vectors are pUAS-C-EGFP-BD-attB, pUAS-C-mCherry-BD-attB, pUAS-N-EGFP-BD-attB and pUAS-N-mCherry-BD-attB. (Note: Expression vectors were designed similarly for labeling POI with NirFP but are yet unavailable at the DGRC.)
Sequences: 256bp C-terminus BD cassette: 5’-GGCCGCATAACTTCGTATAGCATACATTATACGAAGTTATAGATCCAATATTATTGAAGCAT TTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAAT AGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTGGATCTCGAGCTCAAGCT TCGAATTCAGGGTTTCCTTGACAATATCATACTTATCCTGTCCCTTTTTTTTCCACAGCTACC GGTCGCGT-3’ 167bp N-terminus BD cassette: 5’-AATTCATAACTTCGTATAGCATACATTATACGAAGTTATAGATCCAATATTATTGAAGCATTT ATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATA GGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTC-3’ XbaI-tailed PCR primers: C-mCherry-FWD 5’-cccctctagaGTGAGCAAGGGCGAGGAGGATAACATG-3’ C-mCherry-REV 5’-ggggtctagaTTACTTGTACAGCTCGTCCATGCCGC-3’ C-EGFP-FWD 5’-cccctctagaGTGAGCAAGGGCGAGGAGCTGTTC-3’ C-EGFP-REV 5’-ggggtctagaTTACTTGTACAGCTCGTCCATGCCGAG-3’ EcoRI-tailed PCR primers: mCherry-N-FWD 5’-ccccgaattcacaccATGGTGAGCAAGGGCGAGGAGGAT-3’ mCherry-N-REV 5’-ggggaattcccCTTGTACAGCTCGTCCATGCCGCC-3’ EGFP-N-FWD 5’-ccccgaattcacaccATGGTGAGCAAGGGCGAGGAGCTG-3’ EGFP-N-REV 5’-ggggaattcccCTTGTACAGCTCGTCCATGCCGAG-3’
Expression Clone sets
We transferred ORFs from the BDGP Drosophila melanogaster XO and XS expression-ready clone sets to the pUAS-C-EGFP-BD-attB, pUAS-C-mCherry-BD-attB, pUAS-N-EGFP-BD-attB and pUAS-N-mCherry-BD-attB acceptor vectors (Yu et al., 2011). For recombination reactions, 200 ng of expression-ready Donor clone and acceptor Vector were recombined in a final volume of 10 µl for 15 minutes at 25 °C in a thermal cycler in the presence of Cre recombinase (0.2 units) and recombinase buffer supplemented with BSA (0.1 mg/ml) (Clontech #631614) according to Clontech manual PT3460-1 (now part of Takara Bio). Cre recombinase was inactivated by incubating the reaction at 70 °C for 10 minutes. From this reaction, 5 µl was transformed into chemically competent TAM-1 cells in a 96-well plate format (Active Motif #11096) and selected for Chloramphenicol resistance. Each clone was sequence verified to check for target mismatches using BigDye Terminator v3.1 ready reaction mix (Applied Biosystems #4337457) and the sequencing primer: 5’-GCCAATGTGCATCAGTTGTGGTC-3’. Sequencing samples were analyzed on a conventional capillary electrophoresis instrument (e.g., ABI 3730/3730xl DNA Analyzer). Glycerol stocks were generated and stored for each isolate. Clones can be obtained from the Drosophila Genome Resource Center (DGRC) in Bloomington, IN and sequence from GenBank using the accession numbers provided in Supplemental Table.
Drosophila stocks. tub-GAL4 - source: Bloomington Stock #60298 was used for ubiquitous expression. elav-GAL4 - source: Bloomington Stock # 8760 was used for pan-neuronal expression. eve-GAL4[RN2-GAL4-E] - source: M. Fuijoka was used for expression in aCC/RP2/pCC neurons. Transgenic flies carrying UAS-eGFP::POI (attp40 site at 25C7 - source: www.geneticservices.com), UAS-mCherry:POI (vk01 site at 59D3 - source: Bloomington Stock # 9722) on the second chromosome and UAS-NirFP::POI (attp2 site at 68A4 - source: Bloomington Stock # 8622) on the third chromosome were constructed as described (Sharifai et al., 2014). All attp landing sites were selected because of their integration rates varying as high as 25-35%, i.e., more efficiently than P element transformation of about 1%. P Bac(y+.attp-3b) attp2 and P Bac(y+.attp-3b) att40 are from the collections described in (Groth et al., 2004) and (Szabad et al., 2012). They are located at map positions 25C6 (attp40) on the second chromosome and 68A4 (attp2) on the third chromosome. PBac(y+.attp-3b) vk01 is from the Venken Bac collection (Venken et al., 2006b) and is located at 59D3 of the second chromosome. Besides their high integration rates, these integration sites were used also for high expression rates and low background effects (Markstein et al., 2008; Pfeiffer et al., 2010) (Zusman et al. unpublished observations). For all integration sites the integrase-producing transgene P(nos-phiC31\int.NLS) X (Bischof et al., 2007) was used.
Imaging
UAS-transgene carrying males were crossed with GAL4 driver carrying virgin females in mating cages using grape juice agar plates coated with yeast to collect embryos. Mating cages were placed at 25 °C and collection plates were swapped after one hour to gather embryos for a given developmental stage. Embryos were manually dechorionated and developmental staging was confirmed by gut morphology. Embryos were placed on double-sided tape attached to a glass slide with a silicone well and then immersed in HL3.1 hemolymph buffer for imaging. We used an upright microscope with a x40 or x63 water-immersion objective lens to image the sample. However, when an inverted microscope was used, a coverslip was placed over the buffer-immersed embryos using elastic silicone. Subsequently, the slide was flipped upside-down and mounted on a x40 or x63 oil-immersion objective lens. In some cases, devitillenization, filet dissection, or both, were performed to increase signal intensities of eGFP, mCherry and NirFP. A ratio of mCherry over eGFP being between 0.1 and 10 was found essential for obtaining reliable FRET measurements, consistent with previous findings (Berney and Danuser, 2003). No samples created through our transgenic method produced a sign of “false positive” FRET, i.e., a false decrease in donor fluorescence lifetime. In contrast, experiments using standard retrovirus transfection methods in human and insect cell lines had an expression level of mCherry, the FRET acceptor, beyond the ratio of 100 and produced such a “false positive” FRET (not shown).
FRET probability
The probability of FRET is defined as a product of the concentration and lifetime reduction of a FRET donor, e.g., eGFP or mCherry. Whereas the concentration of a given protein varies from pixel to pixel, the fluorescence lifetime τ is calculated independently of its concentration. Consequently, a polar histogram allows for normalization of the FRET probability irrespective of the size of proteins interacting with each other. This size-free unit of protein interactome is termed “interaction” (Sharifai et al., 2014).
Acknowledgements
We thank TK Harris and Rajeev Probhakar for structural biology of GFP, Konstantin Lukyanov for chemistry of NirFP, Edward Giniger, Francisco Raymo, and Peter Larsson for discussions on FRET-based protein imaging, Grace Zhai for discussion on Drosophila, Laura Bianchhi and Kevin Collins for discussion on C. elegans, Julia Dallman and Sandra Rieger for discussion on zebra fish, and Vance Lemmon, Patelis Tsoulfas and Tom Lisse for discussion on mice.
Supported in part by research awards RC2-NS069488 from American Recovery and Reinvestment Act and National Institutes of Health and MH079432 from National Institutes of Health to AC, and research award P41HG3487 from National Human Genome Research Institute to SC, and performed under U.S. Department of Energy Contracts DE-AC0376SF00098 and DE-AC02-05CH11231.
Footnotes
Contributions: CH, TD, SC and AC conceived the project. CY, KW and SC designed and made the eGFP and mCherry expression vectors and expression clones. TD, MB and NS made NirFP expression clones. SZ and TD made transgene lines. TD and CH established stocks. CH, MD, TD, NS and HS performed microscopy, and TY, JD and MK assisted. AC with inputs from other co-authors wrote the paper.