High-throughput yeast two-hybrid library screening using next generation sequencing

Yeast two-hybrid (Y2H) is a well-established genetics-based system that uses yeast to selectively display binary protein-protein interactions (PPIs). To meet the current need to unravel complex PPI networks, several adaptations have been made to establish medium- to high-throughput Y2H screening platforms, with several having successfully incorporated the use of the next-generation sequencing (NGS) technology to increase the scale and sensitivity of the method. However, these have been to date mainly restricted to the use of fully annotated custom-made open reading frame (ORF) libraries and subject to complex downstream data processing. Here, a streamlined high-throughput Y2H library screening strategy, based on integration of Y2H with NGS, called Y2H-seq, was developed, which allows efficient and reliable screening of Y2H cDNA libraries. To generate proof of concept, the method was applied to screen for interaction partners of two key components of the jasmonate signaling machinery in the model plant Arabidopsis thaliana, resulting in the identification of several previously reported as well as hitherto unknown interactors. Our Y2H-seq method offers a user-friendly, specific and sensitive screening method that allows high-throughput identification of PPIs without prior knowledge of the organism’s ORFs, thereby extending the method to organisms of which the genome has not entirely been annotated yet. The quantitative NGS readout and the incorporation of background controls allow to increase genome coverage and ultimately dispose of recurrent false positives, thereby overcoming some of the bottlenecks of current Y2H technologies, which will further strengthen the value of the Y2H technology as a discovery platform.


23
Yeast two-hybrid (Y2H) is a well-established genetics-based system that uses yeast to 24 selectively display binary protein-protein interactions (PPIs). To meet the current need to 25 unravel complex PPI networks, several adaptations have been made to establish medium-to 26 high-throughput Y2H screening platforms, with several having successfully incorporated the 27 use of the next-generation sequencing (NGS) technology to increase the scale and sensitivity 28 of the method. However, these have been to date mainly restricted to the use of fully 29 annotated custom-made open reading frame (ORF) libraries and subject to complex 30 downstream data processing. Here, a streamlined high-throughput Y2H library screening 31 strategy, based on integration of Y2H with NGS, called Y2H-seq, was developed, which allows 32 efficient and reliable screening of Y2H cDNA libraries. To generate proof of concept, the 33 method was applied to screen for interaction partners of two key components of the 34 jasmonate signaling machinery in the model plant Arabidopsis thaliana, resulting in the 35 identification of several previously reported as well as hitherto unknown interactors. Our 36 Introduction 44 Disentangling protein-protein interaction (PPI) networks is crucial for our understanding of 45 cellular organization and function. To achieve this, a wide range of technologies to identify 46 PPIs has been developed over the last decade [1,2]. One of the most advanced and 47 commonly used methods to identify PPIs in vivo under near-physiological conditions is 48 affinity purification coupled to mass spectrometry (AP-MS) [3][4][5]. Equivalent comprehensive 49 assays to specifically identify binary PPIs include protein domain microarrays and in vivo 50 protein fragment complementation assays (PCAs) [6-10]. The principle of PCA is based on the 51 fusion of two hypothetically interacting proteins (bait and prey) to two fragments of a 52 reporter protein. Interaction between the bait and prey proteins results in the reassembly of 53 the reporter protein, followed by its activation. The signal readout can be bioluminescence, 54 fluorescence or cell survival. In the popular yeast two-hybrid (Y2H) method, the bait protein 55 is fused to the DNA binding domain (DBD) and the prey (or prey library in the case of a 56 comprehensive Y2H screening) is fused to the activation domain (AD) of a transcription 57 factor (TF) [11]. Upon association of the hypothetical interactors, the TF is functionally 58 reconstituted and drives the expression of a reporter gene that can be scored by selective 59 growth. Typically, conventional medium-throughput Y2H library screenings are subject to 60 laborious one-by-one clonal identification of interaction partners, but today, proteome-wide 61 mapping of PPIs demands a high-throughput approach. This led for instance to the 62 development of a matrix-based Y2H method that bypassed the inefficient identification by 63 DNA sequencing [12]. Collections of bait and prey strains were automatically combined and 64 arrayed on fixed matrix positions and PPIs were scored as visual readouts. A major drawback 65 of this strategy is the need for pre-assembled libraries based on defined gene models and 66 expensive robotics that are not accessible to every researcher. 67 Clonal identification of Y2H screening with DNA sequencing has a tremendous negative 68 effect on the efficiency, cost and labor of the method. Furthermore, given the labor-penalty 69 involved with increasing transformation titers, the clonal identification of Y2H interactions is 70 usually not compatible with quantitative assessment of PPI abundances. Therefore, replacing 71 the conventional Y2H screening strategy with a pool-based selection and global 72 identification by NGS, can have three major implications: (i) cost reduction by high-capacity 73 sequencing, (ii) higher sensitivity and (iii) quantification of the abundance of bait-specific 74 interactions. bait and prey clone are essential to associate barcodes to ORFs, which may pose a cost 91 restriction for massive screening purposes. The latter was addressed in CrY2H-seq, which 92 introduced a Cre-recombinase interaction reporter that endorses fusion of the coding 93 sequences of two interacting proteins, followed by NGS to identify these interactions en 94 masse [16]. The latter method was employed to uncover the transcription factor 95 interactome of A. thaliana. 96 All of the above-mentioned Y2H-NGS strategies focus on increased capacity, efficiency 97 and sensitivity, although they may face some lack in specificity or do not fully exploit the 98 quantification potential of NGS coupled to Y2H. Furthermore, construction of full-length ORF 99 libraries are necessary, thereby restricting these methods to organisms of which the 100 genomes are well annotated or to 'defined' gene models, which for instance cannot take 101 alternative splicing, alternative start codon use or transcript processing into account. 102 Here, we discuss a user-friendly and standardized Y2H-NGS workflow ('Y2H-seq'), 103 complementary to the matrix-Y2H approaches, which allows rapid identification of 104 interaction partners of a bait of interest in the organism of choice without the need for 105 expensive robotics. The Y2H-seq screening method generates a quantitative readout that, 106 through the use of control screens, allows to eliminate false-positive PPIs to boost the 107 specificity of the method and thereby avoiding unnecessary downstream experimental 108 binary interaction verification. Furthermore, the method is not dependent on predefined 109 and prefabricated ORF libraries but on cDNA libraries, and is therefore principally applicable 110 to every organism regardless of the annotation status of its genome. The functionality of our 111 Y2H cDNA library used to perform the Y2H screening 138 The ProQuest two-hybrid cDNA library was generated by cDNA synthesis from RNA extracted 139 from A. thaliana suspension cells AT7, cloned into pEXP-AD502 vector (ProQuest), equivalent 140 to pDEST TM 22 vector (Thermo Fisher Scientific) and electroporated in the DH10B-Ton A (T1 141 and T5 phage resistance) cells (Thermo Fisher Scientific). The average insert size was 1.1 kb 142 and the number of primary clones was 5.3 x 10 6 cfu with a 100% insert coverage.  Table)  151 and Sanger-sequenced. 152

Semi-quantitative qPCR 153
Colonies of the Y2H screening plates were dissolved and pooled in 10-15 mL of ultrapure 154 water and plasmids were collected using the Zymoprep TM Yeast Plasmid Miniprep II kit 155 (Zymo Research, Irvine, CA, USA). Prey constructs were amplified via PCR using Q5® High-156 Fidelity DNA Polymerase (New England Biolabs, Ipswich, MA, USA) and generic pDEST TM 22 157 primers that bind to the GAL4AD and the region flanking the attR1 site (S2 Table). The 158 following program was used: initial denaturation (98°C, 30 s), 35 amplification cycles 159 (denaturation 98°C, 10 s; annealing 55°C, 30 s; elongation 72°C, 2.5 min), final extension 160 (72°C, 5 min). The PCR mixture was purified using the CleanPCR kit (CleanNA, Alphen aan 161 den Rijn, The Netherlands) and 40 ng of the purified PCR product was used for semi-162 quantitative qPCRs, which were carried out with a Lightcycler 480 (Roche Diagnostics, 163 Brussels, Belgium) and the Lightcycler 480 SYBR Green I Master kit (Roche). Specific primers 164 (S2 Table)

233
The Y2H-seq flow-chart 234 An illustration of the general workflow of our Y2H-Seq strategy is given in Figure 2. As 235 indicated above NINJA and TPL-N were used as baits and a Y2H cDNA library originating from 236 A. thaliana AT7 suspension cells was used as prey. 237 After transformation of the Y2H reporter strain PJ69-4α with the bait plasmids, a first 238 checkpoint is introduced, in which the bait strains were individually co-transformed with 239 positive and negative control prey expression clones to verify functional expression of the 240 baits, to exclude possible auto-activation and to corroborate binding with previously 241 reported interaction partners (Fig 1). Next, the bait strains were used for Y2H-seq screening 242 with the A. thaliana Y2H cDNA prey library. Simultaneously, a control screening was 243 performed with the empty expression vector, which will hereafter be referred to as EMPTY. 244 Subsequent to five days of selective growth of the transformed yeast cells, the prey cDNA 245 inserts of about ten individual yeast colonies per screen were Sanger-sequenced (Fig 1). This 246 second checkpoint allowed us to confirm the retrieval of reported interactors as preys. 247 Subsequently, all yeast colonies that survived selective growth were pooled per screen and 248 the cDNA inserts of the prey plasmid pools were amplified by PCR. A third checkpoint 249 consisted of a qPCR analysis with specific primers for genes corresponding to known bait 250 interactors, which allows to assess the representation of known interactors in both screens 251 in a quantitative manner (Fig 1). Prey abundance was quantified relative to that in the A. As expected, the binary interaction between the NINJA bait and the preys PPD1, JAZ1, JAZ2 278 and JAZ4 was confirmed (Fig 3A). Likewise, the TPL-N bait strain showed interaction with the 279 preys auxin/indole-3-acetic acid 17 (IAA17) and NINJA (Fig 3B). Furthermore, neither of the 280 bait strains exhibited auto-activation, which indicated that NINJA as well as TPL-N were 281 functionally expressed in the bait strains. 282

Checkpoint 3: semi-quantitative qPCR, a complementary approach to evaluate the quality 307 of a Y2H-seq screening 308
In a third checkpoint, the quality of the Y2H-seq screening was further assessed. All 309 selectively grown yeast colonies were pooled per screening (Fig 2) and cDNA inserts of the 310 prey plasmid pools were PCR-amplified with vector-specific primers (S2 Table). To examine 311 whether potential interaction partners of the baits were overrepresented relative to the 312 cDNA library control, a qPCR was performed using prey-specific qPCR primers (S2 Table). In 313 the NINJA screen, compared to the control library, the genes encoding JAZ1, JAZ2, JAZ12, 314 TIFY8 and PPD1 were overrepresented (Fig 5), in agreement with previous literature reports 315 [17,34]. Hence, this shows the value of this qPCR assay set-up as a final checkpoint before 316 the actual Y2H-seq analysis, at least for baits with a limited set of known interactors. 317 In contrast to NINJA, TPL can interact with potentially hundreds of proteins [18]. Of the 318 EAR-motif containing proteins known to interact with TPL and identified in the second 319 checkpoint, only enrichment of IAA30 in the TPL-N pool could be observed (Fig 6, Table 2). 320 Y2H cDNA library screenings are prone to false negatives, i.e. missing interactions, due 321 among others to aberrant folding, clones with truncated genes or absence of the gene in the 322 cDNA library. In the case of TPL-N, for example, the NINJA clone that is represented by the A.  First, a quality check was performed on the raw reads. Thereby, adapters, low-quality 345 sequences and partial vector sequences were trimmed. Concomitantly, paired-end and 346 orphan single-end reads were split. The processed reads were then mapped to the reference 347 genome (TAIR10) using TopHat. To avoid overestimation of short genes, only one mate-pair 348 per read was used for mapping. The resulting alignments were used as input for Cufflinks, 349 which generates the raw expression quantification data for each of the analyzed raw 350 sequencing files. For the subsequent analysis of the raw expression data, a Y2H-seq pipeline 351 was drafted in R-studio. 352 Mapped genes in the TPL-N and NINJA Y2H screenings with raw read counts less than six 353 were eliminated. Genes in the EMPTY screening that had no raw read counts were given an 354 arbitrary value of 1 and flagged as imputed. After calculating the Fragments Per Kilobase of 355 Exon Per Million Fragments Mapped (FPKM) values, the signal to noise ratio (SNR) was 356 defined for NINJA and TPL-N compared to EMPTY. Intuitively, one would expect little NGS 357 data to be derived from the EMPTY screening, given that no yeast cells survived selective 358 growth (Fig 4). However, this was not the case and can be explained by the pooling method 359 employed here: 'scraping' all yeast cells from the selection plates includes also dead or 360 nearly dead cells that may still contain intact prey plasmids. Hence, genes with a high 361 representation in the cDNA library, and thus genes with a high expression level in 362 Arabidopsis suspension cells, are identified in the EMPTY NGS data set. 363 Next, to allow setting relevant arbitrary thresholds, the 99.5 th percentiles of SNRNINJA/EMPTY 364 and SNRTPL-N/EMPTY were calculated, leading to thresholds of 7.2 for NINJA and 6.0 for TPL-N 365 screenings, respectively (Tables 3 and 4). With this first threshold, overall, from the 71 366 potential interactors of NINJA, seven were known to be interactors [17,34], whereas for TPL-367 N, 12 out of the 51 potential interactors had been previously reported [25]. 368 When super-implying a second threshold, in this case of >100 on the FPKMNINJA and 369 FPKMTPL-N values, nearly all retained interactors were either reported already or very 370 plausible. Indeed, in the case of NINJA, only TIFY-domain containing proteins were retained 371 (Fig 6, Table 3). In the case of TPL-N, all but one of the retained proteins using this second 372 threshold contained an EAR-motif [43], the conventional TPL recruitment domain (Fig 7,  373

403
To assess whether the retrieved preys that did not pass our stringent cut-offs, nonetheless 404 represent true potential interactors of NINJA and N-TPL, additional Y2H experiments were 405 carried out. For NINJA, the first four potential interaction partners with SNRNINJA/EMPTY>7.2 and 406 FPKMNINJA<100 were tested in a binary Y2H assay (Table 3 and

418
In the retained list of potential interactors using threshold SNRTPL-N/EMPTY>6 with FPKMN-TPL>100 419 values, the one candidate ATCKA2 (AT3G50000) that did not contain an EAR-domain was 420 tested for direct interaction with N-TPL in a Y2H assay, besides five candidates with FPKMN-421 TPL<100 (Table 4 and Figure 9). For the latter set, we specifically avoided to pick candidates 422 from the AGL and IAA families, which are most likely true, but less abundant interactors, and 423 chose both candidates with and without an EAR domain. ATCKA2 interaction with N-TPL 424 could not be confirmed with binary Y2H, suggesting it was a false positive caused by the Y2H-425 seq pipeline. In contrast however, interaction between TPL-N and the five other candidates 426 were all confirmed, demonstrating that they do not represent artefacts of the Y2H-seq 427 methodology and may be true interactors. Hence, in contrast to NINJA, this implicates that 428 the arbitrary threshold of SNRTPL-N/EMPTY>6 with FPKMN-TPL>100 was too stringent for N-TPL. 429 Perhaps this may be due to the pleiotropic function of TPL, which has an exceptionally high 430 number of protein interactors, often from multigene families. For proteins such as NINJA, 431 with a more defined role and a well-defined set of interactors, a stricter threshold may be 432 justified. For proteins such as TPL, one may need to be more relaxed in determining 433 candidate interactors. As exemplified here, this leads to the identification of potential novel 434 interactors from gene families previously unreported to be capable of interacting with TPL, 435 including EAR-domain containing proteins such as the RING/U-box protein AT3G05670, or 436 proteins that do not contain an EAR domain such as the putative TF AT3G54390, the 437 homeodomain TF AT2G40260 and the bHLH TF AT3G19860 (Table 4 and manipulation and sequencing of individual yeast clones that survive the screening selection. 455 Moreover, a higher sensitivity can be achieved in our Y2H-seq strategy through maximal 456 coverage of PPIs by increasing library titers. Consequently, interactions with less abundant 457 proteins that would be masked or lost in conventional Y2H screenings can now be detected. 458 In this regard, a factor that will determine the impact of future Y2H-seq screenings more 459 than ever, will be the choice and the quality of the Y2H cDNA library. For instance, full-length 460 protein libraries may mask PPIs by steric hindrance, hence the use of more complex Y2H 461 cDNA libraries encoding protein fragments as well as full-length proteins may now be 462 considered, and screened in one effort, which could lead to a comprehensive coverage of 463 the PPI space. The utility of fragment-based Y2H approaches has previously been 464 demonstrated [44,45]. By playing with sample preparations to generate cDNA libraries, one 465 could increase the genome coverage with no extra effort in the Y2H screening. For instance, 466 different organs from a single plant, different developmental stages of a single organ, or 467 explants subjected to different environmental cues or chemicals can now be pooled in a 468 single cDNA library. This will allow expanding the number of genes screened in a single 469 event, as well as different versions of the same gene, e.g. following expression after 470 alternative splicing or translation start events. As such, the Y2H-seq strategy will provide an 471 effective way to discover differentially regulated PPIs, allowing further exploration of 472 biological pathways and their regulation. Furthermore, the use of cDNA libraries makes it 473 possible to identify novel interaction partners of organisms of which the genome has not 474 been fully annotated yet, unlike the use of ORF libraries based on known and completely 475 fixed gene models. 476 The Y2H-seq strategy implements a quantitative readout system, with a straightforward 477 and adaptable scoring procedure. The use of background controls reliably allows eliminating 478 false positives in early stage. This does not only involve comparing quantitative NGS 479 readouts from Y2H-seq screenings with bait proteins to those of control screenings with 480 'empty' control vectors, but also comparing the readouts of the screenings with bait proteins 481 among each other. Indeed, as is also the case with other PPI discovery methods, such as 482 tandem affinity purification [46,47], a specific 'blacklist' of returning Y2H-seq interactors for 483 each cDNA library can be composed by marking common interactors of seemingly unrelated 484 bait proteins. This may allow fine-tuning the thresholds to be set up in the filtering of the 485 Y2H-seq NGS data, and thereby enable determining robust priority lists and reducing 486 laborious and needless downstream validation assays to a minimum. 487 Finally, this strategy can also easily be extended to Y1H screenings, for which the same 488 cDNA library could be screened, but for which considerably higher false-positive rates are 489 typically obtained as compared to Y2H screenings [48,49]. As such, we anticipate that the 490 cost and labor reduction along with the increased detection and quantification potential of 491 our Y2H-seq strategy can give an important upgrade to this long-existing, but far from fully 492 exploited screening tool. 493 494 Supporting information 495 S1