Abstract
DNA-encoded peptide libraries are ideal functional peptide discovery platforms for their extremely large capacity. However, it’s still difficult to build high content peptide library in intact mammalian cells, which offer advantages associated with appropriate protein modification, proper protein folding, and natural status of membrane protein. Our previous work established linear-double-stranded DNAs (ldsDNAs) as innovative biological parts to implement AND gate genetic circuits in mammalian cell line. In the current study, we employ ldsDNA with terminal NNK degenerate codons as AND gate input to build highly diverse peptide library in mammalian cells. This ldsDNA-based AND gate (LBAG) peptide strategy is easy to conduct, only PCR reaction and cell transfection experiments are needed. High-throughput sequencing (HTS) results reveal that our new LBAG strategy could generate peptide library with both amino acid sequence and peptide length diversities. Moreover, by a mammalian cell two-hybrid system, we pan an MDM2 protein interacting peptide through the LBAG peptide library. Our work establishes ldsDNA as biological parts for building highly diverse peptide library in mammalian cells.
Introduction
Different state-of-the-art display technologies, e.g., phage, ribosome, mRNA, bacterial, and yeast-display, have been developed to identify functional peptides (1–7). Through construction and screen vast peptide libraries, these display technologies physically couple between phenotype (high affinity high selectivity binding) and genotype (DNA sequence). Currently, plasmids and PCR products have been explored as genetic mediums to store highly diverse coding information of the peptide library (8,9). These genetic materials are introduced into phage-host bacteria, yeast, or test tubes to transcript-translate into peptides. However, it’s still a big challenge to build high content peptide library in intact mammalian cells. Some attempts have been made to introduce DNA-encoded peptide library into mammalian cells using episomal-, viral- or transposon-mediated gene transfer (10–17). While, the peptide library capacity is constantly lost during plasmid extraction, lentivirus generation, transfection/infection process. This leads to small library size relative to other display technologies.
We previously applied linear-double-stranded DNA (ldsDNA, or PCR amplicon) as novel biological parts to implement Boolean logic AND gate genetic circuits in mammalian cells (18). Via splitting essential gene expression cassette into signal-mute ldsDNA, our strategy achieves two or three-input AND gate calculation with a low noise-signal ratio both in vitro and in vivo. This split-relink process is similar to V(D)J recombination, which serves as the fundamental molecular mechanism to generate highly diverse repertoires of antigen-receptors (antibodies and T-cell receptors) in mammalian (19,20). In the present study, we employ this ldsDNA-based AND-gate (LBAG) genetic circuit to generate high content peptide library.
Results
Generation of peptide library by ldsDNA-based AND-gate genetic circuit
Our previous study demonstrated that ldsDNAs (PCR amplicons), split from intact gene expression cassettes (containing promoter, gene coding sequence and polyA signal), could undergo reconnection to form one output-signal-generating molecule in cultured cells (18). Here we hypothesize that ldsDNAs with terminal NNK degenerate codons could generate highly diverse peptide library through ldsDNA-based AND-gate (LBAG) calculation in mammalian cells (Figure 1).
Taking pEGFP-C1 plasmid as PCR template, primers with multiple tandem NNK-trinucleotides were used to generate ldsDNAs with terminal degenerate codons (Supplementary Table 1). Three kinds of ldsDNAs, containing CMV promoter, Kozak sequence and different number of NNK-trinucleotides, were produced (namely CMV-Kozak-<NNK>5, CMV-Kozak-<NNK>10, CMV-Kozak-<NNK>15) (for sequences see Supplementary Materials). Meanwhile, two kinds of ldsDNAs, containing terminal NNK-trinucleotides (or not), GFP coding sequence and SV40 polyA signal (PAS), were amplified (namely <NNK>5-GFP-SV40 PAS, <NNK>0-GFP-SV40 PAS). Six different combinations of these two subgroup ldsDNAs were introduced into HEK293T cells to construct the AND-gate genetic circuits. The six combinations are as following: 1, CMV-Kozak-<NNK>5 with <NNK>5-GFP-SV40 PAS (5X+5X for short); 2, CMV-Kozak-<NNK>10 with <NNK>5-GFP-SV40 PAS (10X+5X); 3, CMV-Kozak-<NNK>15 with <NNK>5-GFP-SV40 PAS (15X+5X); 4, CMV-Kozak-<NNK>5 with <NNK>0-GFP-SV40 PAS (5X+0X); 5, CMV-Kozak-<NNK>10 with <NNK>0-GFP-SV40 PAS (10X+0X); 6, CMV-Kozak-<NNK>15 with <NNK>0-GFP-SV40 PAS (15X+0X).
When AND-gate genetic circuits are built up in cells, peptide library would be generated surrounding the junctions formed between these two subtype ldsDNAs (Figure 1). Forty-eight hours after transfection, total RNAs were extracted then cDNAs were synthesized via reverse transcription. Nucleotide sequences surrounding the junctions formed by two ldsDNA-inputs were PCR amplified and then subjected to pair-end high-throughput sequencing (HTS). The HTS statistics for all sequencing libraries are shown in Table 1.
Reading frame phase of ldsDNA-based peptide library
To display or tether library peptides, oligopeptides are usually fused to specific anchor proteins. For example in phage display experiment, the peptide library is fused to an open reading frame (ORF) thereby connecting peptide to phage coat protein (1). Our previous research demonstrated that the ldsDNA could be subject to terminal nucleotide(s) deletion which may lead to reading frame shift. Here we analyze the reading frame phase characteristics of the LBAG peptide libraries (Figure 2). Libraries generated by dual-side (_X+_X) terminal-NNKs-ldsDNAs, e.g., CMV-Kozak-<NNK>5 with <NNK>5-GFP-SV40 PAS (5X+5X), show a similar reading frame phase distribution. The average in-frame (3n) ratio ranging from 45% to 47%. While libraries generated by single-side (_X+0X) terminal-NNKs-ldsDNAs, e.g., CMV-Kozak-<NNK>5 with <NNK>0-GFP-SV40 PAS (5X+0X), show dramatic dynamics of reading frame phase distribution (Figure 2). 5X+0X peptide libraries reach the maximum 69% in-frame rate compared with 63% of 10X+0X and 53% of 15X+0X, respectively. Then, the in-frame nucleotide sequences (3n) are translated into polypeptides for further analysis.
Length dynamic and amino acid composition of ldsDNA-based peptide library
We next analyze the polypeptide length distributions of different LBAG peptide libraries (peptides with pre-stop codon are excluded). Dual-side terminal-NNKs-ldsDNAs strategies, including 5X+5X, 10X+5X and 15X+5X, show a more dynamic length distribution compared with that from single-side terminal-NNKs-ldsDNAs strategies (5X+0X, 10X+0X and 15X+0X) (Figure 3). In addition, for single-side terminal-NNKs-ldsDNAs strategies, most peptides have a length equal to the theoretical polypeptide length generated by direct ligation (supposing no insertion or deletion happened) (Figure 3D-F). For example, in the 5X+0X group, over 81% of the peptide length is five amino acids.
The amino acid composition of different LBAG peptide libraries is in line with the theoretical NNK amino acid composition (pre-stop codon is included in the analysis) (Table 2). However, Proline (P) residue always shows a higher percentage than the theoretical ratio, particularly when the peptide chain getting longer. Meanwhile, the percentage of each nucleotide at each position of peptide library codons is showed in Supplementary Table 2. The nucleotide distributions are in line with NNK degenerate base ratio.
The capacity of the LBAG peptide library
Peptide library capacity could be represented by its coverage ability on amino acid combination possibilities. Here, all eighteen LBAG peptide libraries cover all 400 possible dual-amino-acid combinations (2-mer). For triple-amino-acid (3-mer), the library coverage is ranging from 64% to 98% (Figure 4A). For quadruple-amino-acid (4-mer), the library coverage is ranging from 5.7% to 32.1% (Figure 4B). The peptide library content, defined as the total amino acid number coded by a library, serves as a determinate factor of library capacity (Figure 4).
The asymmetry of input ldsDNA end processing during AND gate calculation
As shown by Figure 2 and 3, single-side (_X+0X) and dual-side (_X+_X) LBAG peptide library strategies show obvious differences in both reading frame phase distributions and peptide length dynamics. To further investigate these differences and figure out whether or not the up-stream and down-stream ldsDNAs have the same chance of end processing, we PCR-generated two pairs of AND gate ldsDNAs: pair 1, CMV-Kozak-FLAG-NNN with GFP-SV40 PAS; and pair 2, CMV-Kozak-FLAG with NNN-GFP-SV40 PAS (N for randomized nucleotides, for ldsDNA sequences see Supplementary Materials). After cell transfection, RNA extraction, cDNA synthesis, and PCR amplification, high-throughput sequencing (HTS) was performed to characterize the oligonucleotide sequences formed by AND gate calculation. About 67% of the sequences formed by CMV-Kozak-FLAG-NNN and GFP-SV40 PAS contains intact random nucleotides (NNN) (Figure 5). While for CMV-Kozak-FLAG and NNN-GFP-SV40 PAS group, about 57% of oligonucleotide contains triple nucleotides (Figure 5). Thus, the down-stream ldsDNAs (NNN-GFP-SV40 PAS) are more likely subject to end processing than up-stream ldsDNAs (CMV-Kozak-FLAG-NNN) in AND gate relink process in cells. This may be the reason why single-side (_X+0X) strategy generate higher in-frame rate and less variability among the peptide length than dual-side (_X+_X) strategy.
In addition, through the above terminal-triple-random-nucleotide-ldsDNA AND gate calculation, we also evaluated the nucleotide preferences of LBAG calculation. In both CMV-Kozak-FLAG-NNN + GFP-SV40 PAS group and CMV-Kozak-FLAG + NNN-GFP-SV40 PAS group, Guanine (G) is always the unfavorable end nucleotide (Supplementary Table 3). Furthermore, up-stream or down-stream ldsDNAs containing two or three terminal Guanines are hard to connect to the corresponding ldsDNAs (Supplementary Table 3).
Using LBAG peptide library to identify MDM2 protein interacting peptide
The E3 ubiquitin ligase MDM2 negatively regulates the activity of the tumor suppressor protein p53 through interacting with the N-terminal transactivation domain of p53. Structurally, the residues F19, W23 and L26 in p53, which insert deep into the MDM2 hydrophobic cleft, are critical in MDM2-p53 interaction (21). Also, the non-peptide inhibitors of the p53-MDM2 interaction have to contain mimics of these amino acids (22–24). Pazgier et al. screened a duodecimal peptide phage display library to identify MDM2 (residues 25-109) interacting peptides (25). They found 14 out of 15 MDM2 binding clones containing the same conserved residues: FXXXWXXL, which in line with the MDM2-p53 binding pattern. Shiheido et al. employed mRNA display to identify MDM2 interacting peptides (26,27). Similarly, they found more than half of all peptides retained the three hydrophobic residues corresponding to F19, W23 and L26 of wild-type p53. Here, to test the performance of our LBAG peptide library strategy, we plan to identify the MDM2 interacting peptides in mammalian cells.
To detect protein-protein interaction in mammalian cells, we employed the two-hybrid system, which has been adapted for use in mammalian cells (28,29). The system contains three plasmids: pACT, which expresses VP16 activation domain, pBIND, which expresses GAL4 DNA-binding domain, pG5egfp, which contains five GAL4 binding sites upstream of a minimal TATA box and EGFP ORF. In the present study, MDM2 (residues 25-109) is subcloned into pACT plasmid (namely pACT-MDM2). By PCR amplification, the LBAG peptide library is fused to GAL4 DNA-binding domain through an XTEN linker (Figure 6A). To achieve high library content, we here used single-side LBAG strategy.
Two plasmids (pACT-MDM2, pG5egfp) and two ldsDNAs (CMV-GAL4-XTEN-<NNK>15, SV40 PAS) were cotransfected into HEK293T cells. The peptide library will be formed through the AND gate calculation between CMV-GAL4-XTEN-<NNK>15 and SV40 PAS (Figure 6A). If the peptide interacts with MDM2, the activation domain VP16 will be recruited to GAL4 binding sites, thereby activating the GFP reporter gene expression (Figure 6B). Here fluorescence-activated cell sorting (FACS) was used to isolate the GFP positive cells. High-throughput sequencing of the peptide library in these cells identified peptide SFVIRWTALWPGPRS among the top hits (23 reads, rank fourth) (Supplementary Table 4). This peptide contains the conserved residues FXXXWXXL, which is critical to interact with MDM2.
Discussion
Affinity-based display technologies, e.g., phage, ribosome, mRNA, bacterial, and yeast-display, have been developed to screen peptide libraries to identify protein-binding peptides. Peptide library size is a determinate factor of a success screen. For phage and yeast-display, the complexity of peptide library stores in plasmid (1,2,7). Currently, the phage display library size could reach 109 (9,30). While for ribosome and mRNA-display, PCR products serve as genetic medium to store library coding information. Ribosome and mRNA-display could achieve extremely large library size (>1013) (5,31). In the current research, we provide a new strategy to build high content peptide library in cultured cell line. The protocol is easy to conduct, only containing two steps: PCR amplification and cell transfection. Since degenerate trinucleotides is present at the 5’ end of the primer, the library complexity keeps on increasing during PCR amplification process. Theoretically, no specific nucleotide sequence is enriched during PCR reaction. PCR products are then transfected into cells to conduct AND-gate linkage to generate full gene expression cassette (promoter-CDS-PAS). For our strategy, no particular attention needs to be paid to maintain the library abundance. The size of ldsDNA-based library needs to be further investigated. Determining the peptide sequence diversity in each cell by single-cell-sequencing will provide crucial information for estimating the whole library size.
For the current protocol, there are still several issues that need to be further addressed to increase library capacity. First of all, the high in-frame ratio would improve library capacity. Our results reveal that peptide libraries generated by single-side NNKs-ldsDNA show higher in-frame ratios than that from dual-side NNKs-ldsDNAs (Figure 2). Besides, chemical modified primers could be used to generate nuclease resistance ldsDNAs to prevent reading frame shift (32,33). Another major obstacle is the pre-stop codon, which causes premature termination (34). For NNK degenerate codon, the percentage of stop-codon in each amino acid position is 3.125% (1/32) and will be accumulated when the peptide is getting longer. When the polypeptide length reaches 20 amino acids, about 47% of the sequences contain pre-stop codon. One potential solution to avoid stop codon is to synthesize PCR primer using trimer phosphoramidites (35,36).
In the present study, we tested the feasibility of our LBAG peptide library strategy through the identification of the MDM2 interacting peptide (Figure 6). Indeed, one peptide (SFVIRWTALWPGPRS), which contains three crucial residues mediating MDM2-p53 interaction, is among the top hits (Supplementary Table 4). However, in phage display and mRNA display experiments, multiple peptides with FXXXWXXL pattern were identified (25–27). This may come from the enrichment of interacting peptides during rounds of panning in phage and mRNA display process. For the current protocol, four components, including two plasmids (pACT-MDM2, pG5egfp) and two ldsDNAs (CMV-GAL4-XTEN-<NNK>15, SV40 PAS), were transient cotransfected into one HEK293T cell to conduct AND gate calculation and assemble the two-hybrid system. In the future application, to save the cell transfection capacity, the plasmids should be preset into cells by generating a stable cell line. Thereby, only ldsDNAs need to be transfected into cells.
Here by terminal-NNKs-ldsDNA-based AND-gate (LBAG) genetic circuit, we develop a novel method to generate highly diverse peptide library in mammalian cells. Our protocol is easy to conduct only PCR-reaction and cell-transfection are needed. High-throughput sequencing results reveal that LBAG peptide library is characterized by both amino acids sequence diversity and peptide length dynamics. We also identified a peptide with conserved residues which are critical to bind to MDM2. Our new method may have great application potential in therapeutic peptide development.
Materials and Methods
Cell culture
The HEK293T cell line was maintained in Dulbecco’s modified Eagle’s medium (DMEM) (Thermo Fisher Scientific, Hudson, NH, USA) containing 10% fetal bovine serum (FBS) with 1% penicillin-streptomycin solution at 37 degree with 5% CO2.
ldsDNA synthesis
KOD-Plus-Ver.2 DNA polymerase (Toyobo, Osaka, Japan) was used to amplify ldsDNAs (PCR amplicons) using the pEGFP-C1 (Clontech, Mountain View, CA, USA) plasmid as the template. PCR products underwent agarose electrophoresis and gel-purification to remove plasmid template and free dNTPs (Universal DNA purification kit, Tiangen, Beijing, China). The amount of PCR products was determined by the OD 260 absorption value. PCR primer sequences and ldsDNA sequences were present in Supplementary Table 1 and Supplementary Materials, respectively.
ldsDNA transfection
HEK293T cells were seeded into 6-well plates the day before transfection (60% − 70% confluency). 500 ng/well of each input amplicons (total 1000 ng/well) were transfected into cells with Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA) reagent following standard protocol.
RNA extraction, cDNA synthesis and PCR amplification
Forty-eight hours after transfection, total RNAs were extracted by RNAiso Plus (TaKaRa, Beijing, China) reagent. cDNAs were synthesized using the PrimeScript™ RT reagent Kit with gDNA Eraser (TaKaRa) following standard protocol. KOD-Plus-Ver.2 DNA polymerase (Toyobo) was used to amplify the corresponding cDNA sequences surrounding ldsDNA junctions.
Two-hybrid system in mammalian cells
The two-hybrid system used in this study was developed from commercial CheckMate™ Mammalian Two-Hybrid System (Promega, Madison, WI, USA) with some modifications. Briefly, MDM2 (residues 25-109) was subcloned into pACT (pACT empty vector, GenBank Accession Number AF264723) vector through XbaI and KpnI restriction sites (namely pACT-MDM2). EGFP sequence was cloned into pG5luc vector (pG5luc empty vector, GenBank Accession Number AF264724) to replace luciferase sequence through HindIII and FseI restriction sites (namely pG5egfp). Two ldsDNAs CMV-GAL4-XTEN-<NNK>15 and SV40 PAS were PCR-generated using the pBIND (pBIND empty vector, GenBank Accession Number AF264722) plasmid as the template. pACT-MDM2 (500ng/well), pG5egfp (500ng/well), ldsDNA CMV-GAL4-XTEN-<NNK>15 (500ng/well) and ldsDNA SV40 PAS (250ng/well) were transient cotransfected into 6-well plated HEK293T cells using Lipofectamine 2000. The experiment could be scaled up by transfected more 6-well plated cells. Forty-eight hours after transfection, FACS was used to isolate the GFP positive cells.
Library construction and high-throughput sequencing
Library construction and high-throughput sequencing were conduct by Novogene (Beijing, China). Sequencing libraries were generated using TruSeq® DNA PCR-Free Sample Preparation Kit (Illumina, San Diego, CA, USA) following the manufacturer’s recommendations and index codes were added. The library quality was assessed on the Qubit Fluorometer and Agilent Bioanalyzer 2100 system. At last, the libraries were sequenced on an Illumina HiSeq 2500 platform and 250 bp paired-end reads were generated.
Data analysis
Paired-end reads from the original DNA fragments were merged using FLASH, a very fast and accurate analysis tool which was designed to merge paired-end reads when there are overlaps between reads 1 and reads 2. Paired-end reads were assigned to each sample according to the unique barcodes. Reads were filtered by QIIME quality filters. Nucleotide sequences generated by ldsDNA-based AND-gate genetic circuits were extracted by in-house Perl scripts. High-throughput sequencing data are deposited on the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) under the accession number GEO: GSE134671 (reviewer access token: kpujmequhnspngx).
Author Contributions
S.L. conceived and supervised the study. S.L. and W.S. designed and performed the experiments. S.L. and W.S. analyzed the data. W.S. and S.L. prepared the figures. S.L. and W.S. wrote the paper. All authors discussed the results and reviewed the manuscript.
Competing Interests
The authors have declared that no competing interest exists.
Supplementary materials
Acknowledgements
This work was supported by the National Natural Science Foundation of China [31870860, 31400673 to S.L., 31971388, 81402407 to W.S.]; the Tianjin Research Program of Application Foundation and Advanced Technology [14JCQNJC09800 to S.L., 15JCQNJC11700 to W.S.]; the State Key Laboratory of Medicinal Chemical Biology (Nankai University) [2018103 to S.L.]; and the Key Project of National Health and Family Planning Commission of Tianjin [2017057 to C.Z.]. The authors have declared that no competing interest exists.