Abstract
The lack of genetic tools to manipulate protozoan pathogens has limited the use of genome-wide approaches to identify drug or vaccine targets and understand these organisms’ biology. We have developed an efficient method to construct genome-wide libraries for yeast surface display (YSD) and developed a YSD fitness screen (YSD-FS) to identify drug targets. We show the robustness of our method by generating genome-wide libraries for Trypanosoma brucei, Trypanosoma cruzi, and Giardia lamblia parasites. Each library has a diversity of ∼105 to 106 clones, representing ∼6 to 30-fold of the parasite’s genome. Nanopore sequencing confirmed the libraries’ genome coverage with multiple clones for each parasite gene. Western blot and imaging analysis confirmed surface expression of the G. lamblia library proteins in yeast. Using the YSD-FS assay, we identified bonafide interactors of metronidazole, a drug used to treat protozoan and bacterial infections. We also found enrichment in nucleotide-binding domain sequences associated with yeast increased fitness to metronidazole, indicating that this drug might target multiple enzymes containing nucleotide-binding domains. The libraries are valuable biological resources for discovering drug or vaccine targets, ligand receptors, protein-protein interactions, and pathogen-host interactions. The library assembly approach can be applied to other organisms or expression systems, and the YSD-FS assay might help identify new drug targets in protozoan pathogens.
Introduction
Protein-ligand interactions are at the core of every cellular and molecular process in biology. They involve protein interactions with small molecules, nucleic acids, lipids, or other proteins. The discovery of protein-ligand interactions is essential for understanding signaling and developmental processes, gene regulation, pathogen-host interactions, and, notably, developing drugs and vaccines1. However, discovering interacting partners is challenging and often relies on screening approaches, including yeast two-hybrid systems, overexpression systems, affinity chromatography coupled to mass spectrometry, or surface display technologies2-4.
Yeast surface display (YSD) or phage display combined with high-throughput sequencing is one of the most successful approaches for protein-ligand discovery, mutational scanning, or protein engineering. Examples of their applications include the identification of glucocorticoid receptors5, phage receptors6, targets of drugs or inhibitors7, vaccine targets8, T-cell receptors9, and antibody targets10. YSD entails the expression of exogenous proteins encoded by DNA libraries on the surface of yeast cells. Each yeast cell expresses ∼105 copies of a single protein on its surface, and thus a large yeast population (∼108) can easily represent a complete genomic library. The exogenous proteins are expressed in-frame with a surface anchoring system, e.g., N-terminal fused anchor proteins (SAG1, SED1), the a-agglutinin display system (Aga1p and Aga2p), or the Flo1p display system11 (Fig 1A). These anchored proteins are expressed on the exterior of the cell wall and expose the exogenous proteins for ligand interaction. While this approach has proven helpful, challenges in constructing genomic libraries have halted its broad application for routine protein-ligand discovery. Generating a eukaryote’s genome-wide library requires cloning 105 to 107 genome fragments into expression vectors, depending on fragment length and the organism’s genome size. DNA fragments are usually generated from genomic or complementary DNA (cDNA) via enzymatic digestion or random physical fragmentation and then cloned into expression vectors through ligation with restriction digested or T-tailed vectors12. Sub-optimal library construction and expression systems can greatly diminish the efficiency of protein-ligand interaction screens. Inefficient library assembly can result in portions of the genome being absent or underrepresented in the library, thus limiting the scope and reliability of downstream screening applications.
The scarcity of genetic tools available to non-model organisms poses significant obstacles to developing genome-wide screenings. This is the case for many protozoan parasites, including kinetoplastids, diplomonads, and apicomplexan, responsible for huge health and economic burden worldwide 1. Trypanosoma cruzi, the causative agent of Chagas disease (ChD), affects ∼8 million people in South and Central America. ChD has spread to Europe, Australia, and Japan and affects over 300,000 people in the United States 1. Trypanosoma brucei causes African Trypanosomiasis, also known as Sleeping Sickness, and affects humans and animals with significant health and agricultural impact in Sub-Saharan Africa 1. Giardia lamblia (also known as G. intestinalis) causes intestinal diseases with nearly 200 million yearly cases, especially among children and immunocompromised people in low and middle-income countries 2. There are no vaccines against these diseases, and the available drugs have limited efficacy and severe side effects 1. While genetic approaches have contributed to understanding T. brucei biology and drug-target discovery 3, 4, much less has been done for other protozoan pathogens, in which genetic manipulation remains a challenge. Hence, the development of parasite genome-wide libraries for YSD may help advance the knowledge of these organisms’ biology and the development of therapeutics, including identifying drug and vaccine targets.
We have developed an efficient approach to construct genome-wide libraries for YSD. Here, we show the robustness of our method by generating genome-wide libraries for T. brucei, T. cruzi, and G. lamblia encoding polypeptides for yeast surface display. The diversity of the libraries ranges from ∼2×105 to 4×106 clones, representing ∼6 to 30-fold of the parasite genome. Nanopore sequencing confirmed the libraries’ genome coverage, and computational analysis predicted the libraries to encode polypeptides in the protein domain range (20-200 amino acids, aa), which is ideal for the identification of ligand binding regions or epitope targeted by antibodies. As a proof-of-concept, we transformed yeast with the G. lamblia library and confirmed its polypeptide expression. We developed a YSD fitness screen (YSD-FS) assay for drug target discovery. Using the YSD-FS assay, we identified known interactors of metronidazole, a drug used to treat giardiasis, and new candidate interacting proteins. Notably, the data revealed enrichment in nucleotide-binding domain sequences conferring yeast increased fitness to metronidazole, implying that metronidazole might target multiple proteins containing nucleotide-binding domains. The libraries are valuable biological resources for discovering drug or vaccine targets, ligand receptors, protein-protein interactions, and investigating pathogen-host interactions. The library assembly method can be applied to other organisms or expression systems, and the YSD-FS assay will pave the way to identifying new drug targets in protozoan pathogens.
Results
Efficient construction of genome-wide libraries for YSD
To generate genome-wide libraries for YSD, we devised an approach to clone parasite genomic fragments into the pYD1 vector (Fig 1A-B). The method entails genome fragmentation followed by fragments end-repairing and A-tailing for adapter ligation. The adapter-ligated fragments are amplified by PCR with primers that pair with adapter sequences and have ∼20 bp vector overlapping sequences used for fragment cloning by Gibson assembly (Fig 1B). Gibson assembly is a method for joining DNA molecules using a single isothermal reaction and requires a ∼20 bp overlap region between joining molecules 5, i.e., fragments and vector. In this approach, a 5’-exonuclease generates 3’-overhangs in the joining molecules, which are then annealed. A DNA polymerase fills gaps between annealed strands, and a DNA ligase seals the nicks in the strands. We initially selected the protozoa G. lamblia for genome-wide library construction because it has a haploid genome of 12.1 Mb 6, thus smaller than most eukaryotes. This organism’s genome is also primarily organized in arrays of exons with an average gene size of ∼1.5 kb and a coding density of 81.5% of the genome 6, 7. The giardia genomic DNA was fragmented in sizes ranging ∼0.5-2 kb by ultra-focused sonication (Fig 1C) to generate an YSD library encoding polypeptides. The genome fragments were end-repaired and A-tailed, followed by ligation of a nanopore forked adapter, which was verified by PCR library amplification (Fig 1B and D). Amplicons from a 16-cycle PCR reaction were used for Gibson assembly reactions followed by bacterial transformation. PCR analysis of randomly selected clones from transformed bacteria confirmed that library clones contained fragments averaging ∼1.5 kb (Fig 1E-F), consistent with the fragmentation, and Sanger sequencing of isolated plasmids confirmed that the sequences originated from the parasite genome. Using Carbon and Clarke equation 8 (Fig 1A), we calculated that 36,880 cloned fragments were required to have each G. lamblia genome fragment represented in the library with a 99% probability. After library assembly, we obtained ∼2.0×105 clones, which represents 5.3-fold the necessary calculated number of clones with an estimated density of 31 clones per gene (Table 1). Restriction analysis of the constructed library indicated fragments ranging on average ∼1.5 kb (Fig 1H), consistent with fragment size selection.
To determine the library coverage of the genome, we sequenced the library using Oxford nanopore sequencing. The long-read technology generates sequences covering the complete DNA sequence of the cloned fragments. Nanopore sequencing of the G. lamblia library (Gl-Lib) resulted in ∼90% of reads mapped to the G. lamblia WB strain reference genome chromosomes and resulted in 27.3x coverage of the genome (Table 2, Fig 2A). Inspection of the parasite chromosomes (chr) showed extensive library representation of the genome with nearly no gap in coverage (Fig 2A and B). The apparent gaps observed in chr 5 relate to the lack of a defined sequence in the reference genome (represented by Ns in the reference genome). About 60% of the library included fragments ranging between 0.4-1.6 kb with an average size of 860 bp (Fig 2C, Table 1). The library read lengths correlated with library fragmentation size, but it also indicated a slight bias towards DNA fragments of small size (Fig 2C, Table 1). Moreover, it also indicated that the library contain fragments to encode predominantly protein motifs or domains 9 compared to full-length proteins. The results also revealed that all G. lamblia annotated genes were represented in the library (Table 1), and essentially all gene sequences were represented by library fragments (Fig 2D). The data show that the method is efficient for constructing genome-wide libraries.
Constructing libraries of organisms with large genomes
To further validate our approach, we generated libraries for T. brucei and T. cruzi, which haploid genomes are 35 Mb and 44 Mb, respectively. These parasite genomes are about 3 to 4 times larger than the G. lamblia genome and thus helped test the method’s efficiency and evaluate potential bias associated with genome size. Trypanosome genomes are organized primarily in segments of exons with genes averaging a size of ∼1.5 kb. After library assembly and transformation, we obtained 7.2×105 and 4.0×106 clones for T. brucei and T. cruzi, respectively (Table 1), representing 6.7 and 29.6-fold the calculated coverage of their genomes at 99% probability, and estimated 41 and 270 clones per gene (Table 1). Restriction enzyme analysis confirmed the library fragment range with an average of ∼1.5 kb (Fig 1G-H). After constructing multiple genome-wide libraries, we found that optimizing the Gibson assembly reaction, which involves defining an optimal ratio between DNA fragments and plasmid vector, is a critical and often necessary step in library construction. After optimization of the Gibson assembly step, we consistently obtained 100% efficiency in library generation using this method. The libraries are hereafter named Tc-Lib and Tb-Lib, respectively, for T. cruzi and T. brucei libraries.
The nanopore sequencing of Tb-Lib and Tc-Lib resulted in 131.9x and 186.1x genome coverage, respectively (Table 2). Both Tb-Lib and Tc-Lib showed a thorough representation of their respective organism genomes covering all chromosomes (Table 1, Fig2 E-F). The sequencing analysis also revealed an average fragment size of ∼0.7 to 1 kb for the libraries, and every single gene of the parasite’s genome was represented in the libraries (Table 1). There was an apparent enrichment bias for large gene families in all libraries (Fig 2A, E-F). The result is unlikely related to sequence selection during library construction, but it may reflect the poor genome assembly of repetitive regions, especially segments containing large gene families such as mucins or mucin-associated proteins (MASPs) in T. cruzi, variant surface glycoproteins (VSGs) in T. brucei, and variant surface proteins (VSPs) in G. lamblia. The library fragments also covered untranslated regions of the genome. The effective assembly of the three organism’s libraries implies that genome size is not a constraint for library construction by this method, and the data show that the libraries are diverse and comprehensive.
Library expression in yeast for surface display
We developed a computational tool that predicts the library’s polypeptide sequences potentially expressed on the yeast surface. The libframe tool uses the fastq data from nanopore sequencing. It identifies library sequences in frame with the sequences of interest; here, the Xpress epitope downstream from the Aga2p coding sequence (Fig 1A). Then, it translates in-frame DNA sequences outputting the predicted amino acid sequences and polypeptide length. Analysis of a pool of library sequences indicated polypeptides ranging from 13-488 amino acids (aa), 13-275 aa, and 13-559 aa for the Gl-Lib, Tb-Lib, and Tc-Lib, respectively (Fig 3A, Table 1). Further analysis of the Gl-Lib indicated that 37% of the in-frame library DNA sequences generated predicted polypeptides ranging from 13-39 aa, whereas 67% of sequences encoded proteins ranging from 40-488 aa, which corresponds to about 70,300 and 127,300 clones, respectively, and thus a calculated average of 12 and 21 polypeptides per gene. Notably, over 90% of protein domains range between 20-200 aa with an average of ∼60 aa, whereas motifs and ligand binding sites are 7-8 aa and ∼18 aa, respectively 9, 10, implying that the libraries can generate polypeptides encompassing protein functional groups. The library polypeptide range reflects the size distribution of cloned genomic fragments (Fig 2C, Table 1), and early stop codons or frameshifts between library sequences and Aga2p. Early stop codons result from DNA fragments originated from untranslated regions of the genome (18.5% of the G. lamblia genome) and coding sequences cloned out of frame, which is common in genomic libraries. Importantly, the large library sizes with hundred thousand to millions of cloned fragments ensure that polypeptides of coding sequences with sufficient length to cover functional protein sequences, motifs, and protein domains are included.
We transformed the Saccharomyces cerevisiae EBY100 strain with the Gl-Lib or the pYD1 vector to validate library expression. To represent a library diversity of ∼106 clones, high-efficiency yeast transformation is required to obtain at least 10-fold of the library size. Thus, we optimized library transfection in yeast by electroporation and obtained a transformation efficiency of ∼107 cfu/µg of library DNA, which resulted in transformed yeast amounting to ∼230-fold the Gl-Lib diversity. The pYD1 vector has an N-terminal Xpress epitope in frame with Aga2p and expressed fused to the library sequences under control of a promoter induced by galactose (gal) or repressed by glucose (glu) (Fig 1A). Western analysis of gal-induced Gl-Lib confirmed the expression of fragments from ∼25-35 kDa (Fig 3B). The size range correlates with our computational prediction of expressed polypeptides, as most library fragments are predicted to generate 2-7 KDa polypeptides, and the predicted molecular weight of Aga2p plus the Xpress epitope is ∼24 kDa 11. We analyzed the expression of the Gl-Lib using anti-Xpress antibodies in a Cytation 5 cell image reader (Fig 3C). The data showed a uniform detection of surface Xpress from pYD1 transformed yeast, and flow cytometry quantification showed that 70% of the yeast uniformly expressed high levels of Xpress epitope. In contrast, the Gl-Lib expressing yeast showed a heterogeneous detection of Xpress (20% strongly positive by flow cytometry), indicating a mixed population of high and low-expressing cells. The low percentage of Xpress positive cells in the Gl-Lib likely reflects interference of the library polypeptides with anti-Xpress antibodies binding to the adjacent Xpress epitope. The data is consistent with other YSD libraries generated using the Aga1p-Aga2p expression system 12. Microscopy analysis further confirmed the surface Xpress exposure, indicating a functional yeast surface display system (Fig 3D). The data show that our methodology generates diverse and functional genome-wide YSD libraries.
Screening for G. lamblia proteins that interact with metronidazole
We posited that YSD libraries are useful tools for drug target screenings. The yeast surface expression exposes library proteins to small ligands outside the cell, favouring ligand-protein interaction. The interaction of library proteins with drugs may lead to drug depletion, drug enzymatic inactivation or drug activation into toxic derived molecules, all of which may affect the growth of interacting yeast clones in the population. Hence, we performed a YSD-FS by growing yeast in the presence or absence of metronidazole to identify clones showing altered fitness in the presence of the drug. Gl-Lib expressing yeasts were grown for three days in vehicle only or in the presence of metronidazole, and the grown yeast population was submitted to three rounds of fitness selection in metronidazole. The library plasmid DNAs were recovered from yeasts for fragment DNA sequencing using Oxford nanopore technology. Sequencing analysis of the vehicle-treated YSD showed a 0.8 Pearson correlation with the non-transformed (original) library, implying only a slight deviation in the yeast expressed library from the generated library (Fig 4B, see No screen). However, drug-treated Gl-Lib expressing yeast showed a low correlation with the vehicle-treated library (Pearson correlation of 0.6), implying that drug treatment altered the fitness of the yeast population expressing the Gl-Lib (Fig 4B, MTZ screen). Analysis of metronidazole treated vs non-treated Gl-Lib expressing yeast revealed 270 genes enriched and 26 depleted (Fig 4C, fold-change ≥ 3, p-value ≤ 0.01). Of these genes, 108 genes (96 enriched and 12 depleted) had annotations other than hypothetical proteins and included bonafide interactors of metronidazole such as thioredoxin reductase, purine nucleoside phosphorylase, and a putative ortholog of metronidazole target protein 1 13. The screen also identified 13 kinases, primarily from the expanded NEK kinase family; 14 protein 21.1 (Ankyring domain-containing proteins), and genes involved in DNA metabolism, antioxidant metabolism, and membrane transporters (Table 3 and Supplementary Table 1). Importantly, analysis of the enriched library fragments revealed specific gene sequences associated with yeast-increased fitness to metronidazole (Fig 4D). The sequences matched primarily to nucleotide binding regions, e.g., nicotinamide adenine dinucleotide (NAD)-binding domains, P-loop containing nucleotide-binding (NTPase) domain, or ATPase domains, indicating that metronidazole may interact with different proteins containing nucleotide-binding domains, i.e., ATP-binding domains or nucleotide cofactor domains (Fig 4D). Gene ontology (GO) analysis of the annotated genes showed enrichment in heterocyclic compound binding, small molecule binding, purine nucleotide binding, transferase, and protein kinase activities (Fig 4E), confirming a bias towards nucleotide-binding proteins in the genes identified. Analysis of the protein domains encoded by the identified genes revealed that 89% had an annotated nucleotide-binding domain (Fig 4E-F). Hence, the screen identified proteins known to interact with metronidazole, potential drug-binding domains, and new candidate proteins to help identify metronidazole targets and mechanisms of action. The data also shows that YSD-FS is a robust approach for drug-target screening without the chemical labelling or modification of drugs.
Discussion
We have developed a robust method to generate genome-wide libraries and demonstrated the method’s efficiency by constructing three genome-wide libraries of protozoan pathogens. We have optimized the libraries’ expression in yeast for surface display and developed an assay combining YSD-FS and high-throughput sequencing to identify potential drug-interacting proteins, demonstrated by using the G. lamblia YSD library to screen for potential metronidazole interacting proteins. The screen identified known metronidazole binding proteins, corroborating the assay strength, and new potential metronidazole binding targets. The assay also revealed specific protein domains associated with yeast-altered fitness to metronidazole, indicating potential metronidazole binding sites. The libraries, screening datasets, and methods are valuable resources for identifying drug targets, especially for organisms with limited genetic tools.
Generating eukaryote genomic libraries require cloning between 105 to 106 fragments, which can cause a bottleneck in library construction. To address this problem, we used Gibson assembly to join DNA fragments 5, which also facilitates library transfer between different vectors. In our approach, we ligated a forked adapter to the genomic fragments and then used it for fragment amplification with primers containing overlapping sequences to the vector. The ligation of the forked adaptor is straightforward and is routinely used by many laboratories to generate DNA or RNA sequencing libraries. We opted for an Oxford nanopore T/A-based forked adapter because it facilitates the preparation of long-read DNA sequencing of the libraries. In this approach, a single PCR step generates barcoded fragments for DNA sequencing, avoiding long protocols associated with nanopore sequencing library preparation. Still, we anticipate that other adapters, including Illumina or customized adapters, can be used to replace the nanopore adapters during library design. The ∼20 bp overlapping sequence matching the receptor vector (e.g., pYD1) can also be replaced by any vector’s sequence, which provides library transfer flexibility for multiple applications. The timeframe to generate a eukaryote genomic library using this method is about two to three weeks. Other methodologies for library generation include DNA cloning based on ligation of T-tailed vectors or restricted digested DNAs. One limitation of DNA or cDNA restriction enzyme digestion and ligation results from bias in DNA digestion depending on the distribution of restriction enzyme sites in the DNA. At earlier stages of this work, we also attempted generating libraries by DNA ligation using T/A cloning vectors. Although efficient for cloning, custom-made T/A-cloning vectors are not as efficient for large-scale library construction, and they also limit library transfer among different vectors.
We generated three genome-wide libraries with protozoan genomes varying between 12.1 to 44 Mb. There was no bias in library construction related to genome size, and libraries were diverse and complete. Still, a minor tendency toward cloning small fragments was observed, which is common to any cloning approach because small DNA fragments can be overly represented in DNA samples compared to large DNA sequences. The libraries’ high genome coverage and diversity increase the chance that clones containing the desired coding region in frame with Aga2p for yeast surface expression are included. Computational analysis based on nanopore reads predicted thousands of polypeptides in frame with Aga2p. The predicted amino acid sequences included polypeptides in the length range of protein motifs, domains, 9 and potential epitopes for protein-ligand binding, including the discovery of antibody targets, protein-domain interaction analysis, and small molecule target screens. We also optimized a protocol for the high-efficiency transformation of yeast and found that the transformed yeast cells preserved the library’s completeness and diversity. We anticipate that the method can help assemble libraries for different applications, including the expression of full-length cDNA sequences, stage-specific transcripts, or gene-specific mutagenesis libraries.
The three libraries generated and validated via nanopore sequencing for G. lamblia, T. brucei, and T. cruzi are powerful resources for exploring basic and translational biology. Advances in T. brucei genetic tools, including RNA interference and genetic screens, have helped identify drug targets and advance our understanding of parasite biology 3, 14. However, much less has been done for T. cruzi and G. lamblia, which have undeveloped genetic tools. RNA interference is present in G. lamblia 15 but not in T. cruzi 16, and CRISPR/Cas9 has been used in T. cruzi 17. However, applying those tools has challenges and limitations, including variability in RNA interference knockdown efficiency, poor parasite transfection efficiency, and lack of regulatable gene expression systems, limiting scaling the approaches for high throughput applications. To date, not a single genetic screen has been performed in T. cruzi or G. lamblia, which attests to the challenges in the genetic manipulations of these organisms. Hence, harnessing yeast genetics for surface display will help to explore the parasite genomes for drug and vaccine target discovery and advance our knowledge of these organisms’ basic biology.
The YSD technology presents several advantages compared to other protein overexpression systems. Yeast has a cell wall that can limit cell permeability to extracellular ligands and thus prevents potential ligand interactions with intracellular protein targets. Moreover, yeast metabolic enzymes can modify intracellular compounds before interacting with parasite proteins, which are significant limitations of intracellular overexpression screens. However, the surface expression of proteins facilitates ligand-target interactions with little, if any, interference from yeast metabolic enzymes. Furthermore, each yeast cell can express ∼105 copies of a single protein on its surface, which provides multiple ligand interacting sites, and facilitates eukaryotic post-translational modifications of expressed proteins, which are not present in prokaryotic systems such as phage display. The surface display approach can also be used for numerous applications, including ligand-target binding screens, e.g., antibody binding, or phenotypic screens, e.g., drug survival screens.
We performed a proof-of-concept drug screen with yeast expressing the G. lamblia library. Although most applications of YSD are ligand-binding, we reasoned that a YSD-FS is more appropriate for drug targets. The YSD-FS does not require drug modifications (e.g., drug labelling with fluorochromes or capturing tags) that can interfere with binding to the unknown target. We used metronidazole because the drug is extensively used to treat G. lamblia, Trichomonas varginalis, and bacterial infections 18. Moreover, data on a few metronidazole interacting proteins were also available to cross-validate the screen 13, 19. The screen identified genes encoding proteins that interact with metronidazole, e.g., thioredoxin reductase and purine nucleoside phosphorylase, thus reproducing described drug targets, and also identified new protein interacting candidates. The annotated candidate interacting proteins’ function was predominantly related to heterocyclic compound binding, with enrichment in proteins with nucleotide-binding domains. The data suggest that metronidazole and perhaps its derived molecules might interact with ATP-binding proteins such as the NEK kinases or enzymes containing nucleotide-binding domains. In agreement with this, metronidazole has been shown to interact with enzymes involved in DNA metabolism 13, 18, 19, many of which we identified on the screen. Several Protein 21.1, which are rich in Ankyrin domains and share similarities with the NEK kinases, were identified, likely due to reminiscent kinase/ATP binding domains 20. Hence, our data suggest that metronidazole might target multiple proteins sharing a common nucleotide-binding domain. The number of enzymes with nucleotide-binding domains is significant in any organism. The identification of the dozens of enzymes on the screen suggests that amino acid differences within the nucleotide-binding domains affect drug binding. Still, there are some apparent preferences, e.g., the P-loop nucleotide-binding domain. Although further validation of the metronidazole interacting protein candidates will be required, the screen points out directions for this drug targets and mechanisms of action. It might also guide structure-activity relationship experiments to design new target-based compounds. The results also indicate the robustness of the YSD-FS in identifying potential drug interactors at nucleotide resolution.
A potential YSD-FS pitfall is that yeast cells may not be sensitive to the studied drugs. For example, yeast is highly resistant to metronidazole, and the concentrations used in the study differ from those used against G. lamblia in vitro. However, the identification of validated metronidazole targets in the screen suggests that the assay tolerates drug ranges different from the concentrations used against the target pathogen. Nevertheless, using mutant yeast strains in multidrug resistance genes may be an alternative in the YSD-FS. Alternating yeast growth conditions from aerobic to anaerobic may also be an alternative to expand the assay application. Notably, the libraries were designed to express polypeptides covering the length of protein domains 9 and thus have the advantage of identifying potential drug-binding sites or antibody epitope targets. However, interactions that might require a full-length protein may be missed, as well as proteins or domains not appropriately folded on the yeast surface. The method developed to generate the libraries combined with the YSD-FS and high-throughput sequencing will help expand drug-target discovery and mechanisms of action to organisms in which genetic manipulations are challenging, and tools are not well developed.
Methods
Cell culture
Saccharomyces cerevisiae strain EBY100 was obtained from the American Type Culture Collection (6). Suspension cultures were grown at 30°C with platform shaking at 225 rpm. Cultures were maintained at 30°C in YPD medium (10 g/L yeast extract; 20g/L peptone; 20 g/L dextrose; pH 6.5) or YPD/agar (YPD with 20 g/L agar). Transformed yeast cells were cultured in a synthetic-defined (SD) medium without tryptophan (SD/-Trp broth, Takara Bio USA Inc.) and induced in SD/-Trp broth with 2% raffinose as an alternative carbon source by the addition of 1 to 3% galactose. T. brucei single marker 427 strain bloodstream forms were grown in HMI-9 medium supplemented with 10% inactivated Fetal Bovine Serum (FBS, Life Technologies) 21 and 2 µg/mL of G418 (Biobasic) at 37°C with 5% CO2. T. cruzi Sylvio X10 strain epimastigotes were grown in Liver Infusion Tryptose medium supplemented with 10% inactivated FBS (Life Technologies)22 at 27°C. G. lamblia WB strain was maintained in modified TY-S-33 medium 23 supplemented with 10% inactivated FBS and 2% Diamond’s vitamin solution at 37°C.
Generation of genome-wide libraries
A detailed protocol for library construction is available 24. Briefly, genomic DNAs were obtained from 1.0×108 parasites. Cells were resuspended in 200µl of lysis buffer (10 mM Tris-HCl pH 8, 10 mM EDTA, 1% sodium dodecyl sulfate) and incubated with 20 units of Proteinase K for 30 minutes at 55°C. Afterwards, 20 ul of RNAse A (10 mg/ml) were added, and the samples were incubated at room temperature for 5 minutes. Lysates were mixed vigorously with 1:1 (v:v) Phenol:Chloroform:Isoamyl Alcohol (25:24:1, pH 6.7), followed by centrifugation at 14,000 x g for 10 minutes. The aqueous phase was mixed 1:1 (v:v) with cold isopropanol and 3μl µg of linear poly acrylamide and centrifuged at 14,000 x g for 15 minutes to precipitate DNA. The DNA pellets were washed with 70% cold ethanol and resuspended in 10 mM Tris-HCl pH 8. Next, 2 µg of genomic DNA was fragmented using an ultra-focused sonicator (Covaris M220, Covaris, Inc.) in microTUBE-50 AFA Fiber Screw-Cap tubes. G. lamblia genomic DNA was sonicated with peak incident power 25W, duty factor 2%, 200 cycles, for 20 sec at 20 °C. T. brucei genomic DNA was sonicated with peak incident power 25W, duty factor 10%, 200 cycles per burst, for 10 sec at 20 °C. T. cruzi genomic DNA was sonicated with peak incident power 25W, duty factor 10%, 200 cycles per burst, for 3 sec at 20 °C. DNA fragments of 0.5-3Kb were size selected using Mag-Bind® TotalPure NGS (Omega Bio-Tek) with a beads:sample ratio of 0.5x following the manufacturer’s instructions. DNA fragments were end-repaired and A-tailed using NEBNext Ultra II End Repair/dA-Tailing Module (New England Biolabs Ltd) according to the manufacturer’s instructions, then purified with a 0.5x beads:sample ratio as described above.. The purified DNA was ligated to forked barcode adapters (Oxford Nanopore Technologies) using the NEBNext Quick Ligation Module (New England Biolabs Ltd) for 2h at 20°C. The reaction product was purified as described above. Then, 20 ng of DNA was used for PCR amplification using forward 5’-CGATGACGATAAGGTACCAGGATCCTTTCTGTTGGTGCTGATATTGC-3’ and reverse 5’-TGCAGAATTCCACCACACTGGATTACTTGCCTGTCGCTCTATCTTC-3’ primers. PCR was performed using Taq DNA Polymerase with ThermoPol Buffer (New England Biolabs Ltd) with denaturation at 95° for 3 minutes, followed by 15 cycles of 95°C for 30 sec, 49°C for 30 sec, and 72°C for 3.5 min. PCR products were size selected with 0.5x beads:sample ratio with Mag-Bind® TotalPure NGS (Omega Bio-Tek). Then, 47.04 fmol of fragmented DNAs were used for Gibson assembly with 25.83 fmol BamHI-digested pYD1 vectors using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs Ltd), and reactions were incubated for 1h at 50°C. One µL of the reaction mix was used to transform 50 µL of chemically competent (∼109 cfu/µg) DH5α Escherichia coli. Ampicillin-resistant colonies were scraped from plates, combined, and the plasmid DNAs were isolated using the NucleoBond® Xtra Midi EF plasmid mid-prep kit (Takara Bio USA Inc.) according to the manufacturer’s instructions.
Nanopore sequencing
Libraries cloned in the pYD1 vector were amplified using forward 5’-TTAAGCTTCTGCAGGCTAGTGGTG-3’ and reverse 5’-CACTGTTGTTATCAGATCAGCGGG-3’ primers with Taq DNA Polymerase and ThermoPol Buffer (New England Biolabs Ltd) for 16 cycles of 95°C for 30 sec, 52°C for 30 sec, and 68°C for 3.5 min. Amplified fragments were purified using 0.85x beads:sample ratio with Mag-Bind® TotalPure NGS (Omega Bio-Tek). Fragments were prepared for Oxford nanopore sequencing using ligation sequencing kit SQK-LSK110 (Oxford Nanopore Technologies) and PCR Barcoding Expansion kit (EXP-PBC001) according to the manufacturer’s instructions, except that quick T4 ligase was used for the BCA ligation instead of the blunt TA enzyme/buffer mix recommended. Fifty fmol of pooled barcoded libraries were sequenced in a MinION using an R9.4.1 flow cell FLO-MIN106 (Oxford Nanopore Technologies) for 24h following the manufacturer’s instructions. A total of 2-8 Gb of DNA were sequenced. All sequences are available at
Computational analysis
All scripts used in this manuscript are included in the Supporting Information. Nanopore sequencing fast5 data from sequenced libraries were basecalled with Guppy (Oxford Nanopore Technologies), according to the manufacturer’s instructions. Fastq sequences were mapped to T. cruzi Sylvio strain reference genome (release 57), or T. brucei 427-2018 genome, or G. lamblia WB reference genome using minimap2 25. Mapping statistics and format conversions were obtained using Samtools 26. Coverage analysis was performed with DeepTools 27. Data were visualized using Integrative Genomics Viewer 28 and circlize 29 following the developer’s instructions. The libframe tool was developed using Python version 3.8 (https://github.com/cestari-lab/Libframe-tool).
Yeast transfection and galactose induction
EBY100 cells were cultured in Petri dishes containing YPD agar and incubated at 30°C for 72h. In 125 ml of YPD medium, between 5-10 colonies were inoculated to an OD of 0.02 and incubation at 30°C and shaking at 225 rpm overnight until an OD of approximately 1.6. To produce electrocompetent yeast, the cell pellet was harvested by centrifugation (3000 rpm, 4 mins, 4°C) and washed two times in ice cold water, then once in ice cold electroporation buffer (1M sorbitol, 1mM CaCl2). Cells were resuspended in 25 ml conditioning buffer (0.1M LiAc/10mM DTT) for 20 mins at 30°C with shaking at 225 rpm. Cells were harvested by centrifugation and washed once in ice cold water and once in ice cold electroporation buffer. Cells were resuspended in electroporation buffer to a final volume of 1.25 ml and kept on ice until electroporation. For each electroporation, 0.1 ug of library DNA and 5 µl of salmon sperm DNA (ThermoFisher Scientific) (250 µg/ml final concentration) were added to 200 µl of electrocompetent cells and mixed by gently pipetting. The DNA/yeast mixture was transferred to a pre-chilled BioRad GenePulser cuvette (0.2 cm electrode gap) and incubated on ice for 5 mins. Cells were then electroporated at 2.5 kV, 25 μF and 200Ω. Cells were immediately transferred to a 4 ml mixture of 1:1 sorbitol:YPD and allowed to recover for 1 hour at 30°C, shaking at 225 rpm. Cells were collected by centrifugation at 4000 rpm for 5’ at 4 ºC, washed three times in water, and resuspended in 4 ml SD/-trp. 10-fold serial dilutions were prepared; 100 µl of 1:100 and 1:1000 dilutions were plated onto SD/-trp agar and grown at 30°C. The transformation efficiency was calculated from the colony counts after 72 hours. The remaining cells were added to 125 ml SD/-trp per every 200 µl electroporation reaction and the culture was expanded for 24 hours at 30°C with shaking at 225 rpm. Transformed yeasts were then kept frozen in SD/-trp media with 25% glycerol. To induce the library expression of parasite peptides, one mL aliquot of transformed yeast was thawed and resuspended in 30 ml of SD/-trp/2% glucose and grown to an OD600 of 1 at 30 ºC while shaking at 225 rpm. Then, the cells were collected by centrifugation at 4,000 rpm for 5 minutes at 4 ºC, washed three times with cold sterile water and resuspended in SD/-trp/ with 2% raffinose (raf) to an OD600 of 0.5-0.8. Cells were incubated for 4 hours at 30°C with shaking at 225rpm, then collected by centrifugation and resuspended in SD/-trp/2%raf with the addition of 1-3% galactose concentrations for induction of protein expression for 16 hours at 30°C, rotating at 225rpm.
Western blotting
Protein extracts were prepared by harvesting cells at 5, 000 × g and then resuspending them in 0.1 M NaOH. The cells were incubated at room temperature for 5 min and centrifuged at 5, 000 × g. The pellet was resuspended in 250 mM Tris (pH 6.8), 140 mM sodium dodecyl sulfate (SDS), 30 mM bromophenol blue, 27 μM glycerol, and 0.1 mM dithiothreitol as previously described 30. After incubation at 95 °C for 10 min, the samples were centrifuged for 5 min at 5000 g to pellet cell debris. Proteins were resolved in 12% SDS/PAGE and transferred to PVDF membranes as previously described 31. Membranes were blocked with 6% non-fat dry milk in PBS 0.05% Tween (PBS-T) for 1h at room temperature (RT). Afterwards, membranes were incubated in mouse anti-Xpress antibodies (Invitrogen) diluted 1:1,000 in blocking buffer (6% non-fat dry milk in PBS-T) for 1h at RT, and then washed 5 times for 8 to 10 minutes at RT. The membranes were incubated in goat anti-mouse IgG-HRP (Santa Cruz) diluted 1:5,000 in blocking buffer for 1h at RT, washed 5 times for 8 to 10 minutes in PBS-T at RT, and then developed with ECL chemiluminescence solution (Life Technologies) in a ChemiDoc Imaging System (Bio-Rad). Membranes were stripped and re-probed, as described above, with rabbit anti-β-actin (1:2000) (ABclonal Technology) and donkey anti-rabbit IgG 1:5000 (Santa Cruz).
Fluorescence-based imaging and flow analysis
One ml aliquots containing approximately 2.0×106 yeast cells were harvested by centrifugation (4,000 xg, 5 minutes) and washed once in PBS. Cells were then fixed in a 1:1 (vol:vol) mix of PBS and 3.7% paraformaldehyde for 20 minutes at RT on a rotator. Cells were washed three times in FACS buffer (PBS, 1% BSA, 0.1% sodium azide) and resuspended in 1ml FACS buffer. 100 µl of cells were transferred to fresh Eppendorfs, pelleted as above, and resuspended in 100 ul of FACS buffer containing mouse anti-Xpress antibodies diluted 1:500 and incubated at RT for 1 hour or at 4°C overnight with rotation. Cells were then washed three times in FACS buffer and incubated with AlexaFluor 488 conjugated anti-mouse IgG (Invitrogen) in FACS buffer for 1 hour at RT in the dark, with rotation. Cells were washed three times in FACS buffer, then resuspended in 1ml FACS buffer for analysis. Analysis was performed using Attune NxT Flow Cytometer (ThermoFisher Scientific). For Cytation 5 imaging analysis and microscopy, cells were fixed and incubated as described above. Imaging analysis was performed using Cytation 5 imaging reader (BioTek). For microscopy, yeast cells were fixed on cover glass pre-soaked with 1% poly-L-lysine to help cell attachment to the cover glass. Cells were mounted on slides with Fluoromount-G™ Mounting Medium (Thermo Scientific). The slides were analyzed using Nikon E800 Upright fluorescence microscopy (Nikon).
YSD-FS screen of metronidazole
Gl-Lib or pYD1 transformed yeast were inoculated into SD/-trp/2%raf/3% gal and incubated at 30°C, 225 rpm for 16h. The cultures were diluted to obtain an OD of 0.2, then 200 ul was inoculated on sterile dish plates (150mm x 25mm) containing SD/-trp/2% raf/3% gal/agar medium and metronidazole (20mg/ml, diluted in dimethyl sulfoxide, DMSO). Cultures were also incubated in plates with drug vehicle only to monitor the effects of DMSO. After a 3-day incubation at 30°C, the colonies from Gl-Lib or pYD1 transformed yeast grown in the presence of metronidazole were scraped from the plates. The cultures were recovered in SD/-trp medium for 16h at 30°C, then washed three times in Milli-Q water and replated as described above in metronidazole-containing plates for an additional three rounds of selection. Afterwards, colonies were scraped from the plates, and their plasmid DNAs were extracted by resuspending the colonies in 400 µL lysis buffer (10mM Tris pH8.0, 0.1M EDTA, 0.5%SDS), 200 μl of acid-washed 0.5 mm glass beads. The mixture was incubated with 20 units of Proteinase K for 30 minutes at 55°C. Then, it was vigorously vortexed for 10 min, with the subsequent addition of another 200µl lysis buffer and vortexed for 10 additional minutes. Afterwards, 20 µl of RNAse A (10mg/ml) were added to the mixture and incubated at RT for 5 minutes. The mixture was boiled at 100 °C for 3 min in a heat block, placed on ice for 1 minute, and then centrifuged for 14,000 × g, 10 min. The supernatant was collected for plasmid DNA extraction using 1:1 (v:v) Phenol:Chloroform:Isoamyl Alcohol (25:24:1) pH 6.7, followed by centrifugation at 14,000 x g for 10 minutes. Cold isopropanol 1:1 (vol:vol) and 3 µl LPA were mixed with the aqueous phase and centrifuged at 14,000 x g for 15 minutes. The precipitated plasmid DNA was washed in 70% cold ethanol and resuspended in 10 mM Tris pH 8.0. The screen was performed in triplicate and samples multiplexed for Oxford nanopore sequencing as described above.
Data and statistical analysis
Data are shown as means ± standard deviation of the mean from at least three biological replicates. Comparisons among groups were made by a two-tailed t-test using GraphPad Prism. P-values ≤ 0.05 with a confidence interval of 95% were considered statistically significant unless otherwise stated. Graphs were prepared using Prism (GraphPad Software, Inc.), Matlab (Mathworks), or RStudio.
Data availability
All data have been included within the article and Supplementary Information. Sequencing data were deposited in the Sequence Read Archive (SRA) with BioProject numbers PRJNA851089, PRJNA851424, and PRJNA851903. The Libframe tool and its code is available at https://github.com/cestari-lab/Libframe-tool. The genome-wide libraries are available upon request (Contact: Igor Cestari, igor.cestari{at}mcgill.ca).
Supporting information
This article contains supporting information.
Conflict of interest
The authors declare that they have no conflicts of interest with the contents of this article.
Funding and additional information
This work was funded by CIHR (CIHR PJT-175222, to IC), NSERC (NSERC RGPIN-2019-05271, to IC), CFI (258389, to IC), and by McGill University (130251, to IC). Andressa Lira received a Mitacs Global-Link fellowship (IT25163). Sahil Rao Sanghi received a fellowship from the Mitacs Research Training Award program (IT19603).
Acknowledgements
We thank Dr. Janet Yee (Department of Biology, Trent University) for providing Giardia Lamblia parasites, sharing cell culture protocols, and providing suggestions to the manuscript. We also thank Dr. Steven Rafferty (Department of Chemistry, Trent University) for providing valuable suggestions to the manuscript.