Abstract
Despite longstanding appreciation of gene expression heterogeneity in isogenic bacterial populations, affordable and scalable technologies for studying single bacterial cells have been limited. While single-cell RNA sequencing (scRNA-seq) has revolutionized studies of transcriptional heterogeneity in diverse eukaryotic systems, application of scRNA-seq to prokaryotic cells has been hindered by their low levels of mRNA, lack of mRNA polyadenylation, and thick cell walls. Here, we present Prokaryotic Expression-profiling by Tagging RNA In Situ and sequencing (PETRI-seq), a high-throughput prokaryotic scRNA-seq pipeline that overcomes these obstacles. PETRI-seq uses in situ combinatorial indexing to barcode transcripts from tens of thousands of cells in a single experiment. We have demonstrated that PETRI-seq effectively captures single cell transcriptomes of Gram-negative and Gram-positive bacteria with high purity and little bias. Although bacteria express only thousands of mRNAs per cell, captured mRNA levels were sufficient to distinguish between the transcriptional states of single cells within isogenic populations. In E. coli, we were able to identify single cells in either stationary or exponential phase and define consensus transcriptomes for these sub-populations. In wild type S. aureus, we detected a rare population of cells undergoing prophage induction. We anticipate that PETRI-seq will be widely useful for studying transcriptional heterogeneity in microbial communities.
Background
Bacterial communities, including genetically homogenous populations, are typically composed of cells in non-identical gene expression states [1, 2]. Gene expression heterogeneity underlies many fundamental bacterial phenomena including communication [3], pathogenicity [4], competence [2], biofilm formation [5, 6] and antibiotic persistence [7]. Elucidation of these processes at a single-cell level could substantially improve our understanding of bacterial evolution and community structures and guide rational development of anti-microbial strategies. However, conventional bacterial single-cell methodologies, such as in situ hybridization [8, 9] and fluorescent reporters [10], allow only a few genes to be monitored at a time. There is a pressing need to develop methods capable of profiling global molecular signatures of single bacterial cells.
Recent developments in high-throughput single-cell RNA sequencing (scRNA-seq) technology have enabled rapid characterization of cellular diversity within complex eukaryotic tissues [11–22]. Despite these advances, comparable tools to study the transcriptomes of individual bacterial cells remain limited (Figure S1). Existing bacterial techniques are low throughput, involving manual isolation of single cells followed by reverse transcription (RT) and amplification reactions for one cell at a time. In 2011, the first single-cell microarray study was described for a few Burkholderia thailandensis cells [23], each containing 2 pg of RNA, orders of magnitude more than many bacterial species of interest [24]. More recent reports described sequencing of six Synechocystis sp. PCC6803 cells [25] and three Porphyromonas somerae cells [26], each of which contains 1-5 fg of RNA. These methods comprehensively characterized the transcriptomes of a few single cells. However, they are prone to contamination and not equipped to study highly heterogeneous bacterial communities and rare populations like persisters [27] across thousands of cells.
Development of high-throughput bacterial scRNA-seq has lagged behind due to numerous technical challenges. Current massively parallel eukaryotic scRNA-seq methods typically require custom microfluidics to co-encapsulate a single cell with a uniquely barcoded bead in a compartment, often a droplet [15, 16, 18] or microwell [14, 17]. These approaches rely on two key properties of many eukaryotic cells, specifically that they are easily lysed with detergent to release their RNA and that their poly-adenylated mRNAs can be effectively captured by beads coated with poly(T) primers. Adaptation of these approaches for bacteria is thwarted by the presence of thick prokaryotic cell wall [28], which makes lysis challenging, and the lack of poly-adenylated mRNAs for effective capture.
Given these considerations, we identified in situ combinatorial indexing [29] as an alternative basis upon which to develop a method for high-throughput prokaryotic scRNA-seq. Two conceptually similar eukaryotic methods, single-cell combinatorial indexing RNA sequencing (sci-RNA-seq) [19, 20] and split-pool ligation-based transcriptome sequencing (SPLiT-seq) [21], rely on cells themselves as compartments for barcoding, which abrogates the need for cell lysis in droplets or microwells. These methods are also amenable to RT with random hexamers instead of poly(T) primers [21]. With just pipetting steps and no complex instruments, individual transcriptomes of hundreds of thousands of fixed cells are uniquely labeled by multiple rounds of splitting, barcoding, and pooling in microplates.
Here, we present Prokaryotic Expression-profiling by Tagging RNA In situ and sequencing (PETRI-seq), a high-throughput, affordable, and easy-to-perform scRNA-seq method capable of distinguishing the transcriptional states of tens of thousands of wild type Gram-positive (S. aureus USA300) and Gram-negative (E. coli MG1655) cells. Our approach captures mRNA with little bias, approaching bulk expression levels when cell transcriptomes are aggregated. Although bacteria only express thousands of mRNAs per cell [1, 30, 31] in contrast to hundreds of thousands in mammalian cells [32], our results show that captured transcript levels are sufficient to distinguish sub-populations at different growth stages and gain novel insights into rare cell sub-populations. PETRI-seq has the potential to elucidate various bacterial phenotypes, including persistence, biofilm formation, and host-pathogen interactions. PETRI-seq could also ultimately enable high-resolution capture of transcriptional dynamics in microbial communities, including unculturable components, a major current challenge in microbiology [33].
Results
A Method for Single-Cell RNA Sequencing of Prokaryotic Cells
PETRI-seq (Figure 1) consists of three experimental components: cell preparation, split-pool barcoding, and library preparation, which are detailed in Figure S2 and Methods. Cell preparation includes fixation to maintain cell integrity, cell wall permeabilization to allow reagent diffusion into cells, and DNase treatment to remove genomic background. As cell preparation is critical to the success of PETRI-seq, we had to optimize key parameters to establish a working protocol for E. coli. Cells were briefly pelleted before fixation with 4% formaldehyde, as adding formaldehyde directly to the cell culture without pelleting reduced RT efficiency (Figure S3A), possibly due to excess cross-linking with media components. We confirmed that fixation did not alter the bulk transcriptome (Figure S3B). Cells were next resuspended in 50% ethanol, which has been used previously for prokaryotic in situ PCR as a storage solution [34], though we have yet to test cellular and RNA integrity after long-term storage. Ethanol did not significantly change the cDNA yield from in situ RT (Figure S3C). Lysozyme was subsequently added to permeabilize cells for in situ RT (Figure S3D). Cells were next treated with DNase to remove background genomic DNA, and DNase was inactivated by mild heat treatment. We confirmed in situ DNase activity by qPCR (Figure S3E) and verified DNase inactivation (Figure S3F,G). Before proceeding to RT, cells were imaged to confirm they were intact (Figure S3H) and counted.
In the next stage, we performed split-pool barcoding. Cells were distributed across a microplate for RT, with different short DNA barcodes in each well. After RT, cells were pooled and redistributed across new microplates for two rounds of barcoding by ligation to the cDNA. We reduced the length of the overhang for each ligation relative to the eukaryotic protocol [21], which made it possible to use only 75 cycles of sequencing instead of 150 cycles and decrease the sequencing cost by almost 50% (Table S1B). We demonstrated effective barcode ligation with this modification (Figure S3I). After three rounds of barcoding, cells contained cDNA labeled with one of nearly one million possible three-barcode combinations (BCs). We counted the cells and lysed roughly 10,000 cells for library preparation. The number of cells was chosen to ensure a low multiplet frequency, which is the percent of non-empty BCs containing more than one cell [35]. For a library of 10,000 cells, the expected multiplet frequency based on a Poisson distribution is 0.56%.
Finally, cDNA was prepared for Illumina sequencing. We used AMPure XP beads to purify cDNA from cell lysates (Figure S3J). AMPure purification is faster and less expensive than streptavidin purification used previously in eukaryotic SPLiT-seq [21]. Importantly, primer biotinylation is a significant initial expense, which is avoided by AMPure purification (Table S1C). To make double-stranded cDNA, we compared second-strand synthesis [36] and limited-cycle PCR after template switching [12]. We found that the former had a significantly higher yield (Figure S3K,L). We then performed tagmentation followed by PCR using the transposon-inserted sequence and the overhang upstream of the third barcode as primer sequences, thereby preventing amplification of any undigested genomic DNA. The libraries were sequenced and analyzed using the pipeline detailed in Figure S4 and Methods. BCs with at least 40 total transcripts, or unique molecular identifiers (UMIs) [37], were considered for further analysis (Figure S4E,F,G).
PETRI-Seq Captures Transcriptomes of Single Cells
To demonstrate the ability of PETRI-seq to capture transcriptomes of single cells, we performed a species-mixing experiment involving three populations of cells: GFP- and RFP-expressing E. coli and wild type S. aureus (Figure 2A). From 9,642 sequenced BCs, we observed that BCs were highly species-specific with 99.6% clearly assigned to one species (Figure 2B). We calculated an overall multiplet frequency of 1.8% after accounting for multiplets of the same species and non-equal representation of the two species [35]. Though this frequency exceeds the Poisson expectation of 0.56%, it is comparable to existing eukaryotic methods [18, 20]. Within the E. coli population, we included a population of cells constitutively expressing GFP [10] and another population expressing RFP induced by anhydrotetracycline (aTc) [38]. E. coli BCs were highly strain-specific with 98.6% assigned to a single population (Figure S5A). With this confirmation that PETRI-seq successfully captured single-cell transcriptomes, we were able to quantify the number of transcripts per cell. We captured a mean of 52.6 and median of 41 mRNAs per GFP-containing E. coli cell (Figure 2C). From these same cells, we captured a mean of 384 and median of 292 total RNAs per cell (Figure S5B). We captured fewer mRNA transcripts per RFP-expressing E. coli cell (Figure S5C), likely due to their reduced growth rate during aTc induction (Figure S5D). There were also 204 ambiguous cells, which could not be assigned to the RFP or GFP population because they did not contain any plasmid transcripts. By excluding these ambiguous cells, we risked over-estimating the true levels of mRNA captured per cell in each population. We thus considered the extreme cases where all ambiguous cells were part of one population or the other (Figures 2C,S5C). The broader population of 875 ambiguous and GFP-expressing cells contained a mean of 44.5 and median of 33 mRNAs per cell. Based on estimates that single E. coli cells contain 2000-8000 mRNAs [1, 30, 31], we estimate our capture rate to be roughly 0.5-2%. In the S. aureus population, we captured a mean of 20.6 and median of 18 mRNAs per cell (Figure 2D). S. aureus cells may contain fewer mRNAs than E. coli cells because of their smaller cell size and genome [39], though there may also be technical differences affecting capture.
Performing molecular reactions inside of cells raises the possibility that RNA capture could be biased by specific cellular contexts. Prior results in eukaryotic cells revealed a bias against rRNA transcripts during in situ RT [21], which is mildly recapitulated in our data (Figure S5E). 87% of sense E. coli transcripts were mapped to rRNA, while previous reports [40] and our own bulk data (not shown) found closer to 96% rRNA. Importantly, we observed strong correlations between combined single-cell transcriptomes from PETRI-seq and cDNA libraries prepared by standard RT for both E. coli and S. aureus (Figure 2E,F), despite the capture bias against rRNA. Our single-cell transcriptomes were reproducible, as shown by the strong correlation between the aggregated transcriptomes of GFP-expressing E. coli cells from two independent libraries (Figure 2G).
PETRI-Seq Classifies Single Cells by Growth Stage
We next sought to determine the capacity of PETRI-seq to distinguish between cells in different growth states. As a proof-of-concept, we mixed E. coli cells in two well-characterized growth phases to create a population resembling naturally arising transcriptional heterogeneity. Specifically, we implemented PETRI-seq on a combined population of GFP-expressing exponential and aTc-induced RFP-expressing stationary E. coli (Figure 3A). We applied unsupervised dimensionality reduction (Principal Component Analysis—PCA [41]) to visualize the low-dimensional structure underlying the diversity of transcriptional states. For the PCA calculation, we considered only cells containing at least 15 mRNAs to avoid spurious effects from cells with extremely low mRNA content. Without considering plasmid genes, we observed robust separation of two populations along principal component 1 (PC1). We used the plasmid genes to classify these populations as RFP-containing stationary and GFP-containing exponential cells (Figure 3B, bottom). We assigned a threshold value for PC1 to distinguish between the two populations and found that 99% of plasmid-containing cells below the threshold expressed the GFP plasmid, and 95% of plasmid-containing cells above the threshold expressed the RFP plasmid. Overall, 98% of all plasmid-containing cells were on the expected side of the threshold line. Of the 7374 cells analyzed, 61% did not contain any plasmid transcripts, so their growth state was at first ambiguous (grey points in PCA). However, we used the PC1 threshold to predict the states of the ambiguous cells and found that 89% were stationary cells. Over-representation of stationary cells in the ambiguous population was not surprising as plasmid expression in stationary cells was generally lower than in exponential cells. Further analysis of the populations determined by the PC1 threshold revealed a mean of 69.2 and median of 51.0 mRNAs per exponential cell, while each stationary cell contained a mean of 34.9 and median of 29.0 mRNAs (Figure 3C). Previous reports have found that stationary cells express fewer mRNAs than exponential cells [42]. The discrepancy in our data also may be due to reduced mRNA levels upon RFP induction by aTc. Lastly, we showed that separation of the two transcriptional states was similarly robust in another biological replicate (Figure S6A) or when operon counts were normalized using sctransform [43], an alternative method (Figure S6B).
We investigated expression patterns for operons and gene ontology (GO) terms for the two biologically distinct populations. We confirmed that rpoS, the stationary phase sigma-factor [44], and dps, a DNA-binding protein essential for cellular transition into stationary phase [45], were upregulated along PC1, as expected in the direction of stationary cells (Figure 3B, middle). Consistent with induction of the stringent response [46], stationary cells showed a large-scale reduction in ribosomal protein expression as well as an increase in expression of amino acid biosynthetic operons (Figure 3B, Top; Figure 3D). oppABCDF, a highly expressed operon encoding the oligopeptide permease, was most strongly correlated with the transition to stationary phase, based on its PC1 loading (Figure 3D), and has been previously shown to be induced during phosphate starvation [47]. Cytochrome oxidase expression also informed the identification of exponential and stationary phase cells. While stationary E. coli cells expressed higher levels of cytochrome D (cydAB), exponential cells expressed more cytochrome O (cyoABCDE). This shift in cytochrome oxidase expression based on growth phase has been well-characterized [48].
Though PETRI-seq captured ~50 mRNAs per bacterial cell, it was sufficient to identify groups of single cells in the same gene expression state. We hypothesized that transcriptomes from similar cells could be combined to define a consensus state for a particular sub-population and that this characterization could be continuously improved by increasing the number of cells in the library. To test this hypothesis, we generated exponential or stationary phase transcriptomes by aggregating the expression counts for single cells of either type as determined by the first principal component of our PCA (Figure 3B, Bottom) and determined the correlations between these aggregated transcriptomes and independently prepared bulk libraries (Figure S7A,B). We repeated this calculation 1,000 times after sampling different numbers of cells ranging from 50 to 7374 cells (Figure S7C,D). Our analysis confirmed the expectation that as more cells were included in the library, the correlation with an independently prepared bulk library from cells in the same growth state increased. It also appeared that the correlations would continue to increase if more cells were sequenced. Notably, the correlation of either single cell type with both bulk libraries increased as cells were added, but the correlations were stronger and increasing at a greater rate for single-cell/bulk libraries of cells in the same state (colored curves in Figure S7C,D), indicating that the aggregated single cells were approaching a transcriptome reflecting their growth state. This analysis demonstrated that by aggregating many cells with similar expression profiles, PETRI-seq could be used to characterize the transcriptomes of sub-populations that might be otherwise difficult to isolate from bulk RNA-seq.
PETRI-Seq Discovers A Rare Sub-Population Undergoing Prophage Induction in S. aureus
scRNA-seq enables characterization of rare populations exhibiting distinct gene expression programs and phenotypes. We applied PCA to 5,604 S. aureus single-cell transcriptomes generated by PETRI-seq (Figure S8A) and found that the eight operons most highly correlated with PC1 (Figure S8B) were lytic genes of prophage ϕSA3usa (Figure S8C, red arrows) [49, 50]. Cells expressing these operons diverged from the rest of the population along PC1 (Figure S8A, red points), indicating that PC1 might be capturing rare prophage induction in the S. aureus culture. Within the small population, 3 cells exhibited dramatic upregulation of phage lytic transcripts reaching roughly 80% of these single-cell transcriptomes (Figure S8D). The remaining 18 cells contained fewer than 10% phage transcripts. In further analysis of the heterogeneity in gene expression across the entire S. aureus population, we found that for most operons, transcriptional noise (σ2/μ2) [1] inversely scaled with mean expression (μ) and followed a Poisson distribution (μ = σ2), which has been described in other single cell studies [51, 52]. SAUSA300_1933-1925, a phage lytic operon encoding putative phage tail and structural genes, clearly diverged and exhibited higher noise than expected from the mean (Figure S8E), which recapitulated its hypervariability in expression as found by PCA. As such, we have demonstrated that PETRI-seq can detect rare cells occupying distinct transcriptional states like prophage induction.
Discussion
In this work, we developed PETRI-seq, an affordable method for high-throughput in situ combinatorial indexing and scRNA-seq of bacterial cells. Prokaryotic scRNA-seq tools have lagged behind eukaryotic methods because of the low mRNA content per cell and technical barriers including the thick cell wall and lack of mRNA poly-adenylation. Using PETRI-seq, we characterized single-cell transcriptional states of both Gram-positive and Gram-negative bacterial species. We cost-effectively (Table S1) sequenced ~20,000 single cells, a dramatic improvement in throughput over existing methods, which typically sequence fewer than ten cells. PETRI-seq captured 30-70 mRNAs per average E. coli cell, corresponding to 0.5-2% of total mRNAs. Aggregated transcriptomes from single cells were highly correlated with bulk RNA-seq libraries. Using fluorescently labeled cells, we showed that PETRI-seq assigned >98% of single cells to the correct growth phase (i.e. stationary or exponential) and defined consensus transcriptomes for these growth phases. PETRI-seq also detected rare prophage-induced cells that were present in 0.4% of the S. aureus population. The introduction of PETRI-seq represents a major advance in high-throughput single-cell microbiology.
Further optimization has the potential to increase the capture rate of PETRI-Seq and improve its sensitivity. During the library preparation step of PETRI-seq, double-stranded cDNA was subjected to conventional tagmentation with both N5 and N7 adaptors (Illumina Nextera XT). However, only one of the adaptors (N7 in our case) could be subsequently amplified. Modified tagmentation using a commercially available and customizable Tn5 (Lucigen) could increase capture by 2-fold [20]. Capture might be further improved by increasing primer and enzyme concentrations during the RT and ligation steps and/or using a hairpin ligation [20] instead of an inter-molecular linker. Given that rRNAs comprise >95% of total RNA species in many bacteria, we reason that mRNA capture might be additionally improved by designing RT primers with sequences biased against rRNA [53], thereby directing reagents preferentially toward mRNA. Alternatively, in situ 5’-phosphate-dependent exonuclease treatment could be used to preferentially degrade processed RNAs, the majority of these being rRNAs [54], prior to RT. If successful, these modifications would reduce the fraction of sequencing reads mapped to rRNA. Although sequencing depth is not limiting at the current capture rate, it may be necessary to deplete rRNA if overall capture is improved so that libraries can be comprehensively sequenced. For this purpose, abundance-based normalization by melting and rehybridization of double-stranded cDNA followed by duplex-specific nuclease treatment [55] may also be considered.
PETRI-seq detected a rare sub-population undergoing prophage induction in S. aureus, which has important clinical implications, as prophage induction is intimately linked to bacterial pathogenesis. Mobile genetic elements such as temperate phages routinely carry virulence factors, and it has been shown that prophage induction can lead to co-expression of these factors [56, 57]. The high throughput capacity of PETRI-seq was vital for identifying such a rare event, and the dominance of phage lytic transcripts in cells undergoing prophage induction made these cells readily detectable. Future studies could use PETRI-seq to further probe the dynamics of prophage induction and lytic phage infection. It will additionally be of interest to gauge the sensitivity of PETRI-seq to characterize other rare, clinically important populations, such as persisters. Persisters are antibiotic-tolerant cells that typically comprise <1% of an otherwise susceptible bacterial population [58]. The underlying transcriptional state of persisters remains poorly understood. More generally, PETRI-seq could be used to study a wide range of bacterial phenotypes far beyond the examples shown here. We hope that widespread implementation of PETRI-seq to study diverse bacterial species and phenotypes will facilitate greater understanding of single-cell phenomena within bacterial populations.
Methods
Experimental Methods
Bacterial Strains and Growth Conditions
E. coli MG1655 was routinely grown in MOPS EZ Rich defined medium (M2105, Teknova, Hollister, CA). pBbE2A-RFP was a gift from Jay Keasling [38] (Addgene plasmid # 35322). RFP was induced with 20 nM anhydrotetracycline hydrochloride (233131000, Acros Organics, Geel, Belgium). GFP was expressed from prplN-GFP [10]. Plasmid-containing cells were grown in appropriate antibiotics (50 μg/mL kanamycin, 100 μg/mL carbenicillin). S. aureus USA300 [49] was routinely grown in trypticase soy broth (TSB) medium (211825, BD, Franklin Lakes, NJ). All bacterial strains were grown at 37°C and shaken at 300 rpm.
Custom Primers Used in this Study
All single-tube primers are shown in Table S2. All primer sequences for 96-well split-pool barcoding are shown in Table S3. Primers were purchased from Integrated DNA Technologies (IDT, Coralville, IA).
Preparation of Ligation Primers
Round 2 and Round 3 ligation primers (Table S3) were diluted to 20 μM. Linkers SB80 and SB83 were also diluted to 20 μM. To anneal barcodes to linkers, 96-well PCR plates (AB0600, Thermo Scientific) were prepared with 4.4 μL of 20 μM linker, 0.8 μL water, and 4.8 μL of each barcode. Primers were annealed by heating the plate to 95°C for 3 minutes then decreasing the temperature to 20°C at a ramp speed of −0.1°C/second.
Primers SB84 and SB81 were also annealed (to form an intramolecular hairpin) prior to blocking by heating 50 μL or 80 μL, respectively, of each 100 μM primer to 94°C and slowly reducing the temperature to 25°C.
Cell Preparation for PETRI-Seq
For sequencing and qPCR measurements, cells were grown overnight then diluted into fresh media (1:100 for S. aureus and prplN-GFP E. coli, 1:50 for pBbE2A-RFP E. coli) with inducer and antibiotics when applicable. For exponential cells, E. coli and S. aureus cultures were grown for approximately 2 hours until reaching an OD600 of 0.4 or 0.9, respectively. Exponential E. coli cells were used for all qPCR optimization experiments. For stationary cells, pBbE2A-RFP E. coli cells were grown an additional 3 hours until the culture reached an OD600 of 4. For the combined exponential E. coli library, 3.5 mL of exponential GFP E. coli was combined with 3.5 mL of exponential RFP E. coli. The S. aureus library was prepared separately from 7 mL of exponential cells. For the 2 libraries of exponential GFP E. coli combined with stationary RFP E. coli, 3 mL of exponential GFP cells was added to ~300 μL of stationary RFP cells. Before fixation, cells were pelleted at 5,525xg for 2 minutes at 4°C. Spent media was removed, and cells were resuspended in 7 mL of ice-cold 4% formaldehyde (F8775, Millipore Sigma, St. Louis, MO) in PBS (P0195, Teknova). This suspension was rotated at 4°C for 16 hours on a Labquake Shaker (415110, Thermo Scientific)
Fixed cells were centrifuged at 5525xg for 10 minutes at 4°C. The supernatant was removed, and the pellet was resuspended in 7 mL PBS supplemented with 0.01 U/μL SUPERase In RNase Inhibitor (AM2696, Invitrogen, Carlsbad, CA), hereafter referred to as PBS-RI. Cells were centrifuged again at 5525xg for 10 minutes at 4°C then resuspended in 700 μL PBS-RI. Subsequent centrifugations for cell preparation were all carried out at 7000xg for 8-10 minutes at 4°C. Cells were centrifuged, then resuspended in 700 μL 50% ethanol (2716, Decon Labs, King of Prussia, PA) in PBS-RI. Cells were next washed twice with 700 μL PBS-RI, then resuspended in 105 μL of 100 μg/mL lysozyme (90082, Thermo Scientific, Waltham, MA) or 40 μg/mL lysostaphin (LSPN-50, AMBI, Lawrence, NY) in TEL-RI (100 mM Tris pH 8.0 [AM9856, Invitrogen], 50 mM EDTA [AM9261, Invitrogen], 0.1 U/μL SUPERase In RNase inhibitor [10x more than in PBS-RI]). Cells were permeabilized for 15 minutes at room temperature (~23°C). After permeabilization, cells were washed with 175 μL PBS-RI then resuspended in 175 μL PBS-RI. 100 μL was taken for subsequent steps and centrifuged, while the remaining 75 μL was discarded. Cells were resuspended in 40 μL DNase-RI buffer (4.4 μL 10x reaction buffer, 0.2 μL SUPERase In RNase inhibitor, 35.4 μL water). 4 μL of DNase I (AMPD1, Millipore Sigma) was added, and cells were incubated at room temperature for 30 minutes. To inactivate the DNase I, 4 μL of Stop Solution was added, and cells were heated to 50°C for 10 minutes with shaking at 500 rpm (Multi-Therm, Benchmark Scientific, Sayreville, NJ). After DNase inactivation, cells were pelleted, washed twice with 100 μL PBS-RI, then resuspended in 100 μL 0.5x PBS-RI. Cells were counted using a hemocytometer (DHC-S02 or DHC-N01, INCYTO, Chungnam-do, Korea).
Split-Pool Barcoding for PETRI-Seq
For RT, Round 1 primers (Table S3) were diluted to 10 μM then 2 μL of each primer was aliquoted across a 96-well PCR plate. A mix was prepared for RT with 240 μL 5x RT buffer, 24 μL dNTPs (N0447L, NEB, Ipswich, MA), 12 μL SUPERase In RNase Inhibitor, and 24 μL Maxima H Minus Reverse Transcriptase (EP0753, Thermo Scientific). 3 * 107 cells were added to this mix. For species-mixed libraries, E. coli and S. aureus cells were combined at this point. Water was added to bring the volume of the reaction mix to 960 μL. 8 μL of the reaction mix was added to each well of the 96-well plate already containing RT primers. The plate was sealed and incubated as follows: 50°C for 10 minutes, 8°C for 12 seconds, 15°C for 45 seconds, 20°C for 45 seconds, 30°C for 30 seconds, 42°C for 6 minutes, 50°C for 16 minutes, 4°C hold. After RT, the 96 reactions were pooled into one tube and centrifuged at 10,000xg for 20 minutes at 4°C. The supernatant was removed.
For the first ligation, cells were then resuspended in 600 μL 1x T4 ligase buffer (M0202L, NEB). The following additional reagents were added to make a master mix: 7.5 μL water, 37.5 μL 10x T4 ligase buffer, 16.7 μL SUPERase In RNase Inhibitor, 5.6 μL BSA (B14, Thermo Scientific), and 27.9 μL T4 ligase. 5.76 μL of this mix was added to each well of a 96-well plate containing 2.24 μL of annealed Round 2 ligation primers. Ligations were carried out for 30 minutes at 37°C. After this incubation, 2 μL of blocking mix (37.5 μL 100 μM SB84, 37.5 μL 100 μM SB85, 25 μL 10x T4 ligase buffer, 150 μL water) was added to each well, and reactions were incubated for an additional 30 minutes at 37°C. Cells were then pooled into a single tube.
The following reagents were added to the pooled cells for the third round of barcoding: 15.6 μL water, 48 μL 10x T4 ligase buffer, and 13.2 μL T4 ligase. 8.64 μL of this mix was added to each well of a 96-well plate containing 3.36 μL of annealed round 3 ligation primers. The plate was incubated for 30 minutes at 37°C. 10 μL of round 3 blocking mix (72 μL 100 μM SB81, 72 μL 100 μM SB82, 120 μL 10x T4 ligase buffer, 336 μL water, 600 μL 0.5 M EDTA) was added to each well. Cells were then pooled into a single tube and centrifuged at 7000xg for 10 minutes at 4°C. The supernatant was removed, and the pellet was resuspended in 50 μL TEL-RI to wash the pellet. This suspension was centrifuged at 7000xg for an additional 10 minutes at 4°C, the supernatant was removed, and the cells were resuspended in 30 μL TEL-RI. Cells were counted using a hemocytometer. Aliquots of 10,000 cells were taken and diluted in 50 μL lysis buffer (50 mM Tris pH 8.0, 25 mM EDTA, 200 mM NaCl [AM9759, Invitrogen]). 5 μL of proteinase K (AM2548, Invitrogen) was added to the cells in lysis buffer. Cells were lysed for 1 hour at 55°C with shaking at 750 rpm (Multi-Therm). Lysates were stored at −80°C.
Library Preparation for PETRI-Seq
Lysates were purified with AMPure XP beads (A63881, Beckman Coulter, Brea, CA) at a 1.8x ratio (~99 μL). cDNA was eluted in 20 μL water. 14 μL water, 4 μL NEBNext Second Strand Synthesis Reaction Buffer, and 2 μL NEBNext Second Strand Synthesis Enzyme Mix (E6111S, NEB) were added to the purified cDNA. This reaction was incubated at 16°C for 2.5 hours. The resulting double-stranded cDNA was purified with AMPure XP beads at a 1.8x ratio (~72 μL). cDNA was eluted in 20 μL water and used immediately for tagmentation or stored at −20°C.
cDNA was tagmented and amplified using the Nextera XT DNA Library Preparation Kit (FC-131-1096, Illumina, San Diego, CA). The manufacturer’s protocol was followed with the following modified reagent volumes and primers: 25 μL TD, 20 μL cDNA, 5 μL ATM, 12.5 μL NT, 2.5 μL N70x (Nextera Index Kit v2 Set A, TG-131-2001, Illumina), 2.5 μL i50x (E7600S, NEB), 20 μL water, 37.5 μL NPM. Libraries were amplified for 8 cycles according to the manufacturer’s protocol. After 8 cycles, 5 μL was removed, added to a qPCR mix (0.275 μL EvaGreen [31000, Biotium, Fremont, CA], 0.11 μL ROX Low Reference Dye [KK4602, Kapa Biosystems, Wilmington, MA], 0.115 μL water), and further cycled on a qPCR machine. qPCR amplification was used to determine the exponential phase of amplification, which occurred after 11 cycles for all libraries presented here. The remaining PCR reaction (not removed for qPCR) was thermocycled an additional 11 cycles, resulting in a total of 19 PCR cycles. Products were purified with AMPure XP beads at a 1x ratio and eluted in 30 μL water. The concentration of the library was measured using the Qubit dsDNA HS Assay Kit (Q32854, Invitrogen) and the Agilent Bioanalyzer High Sensitivity DNA kit (5067-4626, Agilent, Santa Clara, CA).
Libraries were sequenced for 75 cycles with the NextSeq 500/550 High Output Kit v2.5 (20024906, Illumina). Cycles were allocated as follows: 58 cycles read 1 (UMI and barcodes), 17 cycles read 2 (cDNA), 8 cycles index 1, 8 cycles index 2.
Modifications Tested to Optimize PETRI-Seq
To test fixing cells immediately from cultures without centrifugation, ice-cold 5% formaldehyde in PBS was added directly to cells in spent media to bring the final concentration of formaldehyde to 4%. Cell preparation with no lysozyme or no DNase was carried out by simply omitting the enzyme and using water to replace that volume.
Template switching was carried out by adding 2.5 μL 100 μM SB14, 20 μL Maxima H Minus 5x Buffer, 10 μL dNTPs, 2.5 μL SUPERase In RNase Inhibitor, 2 μL Maxima H Minus Reverse Transcriptase, 3 μL water, and 20 μL betaine (J77507VCR, Thermo Scientific) to 40 μL of AMPure purified lysate. SB14 was heated to 72°C for 5 minutes prior to combining the above reagents. The reaction was incubated at 42°C for 90 minutes then heat inactivated at 85°C for 5 minutes. The reaction was purified with AMPure XP beads at a 1.8x ratio and eluted in 30 μL. The purified cDNA was then amplified by setting up the following PCR: 10 μL 5x PrimeSTAR GXL Buffer, 0.1 μL 10 μM SB86, 0.1 μL 10 μM SB15, 1 μL PrimeSTAR GXL Polymerase (R050B, Takara Bio, Kusatsu, Japan), 1 μL dNTPs, and 8 μL water. The reaction was heated to 98°C for 1 minute and then thermocycled 10 times (98°C 10 seconds, 60°C 15 seconds, 68°C 6 minutes). The products were purified by AMPure XP beads at a 1.8x ratio and eluted in 30 μL. The DNA concentration was measured using the Qubit dsDNA HS Assay Kit, and tagmentation was performed according to the manufacturer’s protocol using the appropriate primers (described above for standard PETRI-seq).
qPCR Quantification After In Situ DNase or In Situ RT
For qPCR quantification after in situ RT, cells were counted prior to RT, and then the in situ RT reaction described above (scaled to one 50 μL reaction) was set up with equal cell numbers for each condition and technical replicate. A random hexamer (SB94) or a gene-specific primer (SB10) was used as an RT primer. After RT, cells were centrifuged at 7,000xg for 10 minutes then washed in 50 μL PBS-RI. After one wash, cells were resuspended in 50 μL lysis buffer, and 5 μL of proteinase K was added. Cells were lysed for 1 hour at 55°C with shaking at 750rpm. For qPCR quantification after in situ DNase treatment, cells were washed twice after DNase treatment, as described for PETRI-seq cell preparation, then lysed.
Unpurified lysates were diluted 50x (except for ethanol vs. no ethanol, which were diluted 10x) in water and heated to 95°C for 10 minutes to inactivate proteinase K. Diluted lysates were then used directly in qPCR with either Kapa 2x MasterMix Universal (KK4602, Kapa Biosystems) or Power SYBR Green Master Mix (4368706, Applied Biosystems, Foster City, CA). For quantification of genomic DNA after DNase treatment or quantification of cDNA after RT with random hexamers, qPCR primers SB5 and SB6 were used, and relative abundances were calculated based on an experimentally determined amplification efficiency of 88%, which corresponded to an amplification factor of 1.88. Relative abundance thus referred to 1.88−ΔCt, where ΔCt was the difference between the Ct value of each sample and a calibrator Ct. For RT with the gene-specific primer, qPCR primers SB12 and SB13 were used, as SB12 anneals to the gene-specific primer (SB10). The experimentally determined amplification factor for these primers was 1.73. To quantify cDNA yield, the abundance of a matched sample with no RT (processed equivalently but RT enzyme omitted) was subtracted from each measurement. All replicates were technical replicates, which were treated independently during and after the condition tested.
qPCR Quantification of Ligation Efficiency
To test barcode ligation with a 16-base linker relative to a 30-base linker, approximately 1 μg of purified RNA (bulk) was used for RT with either SB110 or SB114 (used as a positive control). RT was carried out as described for in situ RT, scaled to 50 μL. cDNA was then purified with AMPure XP beads. SB113, the primer to be ligated, was annealed either to SB111 (30 bases) or SB83 (16 bases). 2.24 μL of the annealed primers was then used in a 10 μL ligation reaction. The products were purified with AMPure XP beads. To quantify the proportion of ligated product, qPCR was performed with SB86 and SB13, which amplifies only the ligated product, as SB86 anneals to the ligated overhang, or SB115 and SB13, which amplifies all RT product, as SB115 anneals to the RT primer overhang. ΔΔCt was calculated for the two primer sets with RT product from SB114 as a reference [ΔΔCt = ΔCt(experimental, ligated) − ΔCt(control, SB114 RT), ΔCt = Ct(SB86,SB13) − Ct(SB115,SB13)]. SB114 includes primer sites for both SB86 and SB115, so it mimics ligation with 100% efficiency.
Test of DNase Inactivation by Incubating Cells with Exogenous DNA
After DNase treatment, inactivation, and two PBS-RI washes (described above), cells were resuspended in 20 μL PBS-RI. 6 μL was removed and added to 1 μL DNase reaction buffer, 1 μL water, and 2 μL of a 775 bp PCR product (800 ng). As a control, 1 μL DNase I was added instead of 1 μL water. The reactions were incubated for 1 hour, after which 1 μL of stop solution was added. The cells were centrifuged for 10 minutes at 7,000xg. The supernatants were then heated to 70C for 10 minutes to inactivate DNase. 5 μL of each reaction was run on a gel.
Bulk Library Preparation
For preparation of bulk samples from fixed cells, 25 μL of cells was taken after PETRI-seq cell preparation and just prior to in situ RT. These cells were centrifuged and resuspended in 50 μL lysis buffer supplemented with 5 μL proteinase K. Cells were lysed at 55°C for 1 hour with shaking at 750 rpm (Multi-Therm). RNA was then purified from lysates with the Norgen Total RNA Purification Plus Kit (48300, Norgen Biotek, Ontario, Canada). 300 μL buffer RL was added to the lysate before proceeding to the total RNA purification protocol. Alternatively, the standard bulk RNA sample (shown in Figure S3B) was prepared by centrifuging a cell culture at 5525xg for 2 minutes at 4°C then resuspending cells in 1mL of PBS-RNAprotect (333 μL RNAprotect Bacteria Reagent [76506, Qiagen, Hilden, Germany], 666 μL PBS). Resuspended cells were then pelleted, and RNA was prepared with the Norgen Total RNA Purification Plus Kit according to the manufacturer’s instructions for Gram-negative bacteria.
Purified RNA from either protocol was treated with DNase I in a 50 μL reaction consisting of 2-5 μg RNA, 5 μL DNase Reaction Buffer, 5 μL DNase, and water. Reactions were incubated at room temperature for 30-40 minutes. Reactions were purified by adding 300 μL buffer RL and proceeding according to the Norgen total RNA purification protocol. Total RNA was depleted of rRNA using the Gram-Negative Ribo-Zero rRNA Removal Kit (MRZGN126, Illumina), purified by ethanol precipitation, and resuspended in 10 μL water. For RT, 6 μL RNA was combined with 4 μL Maxima H Minus 5x Buffer, 2 μL dNTPs, 0.5 μL SUPERase In RNase Inhibitor, 1 μL SB94, 0.5 μL Maxima H Minus Reverse Transcriptase, 4 μL betaine, and 2 μL water. The reaction was thermocycled as follows: 50°C for 10 minutes, 8°C for 12 seconds, 15°C for 45 seconds, 20°C for 45 seconds, 30°C for 30 seconds, 42°C for 6 minutes, 50°C for 16 minutes, 85°C 5 minutes, 4°C hold. For second strand synthesis, 14 μL water, 4 μL NEBNext Second Strand Synthesis Reaction Buffer, and 2 μL NEBNext Second Strand Synthesis Enzyme Mix were added directly to the RT mix. This reaction was incubated at 16°C for 2.5 hours. Double-stranded cDNA was purified with AMPure XP beads at a 1.8x ratio (~72 μL beads) and eluted in 30 μL water. Purified cDNA was used for tagmentation with the Nextera XT kit according to the manufacturer’s protocol. Bulk libraries were purified twice with AMPure XP beads at a 0.9x ratio. The resulting libraries were quantified and sequenced as described for PETRI-seq libraries above.
Growth Curves
Overnight cultures were grown as described above and then diluted 1:100 into 1 mL EZ Rich Defined Media with or without 20 nM aTc. Antibiotics were added for plasmid-containing strains. For each condition, 100 μL of diluted cells were aliquoted into 4 wells of a 96-well plate. The plate was incubated at 37°C with shaking on the plate reader (Synergy Mx, Biotek, Winooski, VT). OD600, GFP, and RFP were measured every 10 minutes.
Computational Methods
Barcode Demultiplexing, Cell Selection and Alignment
Cutadapt [59] was used to trim low-quality read 1 and read 2 sequences with phred score below ten. Surviving read pairs of sufficient length were grouped based on their three barcode sequences using the cutadapt demultiplex feature. FASTQ files were first demultiplexed by barcode 1, requiring that matching sequences were anchored at the end of the read, overlapped at 8 positions (--overlap 8), and had no more than 1 mismatch relative to the barcode assignment (−e 0.2). For barcode 2 and then barcode 3, cutadapt was used to locate barcode sequences with the expected downstream linker, allowing no more than 2 mismatches (−e 0.2 --overlap 20/21). The final output after demultiplexing was a set of read 1 and read 2 FASTQ files where each file corresponded to a three-barcode combination (BC). The “knee” method [15] was used to identify BCs for further processing. Briefly, each BC was sorted by descending total number of reads, and then the cumulative fraction of reads for each BC was plotted. Because the yield per BC could be better assessed later after collapsing reads to UMIs, an inclusive threshold was used at this stage to select BCs for downstream processing, which allowed for more precise cell selection after downstream processing (Figure S4C). For selected BCs, umi_tools [60] was used in paired-end mode to extract the seven base UMI sequence from the beginning of read 1. Cutadapt was then used to trim and discard read 2 sequences containing barcode 1 or the linker sequence. Note that at this point all necessary information was contained in the read 2 FASTQ files, so further processing did not consider the read 1 files. Next, cDNA sequences were aligned to reference genomes using the backtrack algorithm in the Burrows-Wheeler Alignment tool, bwa [61], allowing a maximum edit distance of 1 for assigned alignments.
Annotating Features and Grouping PCR Duplicates by Shared UMI
FeatureCounts [62] was used to annotate operons based on the alignment position. Operon sequences were obtained from RegulonDB [63] and ProOpDB [64] for E. coli and S. aureus, respectively. Because featureCounts uses an “XT” sam file tag for annotation, the bwa “XT” tag was first removed from all sam files using a python script. The resulting bam files after featureCounts were used as input for the group function of umi_tools with the “--per-gene” option in directional mode [60]. The directional algorithm is a network-based method that identifies clusters of connected UMI sequences to group as single UMIs. The result was a set of bam files with UMI sequences corrected based on probable errors from sequencing or amplification. A python script was used to collapse reads to UMIs. Reads with the same BC, error corrected UMI, and operon assignment were grouped into a single count. Reads mapping to multiple optimal positions were omitted except rRNA alignments for which multiple alignments were expected. The distribution of number of reads per UMI for all UMI-BC-operon combinations was plotted to establish a threshold below which UMIs were excluded (Figure S4C). Filtered UMIs were used to generate an operon by BC count matrix. Anti-sense transcripts were removed. BCs with fewer than 40 total UMIs were then removed (Figure S4E).
Bulk Sequencing Libraries
For bulk sequencing libraries, only read 2 was used for alignment in order to mimic single-cell methods. Bulk sequencing libraries were pre-processed to remove adapters using cutadapt [59]. Trimmomatic [65] was then used to remove leading or trailing bases below quality phred33 quality 3 and discard reads shorter than 14 bases. Surviving reads were aligned using the backtrack algorithm in bwa [61] with a maximum edit distance of 1. Reads with more than one optimal alignment position were removed. FeatureCounts [62] was used to generate a matrix of operon counts for the bulk libraries. To compare single-cell libraries generated by PETRI-seq to bulk samples, the UMI counts for a given set of BCs (e.g. GFP-expressing E. coli) were summed for all operons. A count matrix was then generated as described for bulk libraries. To calculate TPM, raw counts were divided by the length of the operon in kilobases. Then, each length-adjusted count was divided by the sum of all adjusted counts divided by 1 million.
Calculating Multiplet Frequency
The multiplet frequency was defined as the fraction of non-empty BCs corresponding to more than one cell. To calculate the predicted multiplet frequency, the proportion of predicted BCs with 0 cells was calculated based on a Poisson process: , the proportion of BCs with 1 cell was calculated: , and the proportion with greater than 1 cell was calculated: P(≥ 2) = 1 − P(1) − P(0). Finally, the multiplet frequency was calculated: was the fraction of cells relative to total possible BCs – for example, . The experimental multiplet frequency was computed from the species-mixing experiment as described for populations with unequal representation of two species [35].
Principal Component Analysis (PCA)
rRNA and all plasmid genes (RFP, GFP, AmpR, KanR, tetR) were first removed from the count matrix. Operons with 5 or fewer total counts in the library were also removed (except for Figure S7 in which all operons with >0 counts were included). Cells with fewer than 15 mRNAs were removed. Total operon counts for each cell were normalized by dividing each count by the total number of counts for that cell then multiplying the resulting value by the geometric mean [20] of the total mRNA counts for each cell. The scaled values were then log transformed after adding a pseudocount to each. For each operon, expression values were scaled to z-scores [66]. Principal components were computed using scikit-learn in python.
To normalize counts using sctransform in Seurat [43], first rRNA and all plasmid genes were removed from the count matrix. Operons with 10 or fewer total counts, and cells with fewer than 15 mRNAs were also removed. A Seurat object was created in R from the resulting matrix, and sctransform was applied. The resulting scaled counts were used as input for PCA.
Computing Moving Averages of Gene Expression Along PC1
Using a custom Python script, the cells in the normalized, log-transformed, z-scored gene matrix were sorted by PC1. The rolling function in the pandas package was then used to compute rolling averages of the size indicated for each figure. Win_type was set to “None”. The corresponding PC1 coordinate was the moving average of the PC1 values. Moving averages for GO terms were computed as described, except the z-scored sum of z-scored counts for all operons in the GO term was used to calculate the moving average instead of expression from a single operon. In cases where multiple genes from the same operon were included in a GO term, only one gene was included. Significance of expression trends was determined by the Spearman rank correlation between the operon or GO term expression and PC1, prior to calculating a moving average. FDR was determined by the Benjamini-Hochberg procedure [67].
Computing Operon Noise
Noise was defined as σ2/μ2, where σ is standard deviation and μ is mean. Noise and mean were calculated for all operons with at least 5 raw counts (UMIs) in the dataset (either S. aureus or E. coli). Count matrices were normalized by cell (but not log-transformed) before computing noise and mean. To calculate a p-value for the divergence of SAUSA300_1933-1925, a line was fit to the log-scaled noise vs log-scaled mean of the data. The residuals of the experimental data to the best-fit line were calculated and z-scored. The p-value was determined based on a normal distribution of the z-scored residuals.
Authors’ contributions
SB, WJ, and ST conceived the study. SB, WJ performed experiments and data analysis. PO assisted with computational analysis. SB, WJ, and ST wrote the paper.
Supplementary Materials
Table S3: 96-Well Oligonucleotides Used for PETRI-Seq Barcoding (Separate File)
Sequences of 96 round 1 RT primers, 96 round 2 ligation primers, and 96 round 3 ligation primers.
Acknowledgements
We thank the Tavazoie laboratory for helpful discussions and comments on early drafts of the manuscript. ST is supported by award 5R01AI077562 from NIH. SB is supported by NSF award DGE - 1644869. WJ is supported by a fellowship from the Jane Coffin Childs Fund.