Abstract
Subcellular RNA localization regulates spatially polarized cellular processes, but unbiased investigation of its control in vivo remains challenging. Current hybridization-based methods cannot differentiate small regulatory variants, while in situ sequencing is limited by short reads. We solved these problems using a bidirectional sequencing chemistry to efficiently image transcript-specific barcode in situ, which are then extracted and assembled into longer reads using NGS. In the Drosophila retina, genes regulating eye development and cytoskeletal organization were enriched compared to methods using extracted RNA. We therefore named our method In Situ Transcriptome Accessibility sequencing (INSTA-seq). Sequencing reads terminated near 3’ UTR cis-motifs (e.g. Zip48C, stau), revealing RNA-protein interactions. Additionally, Act5C polyadenylation isoforms retaining zipcode motifs were selectively localized to the optical stalk, consistent with their biology. Our platform provides a powerful way to visualize any RNA variants or protein interactions in situ to study their regulation in animal development.
Differential hybridization and cloning of spatially separated mRNAs obtained from distinct dissected regions of Xenopus oocytes were reported thirty years ago (1–4). It revealed the role of asymmetrically localized mRNAs and 3’ Un-Translated Regions (UTRs) in development, and similar observations were made in other systems (5). Despite advances in molecular methods (6) unbiased investigation of mRNA localization, 3’UTR isoforms, and RNA-binding proteins (RBPs) in situ remains challenging.
In Drosophila embryos a majority of mRNAs examined by fluorescent in situ hybridization (FISH) have been shown to exhibit distinct subcellular localization patterns (7). During development of the Drosophila retina, epithelial cells organize into a precisely ordered structure composed of approximately 800 hexagonal facets known as ommatidia. Each ommatidium consists of several distinct cell types including a core of eight photoreceptor cells capped by four cone cells and surrounded by two large primary (1°) pigment cells, along with 2° and 3° pigment cells and mechanosensory bristle cells in the perimeter (Fig. 1a) (8–10). The emergence of the precise cellular arrangement in ommatidia has been used to investigate basic principles involved in pattern formation and morphogenesis (11), making it an ideal system for studying RNA localization associated with cellular geometry and tissue patterning.
Multiple methods can measure the transcript abundance in situ in situ (12–22). Yet, it remains difficult to spatially map regulatory elements and RBPs in a systematic and unbiased manner. One approach to mapping regulatory motifs, nucleotide variants, or mRNA isoforms is to synthesize and sequence cDNA directly in the tissue (12). Here we hypothesize that if cDNA termination events can be localized in situ it could even be possible to map nucleotide signatures from mRNA-RBP cross-linking (6) within intact cells.
To achieve this, several technical challenges must be addressed. Sequencing needs to be implemented in such a way to identify both cDNA synthesis start and termination sites. The start site is important in mapping sequences adjacent to alternative polyadenylation sites. The stop site is important to identify regulatory events causing cDNA termination. Paired-end reads give the total cDNA length and hence an indication if its length has been truncated. Full-length coverage of the cDNA fragment enables detection of internal nucleotide substitutions or deletions associated with RNAprotein cross-linking (23).
Here we report a novel bidirectional sequencing chemistry called PRICKLi (Paired-end Ribonucleotide-Inosine Cleaved K-mer Ligation) that sequences two bases simultaneously from both ends of the cDNA fragment. It allows for paired-end in situ reads in one imaging cycle and reduces the number of 12-base barcode imaging cycles to six. To generate full-length cDNA reads, we decoupled imaging from sequencing by reading transcript-associated barcodes in situ, followed by cDNA amplicon extraction and Next-Generation Sequencing (NGS). The shared barcode sequences are then used to map assembled NGS reads to a reference tissue atlas bearing barcode coordinates (14, 22).
Cumulatively, these advances enable spatial mapping of major regulatory sequences, factors, and RNA-protein interactions involved in subcellular mRNA localization, especially in 3’ UTRs with alternative cleavage or polyadenylation (APA) isoforms. We refer to this method as in situ transcriptome accessibility sequencing (INSTA-seq).
Results
Paired-end in situ sequencing and decoupling of read length and imaging
Unlike sequencing-by-synthesis (SBS) that extends the sequencing primer in one direction (5’ to 3’), PRICKLi interrogates both ends of the sequencing primer simultaneously using sequencing-by-ligation (SBL) (Fig. 1b). Fluorescently labeled probes containing mixed bases (K: G/T, R: A/G) along with a dark probe (C) utilize the co-localization pattern of two colors from each amplicon to make a base call (Fig. 1c). To cleave the terminal fluorophore, reverse PRICKLi probe incorporates ribonucleotides (rK, rR, or rC) that are nicked by RNase H2 (Fig. 1d) at the 5’ side (24), while the forward PRICKLi probe incorporates inosine for Endonuclease V-dependent cleavage (25). Because the cleavage position of Endonuclease V is variable, phosphorothioate modifications are used to block undesired cleavage sites (Fig. 1e) (26).
After testing the efficiency of ligation and cleavage on cultured cells and beads (Fig. S1), we sequenced cDNA amplicons in the Drosophila retina (Fig. 1c), demonstrating complete cleavage of the terminal fluorophores by RNase H2 or Endonuclease V after 40 or 10 minutes, respectively, prior to the next ligation cycle (Fig. 1f-h). By cycling through ligation, imaging, and cleavage, we sequenced twelve bases using four fluorescent oligonucleotides in six imaging cycles (Fig. 1i).
We then tested the approach in the Drosophila retina. Dissected Drosophila retina specimens were mounted whole (3-6 per well; 9 wells) onto a cover glass (Fig. 2a). After fixation, genomic DNA digestion (Fig. S2), RT, circularization, and rolling circle amplification (RCA), we sequenced 12 bases from cDNA amplicons on a confocal microscope (40 × magnification, 9-12 tiles per retina, 90-200 z-stacks with a 500-nm step size) (Fig. 2a). Because PRICKLi uses dual color base calling, individual amplicons were segmented and registered in 3D across four channels to account for optical aberration, displacement, or variable ligation. Subsequently, base calls from each amplicon were assigned by clustering (Fig. 2b-c). For segmentation we developed a set of wavelet filters (Fig. 2d) that extract the energy from fluorescent spots with a defined period (e.g. single amplicon size) to suppress background noise (Fig. S2). Because cytosine-cytosine base calls are inferred by the disappearance and reappearance of fluorescence across imaging cycles, such calls are only made after 3D registration (Fig. 2d). Non-rigid 3D point set registration by coherent point drift enables (Fig. 2e) tracking of amplicons across cycles despite optical aberrations or sample displacement. The end result is a full 3D reconstruction of transcript-associated barcode reads in the retina whole-mount (Fig. 2f).
Previously, we observed 10-40% of cDNA reads were generated from mRNA when using random hexamer for RT (12). In contrast, we observe over 99.6% rRNA in retina from third instar larva (Fig. 2g). By using a short NNV-oligo(dT)10 primer instead, we enriched mRNA reads by 155-fold (61%), in addition to long (2%) and short non-coding RNAs (6.5%) (Fig. 2g). In total we detected 820 unique protein-coding genes.
Circular cDNAs generate paired sequences corresponding to reverse transcription (RT) termination and start sites. To characterize internal errors caused by cross-linked nucleotides (23), we used NGS to generate longer cDNA reads. Briefly, PRICKLi-sequenced tissues were lysed, followed by hybridization-based enrichment of cDNA amplicons. After second strand synthesis, multiple displacement amplification (MDA), tagmentation, and tissue multiplexing, we generated paired-end reads on MiSeq (Fig. 2h). Compared to DROPseq of the Drosophila eye disc (Fig. 2i) (27), INSTA-seq enriched for genes involved in neurogenesis, axon development, and cytoskeleton organization (Table S1–2). Interestingly, INSTA-seq detected relatively few ribosomal subunit transcripts, although they comprised the largest fraction in the DROP-seq data (Fig. 2j). The reason for this is not clear but suggests that certain biologically relevant transcripts are more accessible by our method. The number of NGS reads per amplicon (443,304 UMIs in total) followed a geometric distribution with no apparent saturation (Fig. 2k), suggesting that deeper sequencing could recapture more in situ amplicons. Since the number of amplicons determines the frequency of non-unique barcodes (‘collision rate’), both tissue size and amplicon density are important considerations. Although each additional PRICKLi imaging rounds can decrease the collision rate by 28% per imaging round (95% CI: 13-63%) (Fig. 2l), the utility of longer UMIs needs to be weighed against increased imaging time. In the end, we settled on a collision rate of 0.62%, which is compatible with six rounds of 3D imaging across Drosophila whole-mount retinas in a single day.
INSTA-seq for unbiased profiling of cis-acting RNA localization elements
To identify putative 3’ UTR cis motifs, we asked whether synthesized cDNA synthesis terminated pre-maturely adjacent to known cis motifs. Indeed, we found that transcript body coverage was skewed towards the end of 3’ UTRs, suggesting incomplete RT (Fig. 2m) with the cDNA size ranging from 8-nt to 4750-nt; however, most amplicons were <100-nt (mean: 86-nt, s.d.: 229-nt) (Fig. 2n). In fact, only 1.74% of cDNAs reached the coding region, while 15% terminated abruptly in the 3’ UTR with a terminal single-nucleotide mismatch (Fig. 3a). Single-nucleotide mismatches were more frequent at the 3’ end compared to the 5’ end of the cDNA (χ2 274.96, p-value < 10−4) (Fig. 3b), and internal mismatches were rare (< 0.06%) with the majority being deletions rather than substitutions. Most mismatches near the 3’ end were due to a single nucleotide (66.22%), and no more than eight nucleotides long (1.73%) (Fig. 3c). Unlike cytosine-tailing by M-MuLV reverse transcriptase (28), the incorporation of cytosine or homopolymer tailing was infrequent, while consecutive mismatches reflected the AU-rich elements in the 3’UTR (Fig. 3c). These results suggest that short cDNA fragment are truncated cD-NAs rather than cDNAs synthesized from fragmented RNAs.
Since RT terminations result when RNA is cross-linked to RBP, we mapped the location of 52 Drosophila RBP recognition motifs (29) defined by a Position Specific Scoring Matrix (PSSM) score (30). By clustering reads using the probability of an RBP motif within the 80-nt window from the RT termination site, we observed a subset of reads (15.1%) with an increase in the RBP motif probability (Fig. 3d). We named cDNAs termination sites with an RBP recognition motif as DORM (Downstream-Of-RBP-Motif). Out of 820 detected genes, we found 19 genes in the DORM cluster, including four ribosomal proteins (RpL28, RpS16, RpS19a, RpS3A), four cytoskeleton-associated proteins (pod1, zip, Act5C, Dhc16F), three proteins related to the Hippo pathway (git, CG45050, GC42788), glec (cell adhesion), stau (3’ UTR binding, A/P-axis specification), and Zip48C (solute carrier) (Table S2). For each gene, we calculated the ratio of cDNA reads terminating at or passing through predicted RBP motifs and ranked them by the frequency of DORM reads (Fig. 3e). We found that DORM cDNAs exhibited a shorter cDNA length (t = −5.25, p-value < 0.0001) than non-DORM reads, containing more single-nucleotide mismatches at the 3’end (OR: 2.16, CI: 1.73 - 2.68, p-value < 0.0001). These properties were seen at the population level as well as at the level of individual genes (Fig. 3h-i). To identify cis-acting motifs at the DORM site in an unbiased manner, we clustered RBP motif z-scores from DORM reads (Fig. 3j) to identify probable RBP motifs (e.g. msi, Sf1, Tra2, CG7804) (Fig. 3k), which were color-coded for spatial analysis (Fig. 3j).
INSTA-seq can detect alternative splicing and polyadenylation (APA) isoforms
Tissue-specific alternative splicing or polyadenylation (APA) of the 3’ UTR is wide-spread (31). In addition, APA serves to generate 3’UTR isoforms with an alternative set of cis-regulatory motifs affecting mRNA localization, degradation, or translation. However, spatially mapping APA isoforms has been challenging using in situ hybridization (ISH) due to a relatively small difference in their size, in addition to the abundance of repetitive or low-complexity sequences in the 3’UTR (32).
To detect APA events in the Drosophila retina, we examined cDNA sequences near annotated polyadenylation sites. The nucleotide composition of the sequence flanking the last aligned position (LAP) was consistent with that of previously reported polyadenylation sites (33, 34) with U-rich Cleavage stimulation factor (CstF)-binding sites downstream and an A-rich polyadenylation signal (PAS) upstream (Fig. 3l). The CstF protein is necessary for the cleavage and polyadenylation, as well as for 3’ end processing of mRNAs. These results indicated that our Drosophila dataset was suitable for mapping sites of cleavage and polyadenylation (CPA). Subsequently, we identified ten genes that had at least two APA isoforms detected in the retina (Act5C, Bacc, CG11360, Con, His4r, Nrg, Oda, ogre, Rtnl1, tws), four of which are known to undergo tissue- or stage-specific extension of the 3’ UTR (35, 36) (Fig. 3m).
To see if APA changes the composition of regulatory elements in shorter 3’UTRs compared to full-length 3’UTRs, we examined proximal and distal 3’UTR sequences flanking the CPA site. The median length for proximal 3’ UTRs was 385-nt (mean: 572-nt, s.d.: 441-nt) and 1,114-nt for distal 3’ UTRs (mean: 1,526-nt, s.d.: 1,177-nt). The median ratio of fragment sizes between proximal vs. distal 3’ UTRs was 46% (Fig. 3n), consistent with the hump observed across oligo(dT)-primed reads in the proximal 3’ UTR (Fig. 3a). Next, we performed motif analysis using k-mer-based differential enrichment for all proximal vs. distal 3’ UTRs pairs. The analysis revealed significant differential enrichment of C/A-rich sequences in the distal Act5C 3’ UTR (‘ACAC’, p-value < 0.01; ‘CACA’, p-value < 0.05) (Fig. 3o). The fraction of alternative isoforms ranged from 25% to over 95% across the ten genes, while Act5C expressed similar levels of short (Act5C-RA, 54%) and long isoforms (Act5C-RD, 46%) (Fig. 3p).
Act5C is one of six actin family genes in Drosophila, and its ortholog is the human cytoplasmic β-actin (ACTB). In Drosophila, the expression of Act5C is tissue- and developmental stage-specific (32, 37). The human ACTB mRNA is localized near the leading edge of migrating fibroblasts (38) as well as the growth cone (39), facilitating local translation and actin polymerization. The sub-cellular localization of the ACTB mRNA is mediated by the IGF2-mRNA binding protein (IMP1/ZBP1) (40, 41). IMPs are a family of homologous RBPs that are conserved from insects to mammals (42). Studies have examined IMP1-binding motifs using cross-linking and immunoprecipitation (CLIP), and such assays have identified C/A-rich sequences, as well as CA dinucleotide motifs recognized by the third KH domain of IMPs (43). The IMP1 Drosophila ortholog dIMP binds two A/C-ACA motifs separated by 0-30 nucleotides (44), which were detected in the distal Act5C 3’ UTR but not in the short Act5C isoform (Fig. 3q). Since dIMP knockout leads to multiple defects in cell migration and growth cone dynamics (44), it implies that APA could regulate Act5C mRNA localization through ZIP/IMP cis-acting motifs during neuronal development (45).
INSTA-seq maps transcripts, regulatory motifs, and APA tissue-wide with subcellular resolution
INSTA-seq is compatible with immunohistochemistry or fluorescent proteins, enabling one to examine localization patterns relative to the protein of interest. In our case, we imaged α-catenin-GFP (Fig. 4a) prior to in situ sequencing, and we subsequently applied a deep Conditional Adversarial Network (46) (Fig. S4) to automatically segment cell-cell boundaries of the three major cell-types that define the ommatidia morphology. This enabled us to assign individual cDNA amplicons within each individually segmented cell (Fig. 4b-c). Cell-cell boundaries in an individual ommatidia was then used to automatically register individual cDNA amplicons onto a subcellular geometric reference atlas coordinate system (Fig. 4d-e).
In total, 774 ommatidium boundaries were segmented, except near the optical stalk where the GFP signal was low. 573 ommatidia were segmented into 2292 cones cells, 1155 primary cells, and 3945 inter-ommatidia cells (IOC) (Fig. 4f). We used the GFP signal in the apical planes to assign planar polarity to each ommatidia as well as tissue-wide developmental axes (dorsal-ventral) and the equator (Fig. 4g). In addition, we used individual cell morphology to compute the first two principal components of the cell contour and generated a two-dimensional subcellular polarity coordinate (Fig. 4h). Multidimensional reduction of cell contours provides an unbiased way to extract average cell morphologies and clusters (Fig. 4i). We validated our computational framework by examining known cell type-specific markers. The adhesion proteins Roughest (Rst) and Hibris (Hbs) control cellular organization through cell sorting by differential adhesion (47). Hbs is preferentially expressed in primary pigment cells, while rst is preferentially expressed in IOCs. Preferential Hbs-Hbs interactions between 1° pigment cells, and Hbs-Rst interactions between 1° cells and IOCs contribute to sorting of 1° cells from IOCs. Although our pilot dataset did not have deep sequencing coverage (Fig. 4j), the relative expression levels of hsb and rst was consistent with the reported protein expression (OR: 0.33, CI: 0.13 - 0.80, p-value < 0.008; Fig. 4j). At the current sequencing depth statistically significant polarization of cis motifs or 3’ UTR-RBP interactions was not found (Fig. 4k); however, we expect more sequencing data to reveal asymmetrically localized regulatory sequences and interactions.
To define the expression patterns of the short and long Act5C isoforms, we first performed an unbiased point pattern analysis. We found that the proximal isoform aggregated at a spatial scale of 30 µm, while the distal isoform aggregated on larger spatial scales (Fig. 4l). At smaller spatial scales (< 50 µm) the two isoforms tended to co-clustered, while on larger spatial scales the two isoforms dispersed away from each other as shown by a negative cross Ripley’s L function (Fig. 4l), which measures if two point patterns are independent or clustered compared with complete spatial randomness. On closer examination this dispersion was due to higher aggregation of the distal isoform in the optic stalk of the retina (OR: 9.52, CI: 7.53-12.10, p-value < 0.0001), Fig. 4m), a region known for supporting retinal glial cells migrating from the optic stalk into the eye disc (48). This is consistent with the zipcode consensus motifs in the distal 3’UTR in RNA localization and subsequent local translation of actin supporting cell migration (49).
Discussion
Cell type- or context-dependent interactions between RNA, and proteins are essential to identifying functional genetic elements and deciphering their function and modes of regulation. To do so, we developed INSTA-seq for mapping mRNA-RBP interactions with sub-cellular resolution in situ. The INSTA-seq method is separated into three parts: 1) UMI sequencing in situ using PRICKLi for scalable imaging, 2) mated-pair NGS for longer reads and 3) spatial mapping of mRNAs with subcellular resolution. Notably, the sequencing chemistry we developed can be customized to a wide range of applications beyond INSTA-seq. Consistent with the previous observations obtained using FISSEQ (12), we discovered that INSTA-seq detects tissue- or pathway-specific gene clusters despite utilizing a minimal number of sequencing reads. It also contains cross-linking signatures for mapping 3’ UTR-RBP interactions across the tissue in situ. By placing sequencing reads into morphology-based reference bins, locally enriched 3’ UTR-RBP interactions can be identified with sub-cellular resolution with more reads. In addition to mRNA-protein binding, we found that alternative polyadenylation (APA) alters zipcode binding protein motifs in the Act5C 3’ UTR, revealing a functional consequence of tissue-specific APA (45). Finally, tissue- or pathway-specific transcripts are selectively amplified while abundant ribosomal protein mRNAs escape detection. We hypothesize that a subset of the transcriptome is more open to dynamic regulation, affecting their accessibility to reagents specific to INSTA-seq. We are now investigating RNA-protein interaction and RNA localization patterns across cell types and perturbations to identify the source of detection bias.
METHODS
PRICKLi probes
All oligos were synthesized by Integrated DNA Technologies, Inc. (IDT). Full design and ordering specifications are outlined in Table S3–6. Reverse set PRICKLi probes due to their single ribonuclotide were ordered as custom RNA oligos. All other oligonucleotides were ordered as custom DNA oligos.
Tissue dissection and mounting
Drosophila retinas expressing UAS-α-Catenin::GFP with the eye-specific GMR-GAL4 driver (50) for highlighting cell outlines were dissected at 26 to 28 hours after puparium formation (APF). Retinas were then fixed, washed in phosphate buffer, and mounted with the retina apical surface facing an acid-washed cover glass coated with Bio-Bond tissue adhesive (Ted Pella Inc.). Each cover glass contained 9 wells and several retinas were mounted per well. The retinas were partially dried on the cover glass to facilitate tight boding with the glass surface. Then, the coverslips were fitted with an adhesive silicon isolator (Grace Bio-Labs) to simplify subsequent liquid handling and tissue imaging.
Cell culture
Immortalized human astrocytes (51) were grown in Dulbecco’s Modified Eagle’s Medium (DMEM) plus 10% fetal bovine serum (FBS; Invitrogen) on fibronectin-coated (F1141, SigmaAldrich) plates. WA09 Fucci hESC (52) were cultured in mTeSR1 (85850, STEM-CELL Technologies) with with 1× RevitaCell ROCK inhibitor (A2644501, Thermo Fisher) and Matrigel Matrix (EMD Millipore).
In situ library preparation
Frozen retinas were thawed and washed in 1x PBS in DEPC H2O (750023, ThermoFisher). Post-fixation was done in 10% formalin (HT5011-1CS, Sigma-Aldrich) for 15 min followed by three washes in 1xPBS in DEPC H2O. Tissues were permeabilized in 0.3% TritonX-100 (93443, Sigma-Aldrich) in DEPC H2O with 0.5 U/µl of SUPERase (AM2696, ThermoFisher Inc) for one hour. After permeabilization the nuclear envelope was permeabilized with 0.1 HCl in DEPC H2O for 45 min followed by three washes in 1xPBS in DEPC H2O.
Genomic DNA (gDNA) was digested by 5 µl of TURBO DNase (AM1907, ThermoFisher) in 175 µl nuclease-free water with 20 µL of 10× Turbo DNA-free buffer (AM1907, ThermoFisher) incubated at 37°C for one hour. After gDNA digestion it is important to inactivate DNase I with by resuspending with DNase Inactivation Reagent (0.2 of the volume) into the reaction mix. Incubate for 30 min at room temperature on a shaker. The sample was rinsed with ice cold ddH2O several times. The inactivation reagent leaves white precipitate which was thoroughly removed by several washes with nuclease-free water before moving to the next step.
Next, primer hybridization was done with 2.5 µM of anchored oligodT10 INSTA-seq reverse transcription primer in DEPC-2xSSC with 0.5 U/µl of SUPERase (AM2696, ThermoFisher Inc) overnight at room temperature. In the morning the sample was placed on ice, and the reverse transcription buffer reaction mix without enzymes was prepared on ice [in DEPC-H2O, 1× M-MuLV RT buffer (P7040L, Enzymatics), 250 µM dNTP, 25 mM (N2050L, Enzymatics), 40 µM Aminoallyl-dUTP, 4 mM (83203, AnaSpec)]. The primer was aspirated on ice, and the sample was preincubated on ice with the RT reaction mix without any enzyme for 15 min. The sample was aspirated on ice, and the revere transcription reaction mix was added with enzymes (same as above but with 5 U/µl of M-MuLV [P7040L, Enzymatics], 0.5 U/µl of SUPERase). The sample was incubated for 15 min on ice, followed by 15 min at 25°C and then at 37°C overnight.
The following day the sample was gently aspirated and 20 µl of BS(PEG)9 (21582, ThermoFisher) dissolved in 980 µl of 1xPBS pH 8 (CHP-300, Boston BioProducts), which was then added to the sample to crosslink newly synthesized cDNA for one hour. BS(PEG)9 was then aspirated and quenched by incubating in 1 M Tris-HCl pH 8.0 (AM9855G, ThermoFisher) for 30 min. To remove RNA in the RNA:cDNA duplex the sample was incubated for one hour at 37°C with 125 mU/µl RNase H (Y9220F, Enzymatics) and 500 ng/µl RNase A (11579681001, Roche) in 1× Rnase H buffer (Y9220F, Enzymatics). After RNA removal the sample was thoroughly rinsed with nuclease-free water three times. The sample was placed on ice and preincubated with the CircLigase II reaction buffer on ice for 5 min (Nuclease-free H2O, 1× CircLigase buffer, 2.5 mM MnCl2, 0.5 M Betaine, [all included in kit from CL9025K, Epicentre]). After aspiration the CircLigase II reaction mix (same buffer composition as previous but with 1U/µl of CircLigase II enzyme, CL9025K, Epicentre) was added to the sample and incubated for two hours at 60°C. The sample was aspirated and prehybridized with 1 µM oligodT10 exonuclease protected RCA primer in 2xSSC with 30% (vol/vol) formamide (221198, SigmaAldrich) at 60°C for two hours. The sample was then aspirated and incubated first for 15 min at 60°C with 10% (vol/vol) formamide in 2xSSC, followed by just 2xSSC for another 15 min at 60°C. The sample was allowed to cool to room temperature and then placed on ice. The sample was then preincubated with the RCA reaction mix without enzyme on ice for 5 min followed by the RCA reaction mix with high concentration phi29 polymerase (1 φ29 buffer, 250 µM dNTPs, 40 µM Aminoallyl-dUTP, 1 U/µl high-concentration φ29 DNA polymerase, P7020-HCL, Enzymatics) incubated at 30°C for no less than 16 hours. After RCA the sample was gently aspirated, and BSPEG9 dissolved in 1xPBS pH 8 was used to fixate rolling circle products (RCPs) for one hour. This is followed by aspiration and quenching with 1 M Tris-HCl pH 8.0 for 30 min followed by wash in 1xPBS. After the fixation in situ cDNA library quality is ready to be assessed by hybridizing a fluorescent sequencing primer onto the RCPs by first heating up 2.5 µM of fluorescent sequencing primer in 5xSASC (0.75 M sodium acetate, 75 mM tri-sodium citrate, adjusted pH to 7.5 using acetic acid) to 80°C and adding the primer solution onto the sample. After cooling to room temperature for 15 min, several washes are performed with 5xSASC, and then the sample is ready to be imaged. The RCP density, roundness, brightness and overall sample quality are examined before in situ sequencing. Endogenous fluorescent proteins are imaged at this step to ensure best signal to noise before any ligation or cleaving of fluorescent sequencing probes is performed.
In situ sequencing
In SBL sequencing is partitioned into ligation cycles and hybridization rounds. In ligation cycles the sequencing primer is extended by ligation. In hybridization rounds a new primer (usually a recessed version of a primer from previous primer round) is hybridized and then extended for a predetermined set of ligation cycles until the entire extended primer is stripped away and a new round is started. PRICKli uses three primer rounds and two ligation cycles in each direction for each primer round. It is important that forward probes are ligated before reverse probes.
First, any remaining fluorescent sequencing primer is stripped away with stripping buffer 80% (vol/vol) formamide (221198, SigmaAldrich) in H2O and 0.01% (vol/vol) Triton X-100 pre-heated to 80°C and incubated for 5 min twice with three washes of constant flow of 10 ml 5xSASC afterwards. Sequencing primers (2.5 µM) are heated to 80°C in 5xSASC and directly applied onto the tissue to anneal at room temperature for 15 min followed by three washes in 5xSASC afterwards. Ligation of forward probes is then done with 2.5 µM of equimolar forward probes in 1× T4 DNA ligase buffer and 6 U/µl T4 DNA ligase (L6030-LC-L, Enzymatics) for minimum of 45 minutes followed by three washes of constant flow of 10 ml 5xSASC afterwards. Then reverse probes are ligated the same way followed by three washes of constant flow of 10 ml 5xSASC afterwards. The sample is then ready to be imaged for a first round. After imaging the probes are cleaved to enable a second ligation cycle.
Cleaving is done with 0.2 U/µl of Endonuclease V (m0305, NEB) and 0.2 U/µl of RNase H2 (M0288L, NEB) in 1× NEBuffer 4 (both enzymes in the same reaction). Fluorophores will be cleaved within 5 minutes but to minimize dephasing it is adviced to let the reaction run to completion of 45 min. Wash extensively with constant flow of 10 ml 5xSASC afterwards three times afterwards. The sample is now ready for a new ligation cycle.
After two ligation cycles on a sequencing primer the extended sequencing primer is cleaved and stripped away with stripping buffer 80% (vol/vol) formamide (221198, SigmaAldrich) in in H2O and 0.01% (vol/vol) Triton X-100 preheated to 80°C and incubated for 5 min two times with three washes of constant flow of 10 ml 5xSASC afterwards. A new set of sequencing primers (2.5 µM in 5xSASC) are then hybridized by heating primers to 80°C and directly apply them onto the tissue to anneal at room temperature for 15 min followed by three washes in 5xSASC afterwards.
Image acquisition
In situ sequencing was performed on a Nikon Ti-E Inverted Microscope with PFS3 Yokogawa CSU-X1 spinning disc confocal (Nikon). FOV pixel size of 1600 × 2048 px and a theoretical pixel resolution of 0.11 µm using a 40 × NA 1.1 CFI Lambda S Apo LWD 4 Water Immersion Objective Lens (MRD77410, Nikon) (9-12 tiles per retina, 90-200 z-stacks with a 500-nm step size). With LUNV NIDAQ lasers (488 nm, 561 nm, 594 nm, 640 nm) with 8900 Sedat Quad dichroic mirrors (425-477, 503-542, 571-628 and 661-728 nm).
In situ base calling
Raw TIFF image stacks were first corrected for chromatic shift (53) and then flat-field corrected and segmented using wavelet filters from WholeBrain R package (54). After segmentation of 3D connected components across z-stacks overlap between channels for individual amplicons was calculated using Manders co-occurrence (55) between all pixels in 3D for a single amplicon. The co-occurrence index was then weighted by the fluorescent intensity using probit regression and the final score was used to cluster individual amplicons into single base calls using K-means clustering in R.
In situ registration
At each imaging cycle all amplicon centroids in 3D detected in all channels were registered back onto previous imaging cycle using the Coherent Point Drift equation (56) implemented in C++. The resulting correspondence index vector was used to match back amplicons across imaging cycles. Amplicons that disappeared and reappeared across imaging cycles were designated as cytosine-cytosine base calls for the image cycle where they disappeared. Auto-correlation function across all amplicons and cycles was used to subtract any remaining dephasing and base calls were re-clustered by K-means clustering. The final set of base call reads for individual amplicons in addition to quality metrics of base calling based on cluster separation was written to paired-end R1 and R2 FASTQ files.
NGS library preparation
After in situ sequencing the amplicon quality was inspected visually, and any fluorescent probes were stripped and thoroughly washed away. The tissue quality was inspected under bright-field microscopy. Since BS(PEG)9 fixation makes the tissue hardened, a pair of small forceps (11255-20, Fine Science Tools) was used to loosen the tissue from the coverslip. Next, the tissue was placed into an Eppendorf tube with 20 µl of lysis buffer (200mM KOH, 40mM DTT, 5mM EDTA). The Eppendorf tube was placed in a 95°C heat block for 10-20 minutes until the tissue was determined to be completely lysed by visual inspection. For cell cultures the lysis buffer was added directly to the well and pipetted up and down, and the pipette tip was used to scratch away cells from the glass bottom. The lysis buffer and cells were added to the Eppendorf tube and placed on a heat block. The well plate was inspected under bright-field to make sure all cells were removed. After lysis the tube was vortexed, and 20 µl of neutralization buffer (400 mM HCL, 600 mM Tris-HCl pH7.5) was added to the tube, followed by 10-minute incubation on ice. When the neutralization buffer has been added, the entire 40 µl solution can be stored at −80°C until further processing.
Next, 250 µl Dynabead M-270 Streptavidin (65306, ThermoFisher Inc) was washed on a magnetic stand three times in 2xSSC. Biotinylated INSTA-seq pulldown oligo was thawed and diluted in 2xSSC (2.5 µM, 5 µl in 195 µl 2xSSC). Dynabead was placed on the magnetic stand for 5 min, and 2xSSC was aspirated on the stand. The pulldown oligo 2xSSC solution was added to the beads and placed on a shaker for 15 min. The tube was then placed on the magnetic stand for 5 min, and any 2xSSC was aspirated on the stand. Immediately, the 40 µl solution containing the lysed in situ amplicon tissue was added to the beads and filled to 200 µl with 2xSSC. The tube was placed on an 80°C heat block for 5 min and then was placed to cool down at room temperature for 15 min and then placed on ice. The tube was placed on the magnetic stand for 5 min, and the supernatant was aspirated on the magnetic stand and washed with 2xSSC two times on the magnetic stand.
Next, second strand synthesis was performed by removing any supernatant on the magnetic stand and by immediately adding the ice cold phi29 second strand synthesis reaction mix (176 µl of nuclease-free water, 20 µl phi29 10x buffer, 2 µl of 25 mM dNTPs, 2 µl of Phi29 DNA polymerase [100 U µl-1]) and placed in a 30°C incubator overnight.
The following day the tube was placed on the magnet for 5 min and the supernatant was aspirated and discarded. While on the magnet 10 µl of 2x annealing buffer (10 mM Tris pH8, 50 mM NaCl, 1 mM EDTA) was added to the beads and gently pipetted up and down several times. Exonuclease-resistant random hexamer primers 1 µl (Thermo SO181) was added into the reaction mix and the beads were filled up with nuclease-free water to 20 µl in total. The tube with the beads, exonuclease-resistant random hexamers and annealing buffer was placed in a 95°C heatblock for 5 min followed by cooling down on ice to 4°C degrees. While on ice the multiple displacement amplification (MDA) reaction mixture was made by supplementing the 20 µl bead solution with 5 µl of 10x phi29 DNA polymerase reaction buffer, 2 µl of 25 mM dNTPs, 21 µl of nuclease-free water, and 2 µl of phi29. The tube was placed in a 30°C incubator, and one microliter was withdrawn from the sample after three hours to check that amplification was successful by measuring the dsDNA concentration using Qubit dsDNA HS Assay Kit (Thermo Q33230) on a Qubit 4 Fluorometer (Thermo Q33238). If amplification was successful the sample was further amplified at 30°C for 10 hours followed by a 3-minute incubation at 65°C to inactivate the phi29 polymerase.
dsDNA concentration of the final sample was measured again using Qubit 4 Fluorometer, and the Nextera DNA FLEX kit (20018704, Illumina) libraries were prepared. Briefly, dsDNA was tagmented by bead-linked transposomes followed by clean up. Index adapters (FC-131-2002, Illumina) were added to multiplex different tissues or samples and tagmented DNA was amplified using five cycles of PCR according to the manufacturer’s instructions. The amplified libraries were double-sided bead purified, followed by quality control using 2100 Bioanalyzer with a High Sensitivity DNA kit (Agilent). The samples were pooled and diluted and prepared according to MiSeq kit specifications (either v3 or v2). We evaluated both 75-bp and 300-bp read length, and the benefit of 300-bp is seen for longer cDNA lengths greater than 85 nt.
Alignment
Reads containing the in situ sequencing adapter were split using removesmartbell.sh from BBmap (57). Any adapters were trimmed, and paired-end reads were then aligned to the Drosophila melanogaster genome version BDGP6 (Ensembl) using either BBMap (57), HISAT2 (58) or STAR (59). For comparison to published single-cell RNA-seq data (27), STAR was used as aligner. Reads aligning to the same position with identical forward and reverse barcode but of different read widths were binned and assigned as individual in situ amplicons. Individual in situ amplicons were assigned to genomic features using featureCounts (60).
DORM reads clustering
cDNA termination site was assigned to each read and 51 RNA binding protein motifs from the RBPmap full list iand dIMP (44) were mapped to the 3’ UTR for each gene using RBPmap (30). The bedtracks of z-score prediction summaries for each RBP recognition motif in a window of 80-nt around the cDNA termination site was used as input for hierarchical clustering. Euclidean dis-similarities were calculated for the resulting matrix and the function hclust with Ward method from R was used to hierarchical cluster the reads.
Segmentation of cell boundaries using alpha-catenin with generative adversial networks
One and a half retina were manually annotated with individual cell boundaries drawn based on alpha-catenin GFP signal. Regions of interests were manually drawn in ImageJ, binarized and saved as 256×256 tiles together with bandpassed filtered raw α-catenin-GFP signal as a training set to pytorch implementation of pix2pix (46).
Registration of ommatidia and amplicons to reference atlas ommatidium
Pixels of segmented cell boundaries from alpha-catenin::GFP were registered to pixels of a standardized reference atlas ommatidia using the Coherent Point Drift equation (56) implemented in C++. The resulting transformation field was then used to transform the location of individual in situ amplicons to the reference atlas. Before registration the planar polarity of each ommatidium was taken into consideration by simple reflection such that both dorsal and ventral ommatidia share the same chirality in the reference atlas.
Multidimensional reduction of cellular morphology
Cell boundaries were segmented based on previous described adversial network (46) on alpha-catenin-EGFP signal. Pixel coordinates for each individual contour was first reduced to 200 pixels with spsample using the sp library in R. Next, the first two principal components of each contour were computed and the intersection of the first principal component was used as zero-index of the contour. For primary pigments cells because of their convex morphology the first principal component was replaced by the medial axis. Both x and y coordinates were appended into a 400 dimensional matrix where each row represents an individual cell boundaries and the first 200 columns represents x-values from the contour and the last 200 columns represents y-values from the contour. Both x and y values were centered by the intersection of the first two principal components and then scaled by the standard deviation. The resulting 7242 x 400 array was then used as an input into the t-SNE algorithm using Rtsne package in R.
In vitro optimization
In vitro reactions of ligation and cleaving was optimized with fluorescent oligos using readout with electrophoresis of denatured ligated or cleaved product on 15% TBE-urea polyacrylamide gel and visualized by Typhoon FLA 7000 imager (GE Lifesciences).
Data analysis
All analysis and plotting was done in R (3.5.3) with statistical tests including Fisher’s exact test, chisquare goodness of fit, student t-test and linear regression. For spatial analysis Ripley’s L was calculated using the spat-stat R package.
ACKNOWLEDGEMENTS
We would like to acknowledge the following individuals for their help T. Chan (MSKCC) for providing cell lines, K. Lao (ThermoFisher) for SOLiD reagents, M. Hammell (CSHL) for computing, and K. Mir (XGenomes) for advice on SBL. We would also like to thank various members of Lee, Wigler, and Hicks labs for their help on MiSeq. This work was supported by NIGMS R35 RFA-GM-003, The V Foundation, STARR Cancer Consortium, CSHL-Northwell translational grant, and Pershing Square Scholarship to J.L. and by the National Institutes of Health NIGMS GM129151 to V.H.