Scalable and deep profiling of mRNA targets for individual microRNAs with chimeric eCLIP

Our expanding knowledge of the roles small regulatory RNAs play across numerous areas of biology, coupled with the promise of RNA-targeted therapies and small RNA-based medicines, create an urgent need for tools that can accurately identify and quantify small RNA:target interactions at scale. MicroRNAs (miRNA) are a major class of small RNAs in plants and animals. The experimental capture of miRNA:mRNA interactions by ligation into chimeric RNA fragments in chimeric CrossLinking and ImmunoPrecipitation (CLIP) provides a direct readout of miRNA targets by enabling profiling of miRNA targets with high-throughput sequencing. Despite the power of this approach, widespread adoption of chimeric CLIP has been slow due to both methodological technical complexity as well as limited recovery of chimeric molecules (particularly beyond the most abundant miRNAs). Here we describe chimeric eCLIP, in which we integrate a chimeric ligation step into AGO2 eCLIP to enable chimeric read recovery. We show that removal of the cumbersome polyacrylamide gel and nitrocellulose membrane transfer step common to CLIP techniques can be omitted for chimeric AGO2 eCLIP to create a simplified high throughput version of the assay that maintains high signal-to-noise. With the increased yield of recovered miRNA:mRNA interactions in no-gel chimeric eCLIP, we show that simple enrichment steps using either PCR or on-bead probe capture can be added to chimeric eCLIP in order to target and enrich libraries for chimeric reads specific to one or more miRNAs of interest in both cell lines and tissue samples, resulting in 30- to 175-fold increases in recovery of chimeric reads for miRNAs of interest. We further demonstrate that the same probe-capture approach can be used to recover miRNA interactions for a targeted gene of interest, revealing both distinct miRNA targeting as well as co-targeting by several miRNAs from the same seed family. RNA-seq analysis of gene expression following miRNA overexpression confirmed miRNA-mediated repression of chimeric eCLIP-identified targets and indicated that probe-enriched chimeric eCLIP can provide additional sensitivity to detect regulated targets among genes that either contain or lack computationally predicted miRNA target sites. Thus, we believe that chimeric eCLIP will be a useful tool for quantitative profiling of miRNA targets in varied sample types at scale, and for revealing a deeper picture for regulatory networks for specific miRNAs of biological interest. Highlights No-gel chimeric eCLIP improves recovery of miRNA:mRNA interactions by 70-fold Probe- and PCR-enrichment deeply profiles mRNA targets of miRNAs of interest Chimeric eCLIP targets experimentally identify non-computationally predicted interactions Increased depth recovers ∼6 million miRNA:target chimeras in HEK293T


Introduction
MicroRNAs (miRNAs) are small non-coding RNAs that regulate target genes via complementarity to messenger RNAs (mRNA), resulting in post-transcriptional repression of hundreds of mRNAs. Regulation via miRNA-mediated repression of gene expression has been shown to be involved in nearly every physiological system and misregulation of miRNA biology has been implicated in a broad spectrum of diseases ranging from cancer to neurodegenerative diseases (Quinlan et al., 2017;Rupaimoole and Slack, 2017). Many miRNAs also display tissue-, cell type-, or conditionspecific expression patterns and play key roles in the regulation of developmental programs (DeVeale et al., 2021;Manakov et al., 2009). Consequently, miRNAs have become attractive tools and targets for biomedical advancements. Currently several small molecules and antisense oligos that target miRNA biogenesis as well as miRNA mimics themselves are in clinical trials as candidate therapies for diseases such as non-small cell lung cancer, keloid, chronic hepatitis C, cutaneous T-cell lymphoma and Alport's syndrome (Rupaimoole and Slack, 2017;Zhu et al., 2020). Thus, the repertoire of miRNA targets is a key determinant of the biological role of a given miRNA (Ebert and Sharp, 2012). Active research and development in the area of RNA-targeted therapies creates a need for tools that can accurately profile miRNA:mRNA target interactions in different cell cultures and tissues at scale.
Generally, miRNAs exert their repressive regulatory function by guiding the RNA-induced silencing complex (RISC) to complementary target sites in the 3′ untranslated region (UTR) of target mRNAs resulting in mRNA degradation, translation inhibition, or sequestration (Bartel, 2018). Building upon this principle of sequence complementarity, various algorithms have been developed to predict miRNA:mRNA interactions throughout the transcriptome (Agarwal et al., 2015;Krek et al., 2005). Computational approaches typically focus on a small set of key features, including sequence complementarity particularly in nucleotides 2-8 (commonly referred to as the 'seed' region of the miRNA), and sequence conservation across species. However, many verified targets do not meet these standard criteria (Gebert and MacRae, 2019), and the reliance on conservation limits detection of species-specific interactions.
Experimental identification of direct miRNA interactions on a large scale has proven more challenging. Recent efforts have shown success at identifying candidate miRNA target sites in vitro using purified RISC complex pre-loaded with a miRNA of interest, followed by highthroughput RNA binding assays (Becker et al., 2019;McGeary et al., 2019). Although these studies allow unprecedented depth in exploring the binding kinetics of individual miRNAs, it is important to pair these approaches with methods to experimentally validate the presence of miRNA:target interactions in vivo. Such profiling of miRNA targets in cell culture or tissues typically relies on immunoprecipitation (IP) of Argonaute (AGO) protein components of the RISC, followed by converting associated RNA into libraries that can be subjected to high-throughput sequencing in order to quantify association. Early methods including RNA Immunoprecipitation (RIP) and CrossLinking and ImmunoPrecipitation (CLIP) of AGO proteins provided the first broad view of the miRNA interaction landscape, revealing principles of miRNA regulation mechanisms as well as an overview of mRNAs regulated by miRNAs (Chi et al., 2009;Hafner et al., 2010). Although these approaches do not explicitly identify the miRNA which recruits the RISC complex to identified AGO2 binding sites, further development of computational methods addressed this limitation by utilizing analysis of sequence and altered binding upon miRNA over-expression or knockdown to predict these specific miRNAs (Erhard et al., 2013;Majoros et al., 2013). Although the combination of AGO IP and computational analysis has the advantage of enabling prediction of interactions for all miRNAs in one experiment, there remains many situations where experimental mapping of direct miRNA:target interactions would be preferred.
To enable this direct experimental mapping of microRNA:target interactions, a suite of chimeric CLIP methods including CrossLinking And Sequencing of Hybrids (CLASH), modified iPAR-CLIP, and CLEAR-CLIP (Broughton et al., 2016;Grosswendt et al., 2014;Helwak et al., 2013;Moore et al., 2015) were developed. These methods use computational analysis coupled with experimental modification of the standard CLIP method to generate and identify miRNA:mRNA 'chimeric' reads which reflect ligation of a miRNA with the target RNA that the miRNA is bound to. This snap shot of in vivo miRNA:mRNA interactions provided a unique ability to characterize the principles of miRNA:target binding through both seed and auxiliary region base-pairing, and provided insight into how these rules can impact functional regulation. However, despite the obvious power of direct experimental identification of miRNA targets, widespread adoption of these methods remains limited, as both the poor recovery of AGO-crosslinked RNA as well as the low fraction of chimeric reads make it difficult to deeply profile the interactome of a miRNA of interest at a reasonable cost. We reasoned that the recent development of improved CLIP methods that increase the recovery of protein-bound RNA by more than a thousand-fold over prior CLIP methods (Lee et al., 2021;Van Nostrand et al., 2016;Zarnegar et al., 2016) represented an opportunity to expand the utility of chimeric CLIP approaches.
Here we describe chimeric eCLIP coupled with AGO2 immunoprecipitation, which builds upon the improvements we described in eCLIP by modifying the standard eCLIP assay with an added ligation step to encourage generation of chimeric reads from both cell lines (HEK293T) and tissues (mouse liver). Further, we find that in the case of chimeric AGO2 CLIP, omitting the SDS-PAGE and nitrocellulose membrane transfer steps dramatically increases recovery of miRNA chimeric reads and simplifies the workflow, and rigorously validate that this approach maintains high signal-to-noise ratio. With this increased recovery, we show that chimeric eCLIP can be combined with PCR-based or antisense oligonucleotide probe capture to enrich libraries for chimeric reads specific to one or more miRNAs or genes of interest, enabling deep profiling of miRNA regulatory networks of interest with a robust and simplified chimeric CLIP protocol.

Design
Here, we combine chimeric CLIP methods, where chimeric fragments directly link miRNA and target RNA transcript within the same sequencing read to unambiguously identify miRNA targets, with the methodological (library preparation) improvements in eCLIP and library capture/enrichment approaches to develop technologies that enable scalable and deep profiling of miRNA targets, particularly for individual miRNAs of interest.

Integration of chimeric ligation with eCLIP
Although chimeric ligation of small RNAs with their targets can occur at low frequency during standard CLIP (Broughton et al., 2016;Grosswendt et al., 2014), chimeric CLIP-seq approaches (including CLASH, CLEAR-CLIP, and modified iPAR-CLIP) have modified this approach by incorporating both a phosphorylation step (using 3' phosphatase minus T4 Polynucleotide Kinase) as well as an additional ligation step (without adapters) to encourage proximity-based ligation and increase the frequency of chimeric fragments to as much as 5.3% (Grosswendt et al., 2014;Helwak et al., 2013;Moore et al., 2015). However, the fact that the number of chimeras recovered per miRNA correlates well with miRNA abundance (Moore et al., 2015) coupled with the generally low efficiency of recovery of protein-crosslinked RNA with prior CLIP methods makes it challenging to obtain sufficient signal for individual miRNAs to perform traditional peak calling, particularly past the few most abundant miRNAs.
Thus, we set out to create a chimeric eCLIP method that combines the improved library preparation steps we developed in the enhanced CLIP (eCLIP) procedure (Van Nostrand et al., 2016) with these optimizations for recovery of miRNA:target chimeras. First, we incorporated the 3' phosphatase minus T4 PNK and no-adapter ligation steps into a standard AGO2 eCLIP experiment. Next, as recovery of chimeric fragments requires that the reverse transcription read through the protein:RNA crosslink site (rather than terminate at this adduct, as often occurs in CLIP (Konig et al., 2010)), we utilized an altered Mn 2+ buffer for reverse transcription that is common in RNA structure probing experiments (Siegfried et al., 2014) and that we previously showed to increase crosslink site readthrough in eCLIP (Van Nostrand et al., 2017b). Finally, as successful mapping of chimeric miRNA:target reads requires at least 40 nt total length (including both a 20-22nt miRNA and a sufficient target sequence to uniquely map), miRNA-only and other undesired smaller fragments could be further depleted from the final sequencing library by increasing the lower bound of the size selection performed at both the nitrocellulose membrane isolation and final library purification step (either by agarose gel purification or bead cleanup).

Increased recovery of AGO2-associated RNA with no-gel chimeric eCLIP
In addition to being experimentally intricate and difficult to scale or automate, the SDS-PAGE, nitrocellulose transfer, and RNA extraction steps are some of the major points of sample loss during CLIP protocols. Thus, developing approaches to transition away from these steps towards a 'no-gel' approach, while avoiding an increase in background (either in co-immunoprecipitated proteins, or co-purification of non-crosslinked RNA), has been a major point of emphasis in the CLIP field (Ilik et al., 2020). Use of denaturing washes (typically coupled with the use of HIS, biotin, HALO, or other peptide tags that have high-affinity interaction with bead or other support structures) is one such avenue, though these stringent washes often result in decreased yields (Helwak et al., 2013;Moore et al., 2015). An appealing alternative approach as described in the qCLASH method is simply to remove the SDS-PAGE and nitrocellulose transfer steps (Gay et al., 2018). However, this modification will lead to at least some inclusion of additional background signal from non-crosslinked RNA or protein-crosslinked RNA outside of the excised membrane region in with-gel experiments, and to what degree this alteration increases background, particularly among potential post-lysis interactions, was not fully explored. To explore this, we performed a variety of analyses to compare both signal and background observed in no-gel versus with-gel chimeric eCLIP in HEK293T cells, and observed that no-gel chimeric AGO2 eCLIP recovered both non-chimeric peaks as well as chimeric reads that had similar enrichment for 3' UTR and miRNA seed sequence motifs compared to with-gel chimeric eCLIP. Further, we observed an increased recovery of unique cDNA fragments (measured as a decrease in required PCR amplification), making no-gel chimeric eCLIP a robust and scalable way to generate miRNA interactome maps.

Enrichment of miRNA or gene of interest by PCR or probe-based capture
The first discovered microRNA (lin-4) was initially characterized for its key role in C. elegans development (Chalfie et al., 1981), and since then individual microRNAs have been shown to play critical roles in cancer (Peng and Croce, 2016), stem cell self-renewal and differentiation (Gangaraju and Lin, 2009), and obesity and insulin homeostasis (Ying et al., 2017) among numerous other diseases, as well as serve as potential therapeutics (Rupaimoole and Slack, 2017). Thus, deep profiling of regulatory maps for individual miRNAs is needed to study the molecular mechanisms of how miRNA regulation alters physiology and disease. However, as the recovery of chimeras per miRNA was still correlated to miRNA abundance, even with the increased number of unique (non-PCR duplicate) fragments in no-gel chimeric eCLIP the necessary sequencing to generate robust maps for a miRNA of interest would be cost-prohibitive for most miRNAs. Thus, a method to specifically enrich the sequenced library for individual miRNAs of interest would enable us to make full use of this improved yield by allowing deeper profiling for biologically relevant miRNAs in a particular cell line or tissue system under study. Further, enrichment for reads at least 40 nt in length (thus potentially containing both a miRNA of interest and a fragment of target RNA of mappable length (>18nt)) could provide further enrichment for chimeras specifically.
PCR-based enrichment has long been used for targeted genomic sequencing approaches (Tewhey et al., 2009), and has the advantage of experimental simplicity. The most straightforward such approach to enrich for desired miRNA-containing fragments is to simply utilize a two-step PCR process, with the first step utilizing a PCR primer within the miRNA of interest, followed by a second round of PCR to incorporate full sequencing adapters. We found that this could be easily adapted to targeted amplification of chimeric eCLIP libraries, and successfully enabled selective enrichment for miRNAs of interest while only requiring the purchase of a single additional PCR primer per miRNA. However, although PCR enrichment is an easy way to allow for in-depth profiling of miRNA targets, the nature of using PCR introduces limitations. First, many mammalian miRNAs have extremely high or low GC content, making design of primers with reasonable melting temperatures challenging. Next, because of the use of a primer targeted to the miRNA, the final reads reflect amplified products and have lost the original miRNA sequence present in the miRNA:mRNA chimeric molecule, leading to loss of information about differential targeting of miRNA family members (which often only contain a single nucleotide difference), and isomiRs or other variations which have been described to play key roles in the processing and function of microRNAs (Ameres and Zamore, 2013;Cloonan et al., 2011). Additionally, false-positive chimeras can be introduced due to the PCR primer annealing to similar sequences elsewhere in the transcriptome.
To address this limitation, we leveraged hybrid capture enrichment as an alternative that preserves the native miRNA and target sequence. Enrichment of desired regions by annealing biotin-or surface-attached antisense oligonucleotide probes followed by stringent washing was a key advance in the development of whole-exome and other targeted genome sequencing approaches (Hodges et al., 2007;Okou et al., 2007). We found that hybrid capture is adaptable to the eCLIP procedure at the post RNA adapter-ligation stage by annealing to commercially synthesized biotinylated DNA oligonucleotides followed by capture on standard streptavidin beads, light washes, and recovery of enriched RNA by DNase treatment. For miRNA-targeted enrichment, we initially designed probes to include multiple copies of the same miRNA to obtain higher yield, but found that concatamers of distinct miRNAs could also be used to enrich for multiple miRNAs simultaneously. Similarly, we designed probes antisense to transcript 3' UTR regions and observed that this could also enrich for chimeric miRNA:mRNA reads that deeply profile a gene of interest. These approaches allow an unparalleled ability to deeply map candidate miRNA interactions for either a miRNA or gene of interest.

Chimeric eCLIP recovers miRNA:mRNA chimeras
To confirm that chimeric eCLIP successfully recovers chimeric miRNA:target reads in a manner similar to prior CLASH approaches, we performed chimeric eCLIP on HEK293T cells using an AGO2 antibody previously validated for CLIP (Sternburg et al., 2018) and the additional 3' phosphatase minus T4 PNK and no-adapter ligation steps described above along with standard eCLIP immunoprecipitation, adapter ligation, SDS-PAGE electrophoresis, nitrocellulose membrane transfer and RNA isolation, reverse transcription, and PCR amplification (Fig. 1A) (see Methods). IP-western blotting indicated successful immunoprecipitation of AGO2 (Sup. Fig. 1A), and visualization using a biotin on the RNA adapter indicated pulldown of crosslinked RNA that resolved to the AGO2 size in high RNase conditions (Sup. Fig. 1B). To confirm that we successfully enriched for AGO2 interactions, we first performed standard (non-chimeric) CLIP analysis, including adapter trimming, repetitive element removal, genomic mapping, PCR duplicate removal, and peak calling (see Methods). In replicate experiments sequenced to 144 and 145 million reads respectively, an average of 57.6% of peaks significantly enriched in IP versus paired input were in 3' UTRs (11,282 out of 20,038 total and 10,502 out of 17,819 total in each replicate respectively), with another 17.4% (4,024 and 2,633 in two replicates, respectively) in coding sequence (CDS) (Fig. 1B). Although only an average of 2.9% of peaks overlapped microRNAs due to their limited number, weighting peaks by information content in IP versus input revealed that an average of 38.8% of total peak information was at microRNAs, confirming substantial enrichment (Fig. 1C). Our results indicate successful enrichment of both miRNAs and putative targets in 3' UTR and CDS regions with AGO2 eCLIP.
Next, we developed an analysis workflow to identify chimeric reads based on a previously published 'reverse mapping' strategy (Moore et al., 2015). Although we started with all annotated miRNAs in miRbase, as part of this process we removed 15 miRNAs from analysis (corresponding to an average of 35% of potential chimeras) because both the miRNA and chimeric fragments mapped to rRNA, suggesting they likely represent annotation errors rather than true miRNAs (Supplementary Table 1). Confirming the ability of eCLIP to improve the recovery of unique (non-PCR duplicate) fragments, for the two replicates we identified a total of 403,532 and 428,936 unique chimeric reads (2.5% and 2.5% of uniquely mapped deduplicated reads, or 0.28% and 0.30% of 144 and 145×10 6 initial sequenced reads respectively for the two replicates), including 132,801 and 146,966 3' UTR chimeras (Fig. 1D), a dramatic increase over non-chimeric eCLIP or prior chimeric CLIP approaches. We observed high correlation between miRNA-only and both miRNA:chimera reads and independent small RNA-seq (Sup. Fig. 1C-E), consistent with previous chimeric CLIP methods (Moore et al., 2015). Separating chimeras by miRNA, these experiments yielded ~7,000 to ~36,000 chimeric reads per miRNA for the top 10 identified miRNAs, rapidly declining to less than 1,000 chimeric reads for the 63 rd most abundant miRNA (Fig. 1E). Notably, the six miRNAs processed from the mir-17-92 polycistronic cluster (He et al., 2005) were the six miRNAs with the most chimeric reads: the most abundant miRNA was miR-17-5p, which has previously been shown to regulate cell cycle progression in 293T cells (Cloonan et al., 2008), while miR-20a-5p, miR-92a-5p, miR-19b-3p, miR-18a-5p, and miR-19a-3p were second through sixth most abundant. In the top 25 miRNAs were also three miRNAs from the mir-106b-25 cluster, of which two (miR-106b-5p and miR-93-5p) share seed sequence with miR-17-5p and a third (miR-25-3p) shares seed sequence with miR-92a-3p. miR-25 was also proposed to be involved in the regulation of cancer (Sarkozy et al., 2018), suggesting that the most abundantly recovered miRNAs likely reflect important functional regulatory RNAs.
To confirm whether chimeric reads likely reflect true miRNA targets, we considered a variety of properties. First, we utilized the CLIPper algorithm (Lovci et al., 2013) to find clusters of chimeric reads for the 20 miRNAs with the greatest number of chimeric reads, identifying 15,222 and 15,805 chimeric clusters in two replicates, respectively. Motif analysis indicated that for the top 20 miRNAs an average of 58.7% of clusters contained 6-mers complementary to the cognate miRNA 6-mer seed region in positions [2:7] (versus an average of 1.7% for non-seed 6-mers), and for 19 of the top 20 the miRNA seed was the most commonly observed 6-mer (Fig. 1F). With a more conservative threshold for clusters (CLIPper p≤10 -5 ) this was further increased to 68.4% of clusters containing the miRNA 6-mer seed (versus 3.5% for non-seed k-mers), albeit with a low number of clusters for some miRNAs (Fig. 1F). One exception, miR-4284, had chimeras that predominantly mapped to mitochondrial transcripts and lacked seed matches, suggesting that this may not reflect a bona fide microRNA in 293T cells. Next, location analysis of chimeric read alignments again indicated an enrichment for expected target regions except for miR-4284, with an average of 33% of chimeric reads mapped to 3' UTRs and additional 21% to CDS (Fig. 1G). Thus, these results suggest that the properties of chimeric reads obtained with chimeric eCLIP modifications are consistent with previous chimeric CLIP-seq approaches.

No-gel chimeric eCLIP increases depth of miRNA chimeras recovered
The standard eCLIP protocol that chimeric eCLIP is based on includes SDS-PAGE protein gel electrophoresis, Western blot-like nitrocellulose membrane transfer, and manual cutting of the membrane to isolate protein-crosslinked RNA. These steps are performed for two purposes: first, non-crosslinked RNA does not transfer to nitrocellulose and is thus removed (Grosswendt et al., 2014), and second, denaturation allows removal of RNA crosslinked to co-immunoprecipitated unwanted proteins of different sizes than the targeted protein. However, in addition to being complex for novice users and limiting scalability and automated handling, we and others have observed that this transfer and isolation step by itself drives a dramatic reduction in experimental yield, and multiple recent modified RBP immunoprecipitation protocols have been described which leave out this step (Gay et al., 2018;Ilik et al., 2020;Patton et al., 2020). As our experience with other RBPs suggested that co-immunoprecipitation artifacts were heavily protein-and antibody-dependent, we set out to rigorously test whether removing these steps altered the composition of chimeric eCLIP-reads. To do this, we tested a simplified protocol that removes the SDS-PAGE and membrane transfer steps and replaces it with a simple Proteinase K treatment to isolate the crosslinked RNA ("no-gel" variant of chimeric eCLIP, Fig. 2A). We observed that removal of the gel transfer steps required on average 6.3 fewer PCR cycles of amplification, suggesting a >70-fold increased experimental yield (Fig. 2B). Using previous estimates of conversion of eCT to non-PCR duplicate reads (Van Nostrand et al., 2020a), this suggests that no-gel AGO2 eCLIP recovers billions of unique RNA fragments.
To query whether the no-gel approach faithfully recapitulated with-gel targets, we first considered the frequency of miRNA-only (non-chimeric) and miRNA-containing (chimeric) reads. We observed a high correlation between with-gel and no-gel approaches for both miRNA-only (Pearson correlation 0.950, P.Value < 2.2•10 -16 ) (Sup. Fig. 2A) and miRNA-chimeric read counts (Pearson correlation 0.953, P.Value < 2.2•10 -16 ) (Fig. 2C). Additionally, the correlation between miRNA non-chimeric and chimeric read counts within an experiment was similar between withgel (R = 0.87 and R = 0.85, p < 6.1•10 -183 and < 9.2•10 -182 ) (Sup. Fig. 1B-C) and no-gel variants (Pearson R = 0.78 & R = 0.81, p < 3.8•10 -145 and < 3.9•10 -165 for replicate 1 and 2 respectively) (Sup. Fig. 2B-C)). These results suggest that the no-gel approach does not alter the observed pattern of miRNA enrichment. However, we did note that information at non-chimeric peaks overlapping miRNAs (as well as the frequency of miRNA-only reads generally) was higher in the no-gel experiments (Sup. Fig. 2D), and we replicated this observation even when explicitly sizeseparating large (>30nt) and small (<30) library fractions (Sup. Fig. 2E), suggesting that miRNAs might have lower efficiency of UV crosslinking to AGO2 than target regions due to their smaller size and location buried within AGO2, leading to decreased recovery in with-gel approaches that include denaturing steps that would remove non-crosslinked miRNAs.
Next, we considered candidate interactions, defined from both non-chimeric as well as chimeric analyses. Manual inspection suggested similar read density distributions of non-chimeric as well as chimeric eCLIP reads between with-gel and no-gel libraries at many significantly enriched peaks (Fig. 2D). Expanding this analysis transcriptome-wide among non-chimeric reads, we observed a high correlation in IP versus input enrichment between with-gel and no-gel libraries considering either all 250,339 clusters identified by CLIPper in the no-gel libraries (Pearson correlation 0.72, P.Value < 2.2•10 -16 ) (Fig. 2E) or all 353,280 clusters identified in the with-gel libraries (Pearson correlation 0.80, P.Value < 2.2•10 -16 ) (Sup. Fig. 2F), suggesting that no-gel chimeric eCLIP generally recapitulates with-gel enrichments (with clusters at miRNA loci showing higher enrichment due to the increased frequency of miRNA-only reads noted above). The enrichment for 3' UTR and CDS regions and increase in frequency of seed matching k-mers was also preserved between with-gel and no-gel chimeric reads. Analysis of clusters identified from no-gel chimeric reads for the most abundant 20 miRNAs showed an average of 60.0% containing the miRNA seed match relative to an average of 2.5% for non-seed 6-mers. The miRNA position 2-7 seed match was the most commonly found 6-mer for 16 of the 20 miRNAs ( Fig. 2F), and the percent of clusters containing the seed match was consistent between with-and no-gel experiments (Sup. Fig. 2G). Similarly, we observed high frequencies of 3' UTR coverage for nonchimeric peaks (33.6% and 36.5% for replicates 1 and 2 respectively) (Sup. Fig. 2H), chimeric reads (18.8% and 21.1%) (Fig. 2G), and clusters called from all chimeric reads (31.5% and 34.1%) (Fig. 2H, Sup. Fig. 2I) in no-gel libraries as with-gel libraries. Similar results were seen considering no-gel chimeric reads for the most abundant 20 miRNAs individually (Sup. Fig. 2J). These results indicate that no-gel chimeric eCLIP is maintaining robust recovery of miRNA targets.
One distinction between the no-gel and with-gel experiments was increased signal in non-mRNA regions, particularly non-coding transcripts (where the percent of chimeric reads doubled from 15.4% and 14.8% to 31.7% and 29.0%) (Fig. 2G). The effect on the number of chimeric clusters was far less dramatic, with only slight increases for non-coding transcripts (average 4.7% in withgel to 7.4% in no-gel) and intronic (average 16.4% of with-gel to 19.3% of no-gel clusters) ( Fig.  2H and Sup. Fig. 2I). Although previous AGO2 CLIP and chimeric CLIP studies have sometimes yielded a significant number of peaks in introns or linked to rRNA, tRNA, or other non-coding RNAs (Chu et al., 2021;Helwak et al., 2013), the particular emergence of non-coding signal only in no-gel experiments here suggests that it likely represents false positive signal. Previous CLASH studies utilizing spike-ins of E. coli RNA and Drosophila S2 cells (Moore et al., 2015) or yeast lysate (Helwak et al., 2013) into human cell lysate estimated a low (1-5%) rate of interactions formed in vitro after cell lysis. However, recent work performing in vitro RNA binding assays using pre-formed miRNA-loaded Argonaute complexes has shown that these complexes readily bind RNA with similar targeting principles as in vivo-identified miRNA targets (Becker et al., 2019;McGeary et al., 2019), suggesting that it would not be surprising that such post-lysis interactions could occur.
To query this we performed chimeric eCLIP upon mixing of human (H; HEK293T) and rat (R; C6) cell lysates (Fig. 2I). Although the sequence similarity between human and rat means that only a subset of chimeric reads can be uniquely resolved, the cross-reactivity of the AGO2 antibody for successful IP of AGO2 protein in rat (Sup. Fig. 2K-L) enabled us to perform chimeric eCLIP in both species simultaneously, validating that both samples were optimally fragmented for successful ligation and incorporation into chimeric reads that had similar enrichment for 3' UTR regions (with a higher intergenic fraction likely driven by less accurate bioinformatic removal of rRNA and other repetitive elements as well as poorer annotation of non-coding transcripts) (Sup. Fig. 2M-N). Next, using 3 paired human-versus rat-only with-gel chimeric eCLIP samples we identified 108 human-enriched and 28 rat-enriched miRNAs with at least 10-fold differential expression (Sup. Fig. 2O). Independent no-gel experiments showed similar sample-specific expression for these miRNAs, whereas intermediate miRNA frequencies were observed in mixed lysate samples (Sup. Fig. 2O). Next, we isolated species-specific chimeric fragments by discarding chimeras that mapped equally well to human and rat, identifying thousands which were chimeric for a human-specific microRNA and were uniquely aligned to either the human or rat transcriptomes (including exons and introns for all protein-coding genes) (Sup. Fig. 2P). We observed that in with-gel experiments, an average of 8.6% of human miRNA chimeras aligned to rat in human-rat mixtures, whereas only 1.2% were observed in human-only experiments (reflecting the baseline mapping error rate) (Fig. 2J, Sup. Fig. 2P). This rate was slightly increased in no-gel human-rat mixture experiments, where 12.1% of human miRNA chimeric regions aligned to rat (Fig. 2J). Similar rates were observed if only 3' UTR-mapping chimeras were considered (Sup. Fig. 2Q). To query whether this was driven by post-lysis interactions of miR:AGO2 complexes or simply cross-ligation of nearby RNAs due to crowding of the beads, we compared the false-positive rate in high (≥10 million cells per 100 uL of beads) and low (≤1.5 million cells per 100 uL of beads), and observed a trend towards decreased rates with increased dilution, with diluted ligation decreasing the false-positive rate from 10.7% to 7.5% in with-gel and 15.3% to 10.0% in no-gel (Fig. 2K). Although this did not reach significance, we utilized this lower ratio for the experiments described above and all probe capture experiments below as it suggested that some fraction of these artifacts might be due to on-bead crowding. As the half of potential postlysis interactions that pair a human miRNA with human target sequence are undetectable by these lysate mixing approaches, these results suggest that a substantial rate of false positives are present even in standard with-gel chimeric approaches and are moderately increased in the nogel approach. This may explain the significant number of rRNA and other non-coding RNA chimeras often seen for many miRNAs in our and prior chimeric CLIP studies (Helwak et al., 2013), and provides further evidence that (as with any immunoprecipitation-based protocol) it is important to validate chimeric CLIP-based miRNA interactions with orthogonal approaches (see further discussion in transfection section below).

Targeted enrichment of miRNA chimeras by PCR
As noted above, the number of per-miRNA chimeric reads in total chimeric eCLIP correlates with non-chimeric miRNA abundance (Sup Fig. 1C-D, Sup Fig. 2B-C), leading to low coverage beyond the first few miRNAs (Fig. 1E). As standard sequencing depth of with-gel chimeric eCLIP often saturated sequencing of non-PCR duplicate reads, there would be no benefit to deeper sequencing or miRNA-targeted enrichment. However, the ~70-fold improved yield in the no-gel variant ( Fig. 2B) suggested that deeper profiling of individual miRNA interactomes might now be achievable. To apply chimeric-eCLIP to directly profile individual miRNAs that were not adequately captured using total chimeric using targeted PCR amplification, we altered the standard PCR amplification in chimeric eCLIP to a two-step approach which first utilizes one universal primer and one primer targeting the miRNA of interest, followed by a second PCR with multiplexing sequencing primers (Fig. 3A). By requiring that this PCR yields at least a 40 nt insert (reflecting miRNA plus 18 nt of additional sequence), this approach allows enrichment not only of a miRNA of interest but also provides selection of chimeric rather than miRNA-or mRNA-only fragments.
To test this approach, we performed PCR-enriched miRNA targeted no-gel chimeric-eCLIP on the most abundant 94 miRNAs in chimeric eCLIP in HEK293T cells, paired with a separate standard (non-selected) with-gel chimeric eCLIP. Sequencing to an average of 7.8 million reads per miRNA library, we obtained at least 100,000 chimeras for 89 of the 94 targeted miRNAs; extrapolating sequencing the non-enriched library to similar total depth (~736 million reads) would yield 100,000 chimeras for only the most abundant 32 (Fig. 3B). Moreover, if one were interested in an individual miRNA, this approach increased the fraction of uniquely mapped chimeric reads from an average of 0.02% to 4.0% (a 175-fold increase) (Sup. Fig. 3A), which could likely be further improved by optimization of PCR amplification conditions. This leads to an increase in identified clusters (Sup. Fig. 3B), and a dramatic drop in required sequencing, as for example miR-25-3p went from 12,045 chimeric reads (out of 50.4 million) in total chimeric eCLIP to 253,151 (out of 9.0 million) in PCR-enriched chimeric eCLIP. For miR-25-3p, cluster identification on these chimeric reads yielded 2,877 significant clusters (CLIPper p ≤ 10 -5 ). These clusters showed high enrichment for 3' UTR (30.4%) and CDS (34.8%) regions (Fig. 3C), and motif analysis indicated that 46.0% of clusters contained the canonical miR-25 seed sequence (complementary to miR-25 positions 2-7) (Fig. 3D). Expanding to all 94 miRs showed a similar annotation distribution, with average 39.9% 3' UTR and 31.9% CDS clusters (Sup. Fig. 3C), and the 2-7 seed match sequence was the most abundant 6-mer for 21 (and in the top 10 for 47) out of the 94 profiled miRNAs (Fig. 3E).
However, we noted that many datasets also showed strong enrichment for the 6-mer at the end of the PCR primer (Fig. 3E), and manual inspection confirmed that these regions often contained stretches of homology to the miRNA that likely cause mis-priming of the PCR primer. Thus, although PCR-based enrichment can successfully yield deep profiling for some miRNAs, these primer-specific artifacts require careful consideration during analysis. To experimentally avoid these artifacts, we developed an alternative non-PCR based approach described below. However, we also hypothesized that as these artifacts were due to the miRNA-specific primer amplifying non-chimeric fragments, the false-positive clusters would likely be created in regions that were not enriched in the non-chimeric AGO2 eCLIP analysis. Consistent with this, motif analysis performed only on PCR-enriched clusters that overlapped reproducible AGO2 eCLIP peaks indicated that 70 of the 94 had an increase in the percent of clusters with the miRNA 2-7 seed match and 80 of the 94 had a decrease in the 6-mer at the 3' end of the primer (Sup. Fig.  3D-E), suggesting that overlapping PCR-enriched datasets with standard AGO2 peaks can help remove false-positive signal and can help enable identification of which miRNA is causing a putative AGO2 interaction.

Targeted enrichment of miRNA chimeras by probe-capture
As discussed in Design above, while PCR enrichment is an easy way to allow for in-depth profiling of miRNA targets, the replacement of the actual RNA fragment sequence with the primer sequence in the final read (and subsequent loss of information about related miRNAs, including miRNA family members which often only contain a single nucleotide difference, and miRNA 5' ends) represents a clear limitation. Further, our results above suggested that PCR amplification artifacts could be a common source of false positives.
To address these concerns, we tested a probe-capture enrichment technique with modified oligonucleotides to increase the depth of chimeric read enrichment (Fig. 4A). First, we tested specificity of enrichment of chimeric reads for miRNAs of interest in HEK293T cells, choosing five miRNAs of interest (miR-221-3p, miR-34a-5p, miR-186-5p, miR-21-5p and miR-222-3p) that span a range of miRNA abundances from 14 th to 56 th most highly expressed miRNA in HEK293T (according to small RNA-seq profiling, Supplementary Table 2). We applied chimeric eCLIP to enrich libraries for chimeras of these miRNAs and compared it to libraries generated using withgel chimeric eCLIP without enrichment. Probe capture-enriched chimeric eCLIP revealed unique targeting by the distinct miRNAs (Fig. 4B), and successfully enriched chimeras for each of the targeted miRNAs by more than 19-fold (Fig. 4C). The frequency of chimeras for the targeted miRNAs was increased from 0.01% to 0.48% of sequenced reads (Fig. 4D) resulting in a more than a 12-fold increase in the number of genes with reproducible 3' UTR clusters (Fig. 4E). Out of all identified chimeras, the 5 targeted miRNAs went from 4.9% in standard chimeric eCLIP to 93.6% in the enriched pool (Fig. 4F), indicating a high specificity in recovering the desired miRNAs. Of note, we observed from 25,441 to 205,723 chimeric reads for each of the enriched miRNAs (Fig. 4C); achieving 25,000 chimeric reads for each would have required sequencing the non-enriched with-gel chimeric library 35-fold deeper (assuming there were sufficient non-PCR duplicates to make this possible). Clusters identified from these chimeric reads showed specific enrichment for the miRNA seed sequence (Fig. 4G), indicating a high signal-to-noise in identifying candidate miRNA targets.
Probe-capture chimeric-eCLIP can enrich for entire miRNA families while preserving the exact sequence of the specific miRNA bound to each target mRNA, enabling deep profiling of miRNA families with highly related sequences. Since many investigators are interested in studying families of miRNAs, we next tested whether probe capture could enable enrichment and subsequent separation of chimeric reads for members of the same miRNA family. First, we targeted six members of the miR-17 family that share the same seed sequence (miR-17-5p, miR-93-5p, miR-20a-5p, miR-20b-5p, miR-106a-5p, miR-106b-5p) along with two miRNAs with related seed sites (miR-18a-5p, miR-18b-5p). This included two miRNAs (miR-17-5p and miR-20a-5p) that were highly over-represented among chimeric reads (Fig. 1E) as well as in small RNA-seq in HEK293T (Supplementary Table 2), and three miRNAs (miR-20b-5p, miR-106a-5p and miR-18b-5p) that were ranked outside of the 80 most abundant miRNAs by chimeric frequency and outside the 200 most abundant miRNAs in small RNA-seq. Even though miRNAs that were selected in the miR-17 family experiment accounted for an average of 26.3% of total chimeric reads without enrichment, probe capture further increased representation of the selected miRNAs by 3.8-fold to 98.9% of chimeric reads and by 32.4-fold among all sequenced reads (from 0.07% to 2.4%) (Sup. Fig. 4A-C). Next, we designed probes against 2 members of the let-7 family (let-7a-5p and let-7g-5p) along with two unrelated miRNAs (miR-26a-5p and miR-26b-5p). Chimeric reads for the targeted miRNAs in the let-7 family experiment were less common, accounting for only 0.01% before enrichment but increasing 49.4-fold to to 0.4% following enrichment, with an increase from 3.2% of chimeric reads without enrichment to 82.2% after enrichment (Fig. 4F,H-I). We note that although other non-targeted let-7 members were also enriched, they were distinguishable by sequencing (Fig. 4H). Again, the increased read coverage led to a dramatic increase in the number of reproducible 3' UTR clusters observed for the targeted miRNAs (Sup. Fig. 4D-E). In each case, clusters identified from chimeric reads showed specific enrichment for sequences matching the seed region of the cognate miRNA (Fig. 4J, Sup. Fig. 4F), indicating that the more extensive set of probe-enriched targets retains the high signal-to-noise observed for non-enriched no-gel chimeric eCLIP (Fig. 2E).
Finally, we tested chimeric CLIP with probe capture enrichment of miRNA:mRNA chimeras in mouse liver tissue. Two sets of enriched libraries were prepared, one enriched for a selection of five miRNAs (miR-26a-5p, miR-21a-5p, let-7a-5p, let-7c-5p, let-7f-5p) and another enriched specifically for miR-122-5p. As above, probe-based enrichment further increased representation of chimeras for miRNAs of interest among chimeric reads, going from 9.6% to 69.2% of chimeras and from 0.2% to 6.4% of total reads for the 5-miRNA pool (Fig. 4F,K-L) and from 34.2% to 66.6% of chimeras and from 0.8% to 5.9% of total reads for miR-122-5p (Sup. Fig. 4G-I). As before, the number of clusters was dramatically increased in probe-capture experiments (Sup. Fig. 4J-K), and seed matching sites for miRNAs were over-represented in clusters called from chimeric reads (Sup. Fig. 4L-M). In summary, probe-enriched chimeric eCLIP libraries enable profiling of hundreds of thousands to several million chimeric reads per miRNAs of interest, allowing deep recovery of interactions for a miRNA of interest at substantially decreased sequencing depth.

Profiling of miRNAs targeting a gene of interest
In addition to focused profiling of genes targeted by miRNAs of interest, it is increasingly important to identify miRNAs that target a specific gene of interest. To do this, we generated enrichment probes complementary to a gene of interest by ordering dsDNA gene fragments and performing T7 transcription with biotinylated nucleotides (see Methods), and performed capture of no-gel chimeric eCLIP cDNA as described above for miRNA-specific probes. As a proof of concept, we tested capture of two 3' UTR regions: Unc-51 Like Autophagy Activating Kinase 1 (ULK1) and Amlyoid Beta Precursor Protein (APP). Although the frequency of chimeric reads obtained for a gene of interest was lower than for probe-capture enrichment of miRNAs, we observed 358-fold and 61-fold increased representation of chimeric reads for the gene of interest for ULK1 and APP enrichment respectively, going from less than 200 chimeric reads (out of 144 and 145 million sequenced reads) to over 1000 (out of less than 20 million reads) (Fig. 5A-B).
Despite differences in chimeric read abundance with and without enrichment, counts of chimeric reads per miRNA were highly correlated between gene-enriched (no-gel) chimeric eCLIP and matched with-gel non-enriched chimeric eCLIP (Pearson correlation 0.65 and 0.75 for ULK1 and APP respectively) (Fig. 5C-D), indicating that the probe enrichment did not dramatically bias miRNA representation among gene specific chimeras. Upon visual inspection, we observed that individual target sites were well separated from each other, identifying numerous putative miRNA target sites in 3' UTRs of ULK1 and APP (Fig. 5E-F). As expected, these sites commonly overlapped sequences complementary to the miRNA seed region. Notably, different miRNAs with the same or related seed sequences often (but not always) showed similar patterns of enrichment ( Fig. 5E-F), reflecting the ability of probe capture enrichment to separate interactions of highly related miRNAs.

miRNA:mRNA chimeras identify functional miRNA targets
As microRNAs often regulate gene expression by inducing RNA degradation, a common way to validate miRNA targets at scale is to show downregulation following miRNA overexpression (Lim et al., 2005). Indeed, targets identified using CLASH or similar chimeric ligation approaches showed enrichment for miRNA-dependent changes in RNA decay and specific functional categories, confirming that these methods yield high-quality sets of miRNA targets (Moore et al., 2015). To confirm that chimeric eCLIP also identifies functional miRNA targets, we chose two miRNAs with low endogenous expression in HEK293T cells (miR-1 and miR-124, ranked 65 th and 265 th most expressed miRNAs respectively (Supplementary Table 2) that have previously been used to study miRNA overexpression (Lim et al., 2005) and performed overexpression followed by small RNA-seq to confirm miRNA expression, mRNA-seq to assess the effect of miRNA overexpression on global gene expression, and chimeric eCLIP to identify targets (Fig. 6A).
To identify reproducible targets for miR-124 and miR-1, we first called clusters using miR-124 and miR-1 chimeric reads in each of the two replicates, and then used a modified IDR pipeline to identify reproducible clusters, which identified 549 miR-124 and 44 miR-1 3' UTR clusters (Fig.  6E). We observed a significant shift towards decreased expression for the 484 and 42 genes containing at least 1 chimeric 3' UTR cluster for miR-124 and miR-1, respectively (p = 2.9×10 -82 and p = 1.3×10 -18 by two-tailed Kolmogorov-Smirnov test) (Fig. 6F, Sup. Fig. 6I). The magnitude of repression generally increased among chimeric eCLIP targets when more chimeric reads were identified (RPMC≥0, ≥100, or ≥200 chimeric reads per cluster), suggesting that the count of chimeric reads per target provides a metric that correlates with the regulatory impact of a particular miRNA:mRNA interaction.
Next, we compared these results against TargetScan (v 7.2) computationally predicted targets (Agarwal et al., 2015). TargetScan-predicted targets were more numerous than genes with chimeric eCLIP clusters and showed significant repression upon miRNA over-expression for both miR-124 and miR-1 (Fig. 6F, Sup. Fig. 6I), although the magnitude was more similar to only the low-confidence (RPMC≥0 read) chimeric eCLIP set. Notably, we observed that for both genes with or lacking TargetScan-predicted miRNA interactions, the subset of genes with chimeric eCLIP clusters showed greater repression upon miRNA over-expression (Fig. 6G-H, Sup. Fig.  6L-M), suggesting that chimeric eCLIP can both refine computationally predicted targets as well as reveal new targets not captured by prediction alone.
MicroRNA sites have also been validated in coding sequence (CDS) regions, albeit with typically weaker repressive activity (Baek et al., 2008;Grimson et al., 2007). Consistent with these prior findings, we observed that the number of CDS clusters was roughly similar to that of 3' UTR clusters (Fig. 6E), and that genes with chimeric clusters only in CDS for miR-124 (222) and miR-1 (113) showed significant repression in miR over-expression (p = 2.5×10 -5 and p = 4.9×10 -4 respectively) albeit weaker than repression of genes with 3' UTR only clusters (Fig. 6I, Sup. Fig.  6N). This decreased effect for CDS targets versus 3' UTR targets is consistent with prior observations, although notably for miR-124 the median log2 fold-change for CDS-only chimeric targets (-0.089) was approaching the change observed for TargetScan-predicted 3' UTR targets (-0.146) (Fig. 6I). As prediction of CDS targets remains challenging due to the high baseline conservation of coding regions, these results suggest that chimeric eCLIP represents a unique approach to enable exploration of functional CDS-region targeting by miRNAs.

A resource of chimeric eCLIP data in 293T cells
Overall, the experiments described in our study profiled nearly six million non-PCR duplicate, uniquely mapped chimeric reads in HEK293T, including 2.6 million across 6 with-gel and 3.4 million across 12 no-gel experiments, a dramatic increase over prior chimeric CLIP studies (Fig.  7A). To further aid the utilization of this resource, we merged these replicates together, yielding 963,813 and 856,953 3' UTR chimeras for with-gel and no-gel respectively, and performed cluster identification for each miRNA with at least 1,000 chimeric reads, enabling identification of 31,494 putative high-confidence miRNA interaction sites in with-gel and 36,559 in no-gel respectively (CLIPper p ≤ 10 -5 ). As expected, coverage remained dependent on miRNA abundance, with miR-20a-5p and miR-17-5p yielding over 210,000 with-gel and 340,000 no-gel chimeras each; however, 46 miRNAs in the merged with-gel and 52 in the merged no-gel had at least 10,000 chimeras (Fig. 7B, Sup. Fig. 7A). These typically had hundreds to thousands of significant clusters located within the 3' UTR or CDS of dozens to over a thousand unique genes (Fig. 7B, Sup. Fig.  7), and these clusters were enriched for miRNA seed matches as expected, with the miRNA 2-7 seed being the most frequently observed 6-mer for 44 of the 46 miRNAs (Fig. 7C). Thus, this resource represents a unique opportunity to further explore the regulatory roles and networks of miRNAs in the well-characterized 293T cell line.

Discussion
The experimental identification of miRNA targets with chimeric CLIP-seq approaches provided unique insights into the rules for miRNA:target interactions (Broughton et al., 2016;Grosswendt et al., 2014;Helwak et al., 2013;Moore et al., 2015). However, the limited yield of non-PCR duplicate fragments, coupled with the low number of chimeric reads outside of the most abundant miRNAs, has hindered the widescale adoption of this powerful approach. With the ever-increasing catalog of miRNAs associated with human disease, coupled with the characterization of small RNAs produced by RNA viruses including SARS-CoV-2 that appear to utilize the host miRNA regulatory machinery (Pawlica et al., 2021), the ability to deeply profile targets for individual miRNAs of particular interest represents a novel opportunity to better understand the biological regulatory networks driven by individual microRNAs in a variety of context.
In this work, we describe both PCR-and probe-based strategies for targeted enrichment of a miRNA of interest. Although PCR-based enrichment is experimentally simpler and can yield robust profiling for many miRNAs, the common amplification of transcriptomic sites with sequence complementary to the primer coupled with the fact that the PCR primer replaces the sequence of the original miRNA (making it impossible to separate highly similar family members) led us to favor the probe-capture approach for general use. Because this approach simply enriches for (but does not alter) the original chimeric RNA fragments, it is thus possible to separate chimeras for miRNA family members differing only by a single nucleotide in order to reveal novel roles and interactions. Indeed, our analysis of miRNAs targeting two individual genes suggested a large degree of co-targeting among family members, which may help to explain the robustness of miRNA regulatory networks and their resilience to mutations of individual miRNAs in knockout experiments (Miska et al., 2007). Although we only performed analysis on annotated miRNAs in this work, this property may also enable querying of how alternative miRNA processing alters targeting, as recent studies indicated that alternative 3' end tailing in particular can play critical roles (Kingston and Bartel, 2019). We note that although in theory this can also enable characterization of microRNA 5' end isomiRs, the addition of non-templated nucleotides after reverse transcription may complicate this analysis (data not shown), and further work may be required to explore this question.
The utilization of an anti-AGO2 antibody to enrich for miRNA interactions has the benefit of enabling this approach to be utilized across cell lines and tissues with only limited experimental modification (optimization of lysis and RNase fragmentation conditions). However, the development of additional approaches, such as more stringent purification of AGO2:miRNA:mRNA complexes utilizing a transgenic AGO2:HaloTag fusion, could provide further opportunities for optimization and removal of background by enabling denaturing wash conditions (Li et al., 2020). In addition to microRNAs, there exist many other small regulatory RNAs that play critical roles in regulating RNA processing through interaction with complementary sequences, including piRNAs, snoRNAs, and others (Gumienny et al., 2017;Shen et al., 2018). Indeed, early CLASH publications showed the utility of the chimeric analysis approach in mapping snoRNA interactions in yeast (Kudla et al., 2011) and more recently was utilized to generate an extensive catalog of putative snoRNA interactions and reveal novel targets of orphan snoRNAs (Gumienny et al., 2017). Similarly, CLASH with PIWI protein has been used to reveal mechanisms of piRNA targeting (Shen et al., 2018), and CLASH using E. Coli Hfq to enrich for sRNA-mRNA interactions revealed principles of small RNAs in bacteria (Iosub et al., 2020). Thus, although we focus this description of chimeric eCLIP on miRNA target identification using immunoprecipitation of AGO2, we anticipate that the principles described here in both increasing experimental yield, as well as targeted enrichment for individual small RNAs of interest, should similarly enable deep profiling of other small RNA interactomes of interest.

Limitations
Chimeric eCLIP, like all CLIP-based approaches, attempts to capture interactions as they occur in living cells. As such, whether observed interactions are truly occurring in vivo or are instead occurring after lysis is a major concern. In a traditional CLIP experiment, the denaturing SDS-PAGE and nitrocellulose transfer removes non-crosslinked RNA, providing a mechanism for removing potential post-lysis interactions. However, the additional chimeric ligation step in chimeric CLIP experiments (including with-gel chimeric eCLIP described here) weakens this limitation, as only one of the miRNA or target would be required to crosslink. Prior chimeric CLIP experiments using spike-in of E. coli, Drosophila, or yeast lysate (Helwak et al., 2013;Moore et al., 2015) suggested a low (1-5%) rate of interactions formed in vitro after cell lysis; however, our work using fragmentation size-optimized human and rat lysate suggests this rate may be substantially higher (up to nearly 20%, once accounting for human-human post-lysis interactions that are not quantified by this approach). Thus, although it is always beneficial to validate CLIPidentified candidate interactions with orthogonal information (e.g., identifying overlap with in vitro motifs or regulation upon miRNA or RNA binding protein over-expression or knockdown), such orthogonal validation is particularly important if one wants robust confidence in individual microRNA targets identified by chimeric CLIP. Requiring crosslinking of both miRNA and mRNA target would reduce post-lysis interactions, but low efficiency of UV crosslinking currently limits such efforts (Moore et al., 2015). However, it is possible that this may be alleviated with improved crosslinking efficiency and the ability to utilize optimized denaturing wash conditions for Halo or other tag-based approaches.
Although we performed validation with miRNA overexpression, the high level of over-expression may not represent native miRNA targeting as recent studies have suggested that limited abundance of AGO2 itself may drive competition between miRNAs for proper loading and function (Khan et al., 2009;Tan et al., 2019). Further, while the PCR-and probe-based enrichment approaches described here enable deeper profiling of miRNAs of interest, the efficiency of these methods does remain constrained by both sequence properties of the miRNA or gene of interest, as well as the baseline expression level. Thus, in the implementation we describe here we do not directly address the stoichiometry of binding between miRNAs and their targets, and it remains an open question whether certain interactions identified by probe-enriched chimeric eCLIP reflect a high enough fraction of expressed copies of that mRNA to drive functionally relevant expression changes.

Cell culture
Human HEK293xT cells were acquired from Takara Bio and cultured in DMEM media (GIBCO) with 10% FBS 1% penicillin/streptomycin at 37°C with 5% CO2. C6 cells were purchased from Sigma and cultured in F-12 media (GIBCO) with 10% FBS, 2mM Glutamax, and 1X penicillin/streptomycin at 37°C with 5% CO2. For each experiment, 10 cm plates (~15 million cells) were washed once with cold 1X phosphate buffered saline (PBS) and overlaid with minimal (3 mL per 10 cm plate) cold 1X PBS, and UV crosslinked (254 nm, 400 mJ/cm 2 ) on ice. After crosslinking, cells were scraped and spun down, supernatant removed, and washed with cold 1X PBS. Cell pellets were flash frozen on dry ice and stored at -80 °C.
Mouse liver tissue 8-week-old C57BL/6J mice were purchased from Jackson Labs. Mice were anesthetized with tribromoethanol and perfused with 0.9% saline. Tissues were collected, snap frozen, and stored at -80 °C for further analysis. Animal experiments were conducted under a protocol approved by the Institutional Animal Care and Use Committee (IACUC) of the University of California San Diego.

miRNA mimic transfection
Human HEK293T cells grown to ~75% confluency in antibiotic-free media were transfected with a final concentration of 100 nM of miR-1 or miR-124 miRNA mimics (IDT) or no mimic (mock) with Lipofectamine™ RNAiMAX (Invitrogen). Cells were incubated with mimics for 16 hours and then harvested (viability 70 -80%). To harvest, cells were washed with cold 1X PBS, and either UV crosslinked as described above (for chimeric-eCLIP), or flash frozen without crosslinking (for RNA-seq and sRNA-seq).

Chimeric-eCLIP
Chimeric eCLIP was based off the previously described seCLIP protocol (Van Nostrand et al., 2017a;Van Nostrand et al., 2016) with modifications to enhance chimera formation described below. As in eCLIP, lysis was performed in eCLIP lysis buffer, followed by sonication and digestion with RNase I (Ambion). Immunoprecipitation of AGO2-RNA complexes was achieved with a primary mouse monoclonal AGO2/EIF2C2 antibody (eIF2C2 4F9 (sc-53521) (Santa Cruz Biotechnology) overnight at 4°C using magnetic beads pre-coupled to the secondary antibody (M-280 Sheep Anti-Mouse IgG Dynabeads, Thermo Fisher 11202D). Initial experiments used standard eCLIP conditions (10 ug of antibody and 125 uL of Dynabeads for 20×10 6 cells), but most experiments used decreased antibody and increased bead amounts based on the trend of decreased cross-species chimeras in those conditions (See Fig. 2K and Supplementary Table 3). Where indicated, 2% of each immunoprecipitated (IP) sample was saved as input control. For human/rat mixing experiments, cell pellets were lysed, sonicated, and RNase digested separately, and then mixed during addition of antibody and beads prior to overnight incubation. Western blot visualization used anti-AGO2 primary antibody (50683-RP02, SinoBiological) at 1:2000 dilution, with TrueBlot anti-rabbit secondary antibody (18-8816-31, Rockland) at 1:6000 dilution.
To phosphorylate the cleaved mRNA 5'-ends, beads were washed and treated with T4 polynucleotide kinase (PNK, 3' -phosphatase minus, NEB) and 1 mM ATP. Chimeric ligation was then performed on-bead at room temperature for one hour with T4 RNA Ligase I (NEB) and 1 mM ATP in a 150 µl total volume. As in seCLIP, samples were then dephosphorylated with alkaline phosphatase (FastAP, Thermo Fisher) and T4 PNK (NEB), and an RNA adapter (N9RiT22 or CLASHn10RiL19bio) was ligated to the 3′ ends of the mRNA fragments (T4 RNA Ligase, NEB). With-gel chimeric-eCLIP IP and input samples were then denatured with 1X NuPage buffer (Life Technologies) and DTT, run on 4%-12% Bis-Tris protein gels and transferred to nitrocellulose membranes. The region corresponding to bands at the appropriate Ago2 protein size plus 75 kDa was excised and treated with Proteinase K (NEB) to isolate RNA, which was column purified (Zymo). No-gel chimeric eCLIP samples were treated directly with Proteinase K (NEB) to isolate RNA and column purified (Zymo). For both methods, RNA was then reverse transcribed with SuperScript IV Reverse Transcriptase (Invitrogen), 3 mM manganese chloride (to encourage read-through of crosslink sites), and 0.1 M DTT. Following reverse transcription (i16RT or InvAR19 primer), samples were treated as in seCLIP, including treatment with ExoSAP-IT (Affymetrix) to remove excess oligonucleotides, hydrolysis with sodium hydroxide (to degrade RNA) and addition of hydroden chloride (to balance pH). A 5' Illumina DNA adapter (InvRand3Tr3) was then ligated to the 3′ end of cDNA fragments with T4 RNA Ligase (NEB), and after bead purification (Dynabeads MyOne Silane, Thermo Fisher), qPCR was performed on an aliquot of each sample to identify the proper number of PCR cycles (using D501_qPCR and D701_qPCR primers). The remainder of the sample was PCR amplified with barcoded Illumina compatible primers (Q5, NEB) based on qPCR quantification and size selected using AMPure XP beads (Beckman). Libraries were quantified using Agilent TapeStation and sequenced on the Illumina HiSeq or NovaSeq platform. As previously described, experimental yield was estimated by eCT, defined as the extrapolated number of PCR cycles necessary to obtain 100 femtomoles of library, assuming 2-fold amplification per PCR cycle (Van Nostrand et al., 2016). In this work, the eCT calculation also included extrapolation to normalize for cell number input and percent of cDNA used in order to enable comparison across experiments.
RNA visualization experiments were performed as previously described (Van Nostrand et al., 2020b) with the additional chimeric eCLIP steps added. Briefly, lysis of 3×10 6 cells was performed with either low (3.3U) or high (100U) of RNase I (Ambion), with IP, chimeric (no-adapter) ligation, dephosphorylation, biotinylated adapter ligation, SDS-PAGE electrophoresis, and transfer to nitrocellulose membrane performed as described above. Detection of biotin-labeled RNA was performed using a Chemiluminescent Nucleic Acid Detection Module Kit (Thermo Scientific).

PCR-based miRNA enrichment
MicroRNA-specific amplification primers were designed to minimize overlap with other miRNAs and limit overlapping with variable miRNA ends (Supplementary Table 4). To perform enrichment, first a qPCR was performed on diluted chimeric-eCLIP cDNA with miRNA-specific forward primer (Supplementary Table 4) and a reverse primer complementary to the RNA adapter sequence (chimeCLIP-7qL_R), and per-primer Ct values were obtained. Next, PCR amplification was performed in two steps. First, 8 cycles of PCR was performed with miRNA-specific forward primer and a reverse primer complementary to the RNA adapter sequence (chimeCLIP-7qL_R), using the standard eCLIP PCR conditions except for primer-specific annealing temperatures (see Supplementary Table 4). After bead cleanup with 2X volume Ampure XP beads (Beckman), a second PCR was performed using standard Illumina multiplexing primers and 8 cycles of amplification with standard eCLIP PCR conditions, using variable amount of first PCR product for each miRNA primer based on the qPCR Ct obtained (Supplementary Table 4). Each library was then purified again with 1.88X volume Ampure XP beads (Beckman), eluted, quantified by TapeStation (Agilent), and pooled for sequencing on the Illumina Nova Seq 6000 platform.
Probe-based miRNA capture: Samples were directly treated with Proteinase K (NEB) in place of the SDS-PAGE and membrane transfer steps described above. Biotinylated DNA probes designed (reverse complement) to the miRNA of interest (IDT) were then hybridized (500 picomoles per sample), washed on Silane beads (Dynabeads MyOne Silane, Thermo Fisher) and treated with DNase (Life Technologies). The remaining reverse transcription and library preparation steps were then performed as described above.
Probe-based gene capture: Samples were directly treated with Proteinase K in place of the SDS-PAGE and membrane transfer described above. Reverse transcription and cDNA adapter ligation steps were performed as above. Prior to PCR amplification, gblocks Gene Fragments (IDT) designed for the gene of interest were amplified to generate dsDNA templates. Biotinylated RNA probes were generated using T7 RNA Polymerase and biotinylated nucleotides (bio-UTP and bio-CTP). The biotinylated probes were coupled to streptavidin beads (Dynabeads MyOne Streptavidin C1, Thermo Fisher), chimeric library molecules were denatured, and chimeric molecules and probes were hybridized for one hour at 50°C. Beads were then washed, genespecific probes were degraded with RNase, and enriched DNA fragments eluted from beads. The remaining PCR amplification and library preparation steps were then performed as described above.

RNA-seq library preparation
15 million HEK293T cells were cells spun down, supernatant removed, and washed with cold PBS. Cell pellets were flash frozen on dry ice and stored at -80°C. Total RNA was isolated using the miRNeasy Mini Kit (Qiagen). Poly(A) RNA was isolated using Dynabeads Oligo (dT)25 (Thermo Fisher). RNA integrity and purity was measured using the Agilent 4200 TapeStation. To prepare RNA-sequencing libraries, 50 ng of poly(A) selected RNA was heat fragmented in 2X FastAP buffer and then treated with alkaline phosphatase (FastAP, Thermo Fisher) and T4 PNK (NEB). An RNA adapter was then ligated to the 3′-ends of the mRNA fragments (T4 RNA Ligase, NEB), after which RNA was column purified (Zymo) and reverse transcribed with SuperScript III Reverse Transcriptase (Invitrogen), then treated with ExoSAP-IT (Affymetrix) to remove excess oligonucleotides. A 5' Illumina DNA adapter (InvRand3Tr3) was ligated to the 3′-end of cDNA fragments with T4 RNA Ligase (NEB) and after on-bead cleanup (Dynabeads MyOne Silane, Thermo Fisher), qPCR was performed on an aliquot of each sample to identify the proper number of PCR cycles. The remainder of the sample was PCR amplified with barcoded Illumina compatible primers (Q5, NEB) and size selected using AMPure XP beads (Beckman). Libraries were quantified using Agilent TapeStation and sequenced on the Illumina NovaSeq platform.

sRNA-seq library preparation
HEK293fT cell pellets were prepared, and total RNA isolated as described in the RNA-seq library preparation Methods section. Small RNA-sequencing libraries were prepared using the QIAseq miRNA Library Kit (Qiagen). Briefly, a pre-adenylated DNA adapter was ligated to the 3' ends of the RNA followed by ligation of an RNA adapter to the 5' ends of the RNA. The adapter-ligated RNA was then reverse transcribed, and on-bead cleanup of cDNA was performed. Library amplification and barcoding was achieved using a universal forward primer and indexing 8-base reverse primers (HT Plate Indices 331565). Libraries were quantified using Agilent TapeStation and sequenced on the Illumina NovaSeq platform.

Analysis of chimeric eCLIP sequencing data
The pipeline for analysis of chimeric eCLIP datasets is available at https://github.com/YeoLab/chim-eCLIP. Briefly, final library fragments in chimeric eCLIP libraries contain a sequence of 10 random nucleotides at the 5¢ end and a sequence of 9 (N9RiT22), 10 (CLASHn10RiL19bio), or 0 (CLASHRiL19bio) random nucleotides at the 3¢ end of the insert sequence. These random sequences serve as unique molecular identifiers (UMIs) and were utilized for removal of PCR duplicates (Kivioja et al., 2011). Only the 5¢ end UMI was used for processing of total chimeric and probe capture enriched libraries; the 3¢ end UMI was used only for PCR duplicate removal in PCR Targeted Chimeric analysis where the 5¢ UMI is lost. In the first steps of the analysis, the 10 nucleotide UMIs were pruned from the 5¢ end of R1 read sequences using umi_tools (v1.0.0) (Smith et al., 2017) and saved in the read name. Next, aggressive adapter trimming was performed to remove not only adapter sequences but also adapter fragment concatemers with cutadapt (v2.5) using options -O 1 --times 3, -e 0.1, --quality-cutoff 6, -m 18, and 10nt fragments that step through the RNA adapter (-a AGATCGGAAG -a GATCGGAAGAa ATCGGAAGAG -a TCGGAAGAGC -a CGGAAGAGCA -a GGAAGAGCAC -a GAAGAGCACA -a AAGAGCACAC -a AGAGCACACG -a GAGCACACGT -a AGCACACGTC -a GCACACGTCT -a CACACGTCTG -a ACACGTCTGA -a CACGTCTGAA -a ACGTCTGAAC -a CGTCTGAACTa GTCTGAACTC -a TCTGAACTCC -a CTGAACTCCA -a TGAACTCCAG -a GAACTCCAGT -a AACTCCAGTC -a ACTCCAGTCA). This aggressive trimming due to high adapter concatemer presence was later observed to be linked to RNA adapters containing random nucleotides on the 5' end, which can be alleviated with the CLASHRiL19bio adapter (Supplementary Table 4). Remaining reads shorter than 18 nucleotides in length were discarded. The final 9 or 10 nucleotides at the 3¢ end of each read were then removed using cutadapt (v2.5) to ensure possible remaining random sequence at the 3¢ end of the insert sequence was removed. Further analysis was performed separately for analysis of non-chimeric reads and analysis of chimeric reads.

Non-chimeric (standard eCLIP) analysis
Standard eCLIP analysis was performed as previously described (Van Nostrand et al., 2016). Briefly, reads were mapped to a database of species-specific (human, mouse or rat depending on the experiment) repetitive elements from RepBase 18.05 using STAR (v2.7.6a) using the following parameter: --outFilterMultimapNmax 30. Reads that did not align to the repeats database were then mapped to the human (hg38), mouse (mm10), or rat (rn6) genomes using STAR (v2.7.6a) (Dobin et al., 2013) with options to require unique mapping  and end-to-end read alignment was forced (--alignEndsType=EndToEnd). PCR duplicates were removed using umi_tools (v1.0.0) by utilizing the 5' UMI sequences and mapping positions. Clusters of reads were identified within eCLIP samples using the cluster caller CLIPper (https://github.com/YeoLab/clipper/commit/61d5456) (Lovci et al., 2013) using transcript annotations for human (GENCODE v29; ENCODE accession ID ENCFF159KBI), mouse (GENCODE vM25), or rat (ENSEMBL release 98). For each cluster IP versus input fold enrichment values were then calculated as a ratio of counts of reads overlapping the cluster region in the IP and the input samples (read counts in each sample were normalized against the total number of genome mapped reads in the sample remaining after PCR duplicate removal). A p-value was calculated for each cluster by the Yates' Chi-Square test (or Fisher Exact Test if the observed or expected read number was below 5). A pseudocount of 1 read was added to all read counts per cluster for input samples when calculating p-values and IP versus input log2 fold changes. Clusters were filtered using a list of excluded regions (ENCODE accession ID ENCFF269URO) and annotated using transcript information from GENCODE v29 (ENCODE accession ID ENCFF159KBI) and LNCipedia v5.0 (Volders et al., 2019) for human and vM25 for mouse (Frankish et al., 2019) using the following priority hierarchy to define the final annotation of overlapping features: protein coding transcript (CDS, UTRs, intron), followed by non-coding transcripts (exon, intron). Clusters passing cutoffs of IP vs. input fold enrichment ≥ 8 and p-value ≤ 0.001 were deemed significant and referred to as significantly enriched peaks.
Information content for significant peaks was calculated using the following formula, where ci = number of IP reads overlapping the peak, ii = number of input reads overlapping the peak, ni = total IP reads, and mi = total input reads.
Inforrmation content was summed across peaks to calculate the total information at peaks overlapping various transcript annotation regions.

Comparison of no-gel versus with-gel fold-enrichment
Peak-level fold-enrichment between samples was calculated by first taking all CLIPper clusters in the first (e.g. no-gel) sample and then calculating the fold-enrichment in IP versus input separate for each cluster in both samples (with-gel and no-gel), with a pseudocount of 1 added to the input read counts at each cluster. For the second (e.g. with-gel) sample in the comparison, if no reads in the IP overlapped the cluster, a pseudocount of 1 was used. The fold-enrichments for all clusters across both experiments were then plotted (Fig. 2E). This process was repeated for the inverse comparison (e.g. comparing fold-enrichment for all clusters identified in the with-gel experiment) (Sup. Fig. 2F).

Chimeric read analysis
Identification of chimeric reads was adapted based on the method described by Moore et al. (Moore et al., 2015). Reads were first "reverse mapped" to a database of mature human or mouse miRNAs compiled from miRBase (v22) (Kozomara et al., 2019) by mapping the miRNA sequences to the reads using Bowtie (v1.2.2) (Langmead et al., 2009). Alignments of miRNA to sequencing reads were then filtered, keeping only positive strand alignments and prioritizing alignments with the fewest number of mismatches so that only one miRNA alignment was selected for each read. In each of these reads, sequences flanking the miRNA alignment were identified (the chimeric "target" portion in chimeric reads), and reads with target portions less than 18 nucleotides in length were discarded. As above, target portions were first mapped to a speciesspecific repeat element database, with mapped reads discarded. The remaining sequences were mapped to the human (hg38), mouse (mm10), or rat (rn6) genomes using STAR (v2.7.6a) and PCR duplicates were removed using umi_tools (v1.0.0). Annotation of target portions of chimeric reads was performed using transcript information for human (GENCODE v29), mouse (GENCODE vM25) or rat (RefGene) following the same hierarchy rules as listed for the annotation of peaks listed above.
For initial analyses, clusters were identified with CLIPper using all chimeric reads as input (with transcript annotations described above); for following analyses, chimeric reads were first separated by miRNA, and clusters were then identified (CLIPper) separately for each miRNA using the miRNA-specific set of chimeric reads. Reproducible clusters were identified with an approach based off the Irreproducible Discovery Rate (IDR) approach (Li et al., 2011) by ranking all clusters based on -log10 transformed CLIPper p-values, followed by running the IDR software (v 2.0.2). IDR-identified regions with IDR score ≥ 540 were used as 'reproducible' clusters.

Calculation of miRNA and chimeric read abundance
As some miRNAs have multiple highly homologous genomic copies, quantification of miRNA abundance from only uniquely mapped reads yielded inaccurate estimates of expression. To address this for non-chimeric analyses, multimapping reads were also considered by taking reads that failed to map uniquely and redoing STAR mapping with the option '--outFilterMultimapNmax=10000' flag, then selecting the primary alignment for each read, and then performing PCR duplicate removal as described above for uniquely mapping reads. This approach may miss occasional PCR duplicates that are assigned different primary alignments from multiple equal alignments; analysis indicates that this typically alters expression estimates by <2%.
As appropriate, per-miRNA and per-cluster abundance was normalized using three alternative approaches: RPM (reads per million) calculated versus all uniquely mapped, non-PCR duplicate reads, RPMI (Reads Per Million Initial) calculated versus all sequenced reads (prior to any adapter trimming or other processing), and RPMC (Reads Per Million Chimeras) calculated versus all uniquely mapped, non-PCR duplicate chimeric reads. RPM of chimeric reads was calculated with a denominator of the sum of (uniquely mapped, PCR duplicate removed) chimeric reads and nonchimeric reads. RPM normalization of miRNA read counts in small RNA-seq was performed analogously to the calculation of non-chimeric RPM in chimeric eCLIP experiments.

Analysis of human/rat mixing experiments
For human/rat mixing experiments, adapter trimming, mapping to human mature miRNAs (miRBase v22), removal of putative rRNA artifact miRNAs (Supplementary Table 1), and identification of potential chimeric reads was performed as described above (in pilot analyses, mapping to human miRNAs gave similar results as separate mapping to both human and rat miRNA annotations). After removing the miRNA region, the remaining putative chimeric fractions were first mapped to a repeat element database containing both human and rat elements (RepBase 18.05) with mapped fragments discarded. Next, remaining reads were separately mapped to human (hg38) and rat (rn6) genomes. For this analysis, genomic mapping was performed allowing multiple mapping (--outFilterMultimapNmax 99). Mapping was then compared between human and rat using STAR alignment score ('AS' flag), with 'species-specific' reads defined as those with a AS score at least 2 greater for one species than the other.
To identify species-specific miRNAs, first the number of putative chimeras (reads containing a miRNA plus at least 18nt of additional sequence either on the 5' or 3' end) were counted for all annotated miRNAs. Next, for the 3 with-gel pairs of human-only and rat-only chimeric eCLIP experiments, the fold-change in expression was calculated for each miRNA (after adding a pseudocount of 1 to both human and rat counts and normalizing against the total number for all miRNAs in the dataset). The set of 'species specific' microRNAs was then defined as those with a fold-change greater than or equal to 10 in all three human-only/rat-only pairs. Analysis of putative false-positive chimeras was performing using only chimeras for these miRNAs for which the chimeric fraction was also a species-specific map to human or rat (defined as reads that either only mapped to one of the two genomes, or for which the STAR mapping score (AS flag) was at least 2 larger for one species than the other). Annotations for rat transcripts used RefGene annotations from the UCSC Genome Browser (obtained 10/28/2019).

Analysis of PCR-enriched chimeric eCLIP
DNA fragments in the targeted chimeric eCLIP libraries contain a 10nt UMI 3¢ end of the insert sequence. These UMIs were pruned from the 5¢ ends of R2 read sequences and saved by incorporating them into the read names in the R1 FASTQ files to be utilized in subsequent analysis steps. All subsequent steps were performed on R1 FASTQ files only. Next, 3¢-adapters were trimmed from reads using cutadapt (v2.5) (Martin, 2011), and remaining reads than 18 nucleotides in length were removed. The final 9 nucleotides at the 3¢ end of each read were trimmed using cutadapt (v2.5) to remove potential UMI sequence at the 3¢ end. The miRNA primer sequence used to select for the miRNA of interest was then used to select for chimeras and trimmed from the 5¢ end of reads, and reads with remaining length shorter than 18 nucleotides were removed. Reads were then processed using the same steps as in the "Non-chimeric reads" section above.

Motif analysis of chimeric eCLIP clusters
First, each cluster was extended by 10nt on the 5' side (to account for possible clusters that terminate at the protein-RNA crosslink site), and sequences were obtained from the appropriate species genome. Next, the presence or absence of every 6-mer was calculated for each extended cluster sequence, and percent frequency for each 6-mer was determined across either all clusters or subsets of clusters that met significance (CLIPper p ≤ 10 -5 or IDR ≥ 540) as indicated. Seed 6mer sequences complementary to miRNA positions 2-7, 3-8, 1-6, or A1-6 (an A at position 1 followed by miRNA positions 2-6) were obtained from the major annoted miRNA sequence from miRbase.

RNA-seq analysis
Reads in the RNA-seq libraries contain a 10 nucleotide UMI at the 5¢ end of each read, which was pruned from 5¢ ends of R1 read sequences using umi_tools (v1.0.1). Next, 3¢-adapters were trimmed from reads using cutadapt (v2.7), discarding reads less than 18 nucleotides remaining. Next, reads were mapped to a database of human repetitive elements and rRNA sequences compiled from Dfam (Hubley et al., 2016) and Genbank (Benson et al., 2013). Reads that did not align to the repeats database were then mapped to the human genome (hg38) using STAR (v2.6.0c). PCR duplicates were removed using umi_tools (v1.0.1) by utilizing UMI sequences from the read names and mapping positions. Gene counts per sample were obtained using previously described pipelines for quantifying region-level coverage from eCLIP data (Van Nostrand et al., 2020a) as well as transcript information from GENCODE (V29). Differential expression analysis was performed using the R package DESeq2 (v 1.34.0) (Love et al., 2014).

sRNA-seq analysis
Reads in sRNA-seq libraries contain the sequence "AACTGTAGGCACCATCAAT" followed by a 12 nucleotide UMI at the 3¢ end of each read. The "AACTGTAGGCACCATCAAT" sequence was identified within each read using cutadapt (v2.7) and reads that did not contain this sequence were discarded. Next, the UMIs were appended to the read names, and the "AACTGTAGGCACCATCAAT" sequence as well as following 3¢ sequence (this includes the UMI and sequencing adapter) using a custom python script. Reads were mapped to a database of human repetitive elements and rRNA sequences compiled from Dfam and Genbank. Reads that did not align to the repeat database were then mapped uniquely to the human genome (hg38) using STAR (v2.6.0c). PCR duplicates were removed using umi_tools (v1.0.1) by considering both UMI sequences and mapping positions. miRNA counts were obtained using a custom python script and miRNA annotations from Mirbase (v22). In order to allow relative comparison of abundances of miRNAs that are transcribed from multiple genomic locations, reads that failed to map uniquely were re-mapped allowing multimapping and selecting primary alignment as described above for miRNA quantification with non-chimeric reads.

miRNA seed match analysis for miRNA over-expression
To assess frequency of miRNA seed matches in 3' UTR of up-and down-regulated genes following miRNA transfections, a seed match was defined as presence of 6-mers complementing miR-124 or miR-1 mature sequences in positions [2:7] or [3:8]. 3' UTR sequences were obtained by merging overlapping and bookending regions with 3' UTR annotations into one contig encompassing all 3' UTR sequences associated with a gene (with N characters inserted when joining non-adjacent sequences in order to avoid creation of subsequences not present in a real isoform). Next, genes were sorted based on the DESeq2 'stat' value, and the Sylamer tool (van Dongen et al., 2008) was used to calculate a hypergeometric enrichment p-value for each 6-mer in growing bins that increase size in steps from the beginning of the list to the end of the list, using k-mer size of 6 nt (-k 6), bin growth step size of 50 genes (-grow 50) and making evaluation of 6mer counts conditional on frequency of k-mers of up to 4 nt in length (-m 4).

Comparison of miRNA targets with miRNA over-expression
To connect information about miRNA target sites with changes in the transcriptome, each gene was assessed for presence of miRNA-specific chimeric clusters identified in three non-overlapping features (3' UTR, CDS or intronic) defined based on GENCODE (v29) annotations and feature hierarchy as described above in "Non-chimeric (standard eCLIP) analysis". When miRNA-specific chimeric clusters overlapping a gene were identified, the average chimeric read coverage of the cluster (calculated by taking the mean of two biological replicates) was used to describe chimeric coverage of the cluster. If more than one cluster was present in the same annotation type of the gene, then the site with the greatest chimeric coverage was used. Cutoffs for different analyses were implemented as indicated, typically using genes with either all reproducible clusters (chimeric RPMC ≥ 0 and IDR ≥ 540), or subsets of these with increased chimeric RPMC cutoffs. For analysis of 'probe-enriched only versus total and probe-enriched' targets (Sup. Fig. 7J-K), 'total and probe-enriched' were defined as genes with clusters with chimeric RPMC ≥ 100 and IDR ≥ 540 in both probe-capture and total (non-enriched) experiments, and 'probe-enriched only' were all other genes with clusters with chimeric RPMC ≥ 100 and IDR ≥ 540 in the probe-capture experiment. For analysis of CDS versus 3' UTR and intronic targets, genes were included in the 'CDS only' class if they contained a chimeric cluster (chimeric RPMC ≥ 100 and IDR ≥ 540) in the CDS region and lacked any clusters for both 3' UTR and intronic regions (chimeric RPMC ≥ 0) (with equivalent rules for '3' UTR only' and 'Intronic only').
In addition to annotating genes based on chimeric eCLIP target sites in three different features (3' UTR, CDS and intronic features), genes were also gropued based on predictions of 896 miR-1-3p targets and 1,820 miR-124-3p targets by TargetScan version 7.2 (Agarwal et al., 2015), of which 862 of miR-1-3p and 1,769 of miR-124-3p targets overlapped GENCODE v29 gene identifiers. For analysis of overlap between TargetScan and chimeric eCLIP data, genes were first separated based on presence or absence of TargetScan predicted targets, and then based on whether they contained a cluster (chimeric RPMC ≥ 100 and IDR ≥ 540) in the probe-capture chimeric experiment.

Software Availability:
The primary data processing pipeline for chimeric eCLIP data is available at https://github.com/YeoLab/chim-eCLIP. Other custom scripts are available upon reasonable request.