Nanopore direct RNA sequencing maps an Arabidopsis N6 methyladenosine epitranscriptome

Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 3′ end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode.

We previously mapped Arabidopsis mRNA 3′ ends transcriptome-wide using Helicos 121 sequencing 20 . We compared the position of 3′ ends of nanopore DRS read alignments and Helicos 122 data genome-wide. The median genomic distance between nanopore DRS and Helicos 3′ ends was 0 123 ± 13 nt (one standard deviation) demonstrating close agreement between these orthogonal 124 technologies ( Figure 2B). Likewise, the overall distribution of the 3′ ends of aligned nanopore DRS 125 reads resembles the pattern we previously reported with Helicos data 20 . For example, 97% of 126 nanopore DRS 3′ ends (4,152,800 reads at 639,178 unique sites, 93% of all unique sites) mapped to 127 either annotated 3′ untranslated regions (UTRs) or downstream of the current annotation. Mapping 128 of 3′ ends to coding sequences or 5′UTRs was rare (2.8%, 119,524 reads at 39,610 unique sites, 5.8% 129 of all unique sites), and mapping to introns even rarer (0.29%, 12,554 reads at 7,791 unique sites, Since RT-dependent internal priming can result in the misinterpretation of authentic 135 cleavage and polyadenylation sites 3 , we next determined whether nanopore DRS was compromised 136 in this way. To address this issue, we examined whether the 3′ ends of nanopore DRS reads mapped 137 to potential internal priming substrates comprised of six consecutive adenosines within a 138 transcribed coding sequence (according to the Arabidopsis Information Portal Col-0 genome 139 annotation, Araport11 ). Of the 10,116 such oligo (A)6 sequences, only four have read alignments 140 terminating within 13 nt in all four datasets. Of these, two were not detectable after error correction 141 with proovread (suggesting that they resulted from alignment errors) and the other two mapped to 142 the terminal exon of coding sequence annotation, indicating that they may be authentic 3′ ends. 143 Hence, internal priming is rare or absent in nanopore DRS data. Overall, we conclude that nanopore 144 DRS can identify multiple authentic features of RNA 3′ end processing. 145 146 Cap-dependent 5′ RNA detection by nanopore DRS 147 Nanopore DRS reads are frequently truncated prior to annotated transcription start sites, resulting 148 in a 3′ bias of genomic alignments ( Figure 3A) 15 . Consequently, it is impossible to determine which, if 149 any, aligned reads correspond to full-length transcripts. To address this issue, we used cap-150 dependent ligation of a biotinylated 5′ adapter RNA to purify capped mRNAs. We then re-sequenced 151 two biological replicates of Arabidopsis Col-0 incorporating 5′ adapter ligation (Supplementary 152 table 1) and filtered the reads for 5′ adapter RNA sequences using the sequence alignment tool 153 BLASTN and specific criteria (Supplementary table 2 In order to determine whether the 5′ ends we detected reflect full-length mRNAs, we 162 compared them against annotated transcription start sites in datasets derived from full-length 163 Arabidopsis cDNA clones 23 . We found that 41% of adapter-ligated nanopore DRS reads mapped 164 within 5 nt of transcription start sites and 60% mapped within 13 nt ( Figure 3C). We also detected 165 recently defined examples of alternative 5′ transcription start sites 24 at specific Arabidopsis genes 166 (Supplementary figure 3D). We therefore conclude that this approach is effective in detecting 167 authentic mRNA 5′ ends. 168 Reads with adapters had, on average, 11 nt more at their 5′ ends that could be aligned to 169 the genome compared with the most common 5′ alignment position of reads lacking the 5′ adapter 170 RNA ( Figure 3D). This difference may be explained by loss of processive control by the motor protein 171 when the end of an RNA molecule enters the pore. As a result, the 5′ end of RNA is not correctly 172 sequenced. Consistent with these Arabidopsis transcriptome-wide nanopore DRS data, reads 173 mapping to the synthetic ERCC RNA Spike-Ins and in vitro transcribed RNAs also lacked ~11 nt of 174 authentic 5′ sequence (Supplementary figure 3E, F). However, the precise length of 5′ sequence 175 missing from all of these RNAs varied, suggesting that sequence-or context-specific effects on 176 sequence accuracy are associated with the passage of 5′ RNA through the pore ( Figure 3D (Figure 4B, Supplementary table 3). We applied proovread 14,15  196   error correction with the parallel Illumina RNAseq data and then re-analysed the corrected and  197 uncorrected nanopore DRS data. After error correction, only 13% (39,061) of unique splice junctions 198 were unsupported by an orthogonal dataset, consistent with an improvement in alignment accuracy. 199 The four nanopore DRS datasets for Col-0 biological replicates captured 75% (102,486) and 69% 200 104,686) of Araport11 and AtRTD2 splice site annotations, respectively. Most of the canonical 201 GU/AG splicing events (100,450; 81%) detected in the error-corrected nanopore data were found in 202 both annotations and were supported by Illumina RNAseq ( Figure 4B, Supplementary table 3). A 203 total of 3,234 unique canonical splicing events in the error-corrected nanopore DRS data were 204 supported by Illumina RNAseq but absent from both Araport11 and AtRTD2 annotations, 205 highlighting potential gaps in our understanding of the complexity of Arabidopsis splicing annotation 206 (Figure 4B, Supplementary table 3). Consistent with this, we validated three of these splicing events 207 using RT-PCR (Polymerase Chain Reaction) followed by cloning and sequencing (Supplementary 208 figure 4A). In order to examine the features of these unannotated splices, we applied previously 209 determined splice site position weight matrices of the flanking sequences to categorize U2 or U12 210 class splice sites 28 . Of the 3,234 novel GU/AG events found in error-corrected data and supported by 211 Illumina alignments, 74% were classified as canonical U2 or U12 splice sites, suggesting that they are 212 authentic (Supplementary figure 4B). 213 In addition to previously unannotated splicing events, we identified unannotated 214 combinations of previously established splice sites. For example, we identified 19 FLM splicing 215 patterns that adhered to known splice junction sites ( Figure 4A). However, 11 of these transcript 216 isoforms were not previously annotated. In order to investigate this phenomenon transcriptome-217 wide, we analysed the 5′ cap-dependent nanopore DRS datasets of full-length mRNAs 218 (Supplementary table 1). Unique sets of co-splicing events were extracted from error-corrected 219 reads (so as to focus on splicing, we did not consider single exon reads or 5′ and 3′ positions). In 220 total, 13,064 unique splicing patterns were detected that matched annotations in Araport11, 221 AtRTD2 or both ( Figure 4C). Another 8,659 unique splicing patterns were identified that were not 222 present in either annotation (Figure 4C, Supplementary table 3). Of these, 50% (4,293) used only 223 splice donor and acceptor pairs that were already annotated in either Araport11 or AtRTD2. Hence, 224 this approach defines splicing patterns (including retained introns) produced from alternative 225 combinations of known splice sites. 226 Overall, we conclude that nanopore DRS can reveal a greater complexity of splicing in the 227 context of full-length mRNAs compared with short-read data. However, accurate splice pattern 228 detection benefits from error correction with, for example, high-accuracy orthogonal short-read 229 sequencing data. However, even with error-free sequences, accurate splice detection can be 230 confounded by the existence of equivalent alternative junctions 29 . Therefore, improved 231 computational tools are required, not only for error correction but also for splicing-aware long-read 232 alignment. 233 234 Differential error site analysis reveals the m 6 A epitranscriptome 235 The epitranscriptome has emerged recently as a crucial, but relatively neglected, layer of gene 236 regulation 1,2 . m 6 A has been mapped transcriptome-wide using approaches based on antibodies that 237 recognize this mark 6,30 . However, in principle, m 6 A can be detected by nanopore DRS 8 . Since m 6 A is 238 not included in the training data for nanopore base-calling software, we asked whether its incorrect 239 interpretation could be used to identify Arabidopsis m 6 A transcriptome-wide. For this, we applied 240 nanopore DRS to four biological replicates of an Arabidopsis mutant defective in the function of 241 Virilizer (vir-1), a conserved m 6 A writer complex component, and four biological replicates of a line 242 expressing VIR fused to Green Fluorescent Protein (GFP) that restores VIR activity in the vir-1 mutant 243 background 9 (Supplementary figure 5A). In parallel, we sequenced a set of six biological replicates 244 with Illumina RNAseq. We then used a G-test statistical analysis to determine whether there was a 245 differential error profile in alignments at each reference base between the mutant (defective m 6 A) 246 and VIR-complemented lines. We identified 17,491 sites with a more than two-fold higher error rate 247 (compared with the TAIR10 reference base) in the VIR-complemented line with restored m 6 A 248 ( Figure 5A). No VIR-dependent error sites mapped to either chloroplast or mitochondrial-encoded 249 RNAs. In all, 99.8% of the differential error sites mapped within Araport11 annotated protein-coding 250 genes. Motif analysis of these error sites revealed the DRAYH sequence (D=G or U or A, R=G or A, 251 Y=C or U, H=A or C or U; Figure 5B, E value=3.3×10 -191 ), which closely resembles the established m 6 A 252 target consensus 1,2 . In addition, like the established location of m 6 A sites in mRNAs 1,2 , the error sites 253 were preferentially found in 3′UTRs ( Figure 5C these results agree with the established and conserved properties of authentic m 6 A sites 1,2 , 259 suggesting that differential error sites can be used to identify thousands of m 6 A modifications in 260 nanopore DRS datasets. 261 In order to examine the validity of m 6 A sites identified by the differential error site analysis, 262 we used an orthogonal technique to map m 6 A. Previous maps of Arabidopsis m 6 A are based on Me-263 RIP 31,32 and limited by a resolution of around 200 nt 33 . Therefore, to examine Arabidopsis m 6 A sites 264 with a more accurate method, we used miCLIP 30 analysis of three biological replicates of Arabidopsis 265 Col-0. We found that, like the differential error sites uncovered in the nanopore DRS analysis, the 266 Arabidopsis miCLIP reads were enriched in 3′ UTRs but with no enrichment over stop codons 267 ( Figure 5D, Supplementary figure 5C). In all, 77% of the called nanopore DRS differential error sites 268 fell within 5 nt of an miCLIP peak ( Figure 5E, F). We therefore conclude that our analysis of nanopore 269 data can detect authentic VIR-dependent m 6 A sites transcriptome-wide. 270 271

Defective m 6 A perturbs gene expression patterns and lengthens the circadian period 272
The combination of transcript processing and modification data obtained using nanopore DRS 273 enabled us to investigate the impact of m 6 A on Arabidopsis gene expression. We found a global 274 reduction in protein-coding gene expression in vir-1 (using either nanopore DRS or Illumina RNAseq 275 data), corresponding to transcripts that were methylated in the VIR-complemented line ( Figure 6A, 276 Supplementary Figure 6A). These findings are consistent with the recent discovery that m 6 A 277 predominantly protects Arabidopsis mRNAs from endonucleolytic cleavage 32 . Therefore, although 278 the m 6 A writer complex comprises conserved components that target a conserved consensus 279 sequence and distribution of m 6 A is enriched in the last exon, it appears that this modification 280 predominantly promotes mRNA decay in human cells 34 , but mRNA stability in Arabidopsis 32 . 281 The changes in gene expression in vir-1 were wide ranging (Supplementary figure 6B-D). For 282 example, we found that the abundance of mRNAs encoding components of the interlocking 283 transcriptional feedback loops that comprise the Arabidopsis circadian oscillator 35 , such as CCA1, 284 were altered in vir-1 (Supplementary figure 6B, C). This distinction was associated with a biological 285 consequence in that the vir-1 mutant had a lengthened clock period ( Figure 6B, C). We detected m 6 A 286 at mRNAs encoding the clock components CCA1, PRR7, GI and LNK1/2 in both the nanopore DRS and 287 miCLIP data (Supplementary figure 6B). We also detected shifts in the poly(A) tail length 288 distributions of mRNAs transcribed from genes previously shown to be under circadian control. At 289 CAB1 mRNAs, for example, we detected poly(A) tail lengths that peaked at approximately 20, 40 and 290 60 nt ( Figure 6D). vir-1 mutants had reduced abundance of CAB1 mRNAs with 20 and 40 nt poly(A) 291 tails, and an increased abundance of poly(A) tails of 60 nt in length ( Figure 6D). Therefore, nanopore 292 DRS may uncover the circadian control of poly(A) tail length, as previously reported for specific 293 Arabidopsis genes 36 . An output of the circadian clock is the control of flowering time, and we found 294 that not only were photoperiod pathway components differentially expressed but so too were other 295 flowering time genes (Supplementary table 4). Notably, FLOWERING LOCUS C (FLC) expression was 296 reduced by more than 40-fold compared with the wild type (Supplementary figure 6D). 297 Consequently, the proper control of circadian rhythms, flowering time and the regulatory module at 298 FLC ultimately requires the m 6 A writer complex component, VIR. 299 300 Defective m 6 A is associated with disrupted RNA 3′ end formation 301 In addition to measuring RNA expression, we examined the impact of m 6 A loss on pre-mRNA 302 processing. Detectable disruptions to splicing in vir-1 were modest. For example, using the DEX-Seq 303 software tool 37 for analysis of annotated splice sites, we found only weak effect-size changes to 304 cassette exons, retained introns or alternative donor/acceptor sites compared with the VIR-305 complemented line in our Illumina data (Supplementary figure 6E). In contrast, a clear defect in RNA 306 3′ end formation in vir-1 was apparent. Using a Kolmogorov-Smirnov test, we identified 3,579 genes 307 with an altered nanopore DRS 3′ position profile in the vir-1 mutant compared with the VIR-308 complemented line (FDR < 0.05, absolute change in position of >13 nt; Figure 6E). Of these, 3,008 309 displayed a shift to usage of more proximal poly(A) sites in vir-1: 60% of these genes also contain 310 m 6 A sites detectable by nanopore DRS (compared with 32% of all expressed genes, p=1.1×10 -266 ; 311 70% were detectable by miCLIP compared with 43% of all expressed genes, p=3.5×10 -237 ) and 312 correspond to locations of increased cleavage downstream of m 6 A sites in the vir-1 mutant 313 (Supplementary figure 6F). A total of 571 genes showed increased transcriptional readthrough 314 beyond the 3′ end in vir-1 ( Figure 6E): 73% of these loci also contained nanopore-mapped m 6 A sites 315 (p=1.2×10 -90 ; 78% were detectable by miCLIP, p=2.1×10 -66 ). The impacts of altered 3′ processing can 316 be complex and have the potential to change the relative abundance of transcripts processed from 317 the same gene but with different coding potential. For example, we detected increased readthrough 318 of an intronic cleavage site in the Symplekin-like gene TANG1 (AT1G27595; Supplementary 319 figure 6G) and increased readthrough and cryptic splicing at ALG3 (AT2G47760) that also results in 320 chimeric RNA formation with the downstream gene GH3.9 (AT2G4G7750; Figure 6F). The existence 321 of the ALG3-GH3.9 chimeric RNAs was supported by Illumina RNAseq ( Figure 6F) and confirmed by 322 RT-PCR, cloning and sequencing (Supplementary figure 6H). We detected 523 loci with increased 323 levels of chimeric RNAs in vir-1 resulting from unterminated transcription proceeding into 324 downstream genes on the same strand. Chimeric RNAs were recently detected in mutants affecting 325 other components of the Arabidopsis m 6 A writer complex, MTA and FIP37 38 . However, only 33% of 326 upstream genes forming the chimeric RNAs had detectable m 6 A sites in the VIR-complemented line 327 with restored VIR activity. Consequently, these findings might be explained either by an m 6 A-328 independent role for VIR (or the writer complex) in 3′ end formation or an indirect effect on factors 329 required for 3′ processing. m 6 A independent roles for the human m 6 A methyltransferases METTL3 39 330 and METTL16 40 have been found previously, and a role for the writer complex in controlling 331 Arabidopsis RNA processing independent of m 6 A cannot be overlooked 40 . In mammals, recognition 332 of the canonical poly(A) signal AAUAAA involves direct binding by CPSF30 41,42 . Notably, alternative 333 polyadenylation controls the expression of an Arabidopsis CPSF30 isoform that encodes a YT521-B 334 homology (YTH) domain with the potential to bind and read m 6 A 43 . A recent study indicated that this 335 YTH domain-containing isoform is required to supress chimera formation 38 . Consequently, in plants, 336 m 6 A may also contribute to the recognition of specific RNA 3′ ends. 337 338

Concluding remarks 339
We have shown that nanopore DRS has the potential to refine multiple features of Arabidopsis 340 genome annotation and to generate the clearest map to date of m 6 A locations, despite the genome 341 sequence being available since 2000 44 . Modern agriculture is dominated by a handful of intensely 342 researched crops 45 , but to diversify global food supply, enhance agricultural productivity and tackle 343 malnutrition there is a need to focus on crops utilized in rural societies that have received little 344 attention for crop improvement 46 . Based on our experience with Arabidopsis, we anticipate that the 345 combination of nanopore DRS and other sequencing approaches will improve genome annotation. 346 Consistent with this, we recently applied nanopore DRS to refine the annotation of water yam 347 (Dioscorea alata), an African orphan crop. Indeed, we are moving into an era where thousands of 348 genome sequences are available and programmes such as the Earth BioGenome Project aim to 349 sequence all eukaryotic life on Earth 47 . However, genome sequences provide only part of the puzzle: 350 annotating what they encode will be essential for us to fully utilize this information.  Transcript-level counts for Illumina RNA sequencing reads were estimated by pseudoalignment with 420 Salmon version 0.11.2 53 . Counts were aggregated to gene level using tximport 54  Helicos data were prepared as previously described 20,63 . Positions with three or more supporting 522 reads were considered to be peaks of nanopore or Helicos 3′ ends. The distance between each 523 nanopore peak and the nearest Helicos peak was then determined. In all, 37% of nanopore peaks 524 occurred at the same position as a Helicos peak, and the standard deviation in distance was 12.5 nt. 525 To determine the percentage of nanopore DRS 3′ ends mapping within annotated genic features, 526 transcripts were first flattened into a single record per gene. Exonic annotation was given priority 527 over intronic or intergenic annotation and CDS annotation was given priority over UTR annotation. 528 Reads were assigned to genes if they overlapped them by >20% of their aligned length, and the 529 annotated feature type of the 3′ end position was determined. Counts were generated both for all 530 reads and for unique positions per gene. 531 532 Isoform collapsing of nanopore DRS data 533 Error-corrected full-length alignments were collapsed into clusters of reads with identical sets of 534 introns. These clusters were then subdivided by 3′ end location by using a Gaussian kernel with 535 sigma of 100 to find local minima between read ends, which were used as cut points to separate 536 clusters. The read with the longest aligned length in each cluster was used as the representative in 537 To validate novel splice junctions detected in nanopore DRS, five splice sites out of the 20 most 560 highly expressed splice sites were selected for further validation; three of the five selected splice 561 sites were successfully amplified in RT-PCR followed by Sanger sequencing (described above). 562 563 5′ adapter detection analyses using nanopore DRS data 564 To produce positive and negative examples of 5′ adapter-containing sequences, 5′ soft-clipped 565 regions were extracted from aligned reads for the Col-0 replicate 1 datasets (with and without 566 adapter ligation) using pysam 62 . These soft-clipped sequences were then searched for the presence 567 of the GeneRacer adapter sequence using BLASTN version 2.7.1 67 . Two rules were initially applied to 568 filter BLASTN results: a match of 10 nt or more to the 44 nt adapter, and an E value of <100. Reads 569 from the adapter-containing dataset that failed one or both criteria were used as negative training 570 examples. A final rule requiring the match to the adapter sequence to occur directly adjacent to the 571 aligned read was also applied. Reads from the adapter-containing dataset that passed all three rules 572 were used as the positive training set. When comparing the ratio of positive to negative examples 573 between datasets containing the adapter and those generated from the same tissue but without the 574 adapter, we found that these three rules gave a signal-to-noise ratio of >5,000 (Supplementary  575   table 2). 576 In all, 72,083 positive and 123,739 negative training examples from Col-0 tissue replicate 1 577 were collected to train the neural network. We then estimated the amount of raw signal from the 5′ 578 end of the squiggle that was required on average to capture the 5′ adapter. To do this, we used 579 nanopolish eventalign version 0.11.0 68 to identify the interval in the raw read corresponding to the 580 mRNA alignment to the reference in the positive examples of 5′ adapter-containing sequences. Since 581 the adapter can be identified immediately adjacent to the alignment in sequence space for these 582 reads, the signal after the event alignment should correspond to the signal originating from the 583 adapter. The median length of these signals was 1,441 points, and 96% of the signals were <3,000 584 points. Therefore, we used a window size of 3,000 to make predictions. optimizer with an initial learning rate of 0.001, which was reduced by a factor of 10 after three 597 epochs with no reduction in validation loss. Early stopping was used after five epochs with no 598 reduction in validation loss. We found that a number of negative training examples from the ends of 599 reads, but not from internal positions, were likely to be incorrectly labelled by the BLASTN method, 600 because the model predicted them to contain adapters. We therefore filtered these to clean the 601 training data, before repeating the training process. Model performance was evaluated using five-602 fold cross validation and by testing on independently generated datasets from Col-0 replicate 2, 603 produced with and without the adapter ligation protocol (Supplementary figure 3B, C) 69,70 . 604 To evaluate the reduction in 3′ bias of adapter-ligated datasets, we used Araport11 exon 605 annotations to produce per base coverage for each gene in the Col-0 replicate 2 dataset. Coverage 606 was generated separately for reads predicted to contain adapters and for those predicted not to 607 contain adapters. Leading zeros at the extreme 5ʹ and 3ʹ ends of genes were assumed to be caused 608 by mis-annotation of UTRs and so were trimmed. The quartile coefficient of variation (interquartile 609 range / median) was then used as a robust measure of variation in coverage across each gene. To 610 validate the 5ʹ ends of adapter-ligated reads with orthogonal data, full-length cDNA clone sequences 611 were downloaded from RIKEN RAFL (Arabidopsis full-length cDNA clones) 23 . These were mapped 612 with minimap2 13 in spliced mode. The distance from each nanopore alignment 5′ end to the nearest 613 RIKEN RAFL alignment 23 . 5′ end was calculated using bedtools 58 . The amount of 5′ end sequence that 614 is rescued when 5′ adapters are used was estimated by identifying the largest peak in 5′ end 615 locations per gene in the absence of adapter, and then measuring how this peak shifted using reads 616 predicted to contain adapters. 617

Differential error site analysis using nanopore DRS data 619
To detect sites of Virilizer-dependent m 6 A RNA modifications, we developed scripts to test changes 620 in per base error profiles of aligned reads. Pileup columns for each position with coverage of >10 621 reads were generated using pysam 62 and reads in each column were categorized as either A, C, G, T 622 or indel. The relative proportions of each category were counted. Counts from replicates of the 623 same experimental condition were aggregated and a 2×5 contingency table was produced for each 624 base comparing vir-1 and VIR-complemented lines. A G-test was performed to identify bases with 625 significantly altered error profiles. For bases with a p-value of <0.05, G-tests for homogeneity 626 between replicates of the same condition were then performed. Bases where the sum of the G 627 statistic for homogeneity tests was greater than the G statistic for the vir-1 and VIR-complemented 628 line comparison were filtered. Multiple testing correction was carried out using the Benjamini-629 Hochberg method, and an FDR threshold of 0.05 was used. The log2 fold change in mismatch to 630 match ratio (compared with the reference base) between the vir-1 and VIR-complemented lines was 631 calculated using Haldane correction for zero counts. Bases that had a log fold change of >1 were 632 considered to have a reduced error rate in the vir-1 mutant. 633 To identify motifs enriched at sites with a reduced error rate, reduced error rate sites were 634 increased in size by 5 nt in each direction and overlapping sites were merged using bedtools version 635 2.27.1 58 . Sequences corresponding to these sites were extracted from the TAIR10 reference and 636 over-represented motifs were detected in the sequences using MEME version 5.0.2 73 , run in zero or 637 one occurrence mode with a motif size range of 5-7 and a minimum requirement of 100 sequence 638 matches. The presence of these motifs at error sites was then detected using FIMO version 5.0.2 74 . A 639 relaxed FDR threshold of 0.1 was used with FIMO to capture more degenerate motifs matching the 640 m 6 A consensus. 641 642 Differential gene expression analysis using nanopore DRS data 643 Gene level counts were produced for each nanopore DRS replicate using featureCounts version 644 1.6.3 75 in long-read mode with strand-specific counting. Differential expression analysis between the 645 vir-1 and VIR-complemented lines was then performed in R version 3.5 using edgeR version 3.24.3 55 . 646 647 Identification of alternative 3ʹ end positions and chimeric RNA using nanopore DRS data 648 Genes with differential 3′ end usage were identified by producing 3′ profiles of reads which 649 overlapped with each annotated gene locus by >20%. These profiles were then compared between 650 the vir-1 and VIR-complemented lines using a Kolmogorov-Smirnov test to identify changes. 651 Multiple testing correction was performed using the Benjamini-Hochberg method. To approximately 652 identify the direction and distance of the change, the normalized single base level histograms of the 653 VIR-complemented line profile was subtracted from that of the mutant profile, and the minimum 654 and maximum points in the difference profile were identified. These represent the sites of most 655 reduced and increased relative usage, respectively. Results were filtered for an FDR of <0.05 and 656 absolute change of site >13 nt (the measured error range of nanopore DRS 3′ end alignment). The 657 presence of m 6 A modifications at genes with differential 3′ end usage was assessed using bedtools 658 intersect 58 , and significant enrichment of m 6 A at these genes was tested using a hypergeometric test 659 (using all expressed genes as the background population). 660 To identify genes with a significant increase in chimeras in the vir-1 mutant, we used 661 Araport11 annotation 64 to identify reads that overlapped with multiple adjacent gene loci (i.e. 662 chimeric reads) and those originating from a single locus (i.e. non-chimeric reads). To reduce false 663 positives caused by reads mapping across tandem duplicated loci, reads mapping to genes 664 annotated in PTGBase 76 were filtered out. Chimeric reads were considered to originate from the 665 most upstream gene with which they overlapped. We pooled reads from replicates for each 666 experimental condition and used 50 bootstrapped samples (75% of the total data without 667 replacement) to estimate the ratio of chimeric to non-chimeric reads at each gene in each condition. 668 Haldane correction for zero counts was applied. The distributions of chimeric to non-chimeric ratios 669 in the vir-1 and VIR-complemented lines were tested using a Kolmogorov-Smirnov test to detect loci 670 with altered chimera production. All possible pairwise combinations of VIR-complemented and vir-1 671 bootstraps were then compared to produce a distribution of estimates of change in the chimeric to 672 non-chimeric ratio in the vir-1 mutant. Loci that had more than one chimeric read in vir-1, 673 demonstrated at least a two-fold increase in the chimeric read ratio in >50% of bootstrap 674 comparisons and were significantly changed at a FDR of <0.05 were considered to be sites of 675 increased chimeric RNA formation in the vir-1 mutant. 676 677 miCLIP 678

Preparation of miCLIP libraries 679
Total RNA for miCLIP was isolated from 7.5 mg of 14-day old Arabidopsis Col-0 seedlings as 680 previously described 77 . mRNA was isolated from ~1 mg total RNA using oligo(dT) and streptavidin 681 paramagnetic beads (PolyATtract mRNA Isolation Systems, Promega) according to the 682 manufacturer's instructions. miCLIP was carried out using 15 µg mRNA as previously described 78 683 using an antibody against N6-methyladenosine (#202 003 Synaptic Systems), with minor 684 modifications. No-antibody controls were processed throughout the experiment. RNA-antibody 685 complexes were separated by 4-12% Bis-Tris gel electrophoresis at 180 V for 50 min and transferred 686 to nitrocellulose membranes (Protran BA85 0.45µm, GE Healthcare) at 30 V for 60 min. Membranes 687 were then exposed to Medical X-Ray Film Blue (Agfa) at −80°C overnight. Reverse transcription was 688 carried out using barcoded RT primers: RT41, RT48, RT49 and RT50 (Integrated DNA Technologies;  689   Supplementary table 5)