Epigenomic reprogramming of repetitive noncoding RNAs and IFN-stimulated genes by mutant KRAS

RAS genes are the most frequently mutated oncogenes in cancer. However, the effects of oncogenic RAS signaling on the noncoding transcriptome are unclear. We analyzed the transcriptomes of human airway epithelial cells transformed with mutant KRAS to define the landscape of KRAS-regulated noncoding RNAs. We found that oncogenic KRAS upregulates noncoding transcripts throughout the genome, many of which arise from transposable elements. These repetitive noncoding RNAs exhibit differential RNA editing in single cells, are released in extracellular vesicles, and are known targets of KRAB zinc-finger proteins, which are broadly down-regulated in mutant KRAS cells and lung adenocarcinomas. Moreover, mutant KRAS induces IFN-stimulated genes through both epigenetic and RNA-based mechanisms. Our results reveal that mutant KRAS remodels the noncoding transcriptome through epigenomic reprogramming, expanding the scope of genomic elements regulated by this fundamental signaling pathway and revealing how mutant KRAS induces an intrinsic IFN-stimulated gene signature often seen in ADAR-dependent cancers.


INTRODUCTION
Most of the human genome is noncoding and transcribed into RNA(1). About half of the human genome is comprised of transposable elements (TE) (2), whose expression patterns are often altered in cancer (3). Additionally, TEs contribute substantially to the noncoding transcriptome and are present in the exonic sequences of thousands of long noncoding RNAs (lncRNAs) (4,5). Noncoding RNAs become disrupted in cancer (6) and epigenetic reprogramming (7), where activation of RAS signaling leads to repression of specific microRNAs (8) and coordinate regulation of lncRNAs in single cells, respectively.
In lung cancers, RAS mutations are present in a third of lung adenocarcinomas (9) and serve as driver mutations that initiate tumorigenesis (10). Although RAS genes are among the most frequently mutated oncogenes in cancer (11), including pancreatic (12) and colorectal cancers (13), the extent to which oncogenic RAS signaling regulates the noncoding transcriptome remains unknown.
To investigate how mutant KRAS regulates the noncoding transcriptome during the initial stages of cellular transformation, we characterized the composition of proteincoding RNA, lncRNA, transposable-element derived repetitive noncoding RNA, and extracellular RNA, as well as global chromatin accessibility, using human airway epithelial cells (14) that have a constitutively active mutant KRAS(G12D) allele. We show that oncogenic KRAS induces cell-intrinsic interferon (IFN)-stimulated gene (ISG) signatures through epigenetic and RNA-mediated mechanisms, and that KRAB zinc finger (KZNF) genes that repress repetitive noncoding RNA loci are globally down-regulated both in vitro and in vivo in lung adenocarcinoma patients with activating mutations in KRAS. Our data reveal that significant upregulation and extracellular release of repetitive noncoding RNAs and ISGs are early transcriptomic signatures of mutant KRAS signaling.

Transcriptomic signatures of mutant KRAS
To determine the transcriptomic landscape of protein-coding and noncoding RNAs regulated by oncogenic RAS signaling, we performed RNA sequencing (RNA-seq) on human airway epithelial cells (AALE) that undergo malignant transformation upon introduction of mutant KRAS (Supplementary Figure 1a) (14). We compared the transcriptomes of AALE cells transduced with control lentiviral vector to AALEs that were transduced and transformed by mutant KRAS-containing lentiviral vector and performed differential expression analysis. We identified many protein-coding RNAs that were significantly differentially expressed (n=4323 upregulated, n=4711 down-regulated), as well as hundreds of differentially expressed lncRNAs (n=279 upregulated, n=409 down-

Mutant KRAS induces an intrinsic IFN-stimulated gene signature
To explore the biological pathways that are perturbed by oncogenic RAS signaling, we performed gene set enrichment analysis (GSEA) (15) using genes that were differentially expressed in our mutant KRAS AALE cells. GSEA revealed that the most significantly enriched pathway was the interferon (IFN) alpha response, while the third most enriched pathway was the IFN gamma response (Supplementary Figure 1c). We examined all known ISGs that were expressed in our mutant KRAS AALEs and found that most of these genes were significantly upregulated, comprising a 25-gene mutant KRAS ISG signature (Fig. 1a). We then compared our KRAS ISG signature to IFN alpha (INFa) and IFN gamma (IFNg) response genes, along with the ADAR-dependent ISG signature described in Liu et al. (16). While the majority of KRAS ISG signature genes overlapped with the other 3 IFN-related signatures, several genes were unique to mutant KRAS cells (Fig. 1b). These results reveal that mutant KRAS signaling activates an intrinsic ISG response in transformed AALEs.
We then investigated whether this mutant KRAS ISG signature was specific to lung cells or if other cell types responded similarly. We performed RNAseq on human embryonic kidney cells (HA1E) that were primed for oncogenic KRASdriven transformation (17) and analyzed how mutant KRAS altered their transcriptomes.
Similar to transformed AALEs, we also observed that thousands of protein-coding RNAs (n=2635 up, n=2639 down) and hundreds of lncRNAs were upregulated (n=165) or downregulated (n=223) (Supplementary Figure 1d). When we performed GSEA, however, there was no enrichment for any IFN pathways in mutant KRAS-transformed HA1E cells, even though they were significantly enriched for upregulated KRAS signaling. In contrast to the mutant KRAS AALEs, we found that both IFNg and IFNa response pathways were among the most significantly decreased gene sets in mutant KRAS HA1Es (Supplementary Figure 1e), highlighting the cell type-specific differences in how the transcriptome is reprogrammed by mutant KRAS.
To assess the heterogeneity of the KRAS ISG signature in mutant KRAS AALEs, we performed single-cell RNA-seq (scRNA-seq) (n=1503 cells) (Fig. 1c). While KRAS signaling genes were upregulated in each cluster of cells, there was extensive heterogeneity in the IFNa response, Liu ISG, and KRAS ISG signatures, with the highest levels present in cluster 4 ( Fig. 1d). This was also consistent at the gene level for ISGs

Epigenetic reprogramming of ISGs by mutant KRAS
To elucidate the molecular mechanisms involved in inducing intrinsic ISG signatures in mutant KRAS AALE cells, we performed Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) (18). In mutant KRAS AALEs, open chromatin was strongly enriched at gene promoters for upregulated KRAS signaling genes, as well as KRAS ISG signature genes (Fig. 2a). Open chromatin peaks were enriched at transcriptional start sites (TSS) of the ISGs OAS1, CH25H, GBP1, and IFNL1, all of which were significantly and specifically upregulated by mutant KRAS signaling (Fig. 2b). We found that several of these KRAS ISGs, including OAS2 and IFIT1, contained interferon-stimulated response elements (ISRE) that are recognized by the transcription factor IRF9, which also undergoes epigenetic activation in mutant KRAS AALEs (Fig. 2c). Some of the most enriched motifs at ATAC-seq peaks in mutant KRAS cells were for IRF9, IRF1, and AP-1 (FOS) (Fig. 2d), and genes at these open loci were significantly upregulated, as were the transcription factors that bind to these enriched motifs (Fig. 2e). These results reveal that oncogenic KRAS signaling induces intrinsic ISG expression through the epigenetic activation of ISG promoters.

Mutant KRAS upregulates transposable element-derived noncoding RNAs
We next investigated the molecular basis for intrinsic ISG signature activation in mutant KRAS AALE cells by analyzing the abundance of repetitive noncoding RNAs transcribed from TEs, which induce an IFN response in cancer cells when aberrantly expressed (19,20). The LINE-1 element L1MC4a, the Alu elements AluSx, AluSg, AluJo, AluY, and AluSz6, and the hAT-Charlie element MER20 were all significantly upregulated in mutant KRAS AALE cells (Fig. 3a), suggesting that oncogenic KRAS signaling induces an ISG signature in transformed lung cells through the activation of TE-derived noncoding RNAs.
We examined TE expression heterogeneity in our single-cell RNA-seq data from mutant KRAS AALEs and did not observe substantial heterogeneity in ALU, LINE, MER, or LTR class TE expression (Fig. 3b), as well as for the specific TEs L1MC4a and AluSx ( Fig. 3c). When we performed RNA editing analysis on the scRNA-seq data to look for Ato-I editing (21), however, we found that TE-derived double-stranded RNAs (dsRNAs) exhibited significantly lower levels of RNA editing in cluster 4 ( To test whether extracellular RNAs that are released from mutant KRAS cells might also exhibit differential RNA editing, we isolated extracellular vesicles from the culture media of control and mutant KRAS AALEs (22,23).

Broad epigenetic silencing of KRAB zinc-finger genes by mutant KRAS
Given the known roles of KRAB zinc-finger proteins (KZNFs) in TE silencing (24), we examined whether KZNFs were involved in TE regulation in mutant KRAS AALEs. When we examined the differential expression of KZNFs in mutant KRAS AALEs, we observed a broad and significant down-regulation of repressive KRAB domain-containing zinc-finger proteins (Fig. 4a) (Supplementary Figure 3a). Based on our ATAC-seq experiments, we determined that significantly down-regulated KZNF genes exhibited loss of open chromatin at their TSS and intronic regions (Fig. 4b), revealing that mutant KRAS signaling induces epigenetic silencing of many KZNF genes. Several significantly downregulated KZNFs, including ZNF90, ZNF736, and ZNF683, were enriched for motifs in their TSS regions for ETS and ELK transcription factors (Fig. 4c), which are known downstream effectors of the RAS signaling pathway together with AP-1 (FOS) (11) (Fig.   4d).
We then analyzed KZNF chromatin immunoprecipitation sequencing (ChIP-seq) data (24) using the University of California Santa Cruz (UCSC) Repeat Browser platform (25). We found that several of the significantly down-regulated KZNFs in mutant KRAS AALEs bind to the consensus TE sequences of MER20 and L1MC4a elements

DISCUSSION
Collectively, our findings reveal the genomic impact of oncogenic KRAS signaling on repetitive noncoding RNAs and ISGs. Our conclusions are based on deeply sequencing and analyzing the transcriptomes of mutant KRAS-transformed AALE cells at the population, single-cell, and extracellular levels, as well as the epigenomic level, building on our previous work identifying noncoding RNAs that are coordinately regulated with RAS signaling genes during epigenomic reprogramming (7). The molecular basis for the intrinsic ISG signature we observe in mutant KRAS AALE cells is different from TEinduced IFN responses in cancer cells treated with DNA methyltransferase inhibitors (19,20), as we instead find a prominent role for broad KZNF supression during early stages of mutant KRAS-driven cellular transformation. Our studies also suggest that oncogenic RAS signaling contributes to the early induction of intrinsic ISG signatures that are observed across many cancers and cancer cells lines with ADAR dependencies (16,26). ATAC-seq. 100,000 cells were collected and centrifuged at 500xg for 5 minutes at 4C.
Pellets were washed with ice-cold PBS and centrifuged. Pellets were resuspended in icecold lysis buffer. Tagmentation reaction and purification were conducted according to manufacturer's protocol (Active Motif). Libraries were sequenced on a NextSeq500 as 2x75 paired end reads. Extracellular RNA. The exoRNeasy serum/plasma maxi kit (Qiagen) was used to isolate extracellular vesicles, which were quantified using Nanoparticle Tracking Analysis (Malvern, UK). 30 ml of cell culture supernatant was filtered to remove particles larger than 0.8 um. The filtrate was precipitated with kit buffer and filtered through a column to collect extracellular vesicles. These vesicles were then lysed with QIAzol® lysis reagent.
Total RNA was isolated using the indicated phase separation method and used to make libraries for RNA-seq, which were sequenced on a NextSeq500.
Exosomal RNA. Exosomes were isolated using the Exosome Total Isolation Chip (ExoTIC) as previously described (23). The ExoTIC device was first flushed with 2 mL of 1X PBS buffer. Then, the EVs from culture media were isolated as follows: a five millilitervolume of culture medium was drawn up in the same syringe and connected with the ExoTIC device. This syringe along with the ExoTIC device, was fixed onto a syringe pump. A pump flow rate of 5 mL/h was applied to filter the culture media, concentrating EVs in front of the nanoporous membrane. Free proteins, nucleic acids, etc., which are smaller than the membrane pore size (∼50 nm) pass through the filter pores. The EVcontaining retentate was then washed by running 5 mL of 1X PBS through the device using the same syringe. The ExoTIC device was then disconnected from the syringe, and the purified EV solution was collected via the device inlet using a 200μL pipet. RNA extracted from the purified EV sample using the miRNeasy Mini Kit (Qiagen) was used to make libraries for RNA-seq, which were sequenced on a NextSeq500. Statistical Analysis. All quantitative data for functional assays has been reported as means ± standard deviation. Statistical significance for these was calculated using a ttest and p-values <0.05 were considered significant. The resulting .sam files were converted to bam, sorted, and indexed using Samtools with default arguments for all procedures (31).
Sleuth (0.30.0): transcript differential expression was performed using Sleuth (32) and Wasabi (1.0.1) to convert the Salmon output into the proper format. Upon completion, the transcripts with q-values below 0.05 in the likelihood-ratio test were used to filter salmon output from which log2fc was manually calculated and paired to the sleuth output. Sleuth was primarily used for quantifying DE of Transposable Element loci in which case the provided reference was the repeat masked loci sequences from the UCSC Genome Browser.
DESeq2 (1.24.0): Salmon output was imported into a DESeq object using tximport (33) and differential expression analysis was performed with standard arguments (34). All results were filtered to have padj <= 0.01. In the case where R could only generate 0.00 for the padj values, they were reset to the lowest non-zero padj value in the data set.
Where count data was used, it was normalized across samples using DESeq.

Transposable Element Content Analysis. Exon and 5'/3' UTR Overlap: a whole
genome .gtf file was downloaded from the UCSC genome browser table browser utility.
This file was parsed and merged with the GENCODE v.29 reference transcriptome. This modified .gtf (now a .bed file) was passed to bedtools (35) where the overlap function was used with the following arguments: -a modified.gtf.bed -b all.ucsc.rmsk.genes.bed -wao -s > retained.overlap.bed alongside a whole genome .gtf retrieved as described above except generated from the repeat-masked browser track. The resulting overlapped bed file was processed and visualized using custom R scripts.
Differential Expression: Differential transcript abundance was determined using the Salmon and Sleuth procedures described above provided with a custom index comprising both the GENCODE version 29 transcripts and all transcripts extracted from the Hammel lab GTF file as described in the single cell procedures. Sleuth output was filtered and visualized using R and Tidyverse.
Zinc Finger Protein Analysis. ChIP-exo data and supplementary information were extracted from supplementary data provided by Imbeault et al (24). ZNF genes were cross referenced with DESeq2 and RepeatMasker (36) outputs to extract relevant differential expression data of ZNF proteins and Transposable Element transcripts using R.
RepeatMasker output from promoter analyses was cross referenced with ChIP-exo target data to identify potential regulatory targets of differentially expressed KZNFs. Only KZNF targets with 'score' [see Imbeault et al] >= 75 were kept for analysis. Analysis of all data was performed and visualized in R using custom scripts.
Gene Set Enrichment Analysis. Genes determined to be significantly differentially expressed in DESeq2 output were first 'pre-ranked' in R by the following metric: Score metric = sin(log2FoldChange) * -log10(p-value) The resulting ranked files objects were processed using the R package fgsea alongside gene set files downloaded from msigdb using the R package msigdbr. Additional code was written for select vizualizations.
Gene Ontology Analysis. Upregulated gene names were extracted from DESeq2 output using bash command line tools. Name lists were pasted into the Gene Ontology Consortium's Enrichment Analysis tool powered by PANTHER. Output data was exported as .txt files and parsed using bash command line tools. Parsed data was visualized using custom R scripts.
Single Cell Analysis. Cell Ranger: Single cell output data was processed using 10x pipeline CellRanger. The mkfastq functionality was used to generate fastq files for further downstream analysis.
Salmon -Alevin: fastq files generated above were passed to Salmon alevin with the default arguments for CHROMIUM V2 data. alevin was used to psuedoalign the libraries to both the GENCODE v32 combined with the repeat masked loci sequences extracted from Hg38 via the UCSC Genome Browser. A salmon index was built from this reference with standard arguments. These alevin output matrices were imported into R using tximport.

STAR-solo:
This feature within the STAR software was used to generate single cell SAM files for downstream processing. Run with the recommended arguments.
Seurat (3.0): Normalization and UMAP clustering were performed with Seurat following their described approach optimized to our data set (see code notebook). Additional code was written to extract count data from Seurat single cell objects using the SingleCellExperiment R package (37).
TCGA ZNF Analysis. TCGA-LUAD and GTEX lung phenotype and normalized count data were downloaded from the UCSC Xena browser TOIL data repository (https://xenabrowser.net/datapages/?cohort=TCGA%20TARGET%20GTEx&addHub=ht tps%3A%2F%2Fxena.treehouse.gi.ucsc.edu&removeHub=https%3A%2F%2Fxena.tree house.gi.ucsc.edu%3A443). The files were combined and patients were grouped by their KRAS mutation status and identity. These data were compared to and visualized alongside of data generated from our analysis using custom R code. Significance was determined with a one-way t test implemented in the R t.test() function.
RNA Editing. RNAEditingIndexer: Single cell SAM files were subsampled into 3 equal subsets per cluster based on barcode. Each SAM file was then converted into a BAM as described above and used as input for the RNAEditingIndexer script with bed files generated by extracting the locations of detected Transposable Element loci from Sleuth output (21).

ATAC-seq Analysis. ENCODE:
The ENCODE ATAC-seq pipeline was used for alignment, quality control, MACS analysis with default arguments to produce output for downstream analysis.
HOMER: Narrow peaks files produced above were processed using a variety of HOMER tools with default arguments where not explicitly stated: findMotifsGenome was used to find enriched motifs in each ATAC library and their subsets.
annotatePeaks was used to generate detailed annotations of the motifs found above and their context. It was also used to create gene set differential accessibility histograms by using the -hist argument set to 1. mergePeaks allowed us to identify unique and overlapping peaks across the two libraries.
makeTagDirectory was used to generate peak height estimates to be applied on all histogram comparisons (38).
(version 3.6.1) running from the Rocker 'Tidyverse' Docker container (rocker/tidyverse:3.6.1). Unpaired, bi-directional t test was performed with the t.test() function on samples with 3 biological or technical replicates. Linear regression was carried out with the lm() function.
Additional Code. All analysis was performed in the R programming language with supplemental scripts written in Bash.