Global changes in open reading frame dominance of RNAs during cancer initiation and progression

Cancer cells express unique RNA transcripts; however, the factors determining their translation have remained unclear. We recently developed open reading frame (ORF) dominance as a measure that correlates with coding potential of RNAs. Upon calculating the ORF dominance of cancer-specific transcripts across 24 human tumor types, 14 exhibited significantly higher ORF dominance in cancer than in normal tissues. In organoid-based mouse genetic models, ORF dominance increased with carcinogenesis. Gene ontology analysis revealed that gene sets with increased ORF dominance were associated with cell proliferation, while those with decreased ORF dominance were linked to DNA damage response. Translatome analyses demonstrated that elevated ORF dominance during carcinogenesis resulted in higher translation frequencies of ribosome-bound RNAs. As cancer progressed, ORF dominance showed that the boundary between coding and noncoding transcripts became blurred prior to distant metastasis, indicating decreased proliferative cell populations and increased generation of RNA isoforms that potentially translate neoantigens before the development of metastatic tumors. These findings suggest that cancer evolution leads to dynamic changes in ORF dominance, resulting in global translational alterations in transcriptomes.


Introduction
RNAs are traditionally categorized as either coding RNAs, which are translated into proteins, or noncoding RNAs. However, recent findings (Huang Y et al., 2021) have challenged this binary classification by identifying bifunctional RNAs that possess both coding and noncoding functions. The relationship between the processes of carcinogenesis and biological evolution has long been debated owing to their striking similarities. Seminal studies from the 1970s demonstrated that most cancers originate from a single neoplastic cell and evolve through a selection process of somatic mutations, favoring the survival of the most proliferative population (Nowell, 1976).
Subsequent research revealed that tumors consist of genetically distinct subclones with diverse characteristics, challenging the notion of linear clonal evolution (Dexter et al., 1978). The application of population genetics concepts further expanded our understanding of tumors as subjects to natural and artificial selection, including the acquisition of resistance to therapy within different subclones (Heppner, 1984). Recent advancements have introduced macroevolutionary phenomena such as chromothripsis and age-dependent carcinogenesis, which cannot be fully explained by Darwinian evolutionary theory (Vendramin et al., 2021); however, an accurate method for describing cancer evolution remains to be established.
Given that cancer tissues express RNA isoforms distinct from normal cells, which play roles in Therefore, the combination of LR-seq and ORF dominance is expected to enhance transcriptome analysis and offer new insights into cancer evolution. In light of these data, we have reevaluated cancer evolution using ORF dominance, a valuable tool for describing biological evolution.

ORF dominance
In this study, ORFs were defined as sequence segments that start with AUG and end with either UAA, UAG, or UGA in the 5ʹ to 3ʹ direction of the three reading frames in the RNA sequence. Here, lpORF represents the length of primary ORF (pORF), and lsORF represents the length of secondary ORF (sORF). ORF dominance values range from 0 to 1. During the analysis of the LR-seq data, we identified multiple isoforms that mapped to the same Ensemble transcript ID. In such cases, we selected the longest isoform to calculate ORF dominance. Additionally, we computed the average ORF dominance of all the detected isoforms for further analysis. We plotted the relative frequency distribution of ORF dominance calculated from the entire RNA dataset and compared the peak positions and graph shapes.

Obtaining transcriptome data
We obtained transcriptome data for ORF dominance analysis from published and public databases.
Cancer-specific most dominant transcripts (cMDTs) and normal tissue transcripts (nMDTs) corresponding to cMDTs in 24 cancer types were obtained from Kahraman et al. (2020) previously published database (Table S1). To ensure high mapping accuracy, we only included data with a one-to-one correspondence, excluding cases with multiple nMDTs mapped to a cMDT.
The data included mutation information for cMDTs (MutationInfo), which we used to examine the association between mutation status and ORF dominance. The mutations were classified into 6 categories based on their position: 5'UTR, CDS, 3'UTR, Core promoter, Splice site, and Enhancer. However, the data did not provide classification information for coding noncoding transcripts.
For cancer-specific transcript data obtained through LR-seq, we retrieved information on aberrant splicing of isoforms in lung adenocarcinoma specimens and lung adenocarcinoma cell lines from a previous report by Oka et al. (2021). Similarly, the classification of coding noncoding transcripts was not provided in this data. The cell line data included information on driver mutations, which we also incorporated into our analysis. Additionally, we collected data on hepatocellular carcinoma (HCC)-specific isoforms in HCC from a previous study by Chen
The lysates were subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis under reducing conditions, followed by transfer to polyvinylidene difluoride membranes (Millipore, Burlington, MA, USA) using a semi-dry transfer system. Membranes were blocked overnight at 4 °C in Tween buffer containing 5% dry milk for 90 min before reaction, followed by incubation with primary antibodies. The primary antibodies used were Raly (A302-070A, Bethyl Lab, 1:2,000) and α-tubulin (T5168, Sigma-Aldrich, 1:10,000).

Creation of a list of oncogenes and tumor suppressor genes
Genes classified as oncogenes or tumor suppressor genes were identified using the UniProt database (https://www.uniprot.org/). Specifically, genes registered under the keywords "protooncogene" (KW-0656) or "tumor suppressor" (KW-0043) were categorized accordingly.
Gene sets were created using transcript IDs, and the Functional Annotation Chart with default settings provided by DAVID was utilized for result analysis. A Benjamini value <0.05 was considered statistically significant.

AHA-mediated ribosome isolation
We employed the AHARIBO technique, which utilizes AHA (an analog of methionine) to incorporate it into proteins during translation. This allowed us to pull down complexes of ribosomes, RNA, and proteins involved in translation by binding them to AHA-coated beads (Minati et al., 2021). The AHARIBO kit protocol (Funakoshi, Tokyo, Japan) was followed for the experiment.
To initiate the procedure, we switched the medium of the organoids in culture to a methioninefree medium (Thermo Fisher Scientific). After 40 min of culture, AHA was added and incubated for 10 min. Following this, the medium was removed and the organoids were washed with PBS.
To collect the cells from the Matrigel, the Matrigel was depolymerized using Cell Recovery Solution (Corning). The collected cells were then lysed using a kit containing lysis buffer. The ligand-coated beads were bound to the lysate, and the resulting bead-cell complex was collected using a magnetic rack. The collected bead suspension was incubated with 10% Sodium Dodecyl Sulfate and Proteinase K (Fujifilm Wako Pure Chemical) at 37 °C for 60 min. The supernatant was collected, and RNA was extracted using RNeasy mini (QIAGEN) for RNA sequencing.
Similarly, for protein analysis, the beads were reacted with the cell lysate and collected using a magnetic rack. The collected bead suspension was subjected to LC-MS analysis following a thorough wash with urea washing solution (AHARIBO kit).

Quantitative real time RT-PCR
RNA was extracted using the AHARIBO kit, and cDNA synthesis was performed using SuperScript II with random primers ( features such as "MBR," "no shared spectra," "unrelated runs," "use isotopologues," and "heuristic protein inference." The protein identification threshold was set at 1% or less for both precursor and protein FDRs.

Changes in ORF dominance during carcinogenesis
To gain insights into the significance of changes in ORF dominance during carcinogenesis, we calculated the ORF dominance of transcriptomes obtained from normal and cancer tissues and compared their distributions.
Using a dataset of 24 cancer types from Kahraman et al. (2020), which included cancer-specific cMDTs and corresponding nMDTs in normal tissues, we initially compared the ORF dominance distributions of all cMDTs and nMDTs regardless of cancer type. We observed a significant shift toward higher ORF dominance values in cMDTs compared to nMDTs ( Figure 1A). Next, we focused on transcripts derived from oncogenes and tumor suppressor genes within cMDTs and compared them to nMDTs. We found that the ORF dominance of cMDTs significantly increased only for oncogenes ( Figure 1B). Additionally, when comparing ORF dominance by cancer type, cMDTs showed significantly higher values in 14 out of 24 cancer types, while three cancer types (bladder transitional cell carcinoma (Bladder-TCC), chronic lymphocytic leukemia (Lymph-CLI), bone and soft tissue leiomyosarcoma (Bone-Leiomyo)) exhibited lower ORF dominance. No significant differences were detected among seven other cancer types ( Figure 1C).
These findings indicate an overall increasing trend in transcript ORF dominance during carcinogenesis, specifically elevated ORF dominance for oncogenes and cancer type-specific transcripts.
Furthermore, GO analysis of gene sets with ORF dominance changes exceeding 0.1 between nMDT and cMDT revealed that genes with increased ORF dominance were associated with the Golgi apparatus, endoplasmic reticulum, cell junctions, and cell cycle (Table S2), while those with decreased ORF dominance were related to the mitochondrion, DNA damage, and DNA repair (Table S3). Similarly, analyzing the 14 cancer types with increased ORF dominance showed that gene sets with elevated ORF dominance were associated with the Golgi apparatus, endoplasmic reticulum, and cell junctions, whereas those with decreased ORF dominance were linked to the mitochondrion, DNA damage, and DNA repair (Table S4, S5). Likewise, for the three cancer types with decreased ORF dominance (Bladder-TCC, Lymph-CLI, and Bone-Leiomyo), the gene sets with decreased ORF dominance were associated with the mitochondrion (Table S6). These results suggest that changes in ORF dominance may be associated with physiological functions involved in carcinogenesis, such as organelles, the cell cycle, and DNA damage response.
To examine the relationship between changes in ORF dominance and genetic mutations, we compared the ORF dominance distributions of nMDT and cMDT among isoforms with and without mutations. The results showed a significant shift toward higher ORF dominance values irrespective of the presence or absence of mutations ( Figure S2A). Additionally, investigating the association between the change in ORF dominance and mutation location revealed a significant increase in ORF dominance when mutations were present in the 5' UTR, 3' UTR, and CDS regions.
However, no change in ORF dominance was detected when mutations occurred in regulatory regions such as the core promoter, enhancer, and splice sites ( Figure S2B).

LR-seq reveals high-resolution ORF dominance changes
The data presented above are derived from SR-seq data obtained from the Pan-Cancer Analysis  Figure S3). In the lung adenocarcinoma cell line data, all cell lines carrying KRAS mutations exhibited a marked shift in ORF dominance, whereas those with NRAS or EGFR mutations did not demonstrate a significant change ( Figure S3A). Conversely, in hepatocellular carcinoma, a significant shift toward lower values in ORF dominance was observed ( Figure S3B).  Figure   S4).

Changes in ORF dominance in a mouse model of organoid carcinogenesis
To examine the changes in ORF dominance during pure carcinogenesis, we compared normal and cancer cells with matched backgrounds derived from the same mouse cells. Specifically, we lentivirally transduced Kras LSL-G12D/+ ; Trp53 flox/flox mouse bile duct organoids in vitro with either Cre or the empty vector pLKO.1 to obtain Kras G12D/+ ; Trp53 -/-(hereafter referred to as Cre organoids) and normal controls (V organoids). When both types of organoids were inoculated into nude mice, subcutaneous (SC) tumors only developed from the Cre organoids. We obtained two clones of subcutaneous tumor-derived organoids (SC1/SC2 organoids) from two independent experiments (Figure 2A-C). Total RNA was extracted from these four organoids, and RNA sequencing (LR-seq) was performed. In the analysis, transcripts were classified as coding RNA or noncoding RNA during annotation, and the distribution of ORF dominance was analyzed for each.
The results showed no significant differences in the ORF dominance distribution of the organoids before (V) and after (Cre) genetic engineering, for both coding and noncoding RNAs. However, the ORF dominance distribution of the transcripts before (Cre) and after tumorigenesis (SC1, SC2) exhibited a significant shift toward higher values for both coding and noncoding RNAs ( Figure 2D and Figure S5). Additionally, GO analysis of gene sets with ORF dominance changes greater than 0.1 revealed that gene sets with increased ORF dominance were associated with Golgi apparatus, endoplasmic reticulum, mitosis, cell division, and cell cycle (Table S7), similar to the results obtained in human cancers. Gene sets related to mRNA processing, DNA damage, and DNA repair were enriched in gene sets with both increased and decreased ORF dominance (Table S7 and S8).
Additionally, we performed similar analysis using mouse pancreatic organoids with the Kras LSL-G12D/+ ; Trp53 flox/flox allele to obtain V, Cre, and SC organoids as previously described in our study (Matsuura et al., 2020). After LR-seq analyses, we examined the ORF dominance distributions.
The ORF dominance distributions after genetic engineering (Cre) were significantly shifted toward higher values in both coding RNA and noncoding RNA, compared to the distributions in V organoids ( Figure S6). Similar to human cancers and cholangiocarcinoma organoid models, the gene sets with increased ORF dominance after genetic engineering (Cre) were associated with Golgi apparatus, endoplasmic reticulum, cell junction, and cell cycle (Table S9). Gene sets related to mRNA processing, mitochondrion, and autophagosomes were enriched in gene sets with both increased and decreased ORF dominance (Table S9 and S10). After tumorigenesis (SC), gene sets related to Golgi apparatus and endoplasmic reticulum were enriched in genes with both increased and decreased ORF dominance (Table S11 and S12), while gene sets related to cell cycle, cell division, and mitosis were predominantly enriched in decreased ORF dominance. These results indicate that the timing of ORF dominance elevation differs among different tissues even when the same genetic mutations are employed for carcinogenesis. Moreover, certain gene sets with altered ORF dominance share common functions (e.g., Golgi apparatus, endoplasmic reticulum, and cell cycle), while others are tissue-specific (e.g., autophagosomes).

RNA sequence changes inducing alternations of ORF dominance
Using the LR-seq method in a mouse bile duct organoid model, we investigated changes in ORF  (Table   S13 and S14).
To explore the potential contributions of genomic mutations to the elevation of ORF dominance after tumorigenesis, whole-genome sequencing was performed on the four organoids ( Figure S8).
The number of mutations increased following tumorigenesis ( Figure S8A); however, no significant changes were observed in the types of mutations ( Figure S8B).  Figure 3A). After tumorigenesis (SCs), the ORF dominances of Raly mRNA increased due to 3'UTR shortening ( Figure 3A), and this was accompanied by elevated protein expression levels ( Figure 3B). To further investigate the translation process, we extracted RNAs bound to ribosomes, which were actively translating nascent peptides, from the organoids using AHARIBO. Subsequently, we quantified the Raly mRNA bound to ribosomes and the newly translated Raly protein using RT-qPCR and LC-MS, respectively. While the amount of Raly mRNA bound to ribosomes in SCs showed a decrease compared to Cre ( Figure 3C), the level of newly translated Raly protein was higher in SCs ( Figure   3D), indicating that increased ORF dominance is associated with augmented translation of Raly.

Elevation of ORF dominance contributes to enhanced translation
To gain a comprehensive understanding of the relationship between ORF dominance and translation, we conducted a thorough analysis of RNAs bound to ribosomes that were actively translating nascent peptides using AHARIBO combined with LR-seq. We also calculated the ORF dominance based on the RNA sequences ( Figure 4A and Figure S9). The distributions of ORF dominance for ribosome-bound RNAs revealed a shift toward lower values in coding transcripts ( Figure 4A, left), while noncoding transcripts showed a shift toward higher values, although the shift in SC2 noncoding RNAs was not statistically significant ( Figure 4A, right) Figure 4B, AHA). Additionally, cancer organoids (SCs) exhibited transcriptomes that approached this optimal transcript length compared to control samples (V and Cre). As previously reported, shorter transcripts demonstrated higher ORF dominance in both coding and noncoding transcripts ( Figure 4C). Shortening transcripts was the primary mechanism for achieving optimal translation ( Figure 4B). However, in cancer organoids, long coding transcripts (>4,000 bp) showed increased ORF dominance, whereas noncoding transcripts did not exhibit the same pattern ( Figure 4C). This difference between coding and noncoding transcripts is likely due to the presence of functional ORFs in coding transcripts, which limits transcript shortening. No relationship was observed between RNA expression levels and ORF dominance or tumorigenesis ( Figure S10 and S11).
Ribosomes translate peptides from both pORFs and sORFs. Our methodology (AHARIBO-LRseq) allowed us to extract and identify all RNAs translated from pORFs and/or sORFs. However, AHARIBO-LC-MS data specifically identified peptides from bona fide ORFs, as peptides from sORFs were either not registered in the database or were undetectable due to low stability.
Consequently, we examined whether ORF dominance-shifts in ribosome-bound RNAs  Figure 5C), reflecting the order of ORF dominance shifts ( Figure 4A) and translational frequencies ( Figure 5A) (Cre < SC1 < SC2). Notably, this elevation was only observed within the defined range of ORF dominance (< 0.7) ( Figure 5B). These findings indicate that shifts in ORF dominance within the transcriptome contribute to enhanced translation in cancer organoids.

Potential of ORF dominance analysis using SR-seq data
To validate the efficacy of LR-seq in ORF dominance analysis, we compared the ORF dominance distributions using SR-seq data from the same RNA samples in a mouse cholangiocarcinoma model experiment ( Figure S12). The SR-seq results revealed no significant difference in ORF dominance distribution between the four organoids for both coding and noncoding RNAs ( Figure   S12A). This led us to consider the possibility that RNAs with unchanged expression levels might introduce noise into the analysis, masking the results of RNAs with altered ORF dominance. To address this, we extracted RNAs with expression levels increased more than two-fold before and after genetic engineering (V→Cre) and tumorigenesis (Cre→SC1, Cre→SC2). We then compared the ORF dominance distributions again. The analysis of noncoding RNAs showed no significant difference, whereas in the analysis of coding RNAs, the ORF dominance distribution of RNAs with expression level changes after tumorigenesis (Cre→SC1, Cre→SC2) was higher than that of RNAs with expression level changes after genetic engineering (V→Cre) for upregulated RNAs ( Figure S12B). These findings suggest that LR-seq can more accurately detect changes in ORF dominance compared to SR-seq. Furthermore, our results indicate that even with SR-seq data, changes in ORF dominance can be detected with high sensitivity by focusing on transcript data with altered RNA expression levels.
To explore the relationship between ORF dominance and intratumor diversity, we calculated ORF dominance using single-cell RNA sequencing data obtained via SR-seq (Ono et al., 2021). In a previous study, mouse colorectal cancer organoids were derived from single cells with APC knockdown and subcutaneously inoculated into nude mice to induce tumor formation. The To investigate the relationship between ORF dominance and cancer progression, we calculated ORF dominance based on TNM classification (Table S16)

Discussion
In this study, we aimed to examine the role of ORF dominance in capturing the process of cancer evolution. In a previous report by Suenaga et al. (2022), we explored the association between ORF dominance calculated from transcriptome data of extant species and evolutionary phylogenetic trees. In this study, we compared ORF dominance between normal and cancerous tissues using publicly available patient-derived transcriptome data. We also investigated changes in ORF dominance during carcinogenesis and cancer progression using multiple mouse carcinogenesis models.
Our findings revealed an overall increasing trend in ORF dominance during carcinogenesis for both coding and noncoding transcripts (Figure 8). Gene ontology analysis of genes with elevated ORF dominance identified associations with the Golgi apparatus, endoplasmic reticulum, cell cycle, and mitosis. Conversely, decreased ORF dominance was associated with the mitochondrion, DNA damage, and DNA repair. Given that ORF dominance correlates with the translation frequency of ribosome-bound RNA, these fluctuations in gene translation likely contribute to carcinogenesis ( Figure 8A). We observed high ORF dominance in noncoding RNAs that translate neoantigens, particularly with a peak around 0.25. Additionally, noncoding transcripts with increased expression in advanced human colorectal cancers exhibited a peak at 0.25 in ORF dominance. Therefore, noncoding RNAs with elevated ORF dominance could serve as promising candidates for RNA-based neoantigen translation and cancer immunotherapy in advanced cancers.
Interestingly, elevations in ORF dominance were detected earlier in pancreatic organoids than in bile duct organoids. In mouse models of colorectal cancer and human colorectal cancers, elevated ORF dominance was associated with subgroups linked to metastasis (cGMP/GC and Dormant).
These results suggest that increased ORF dominance may occur earlier in more aggressive cancer types or subtypes.
Regarding overall transcriptome changes, approximately 50% of transcripts exhibited altered Lastly, contrasting carcinogenesis, a decrease in ORF dominance of coding RNAs was observed during cancer progression in a skin carcinogenesis model and in a human colorectal cancer subgroup (AntiEpi). Conversely, an increasing trend was detected for noncoding RNAs during cancer progression in human colorectal cancers. These changes occurred prior to distant metastasis. In other words, coding and noncoding RNAs exhibited diminishing molecular distinctions during cancer progression before the development of metastatic tumors, blurring the boundary between them ( Figure 8B). In our analysis of biological evolution, we proposed that the blurring of the coding noncoding RNA boundary is associated with a decrease in effective population size and an increased risk of extinction, while also providing opportunities for the emergence of new genes that enable species to adapt to new environments (Suenaga et al., 2022).
The observation that the coding noncoding RNA boundary is also blurred during cancer progression suggests that the number of proliferative cancer cells (effective population size) in cancer tissue may decrease as cancer progresses before the development of metastatic tumors.
This notion aligns with the fact that genes associated with the cell cycle and mitosis exhibited decreased ORF dominance in pancreatic cancer models (SC), and the Dormant subtype, characterized by low proliferative ability (low effective population size), demonstrated the highest metastatic potential with no significant changes in ORF dominance following its emergence.
In summary, our study elucidated the significance of changes in ORF dominance during cancer evolution as follows: (1) ORF dominance undergoes extensive changes during carcinogenesis, influencing the translation efficiency and resulting in increased expression of oncogenes related to cell proliferation and decreased expression of genes involved in DNA damage response.
(2) Changes in ORF dominance during the carcinogenic process are not primarily driven by genetic mutations, but rather by other mechanisms such as epigenetic and/or transcriptional alterations.
(3) The boundary between coding and noncoding RNA becomes blurred prior to the development of metastatic tumors.
However, our study did not demonstrate whether genes affected by changes in ORF dominance actually play a role in carcinogenesis and cancer progression, which is a question that needs to be addressed in future research. The upstream regulatory mechanisms governing alterations in ORF dominance remain unknown. Therefore, accumulating and analyzing experimental data using ORF dominance as a starting point will be crucial for a deeper understanding of carcinogenesis and cancer progression.

Acknowledgments
We gratefully acknowledge the valuable comments from Toshinori Ozaki, as well as the technical assistance provided by Kyoko Takahashi

Additional information
Supplementary information is available.

Competing interests
The authors declare no competing financial interests.

Data availability
List of sequencing sources can be found in