Abstract
Long intergenic non-coding RNAs (lincRNAs) account for a large fraction of transcribed loci in the human genome. While many lincRNAs are retained in the cell nucleus, preventing their association with ribosomes, binding of cytosolic lincRNAs to ribosomes has been observed, but rarely results in translation. This raises the question of how translation of short open reading frames (ORFs) within cytosolic lincRNAs is hindered. Here, we investigate the content of nucleotide triplets in lincRNA putative ORFs (i.e. “codons”) and its potential impact on ribosome binding and translation.
We find that lincRNA and mRNA ORFs have distinct codon frequencies, that are well conserved between human and mouse. In lincRNAs, codon frequencies are less correlated with the corresponding tRNA abundance measures than in mRNAs. This correlation is weaker for cytoplasmic lincRNAs and lowest for those without experimental evidence for ribosome binding.
Our results suggest that putative lincRNA codons are a substrate of evolutionary forces modulating them to counteract unwanted ribosomal binding and translation. The resulting sequence signatures may help in distinguishing bona-fide lincRNAs with regulatory roles in the cytoplasm from transcripts coding for peptides.