Abstract
BACKGROUND The daily gene expression oscillations that underlie mammalian circadian rhythms show striking tissue differences and involve post-transcriptional regulation. Both aspects remain poorly understood. We have used ribosome profiling to explore the contribution of translation efficiency to temporal gene expression in kidney, and contrasted our findings with liver data available from the same mice.
RESULTS Rhythmic translation of constantly abundant mRNAs affected largely non-overlapping transcript sets with distinct phase clustering in the two organs. Moreover, tissue differences in translation efficiency modulated the timing and amount of protein biosynthesis from rhythmic mRNAs, consistent with organ-specificity in clock output gene repertoires and rhythmicity parameters. Our comprehensive datasets provided insights into translational control beyond temporal regulation. Between tissues, many transcripts showed differences in translation efficiency, which were, however, of markedly smaller scale than mRNA abundance differences. Tissue-specific changes in translation efficiency were associated with specific transcript features and, intriguingly, globally counteracted and compensated transcript abundance variations, leading to higher similarity at the level of protein biosynthesis between both tissues.
CONCLUSIONS We show that tissue-specificity in rhythmic gene expression extends to the translatome and contributes to define the identities, the phases and the expression levels of rhythmic protein biosynthesis. Moreover, translational compensation of transcript abundance divergence leads to overall higher similarity at the level of protein production across organs. The unique resources provided through our study will serve to address fundamental questions of post-transcriptional control and differential gene expression in vivo.
Background
Circadian clocks serve organisms to synchronize behaviour, physiology and gene expression according to time of day. The mammalian circadian system consists of a master clock in the brain’s suprachiasmatic nuclei (SCN) that receives photic inputs from the retina and synchronizes peripheral clocks present in most cells throughout the body. The molecular timekeeping mechanism – the core clock – consists of a network of transcriptional activators and repressors interacting in negative feedback loops (reviewed in [1, 2]). In the core loop, the heterodimeric transcription factor ARNTL:CLOCK (also known as BMAL1:CLOCK) drives the expression of its own repressors, encoded by the Period (Per1, Per2, Per3) and Cryptochrome (Cry1, Cry2) genes – a configuration also known as the positive and negative limbs of the oscillator. Additional feedback – in particular an interconnecting limb involving nuclear receptors of the REV-ERB (encoded by genes Nr1d1, Nr1d2) and ROR (Rora, Rorb, Rorc) family – intersects with the core loop, and numerous post-translational modifications of clock proteins further add to the complexity of the circuitry. The final outcome is a set of robustly cycling transcriptional activities peaking at different phases around the day that drive the rhythmic expression of hundreds to thousands of other genes, termed the clock output or clock-controlled genes (CCGs). It is noteworthy that, despite the probably (near-)identical molecular makeup of the core clock across cell types, CCGs show considerable tissue-specificity [3]. The co-regulation by core clock and tissue-specific (non-rhythmic) transcription factors may engender such cell type-specific rhythmic expression patterns, as shown to occur in Drosophila [4]. Overall, however, the origins of tissue-specificity in rhythmic gene output (and even in certain core clock parameters [5]) are poorly understood. Mechanisms that act at the post-transcriptional level and that impact daily mRNA and protein accumulation kinetics are plausible players in the generation of cell type differences as well.
Rhythmic gene expression has been mainly investigated at the transcriptome level i.e., using mRNA abundances as a primary readout. However, comparison of mRNA levels with datasets of genome-wide transcriptional activity and of protein abundances that have become available recently, has suggested that a surprisingly large fraction of gene expression oscillations may have post-transcriptional origins (reviewed in [6]). The many cases of protein rhythms that are independent of an underlying oscillating transcript (initially reported in a low-throughput mass-spectrometric study from mouse liver already 10 years ago [7] and recently confirmed at a comprehensive scale [8, 9]) point to important roles for translation, protein degradation and protein secretion in shaping time of day-dependent proteomes. We [10] and others [11] have recently used ribosome profiling, a genome-wide method that assesses translation efficiency through the deep sequencing of ribosome-protected mRNA fragments, to chart the contribution of translational control to daily protein biosynthesis in mouse liver. One conclusion that emerged from the identified cases of translationally generated oscillations was that circadian clock activity and feeding rhythms both contribute to regulating rhythmic gene expression outputs [10, 11]. Notably, the most abundant group of transcripts subject to rhythmic translation, i.e. mRNAs encoding ribosomal proteins and other components of the translation machinery that all contain 5′-terminal oligopyrimidine tract (5′-TOP) sequences regulated by the mammalian target of rapamycin (mTOR) [12], appear to be under the dominant control of feeding [11].
We have now performed ribosome profiling using a second organ from the same cohort of animals, the kidney, which is an emerging circadian model organ with distinct rhythmic functions [13]. By contrasting kidney and liver datasets we comprehensively assessed commonalities and differences in their translatomes, and we evaluated in how far the regulation of translation efficiency contributed to tissue specificity in rhythmic and constitutive protein biosynthesis.
Results
Around-the-clock ribosome profiling datasets from two organs
For our recent study of the liver translatome around-the-clock [10] we had used ribosome profiling [14] (RPF-seq) on a time series of organs collected from mice sacrificed every 2 hours over the 24-hour day (12 timepoints in duplicate; Figure 1A). To generate a complementary dataset from a second organ we chose the kidneys from the same cohort of animals. Liver and kidney express thousands of genes in common [3, 15], thus providing a particularly suitable setting for a cross-organ comparison of gene expression.
Applying the same experimental and computational methods as for liver RPF-seq [10, 16], we obtained comparable high-quality data for kidney (see Supplementary Table S1 and Figure S1A-C for details on sequencing and mapping outcomes). Briefly, ribosome footprints from both organs showed similar enrichment for protein coding sequences (CDS) of mRNAs and depletion of untranslated regions (UTRs) (Figure 1B). Like the footprints from liver, also those from kidney exhibited excellent reading frame preference, which allowed resolving the 3-nt periodicity of coding sequences transcriptome-wide (Figures 1C and S2A-B). Moreover, the high correlation coefficients seen across replicates of the kidney time series for both RNA- and RPF-seq data indicated excellent biological and technical reproducibility (Figure S3A-B). We also used a recently developed tool, termed Ribo-seq Unit Step Transformation (RUST) [17], to confirm high technical similarity of datasets between organs (Figure S2C-D). Finally, principal component (PC) analysis on all available datasets (96 libraries, i.e. RPF-seq and RNA-seq from 2 organs, 12 timepoints, in duplicate) segregated the data according to the main experimental and biological covariates. PC1 (explaining 64.2% of variation) thus separated libraries according to organ, indicating that tissue origin represented the major source of divergence, followed by PC2 (28.4%) that separated RNA-seq (mRNA abundance) and RPF-seq (footprints/translation) (Figure 1D). The cyclic nature of the data was resolved in the representation PC3 vs. PC5 (together 12.5%), in which timepoints assembled to a near-perfect clock (Figure 1E). The larger circular arrangement of the liver vs. kidney time series suggested that rhythmic gene expression from liver contributed more strongly to overall variation than did kidney rhythms. This observation is in line with the notion that there are more, and higher amplitude rhythms in liver than in kidney [3]. Taken together, we concluded that the kidney data were of similarly high quality as our previous liver datasets [10]. Together, they would be suitable for comparative analyses of time of day-dependent and constitutive translation across two tissues.
Cross-organ differences in translation efficiency are widespread, of moderate scale, and partially compensate RNA abundance differences
To what extent do differences in translation efficiency contribute to different gene expression outputs across organs? We addressed this question using the set of 10289 genes whose expression was detectable in kidney and in liver at both RPF and RNA level (Figure 2A). From the ratio of CDS-mapping normalised read counts for RPF-seq relative to RNA-seq we first calculated relative translation efficiencies (TEs) per transcript and for each organ. TEs were overall rather similar between tissues, with 95% of genes falling into a less than 3-fold range for the kidney/liver TE ratio, as compared to a greater than 100-fold range for the transcript abundance ratio (Figure 2B). This observation was coherent with the considerably broader spread of mRNA abundances vs. TEs across genes within each organ (greater than 500-fold vs. just over 10-fold, respectively; Figure S5A-B) and is in line with a dominant role for the regulation of mRNA levels (i.e., transcription and mRNA decay) in controlling quantitative differences in gene output.
Intuitively, we had expected that RNA levels that were widely dissimilar between kidney and liver and subsequently further modulated by organ-specific TEs, would probably give rise to even greater cross-organ divergence at the RPF level. Intriguingly, however, the global correlation between kidney and liver was better for footprint abundances than for transcript abundances (Spearman ρ [RPF]: mean 0.784 vs. ρ [RNA]: mean 0.736; p<2.2e-16) (Figure 2C; Figure S3C-D). This phenomenon was observed irrespective of whether the genes expressed only one dominant protein-coding transcript isoform (“single isoform genes” in the following) that was common to both organs, or whether they gave rise to different (including tissue-specific) mRNA variants (“multiple isoform genes”) (Figure 2D). The observed higher cross-organ concordance of RPFs could have simply had technical reasons, for example if the RPF-seq protocol gave more reproducible results than the RNA-seq protocol. We addressed this caveat by comparing measurement errors (MEs) for RNA and RPF data using a similar approach as in a recent publication [18]. We found that MEs scaled inversely with expression levels, as expected, and showed some variation due to organ (Figure S6A,B,F,G). Especially in liver and among low expressed transcripts, a tendency towards smaller MEs for RPF than for RNA was indeed visible (differences statistically non-significant). In most other cases, however, measurement errors were (in part significantly) higher for the transcripts’ RPF counts than for their RNA counts. Of note, the better cross-organ correlation of RPF vs. RNA levels seen in the full transcript set (Figure 2C) was also evident within various transcript subsets (Figure S6C-E, H-L), including such subsets for which RPF MEs were higher than RNA MEs (Figure S6E, L). It is thus unlikely that technical bias was the reason for the higher RPF correlation. Finally, an analysis that we performed on independent ribosome profiling datasets from rat liver and heart [19] allowed us to confirm the phenomenon of higher concordance of RPF vs. RNA abundance also between these organs(Figure S7A-C). Taken together, these findings are suggestive of a potentially broader biological phenomenon that consists in the partial compensation of differences in a gene’s mRNA expression through counteracting effects exerted through its TE, resulting in the convergence at the level of protein biosynthetic output (footprints, RPF) across tissues.
Transcript features associated with cross-organ differences in translation efficiency
Do particular transcript features have predictive value for organ-specific differences in translation efficiency? To investigate this question we selected the genes with significantly different TEs between tissues (N=5013; Wilcoxon signed rank test; FDR<0.01) and implemented a 1.5-fold cutoff on TE ratio between the organs to retrieve the most pronounced cases (N=960) (Figure 2E; Supplementary Table S2). Of these, 533 represented “single isoform genes” with no (or negligible amounts of) expression of tissue-specific mRNA variants. For these genes, we examined whether a higher TE in kidney (N=193) or in liver (N=340) was associated with specific transcript characteristics. Of several features tested, we found that CDS and transcript lengths showed the most significant association with differential TE (Figure S8A-B). Of note, we had previously seen in liver that shorter coding sequences – i.e., transcripts encoding smaller proteins – are more efficiently translated [10]. Our present analyses suggest that such transcripts are also more prone to tissue-specific regulation at the translational level. Other sequence features showed some bias within the differential TE gene sets as well, although the effects were overall weaker and less consistent. Briefly, the 5′ UTRs of genes with higher TEs in liver were longer and predicted to fold more strongly. By contrast, transcripts with higher kidney TEs were associated with lower 5′ UTR GC content and slightly shorter 3′ UTRs. No association with differential TE was found for the Kozak sequence context score.
We also investigated two functional classes of sequence features, miRNA binding sites and upstream open reading frames (uORFs), for association with differential TE. Of note, the 960 “TE different” transcripts were not enriched for any predicted miRNA binding sites, making it unlikely that this class of post-transcriptional regulators is a major player in establishing tissue-specific TEs (data not shown). We had previously observed that in the liver the presence of a translated uORF in the 5′ UTR was strongly predictive of low TE at the main ORF [10]. An analogousrelationship was evident in kidney as well (Figure S5C). To assess whether uORF translation was associated with TE differences across organs, we compared how the identified uORF-containing transcripts (i.e., single isoform genes showing translated uORFs in at least one organ; N=1377) distributed to the differential vs. non-differential TE gene sets. The group of genes with higher TE in liver was significantly enriched for transcripts with translated uORFs (p=6.08e-04; Fisher’s exact test) and there was slight depletion among genes with higher TE in kidney (not significant) (Figure S8C). Only few differential TE genes exhibited uORF translation that was exclusive to one organ, but there was a tendency for kidney-specific translation of uORFs to be associated with higher TE on the CDS in liver, and vice versa (Figure S8D). For the genes with uORFs translated in both tissues, we expected that cross-organ differences in the strength of uORF usage would negatively correlate with TE differences at the CDS. However, such a trend was only visible for liver differential TE genes (Figure S8E), and globally, uORF and CDS TEs even showed slightly positive correlation. In summary, these analyses suggested that uORF translation contributed to some extent (and especially for genes that were more efficiently translated in the liver) to cross-organ differences in TE; however, the overall impact appeared limited (see Discussion).
We next included the “multiple isoform genes” in the analyses and asked whether transcript isoform diversity between the two organs – i.e., the occurrence of tissue-specific mRNA variants generated by alternative transcriptional start sites, splicing, and 3′ processing – had any relationship to differential TE. Briefly, using our RNA-seq data we first compiled an inventory of all annotated, protein-coding transcript isoforms and their estimated relative expression levels per gene and tissue. We then used the Hellinger distance [20] as a measure of dissimilarity in isoform expression levels between kidney and liver. A value of 0 for this metric indicates that a gene has identical isoform distribution in both tissues (i.e., these are essentially the “single isoform genes” described above), while a value of 1 denotes a lack of overlap in expressed isoforms. Globally, the 960 genes with differential TE showed significantly higher Hellinger distances than the remainder of the expressed genes (p=3.74e-04; Kolmogorov-Smirnov-test) (Figure 2F). Molecularly, the term “transcript isoform” comprises variations affecting 5′ UTR, CDS, and 3′ UTR. By comparing the genes for which all expressed variants affected exclusively one single feature, or for which this particular feature was not affected at all, it became apparent that transcript diversity in the 5′ UTR was particularly strongly associated with differential TE (Figure 2G). By contrast, variation in the CDS showed significantly less association with cross-organ differences in translation efficiency (Figure S9A-B). Although the low number of available transcripts bearing exclusively 3′ UTR differences precluded a rigorous interpretation, 3′ UTR variation did not appear to be associated with differential TE either (Figure S9C). Altogether, we thus concluded that TE differences between tissues may, at least in part, have their origin in tissue-specific transcript variants, especially through alternative 5′ UTRs.
Finally, we were interested in whether cross-organ differences in translation efficiency affected specific pathways. For the 640 “TE different” genes that showed increased TE in liver (Figure 2D), gene ontology (GO) analyses revealed significant enrichment for categories related to transcription (Supplementary Table S2). Conceivably, tissue-specific translational control of transcriptional regulators may thus impact also on the organs’ transcriptomes. The 320 “TE different” genes that were translated better in kidney did not show any significant enrichment.
Translational modulation of phase of oscillation in kidney
We next turned to the analysis of factor time across the datasets. We annotated rhythmic events in kidney with the same methodology as previously for liver, including a 1.5-fold cut-off on peak-to-trough amplitudes [10]. A list of the detected RNA and RPF rhythms and genome-wide gene expression plots are provided in Supplementary Table S3 and Supplementary Dataset S1, respectively. Our analyses yielded 1338 and 977 genes that cycled at the RNA abundance and footprint level, respectively, with an overlap of 542 genes (Figure 3A). As discussed later, this relatively modest overlap (542 genes corresponds to 41% and 55% of all “RNA rhythmic” and “footprints rhythmic” cases, respectively) likely underestimates the full extent of shared rhythmicity and only contains the most robustly oscillating gene expression events, which we further explored in the following.
Interestingly, the analysis of rhythmicity parameters across the 542 genes revealed that the timing of their RPF peaks relative to their RNA peaks had a significantly different and broader distribution than the corresponding set from liver (p<1.0e-04; permutation test) (Figure 3B). This observation suggested that the phase of protein biosynthesis rhythms was subject to marked translational modulation in kidney. In liver, by contrast, RPF peaks were more tightly gated by RNA abundance peaks. Surprisingly, maximal translation tended to precede maximal RNA abundance in kidney (Figure S10A), as globally the mean RPF peak phase was advanced (-0.123 hours) and also RPF rhythms were enriched for phase advances (282) vs. delays (260), albeit neither reaching statistical significance (p=0.16, Wilcoxon rank sum test).
The above analyses used different rhythmic gene sets for kidney than for liver, potentially compromising comparability. The observed differences in the RPF-RNA phase relationships could thus have simply arisen from transcript-specific rather than from tissue-specific differences in the timing of translation. We thus analysed the group of 178 genes whose RNA and RPF profiles were rhythmic in both organs (Figure 3C; Supplementary Table S4; Supplementary Dataset S2). Again, the distribution of RPF-RNA offsets was significantly broader in kidney than in liver (Figure 3D; p=0.007, permutation test) with an RPF peak phase advance in kidney (mean -0.143 h) and a phase delay in liver (mean 0.036 h) (Figure S10B-C). We next calculated the gene-wise RPF-RNA peak phase difference in kidney relative to that in liver. More genes showed their RPF maxima earlier (96) than later (82) in kidney vs. liver, with a mean advance of -0.178 h (Figure S10D), but again without passing statistical significance (p=0.152, Wilcoxon rank sum test).
Conceivably, we lost statistical power and introduced error in the above analyses by restricting the phase comparisons merely to the peaks of the rhythmic curve fits. We thus sought a method that would take into account phase differences between RPF and RNA profiles over all data points. To this end, we used cross-correlation to quantify the similarity between the RPF and RNA time series as a function of sliding one series on the time axis relative to the other. When the time series were not shifted against each other at all (RPF-RNA lag = 0 h), the RPF-RNA cross-correlation values were overall highest, as expected, and they were significantly higher in liver, in line with stronger gating of RPF rhythms relative to RNA oscillations in this organ (Figure 3E). Importantly, when cross-correlation of RNA was calculated with earlier RPF time points (negative RPF-RNA lags; see in particular lags of -2 h to -8 h in Figure 3E), kidneys scored significantly higher than livers. Sliding the series in the other direction, however, rather led to overall better correlations in the liver (see lags of +4 to +8 in Figure 3E; liver-kidney difference non-significant). Taken together, these analyses underscored that there was asymmetry in the data with RPF rhythms preceding RNA rhythms specifically in the kidney.
We confirmed kidney-specific translational phase advances by visual inspection of individual gene expression profiles. Figure 3F shows the profiles for the genes Hlf, Nampt, Slc5a6, Tardbp, Dnajb4, Cgn and Etnk2, which all show an RPF phase advance of up to several hours relative to RNA. Cross-correlation analysis for the individual genes confirmed kidney-specific, phase-advanced translation as well (Figure 3G).
At first sight, translation that is phase-advanced to mRNA abundance is counterintuitive. Conceivably, it may occur when translation efficiency is not constant, but decreases over the lifetime of an mRNA. TEs may be higher on freshly synthesised messages that have long poly(A) tails, and decrease as a result of gradual deadenylation even before transcript stability and abundance are affected as well [21]. In keeping with the hypothesis of cross-organ differences in poly(A) kinetics, we have observed that most subunits of the major cytoplasmic deadenylase complex, CCR4-NOT, are significantly more highly expressed in kidney than in liver (Figure S11A-C). Higher deadenylase activity in kidney could provide an attractive molecular explanation for the observed tissue-specific differences in RPF-RNA phasing, and for RPF rhythms that are phase-advanced to RNA oscillations.
High tissue divergence in translationally driven rhythms
Rhythmicity detection algorithms are sensitive to false-negatives i.e., to classify gene expression profiles as “non-rhythmic” (for example because they fail imposed thresholds on amplitude or FDR) although the underlying temporal patterns may still be more similar to, and more likely to be, rhythmic than invariable. Of note, the lack of canonical methods to reliably determine true absence of rhythms is a common problem in the field (see recent review by [6] for discussion). Venn diagrams that simply overlap rhythmic gene sets hence need to be interpreted with caution. For these reasons, the extent of “RNA only” and of “footprints only” oscillations in Figure 3A is likely not reported reliably and subject to overestimation. The heatmaps of the corresponding RNA and RPF profiles support this notion as well (Figure S12B, D).
In order to identify the true-positive “translation only” cycling transcripts with higher reliability, we implemented the same methodology as in our previous study [10]. Briefly, we used the analytical framework Babel [22] to preselect all transcripts whose translation efficiency changed significantly over the day (and/or whose TEs deviated significantly from the global transcript population). Rhythmicity analyses were then performed on this gene subset and yielded 92 cases with the sought-after temporal profiles of rhythmic translation on non-rhythmic mRNAs (Figure 4A). Comparison with the 142 genes of the analogous set from liver revealed near-perfect tissue specificity of translationally driven oscillations. Only two genes, Abcd4 and Lypla2, were shared between the organs; they were both among the least compelling cases of “translation only rhythms” that our method had identified, as judged by visual inspection (Figure 4B).
Interestingly, not only the identity of rhythmically translated genes, but also the time-of-day at which the majority of rhythmic translation events occurred, was highly tissue-specific. The phase histograms thus showed striking differences in the peak time distributions between the organs (Figure 4C-D; difference in distributions: p=1.66e-04; W = 17.403, df = 2; Watson-Wheeler test for homogeneity of angles). Of note, the enrichment for translational maxima at the light-dark transition (Zeitgeber time, ZT10-16; ZT00 corresponds to lights-on and ZT12 to lights-off) that dominated the distribution in liver (Figure 4D, F) was virtually absent from kidney (Figure 4C, E). Instead, kidney showed enrichment for transcripts with maximal translation occurring around ZT4 and ZT16. Visual inspection of individual examples confirmed the organ specificity of RPF rhythms. The cases of robust translational oscillations that we [10] and others [11, 12] had previously identified in liver were thus absent or severely blunted in kidney. This included mRNAs encoding ribosomal proteins (RPs), which make up the bulk of genes showing a translational surge at the light-dark transition (e.g. Rps25, Rpl23a), as well as transcripts encoding the transcription factors Deaf1 (deformed epidermal autoregulatory factor 1) and Mxi1 (MAX interactor 1), and mRNAs containing iron-responsive elements in their 5′ UTRs (e.g. Ferritin light chain 1, Ftl1) (Figure 4G), all of which we had previously reported as translationally rhythmic in liver [10]. Rhythmic translation exclusive to kidney was not significantly enriched for particular pathways (data not shown), and the temporal profiles were overall of lower amplitude than those seen for liver; Tma7 (Translational machinery associated 7 homolog), Ddb2 (Damage-specific DNA binding protein 2), Actg1 (Actin, gamma, cytoplasmic 1) and Hoxd3 (Homeobox D3; not expressed in liver) were among the most distinct examples (Figure 4H).
In summary, we concluded that temporal changes in TE were strikingly tissue-specific and overall relatively rare in kidney. Specifically for transcripts encoding RPs and other components of the translation machinery, which are the most prominent group of TE rhythmic genes in liver, it has been suggested that feeding-dependent mTOR-signaling underlies the translational upsurge at the light-dark transition via a mechanism involving the 5′-terminal oligopyrimidine (5′-TOP) motifs that these transcripts carry [11, 12]. Interestingly, the TE comparison between both tissues revealed that kidney RP translation occurred at a relatively high level throughout the day (Figure 4I). The lack in rhythmicity for RP genes in this organ may thus result from an absence of translational repression during the light phase rather than a lack in activation in the dark phase. It may indicate that the kidney is less sensitive to systemic cues engendered by feeding and fasting (see Discussion).
Different degrees of tissue specificity in core clock gene expression at the level of RNA abundance and protein biosynthesis
Clocks exhibit functional differences across cell types and organs, for example at the level of rhythmicity parameters (e.g. free-running period and phase [5]), of clock output gene repertoires [3], of oscillator strength and robustness [23, 24], or with regard to clock gene loss-of-function phenotypes [25]. Conceivably, the precise timing and level at which the various clock proteins are produced may modulate properties of the clock circuitry and underlie some of the abovementioned functional variations. In order to investigate these possibilities we compared the expression of core clock components in both organs.
We first investigated transcript and footprint RPKMs as averages over timepoints to assess the cumulative daily production of clock RNAs and proteins. Most core clock genes showed a considerable degree of organ-specificity in their expression levels that was readily appreciable in the footprint vs. transcript abundance representation with both organs overlaid in a single graph (Figure 5A). Two tissue differences caught our particular attention. First, the balance between the transcriptional activators Rora/Rorc and repressors Nr1d1/Nr1d2 differed markedly betweenorgans and was skewed towards repression in kidney (i.e., higher Nr1d1/2 and lower Rora/c RPKMs in kidney, Figure 5A-B). These transcriptional regulators bind to shared sequence elements on DNA and form the “interconnecting limb” within the rhythm-generating clock circuitry. In addition, they also control an output branch of the oscillator [1, 2]. It is hence conceivable that adjusting the relative levels of NR1D1/2 vs. RORs tailors clock-controlled gene expression in a tissue-specific fashion. Our observation of an active state of this output branch in liver and a more repressed state in kidney, is fully consistent with the current knowledge of its target genes and knockout phenotypes, which point to a prominent role in the regulation of hepatic pathways such as lipid, cholesterol and bile acid metabolism [26].
A second tissue difference concerned the main constituents of the negative limb, the Period (Per) and Cryptochrome (Cry) genes. Heterotypic PER:CRY protein complexes inhibit ARNTL:CLOCK-driven transcriptional activity and thus lie at the core of the oscillator’s principal negative feedback loop. We observed a shift to more PER (and slightly less CRY) biosynthesis in kidney (Figure 5B). PERs (and in particular PER2) are considered stoichiometrically rate-limiting components of the PER:CRY complex, and increased PER2 dosage engenders long periods [27, 28]. Interestingly, tissue explant experiments have shown that kidney clocks free-run with almost 1.5 hour longer periods than liver clocks [5], as would be predicted from the increased PER biosynthesis that our analyses revealed. As a more general concept, we deem it conceivable that the modulation of biosynthesis levels for individual clock proteins may be a more general mechanism to engender distinct differences in clock parameters across cell types.
We noted that for the majority of clock genes (Npas2, Cry1, Cry2, Per1, Per2, Per3, Nr1d1, Rorc) the tissue differences were less pronounced at the RPF than at the RNA level (Figure 5B), indicating that translation efficiencies partially counteracted RNA expression differences. Only in four cases (Clock, Arntl, Nr1d2, Rora) TEs exacerbated transcript abundance differences and led to higher tissue differences at the RPF level. Interestingly, this observation could also be made in the time-resolved data. As a measure of similarity between rhythmic profiles, we used the Euclidean distances calculated between the four rhythmic traces of each individual gene (i.e., RNA and RPF in kidney and liver; Figure S13). Hierarchical clustering of the similarities for the ensemble of the 12 main core clock genes showed that RPF rhythms from the two organs grouped together (Figure 5C). Clock protein biosynthesis rhythms between organs were thus more similar than RNA and RPF rhythms within organs. By contrast, the 178 common rhythmic genes identified in Figure 3C – serving as a control set for this analysis – revealed within-organ clustering (Figure 5D). These findings underscored that translational compensation was occurring within the core clock, where it led to more similar rhythms in clock protein biosynthesis than would have been predicted from the rhythmic RNA abundance profiles. This phenomenon was, however, not a general feature of all rhythmic gene expression.
The transcriptome-wide analyses described further above had shown only weak signs of association between cross-organ differences in TE and in uORF usage (Figure S8C-E). However, we knew from our previous work in liver that at least five core clock transcripts (Nr1d1, Nr1d2, Cry1, Clock, Arntl) contained translated, potentially regulatory, AUG-initiated uORFs [10]. We therefore examined whether for any of these concrete cases there was evidence for a connection between uORF translation and cross-organ TE differences. The read distribution along the transcripts (Figure S14A) and the marked frame preference of RPF reads (Figure S14B) confirmed that the footprints mapping to our annotated uORFs likely reflected active translation. However, only in one case, Nr1d2, there was a distinct anticorrelation between uORF usage and TE differences on the CDS (Figure 5E). Nr1d2 contains two translated uORFs in the 5′ UTR (Figure 5F), whose decreased usage in kidney was accompanied with higher TE on the CDS in this organ (Figure 5E). For Nr1d2, differential uORF usage could thus represent a plausible mechanism that contributes to regulating organ-specific gene expression output at the translational level, keeping NR1D2 biosynthesis low in liver and high in kidney.
Discussion
Given that the functionally relevant output of most gene expression is the protein, quantitative and genome-wide analyses of protein biosynthesis are of high interest to complement the wealth of transcriptomics data that is already available. Of note, only the fairly recent development of the ribosome profiling technique [29] has made it possible to analyse translational events in a quantitative, high-throughput fashion. Our study of two paradigms of differential gene expression, i.e. its tissue- and its time of day-dependence, is among the first of its kind and, together with the associated datasets and resources, will likely be of wide interest and utility to researchers working in the chronobiology and gene expression fields.
We have addressed several, rather fundamental questions that go beyond the chronobiological focus of the study: How does the dynamic range of translation efficiency compare to that of transcript abundances across two distinct organs of an animal? Is translation efficiency a default transcript property and comparable across two tissues, or do TEs become reinterpreted depending on cell type or organ? Does cross-tissue variability of TEs come with any direction i.e., is there a global tendency to either reinforce or to counteract transcriptomal differences?
To our knowledge, only one previous study has reported on ribosome profiling datasets from two complementary mammalian tissues, rat liver and heart [19]. This study also included animals with different genetic backgrounds as covariates in the experimental design, and its main focus was on strain differences in translation rather than on tissue differences. Our analyses based on more than 10000 genes commonly expressed in liver and kidney show that cross-organ TE differences are widespread, but of limited magnitude. Across genes in a tissue and for individual genes between tissues, the dynamic range of translation efficiencies is thus about 30-50-fold narrower than that of transcript abundances. These findings are coherent with the view that major differences in gene expression are set up at the level of transcription (possibly with some influence coming from RNA stability as well), whereas differences in translation rate have more of a modulatory role. It is intriguing that this modulation is overall characterised by directionality, with TE differences between tissues globally counteracting some of the mRNA abundance differences. Such translational compensation has previously been observed for divergent transcript expression levels across yeast species [30] and across different rat strains [19]; our study now extends this observation to gene expression across organs. Moreover, the idea of translational compensation is conceptually similar to findings that proteomes are evolutionarily more highly conserved than transcriptomes [31, 32]. As an underlying common principle, these cases may indicate that selective pressure on precise gene expression levels likely acts on protein abundances, whereas a certain degree of variability (even noise) in RNA levels may be tolerated without further consequences. It will be exciting to study the underpinnings of translational compensation further, across tissues and across species.
Maybe not unexpectedly, there was no dominant, distinct sequence feature that could serve as a predictor for cross-organ TE differences. Rather, we found several associations with a number of transcript characteristics. Conceivably, these contribute collectively to modulating TEs in concert with the specific cellular and tissue environment and possible cell type differences in the translation machinery including its regulators and trans-acting factors. While our ribosome profiling studies have allowed us to record the outcome of such regulation at high resolution, understanding its causes represents an exciting challenge for the future. For now, we can only infer that an overarching theme of the identified associations is a connection to 5′ UTRs, which is in also in line with the notion that initiation is rate-limiting for most translation events. We thus observed associations of cross-organ TE differences with 5′ UTR length, with uORF usage, with GC content and folding potential, as well as with transcript isoform diversity that affected the 5′ UTR. We would like to point out that comprehensive uORF annotations remain a bioinformatics challenge that is far from resolved. We have therefore restricted our analyses to AUG-initiated ORFs, inevitably leading to a bias towards false-negatives in uORF annotation. As we will learn how to annotate uORFs more comprehensively and more precisely in the future, it may be worth revisiting the relationship between differential TE and uORF translation in our datasets in order to evaluate whether a clearer role for these regulatory sequence elements will emerge.
Our study has led to novel insights into rhythmic gene expression. The extent to which rhythmicity is generated by the temporal regulation of translation has been the subject of speculation ever since the first report of rhythmic proteins encoded by non-rhythmic mRNAs [7]. Our kidney datasets complement recent time-resolved ribosome profiling data from liver [10, 11] and from a cell line [33]. As compared to liver, the number of transcripts subject to translational rhythms in kidney is slightly lower, but overall in a similar order of magnitude with around 1% of the transcriptome affected. It came as a surprise that translational rhythms were essentially tissue-specific in terms of the affected genes and the phase distributions. A possible explanation could be that these rhythms are driven by rhythmic systemic cues to which tissues do not respond equally. The effects of feeding and mTOR signalling, for example, may be more pronounced in liver due to the dedicated role that this organ plays in energy homeostasis and fasting responses, thus explaining the differences in translational oscillations for RP genes. Beyond the role that translation has in generating rhythms, our analyses have pointed to an additional, rhythmicity-modulating role that appears to affect gene expression quite broadly, i.e. the timing of the phase of protein biosynthesis oscillations relative to that of mRNA abundance rhythms. Consistent with work by the Green lab that showed interactions between polyadenylation status of mRNAs and rhythmic protein expression in the liver [21], it is tempting to speculate that related mechanisms are operative across organs, with tissue-specific deadenylation kinetics tuning the timing of rhythmic protein biosynthesis. Finally, our study is a first attempt to resolve tissue differences in core clock gene expression as a factor contributing to functional differences of the oscillator. It is interesting that the core clock mechanism has a long-standing history of being referred to as a “transcription-translation feedback loop” [34], although the actual feedback occurs at the transcriptional level, and possible mechanistic functions of translational regulation have not been much investigated. Our cross-organ comparison of core clock protein biosynthesis suggests that translational control – including through the activity of uORFs [10, 33] – is of regulatory interest and represents a way by which the identical set of core clock genes could form circuitries with different stoichiometry of its main components. As a result, both clock parameters and output gene repertoires may be organ-specifically tuned.
Conclusions
How translational differences contribute to overall gene expression diversity is still poorly understood. Our study uncovered translational changes that occur across two paradigms of regulated gene expression, i.e. around-the-clock and between tissues. Daily gene expression rhythms generated at the translational level were strongly organ-specific with regard to the identities and phase distributions of affected genes. Moreover, our data indicate that translation efficiency differences between organs can adjust the timing of protein production from rhythmic mRNAs, and the levels of core clock protein production, in agreement with the tissue-specificities observed in clock output gene sets and clock parameters. Together, these results are consistent with an important role of post-transcriptional mechanisms in mammalian circadian gene expression regulation. Beyond the temporal dimension, we have explored constitutive protein biosynthesis across organs. Our quantitative analyses underscore that gene expression divergence is largely programmed at the transcript abundance level. Interestingly, the widespread differences in translational efficiency that we detected between organs even serve to achieve higher concordance in protein production between tissues. Conceivably, such translational compensation reflects a selective pressure to maintain precise protein levels rather than mRNA levels. The high-resolution genome-wide translatome datasets generated in this study will allow further explorations into the mechanisms of post-transcriptional control and differential gene expression in vivo.
Methods
Animals
12-week old male mice (C57BL/6J; Janvier Labs) were entrained for two weeks to light:dark 12:12 cycles with ad libitum access to food and water and were anesthetized (isoflurane) and sacrificed every two hours (ZT0 - ZT22, with ZT0 corresponding to “lights-on”) for two daily cycles. Livers and kidneys were removed and processed either directly or flash-frozen in liquid N2. All experimental procedures were approved by the Veterinary Office of the Canton Vaud (authorisation VD2376).
Ribosome profiling
Generation of liver RPF-seq and RNA-seq libraries using the ARTseq ribosome profiling kit (Epicentre) was described recently [10, 16]. Kidney libraries were prepared in the same manner, with a single modification to the order of steps in RPF library preparation. After RNase treatment and recovery of ribosome-protected fragments, 5μg of material was first rRNA-depleted (Ribo-Zero magnetic kit, Epicentre) and then purified by 15% PAGE. In the formerly prepared liver libraries, Ribo-Zero treatment and PAGE purification had been inverted because at the time we had found that changing the order had a beneficial effect on obtaining highly concentrated libraries. For the kidney samples, however, we noted that this modified order led to higher contamination with reverse-strand rRNA probes bleeding from the Ribo-Zero kit and we thus reverted to ARTseq’s original order. All other steps and materials were identical between liver and kidney samples and followed the ARTseq ribosome profiling kit instructions. RPF and RNA libraries were sequenced on an Illumina HiSeq 2500.
Sequencing data processing, alignment and quantification
Processing, quality assessment, alignment and quantification of sequencing data were performed as described previously [10, 16]. Briefly, after adapter trimming using Cutadapt [35], the length distribution of trimmed reads was used to assess the quality of nuclease digestion and size-selection, which is particularly important for RPF libraries (Figure S1B). Trimmed reads were filtered by size (26-35 nt for RPF; 21-60 nt for RNA) using an in-house Python script, and sequentially mapped to mouse rRNA, human rRNA, mitochondrial tRNA, mouse tRNA, mouse cDNA (Ensemble mouse database release 75) using Bowtie v2.2.1 [36] and mouse genome (GRCm38.p2) using Tophat v2.0.11 [37]. Trimmed and filtered sequences were also directly mapped against the mouse genome (Tophat v2.0.11) in order to estimate expressed transcript models in each organ (using Cufflinks v2.2.1 [38]). Transcriptome-mapping reads in the sequential alignment were counted towards their location into the 5′ UTR, CDS or 3′ UTR of the transcript, based on feature annotation (Ensemble mouse release 75). Mappable and countable feature lengths were not calculated for this study (see “faux reads analysis” in the “Quantification of mRNA and ribosome footprint abundance” section of Supplemental Experimental Procedures of previous study [10]) as its contribution was negligible for further analyses. Therefore RPKM calculations in this study were not corrected with such factor. Read counts in RNA-seq and RPF-seq datasets were normalised with upper quantile method of edgeR [39] and RPKM values were calculated as the number of reads per 1000 bases per geometric mean of normalised read counts per million. Relative translation efficiencies (TE) were calculated as the ratio of RPF-RPKM to RNA-RPKM per gene per sample. Reading frame and nucleotide periodicity analyses were performed as in [10]. Principal Component Analysis (PCA) relied on a combined matrix of CDS counts for RPF and RNA from both liver and kidney and following the same approach as before [10]. Ribo-seq Unit Step Transformation (RUST) analysis was used to assess whether the sequencing libraries were globally of similar quality in terms of their local footprint densities [17]. RUST is a simple normalisation method that reduces the heterogeneous noise in the data and allows identification of mRNA sequence features that affect footprint densities globally. We used the version 1.2 of the published rust_codon.py standalone python script with minor modifications to reflect the experimental settings as closely as possible (i.e., A-site offsetting). RUST codon profile and corresponding Kullback–Leibler (K–L) divergence for each library (RPF and RNA) was generated against a database of 8012 single protein isoform transcript sequences using all mapped reads with a length of 28-32 nt. The K-L divergences from all samples for each combination of tissue (kidney or liver) and read type (RPF or RNA) were used to generate K-L profiles at the 0, 10, 25, 50, 75, 90, and 100th quantiles.
Correlation analyses and assessment of translational compensation across organs
Correlation of RNA-seq and RPF-seq across organs: Kidney vs. liver correlations at the levels of RNA- and RPF-seq (i.e., Figure 2B; Figure S6C-E, H-L) were calculated in a pairwise fashion for each of the 24 samples (12 timepoints, 2 replicates/timepoint), as livers and kidneys of each replicate originated from the same animals. Significance of the difference in the Spearman coefficients between both distributions was assessed by paired t-test on Fisher z-transformed coefficients. Heart vs. liver correlation at the levels of RNA and RPF-seq (Figure S7A) was calculated from the study [19], using the BN-Lx reference rat strain data. Since the five heart and liver replicates in this study did not come from the same animals, we calculated all possible pairwise correlation coefficients between heart and liver (i.e., 25), and compared all possible combinations of 5 coefficients between RNA- and RPF-seq (paired t-test on Fisher z-transformed coefficients).
Measurement error: Measurement errors (Figure S6A,B,F,G) were calculated similarly to [18] using the meas.est() function from smatr R package [40]. Genes were first binned according to average expression level (calculated as the fourth root of the product of liver RNA-seq, liver RPF-seq, kidney RNA-seq and kidney RPF-seq) into 10 groups, each containing 10% of all genes. Within each bin, the measurement error was calculated separately for RNA- and RPF-seq and for liver and kidney, using the two replicates (log of normalised CDS counts) to estimate the error, and the 12 timepoint samples to estimate its variability. For the analyses using a filtered gene set (Figure S6F-G), genes that showed a mean expression ratio (either between organs or between RNA- and RPF-seq) greater than 2 for all timepoints were excluded (9236 genes used in analysis).
Analyses of differential translation efficiency
To test for differential translation efficiency (TE) between liver and kidney we used the Wilcoxon-signed rank paired test, using all 24 samples (12 timepoints; 2 replicates/timepoint) as replicates; resulting p-values were FDR-corrected. A gene was defined as having differential TE when FDR < 0.01 and the inter-organ difference in TE was at least 1.5-fold (Figure 2E).
Analysis of transcript usage diversity across organs: For each gene g, P(g) = (p 1, …, p n) is the vector of the relative expression proportions of its n protein-coding transcripts, as estimated from our RNA-seq analysis (see Sequencing data processing, alignment and quantification). To quantify the dissimilarity in relative transcript isoform expression between liver L and kidney K, the Hellinger distance H is defined as:
In order to detect the transcript features that were associated with tissue specificity in TE, we selected genes whose transcript diversity between both organs originated from, or was excluded from, 5′ UTR, CDS, or 3′ UTR, based on feature annotation information for the detected protein-coding transcripts (Figure 2G and Figure S9).
Study of transcript characteristics: For single-isoform genes, we investigated whether a particular transcript characteristic (length, GC content, Kozak context, structure) could be predictive of differential TE. Length and GC content were determined directly on the whole transcript and/or on the region of interest (5′ UTR, CDS, 3′ UTR). Kozak context was scored according to the consensus sequence GccA/GccAUGG, where upper-case letters denote highly conserved bases (scored +3), lower-case letters indicate the most common nucleotides (scored +1), and bold is the start codon (not scored), giving a maximum score of 13. Minimum free energy secondary structures on the 5′ UTR were predicted with RNAfold from ViennaRNA package with default parameters [41].
Detection and translation efficiency calculation for uORFs
To assess the impact of differential uORF usage on TE differences across organs, uORFs were identified as in our previous study [10]. Briefly, genes expressing a single protein-coding isoform in both organs were used for this analysis (N=5815). We selected uORFs with an AUG start codon and a length of at least 18 nt to the first in-frame stop codon and considered them as translated if the reads showed significant frame bias towards the reading frame of the uORF start codon and if coverage was >10%. uORF translation efficiency was calculated from the ratio of RPF-seq to RNA-seq reads whose predicted A-sites mapped to the annotated uORF regions. If several uORFs partially of completely overlapped on a given 5′ UTR, a composite uORF was considered for read counting. uORFs overlapping with the CDS in the same frame were not considered. When they overlapped in different frames, only reads mapping to the 5′ UTR-specific uORF sequence (but not the overlapping sequence) was considered for quantifications.
Rhythmicity analyses
Rhythmicity detection and rhythmic parameter estimations in each dataset (RNA-seq and RPF-seq, liver and kidney) were done based on Akaike information criterion (AIC) model selection as in our previous study [10]. The Babel computational framework [22] was used to detect rhythmically translated genes from constantly expressed mRNAs within each organ. For cross-correlation of time series to compare the daily profiles of rhythmic genes beyond their peak differences, we used the ccf function in R. As we computed the correlations of the RPF-seq with respect to the RNA-seq profiles, negative lag values correspond to RPF leading RNA.
Hierarchical clustering of rhythmic genes
To evaluate the similarity of the expression profiles for rhythmic genes, a dissimilarity matrix was computed for each gene of interest, based on the Euclidean distance between the RNA-seq and RPF-seq expression profiles within and across organs. A hierarchical clustering tree was constructed on the weighted average of the dissimilarity matrices of genes under consideration (core clock genes in Figure 5C or all rhythmic genes in Figure 5D), using the “average” clustering method. The R functions{packages} dist{stats}, fuse{analogue} and hclust{stats} were used for computing the individual dissimilarity matrices, the weighted mean dissimilarity matrix, and the hierarchical clustering, respectively.
Declarations
Electronic supplementary material
CasteloSzekely_Suppl.pdf: This file contains the Supplementary Figures S1-S14
CasteloSzekely_Suppl_Table_S1.xls: This file contains information on the deep-sequencing data from kidney (raw read counts, mapping summary etc.)
CasteloSzekely_Suppl_Table_S2.xls: This file contains details of the GO-term analysis on the differential TE gene set of Figure 2.
CasteloSzekely_Suppl_Table_S3.xls: This file contains the outcome of the transcriptome-wide rhythmicity analyses on the kidney datasets (related to Figure 3A).
CasteloSzekely_Suppl_Table_S4.xls: This file contains the outcome of the rhythmicity analyses in kidney and liver for the 178 commonly rhythmic genes (related to Figure 3C).
Supp_dataset_S1.zip: This zip-file contains transcriptome-wide, time-resolved expression plots for kidney RPF (blue) and RNA (orange) in the left panels (with “error bars” connecting the two replicates of each timepoint) and TE in the right panels.
Supp_dataset_S2.zip: Similar to Supp_dataset_S2.zip, expression plots for kidney and liver for the 178 common rhythmic genes of Figure 3C.
Competing interests
The authors declare that they have no competing interests.
Author contributions
VCS and PJ performed the experiments. VCS and ABA analysed the data. DG and VCS wrote the manuscript with inputs from the others authors. DG designed the project.
Acknowledgments
We thank Lausanne Genomic Technologies Facility and Vital-IT staff for high-throughput sequencing and computational support. DG acknowledges support by: Swiss National Science Foundation (grants 128399, 157528); National Centre of Competence in Research (NCCR) RNA & Disease; Fondation Pierre Mercier; Fondation Leenaards; Olga Mayenfisch Stiftung; SystemsX.ch StoNets consortium; University of Lausanne. PJ was supported by Human Frontiers Science Program long-term fellowship LT000158/2013-L.