## Abstract

Competition to bind microRNAs induces an effective positive crosstalk between their targets, therefore known as ‘competing endogenous RNAs’ or ceRNAs. While such an effect is known to play a significant role in specific situations, estimating its strength from data and, experimentally, in physiological conditions appears to be far from simple. Here we show that the susceptibility of ceRNAs to different types of perturbations affecting their competitors (and hence their tendency to crosstalk) can be encoded in quantities as intuitive and as simple to measure as correlation functions. We confirm this scenario by extensive numerical simulations and validate it by re-analyzing PTEN’s crosstalk pattern from TCGA breast cancer database. These results clarify the links between different quantities used to estimate the intensity of ceRNA crosstalk and provide new keys to analyze transcriptional datasets and effectively probe ceRNA networks *in silico*.

## INTRODUCTION

MicroRNAs (miRNAs) are small non coding RNA (ncRNA) molecules that post-transcriptionally regulate a significant portion of the eukaryotic transcriptome via sequence-specific, protein-mediated binding in the cytoplasm [1]. Their primary effects on coding transcripts consist in inhibiting translation and fostering degradation [2]. Long ncRNAs, instead, can transiently sequester miRNAs, thereby altering their availability and overall repressive potential [3]. Following early observations concerning small regulatory RNAs in plants and bacteria [4, 5], competition to bind miRNAs has been hypothesized to cause an effective positive interaction (‘crosstalk’) between their coding and/or non-coding targets that may directly affect protein levels [6] (see Fig. 1A,B). Several experimental and modeling studies have clarified the conditions under which such a scenario may become biologically relevant, highlighting specifically how molecular levels and kinetic heterogeneities may control it [7–13]. So far, such a ‘ceRNA effect’ (whereby ceRNA stands for ‘competing endogenous RNA’) has been quantitatively validated in cases of differentiation [14], disease [15] or in presence of unphysiologically large transcriptional inputs [16]. Its significance in standard physiological conditions is therefore subject to scrutiny [17].

A major difficulty in detecting the ceRNA effect unambiguously in experiments or data lies in the fact that it should be disentangled from other mechanisms that may bear a similar impact, i.e. an effective positive coupling, on transcripts. Imagine a network of *N* ceRNA species interacting with *M* miRNA species. ceRNA levels *m*_{i} (*i* = 1*, …, N*) fluctuate stochastically in time due to random synthesis and degradation events and to interactions with miRNAs, whose levels are also subject to random fluctuations. Denoting by 〈·〉 the time-average in the steady state, an effective ceRNA-ceRNA dependence can be signaled by a statistical correlation coefficient such as Pearson’s [8], i.e.

with the idea that, if *ρ*_{ij} is large enough, a perturbation altering the level of ceRNA *j* will cause part of the miRNA population to move from one target to the other, effectively broadcasting the perturbation from ceRNA *j* to ceRNA *i* through miRNA-mediated interactions. A more direct description of this mechanism is attained instead via *susceptibilities* like [7]
where *b*_{j} stands for the transcription rate of ceRNA *j*. *χ*_{ij} quantifies the shift in the mean level of ceRNA *i* caused by a (small) variation in *b*_{j}, and a large *χ*_{ij} (assuming no direct control of ceRNA *i* by ceRNA *j*) points to miRNA-mediated crosstalk between ceRNAs *i* and *j* (see Fig. 1C).

While both *χ*_{ij} and *ρ*_{ij} capture aspects of ceRNA crosstalk seen in experiments, their underlying physical meaning is *a priori* different. Fluctuating miRNA levels naturally correlate co-regulated targets, so that a large *ρ*_{ij} is obtained when both ceRNAs respond to the stochastic dynamics of their regulator. This however does not necessarily imply a large *χ*_{ij}. In fact, *χ*_{ij} can be large even in absence of fluctuations in miRNA levels, i.e. as a consequence of competition alone. In such conditions, *ρ*_{ij} vanishes. *χ*_{ij} has indeed been found to be asymmetric under exchange of its indices (i.e. *χ*_{ij} ≠ *χ*_{ji} in general) [7], at odds with *ρ*_{ij} which is necessarily symmetric. It would therefore be important to clarify how quantities like (1) and (2) are related in miRNA-ceRNA networks, especially to understand whether responses to perturbations (a central quantity of interest for many potential applications of the ceRNA effect) can be encoded in quantities as intuitive and as simple to measure experimentally or from data as a Pearson correlation coefficient.

Here we show that the information conveyed by *χ*_{ij} is indeed captured by a correlation function similar to *ρ*_{ij}. On the other hand, *ρ*_{ij} is linked to a susceptibility, i.e. to the response of a target to a perturbation altering the level of its competitor, but the perturbation concerns the intrinsic *decay* rate of the competitor rather than its transcription (as is the case for *χ*_{ij}). In the following, we will derive these results and validate them by computer simulations and gene expression data analysis, and explore their consequences.

## MATERIALS AND METHODS

Numerical simulations were performed using the Gillespie algorithm [18], an implementation of which, for a miRNA-ceRNA network, is available from https://github.com/araksm/ceRNA/. Gene expression data analysis was performed starting from 1098 breast cancer samples obtained from The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/; project ID: TCGA-BRCA).

## RESULTS

### Theory

We start from the dynamics of molecular populations in a miRNA-ceRNA network, denoting by *m*_{i} the level of ceRNA species *i* (ranging from 1 to *N*), by *µ*_{a} the level of miRNA species *a* (ranging from 1 to *M*), and by *c*_{ia} the levels of miRNA-ceRNA complexes. In the deterministic limit where stochastic fluctuations are neglected, the time evolution of concentration variables is described by
with the different parameters denoting intrinsic synthesis (*b*_{i}, *β*_{a}) and degradation rates (*d*_{i}, *δ*_{a}), complex association/ dissociation rates () and complex processing rates (*σ*_{ia} and *κ*_{ia} for stoichiometric and catalytic processing, respectively), while represents the mean lifetime of the complex formed by miRNA *a* and ceRNA *i*. We note that if the mean lifetime of complexes is much shorter than that of free molecular species, i.e. if *τ*_{ia} ≪ 1*/d*_{i} and *τ*_{ia} ≪ 1*/δ*_{a} for each *i* and *a*, miRNA-ceRNA complexes achieve a steady state much faster than miRNA and ceRNA levels. In such conditions, and one can eliminate complexes from (3) by replacing *c*_{ia} with its steady state value

For (i.e. when stoichiometric degradation without miRNA recycling is the dominant channel of complex processing), this allows to re-cast (3) in the form (see Supplementary Text 1)
where *L* is a function of all miRNA levels *µ* = {*µ*_{a}} and all ceRNA levels m = {*m*_{i}} given by

One easily sees (see Supplementary Text 2) that *L* decreases along trajectories of (5), implying that its minimum describes the physically relevant steady state of (3) with m ≠ 0 and *µ* ≠ 0.

If intrinsic molecular noise (arising from stochastic transcription and degradation events and from titration due to miRNA-ceRNA interactions) is added to (3), after a transient molecular levels will eventually stabilize and fluctuate over time around the steady state described by the minimum of *L*. We are interested in finding a compact and intuitive mathematical form for the correlations arising between the different components in such conditions. Molecular noise is Poissonian, namely the strength of fluctuations affecting each variable is proportional to the square root of mean molecular levels (see e.g. [13] for an explicit representation in the context of a miRNA-ceRNA network), which makes our goal especially challenging. However we will see that the effects of molecular noise can be remarkably well approximated by a uniform “effective temperature” *T* representing the strength of fluctuations affecting all molecular species involved. In this case, one can describe fluctuations around the steady state as thermal fluctuations around a Boltzmann-Gibbs equilibrium state. This allows to compute averages of generic functions of m and *µ* as “thermal averages”, i.e.

where
is a normalization factor, the deterministic limit being obtained for *T* → 0. In particular, defining
by straightforward calculations one finds

Therefore, in this approximation, the susceptibility *χ*_{ij} [Eq. (2)] is linked to the correlation function [Eq. (12)]
which, as *χ*_{ij}, is not symmetric under the exchange of *i* and *j*, while the ceRNA-ceRNA covariance
is tied to the susceptibility *ω*_{ij} quantifying the change in 〈*m*_{i}〉 induced by a (small) change of the *intrinsic degradation rate d*_{j} of ceRNA *j* [Eq. (11)]. (Note that *ω*_{ij} ≤ 0.) The constant linking these quantities is the temperature *T* quantifying the strength of the uniform “effective noise”.

Somewhat unexpectedly, the above results suggest that the susceptibility *ω*_{ij} must be symmetric under exchange of *i* and *j*, i.e., for instance, if the level of ceRNA *i* is altered by changing the intrinsic degradation rate of ceRNA *j*, then the reverse is also true. To check this property, one can calculate *ω*_{ij} explicitly for a system formed by *N* ceRNA species interacting with a single miRNA species at steady state by following a different route, specifically along the lines of [7]. Considering the repression strength to which ceRNAs *i* and *j* are subject at a given (mean) level 〈*µ*_{1}〉 of miRNA species 1 (*M* = 1 in this case), one finds, for each ceRNA, a soft “threshold” value of 〈*µ*_{1}〉, denoted by , such that *i* is unrepressed (resp. repressed or susceptible to changes in 〈*µ*_{1}〉) if 〈*µ*_{1}〉 ≪ *µ*_{0,i1} (resp. ≫ *µ*_{0,i1} or ≃ *µ*_{0,i1}). A direct calculation (see Supplementary Text 3) shows that *ω*_{ij} can attain large values only if both ceRNAs are susceptible to 〈*µ*_{1}〉, in which case one has (*i ≠ j*)
where *a*_{l} = 0, 1, 1/4 if ceRNA *l* is repressed, unrepressed or susceptible, respectively. Eq. (15) confirms that *ω*_{ij} is indeed symmetric under exchange of *i* and *j*.

Concerning the approximations under which the the above results were obtained, we remark that we started by considering (3) in the limit of (i) fast complex equilibration, and (ii) miRNA-ceRNA complex processing dominated by the stoichiometric channel, with the former playing the key role in deriving the function *L* (see Supplementary Text 1). We note however that the overall scenario just described also holds for when complexes evolve over time scales much longer than those of free molecular levels, i.e. for *τ*_{ia} ≫ 1/*d*_{i} and *τ*_{ia} ≫ 1/*δ*_{a}. In particular, (3) can again be re-cast in the form of (5) with *L* given by (6), albeit with re-scaled transcription rates (see Supplementary Text 4 for details).

Therefore we conclude that, as long as molecular noise can be approximated by a uniform effective temperature,

(i) the ceRNA-ceRNA covariance

*C*_{ij}= 〈m_{i}m_{j}〉_{c}is a proxy for the susceptibility*ω*_{ij}, and(ii) the correlation function

*X*_{ij}= 〈*m*_{i}log*m*_{j}〉_{c}is a proxy for the susceptibility*χ*_{ij}.

### Validation

We have validated the above scenario by simulating a small network involving 2 ceRNA and a single miRNA species via the Gillespie algorithm [18], where molecular noise is accounted for explicitly (see Supplementary Text 5). Results are summarized in Fig. 2, where we compare *ω*_{12}, *ω*_{21} and *C*_{12} ≡ *C*_{21} on on hand, and *χ*_{12}, *χ*_{21}, *X*_{12} and *X*_{21} on the other, as computed from simulations (i.e. with the actual molecular noise), against the theoretical predictions. We considered three scenarios for the mean lifetime of miRNA-ceRNA complexes, namely those covered by the theory (i.e. complex equilibration much faster and much slower than the equilibration of miRNA and ceRNA levels) as well as the intermediate case where characteristic timescales are comparable for all variables.

One sees that theoretical predictions obtained in the “thermal noise” approximation agree remarkably well with simulations including the actual molecular noise. In particular, the full correspondence between the susceptibilities *ω*_{ij} and *χ*_{ij} and the (re-scaled) correlation functions *C*_{ij} and, respectively, *X*_{ij} is evident. Notice that a single global parameter *T* ≥ 0 has been used to fit all data in each of the conditions. This shows how accurately the assumption of a uniform effective temperature can mimic the effects of intrinsic stochasticity. On the other hand, its limits might be reflected, at least in part, in the discrepancies that occur at high transcription rates.

These results confirm that (12) and (14) are indeed good predictors of the response of a ceRNA to a perturbation affecting one of its competitors within a miRNA-ceRNA network. Notably, such correlation functions are easy to estimate from transcription data sets. Our framework therefore has the potential to offer new insight into post-transcriptional regulation, its system-level organization and its impact on cellular functions.

In order to test this idea, we analyzed the ceRNA scenario emerging from 1098 breast cancer samples obtained from TCGA, focusing on the widely studied oncosuppressor PTEN and its immediate competitors (i.e. the ceRNAs sharing at least one miRNA regulator with PTEN). In particular, we computed
for a set of candidate PTEN ceRNAs found in [15] by means of Mutually Targeted miRNA-Response Element Enrichment Analysis. Notice that the average appearing in Eqs. (16–18) is over samples and not over time. We expect however that, if the interaction network is conserved across samples, averages over samples should reproduce statistical averages such as (13), as different samples effectively represent different snapshots of the state of the network. Fig. 3A shows that when (16) (whose value is encoded in the color of markers) is large, both (17) and (18) tend to be large. According to (15), a large *C*_{ij} (or *ω*_{ij}) signals that both PTEN and its competitor are susceptible to changes in the level of at least one of their shared regulators. For such pairs, in addition, it has previously been shown that both *χ*_{ij} and *χ*_{ji} are expected to be large [7]. This implies a fully bi-directional crosstalk, i.e. any perturbation affecting the level of one species should affect the level of the other via miRNA-mediated regulation. Remarkably, this was experimentally shown to be the case in [15] for some of the ceRNAs we tested (e.g. SERINC1, VAPA), all of which are in this regime according to our analysis. Adding to this, we are also able to point to a number of other PTEN competitors, a perturbation of which should trigger a response by PTEN.

On the other hand, smaller values of (16) (orange markers in Fig. 3A) are associated to strongly asymmetric PTEN-ceRNA pairs for which (18) is much larger than (17). This suggests that PTEN will respond to an increase of its competitor’s bare transcription rate (and not vice-versa), while no response of PTEN should be expected upon perturbing the bare decay rate of the same ceRNA as *C*_{ij} is small. Within the steady state theory of [7], ceRNA pairs with strongly different values of *χ*_{ij} and *χ*_{ji} pertain to cases where the responding ceRNA (PTEN here) is susceptible to variations in the miRNA levels while the perturbed one (PTEN’s competitor) is fully repressed. Our data analysis fully confirms both this scenario and the theory presented here in linking such cases to low values of the bare covariance (14).

Finally, note (Fig. 3B) that the above information can not be retrieved if *C*_{ij} is replaced by the Pearson coefficient *ρ*_{ij}, Eq. (1), which just amounts to normalizing *C*_{ij} by the product of the standard deviations *σ*_{mi} and *σ*_{mj} of *m*_{i} and *m*_{j}. Indeed, using the value of *ρ*_{ij} to color-code PTEN’s ceRNAs, one sees that the Pearson coefficient can mislead into expecting (or not expecting) a response to a perturbation when the actual susceptibilities are small (resp. large).

For instance, *ρ*_{ij} is rather small for the pair formed by PTEN and SLC1A2, which seems to suggest absence of mutual cross-talk between these two transcripts. However, while both *C*_{PTEN,SLC1A2} and *X*_{SLC1A2,PTEN} are small, *X*_{PTEN,SLC1A2} is significant. This suggests that (i) SLC1A2 will not respond to a perturbation affecting the transcription rate of PTEN, and (ii) the pair will be insensitive to changes in each other’s bare decay rate; however, (iii) PTEN *will be affected* by a change in the bare transcription rate of its competitor despite the small statistical correlation that exists between their levels. Likewise, the large value of the Pearson coefficient between PTEN and DTWD2 can mislead into generically expecting a response when instead the susceptibility is strongly perturbation-dependent. In particular, the level of DTWD2 should not be significantly modified by a change in the level of PTEN (as *X*_{DTWD2,PTEN} is rather small) in spite of the large Pearson coefficient. Notice that, remarkably, for this pair, *C*_{ij} and *ρ*_{ij} take on very different values.

## DISCUSSION

To sum up, we have identified [Eq.s (11) and (12)] a set of correlation functions that can serve as proxies for ceRNA susceptibilities to perturbations. Specifically, *C*_{ij} = 〈*m*_{i}*m*_{j}〉_{c} is related to the susceptibility *ω*_{ij} quantifying ceRNA *i*’s response to a change of the bare decay rate of ceRNA *j*, while *X*_{ij} = 〈*m*_{i} log *m*_{j}〉_{c} is related to the susceptibility *χ*_{ij} quantifying ceRNA *i*’s response to a change of the bare transcription rate of ceRNA *j*. These relations are valid at steady state and within the approximations discussed, are fully confirmed by numerical simulations.

Most importantly, quantities like *C*_{ij} and *X*_{ij} can be easily estimated from data and possibly measured in experiments. An analysis of PTEN’s emergent crosstalk pattern from TCGA breast cancer dataset using these functions has indeed shown that a map of ceRNA responses to perturbations affecting competitors can be constructed by combining the information provided by each, while the Pearson coefficient *ρ*_{ij} can be inaccurate in this respect. This opens the way to probing the structure and function of ceRNA networks *in silico* by straightforwardly analyzing transcriptional data, and provides a key to obtain testable transcriptome-scale predictions about ceRNA crosstalk.

Notice that our results apply without any modification to ceRNA pairs that don’t share miRNA regulators, i.e. it is capable of identifying long-range crosstalk (i.e. interactions between ceRNAs that are separated by multiple miRNAs along the miRNA-ceRNA network) of the kind discussed in [19]. In this sense, they can provide insight into ceRNA crosstalk both at the local scale and at an extended, network-level scale.

From the viewpoint of physics, results like (11) and (12) are akin to the “fluctuation-response relations” that constitute a cornerstone of statistical mechanics [20]. Their derivation in our context has relied on an equilibrium framework that presupposes stationarity of molecular levels. Since ceRNA crosstalk can be substantially more complex away from the steady state [21], a more refined mathematical study will be required to extend the theory developed here to off-equilibrium dynamical regimes. On the other hand, our results open the way for the application of recently developed inference techniques [22] to the estimation of miRNA levels or kinetic parameters from ceRNA levels. Overall, the approach presented here provides new means to extract information on post-transcription regulation from sequencing and/or gene expression data, thereby potentially enhancing our ability to exploit the ceRNA effect for therapeutic purposes by allowing for the identification of better (i.e. more responsive) targets for intervention.

## 2. *L* DECREASES ALONG THE DYNAMICS

By direct differentiation and using the fact that (see Eq. (5) in the Main Text) one finds

In other words, under the approximations discussed above, *L* decreases along the dynamics of the miRNA-ceRNA network. Therefore the minimum of *L* (which is unique by virtue of the concavity of *L*) describes a steady state of the dynamics (1).

## 3. APPROXIMATE CALCULATION OF *ω*_{ij} FOR A SYSTEM WITH ONE MIRNA AND *N* CERNA SPECIES

Starting from Eq. (3) taken for *N* ceRNA species and a single miRNA species (we suppress its index for simplicity) in the limit , the steady-state level of *m*_{i} reads

Now noting that
and that the steady state miRNA level can be approximated by [1]
so that
we can compute the susceptibility *ω*_{ij} as

One finds
where is a 3 × 3 matrix that only depends on the regime *R*(*i*) (repressed, susceptible or expressed) to which ceRNA *i* belongs. By considering the definitions of the different regimes in terms of the value of *µ*, all elements of are found to be ≪ 1 (for instance, *W*_{Expr,Expr} = *µ/*(*µ*_{0,i}*µ*_{0,j}) ≪ 1 as *µ* ≪ *µ*_{0,i} and *µ* ≪ *µ*_{0,j} if ceRNAs *i* and *j* are both expressed) except for *W*_{Susc,Susc}, which is given by
leading immediately to Eq. (15) of the Main Text.

## 4. CASE OF SLOW COMPLEX PROCESSING

Assuming complex levels *c*_{ia} are roughly stationary over time scales for which *m*_{i} and *µ*_{a} evolve (i.e. *τ*_{ia} ≫ 1/*d*_{i} and *τ*_{ia} ≫ 1*/δ*_{a} for each *i* and *a*), then all terms in (1) that involve the variables *c*_{ia} can be taken to be roughly constant for short enough characteristic times. In such conditions, miRNAs are effectively transcribed at rates
while ceRNAs are effectively transcribed at rates

In this limit, (1) can again be cast as with

The main difference from the previous case lies in the fact that the minimum of *L* should now be computed self-consistently from the asymptotic value of *c*_{ia}: after the (fast) equilibration of *m*_{i}’s and *µ*_{a}’s following (18), a new steady state value for complexes is computed as , leading in turn to new values for the effective transcription rates and and hence to new values for *m*_{i}’s and *µ*_{a}’s from (18), and so on until convergence.

## 5. STOCHASTIC DYNAMICS OF A MIRNA-CERNA NETWORK

The time evolution of our miRNA-ceRNA network with *N* ceRNA species (labeled *i*), *M* miRNA species (labeled *a*) and intrinsic (molecular) noise is described by the system
where while *η*_{a}, *ξ*_{a} and *ζ*_{ia} represent stochastic variables. As each noise source contributes independently to the overall noise level, one has
where *ξ*_{mi}, *η*_{µa}, , , and and are mutually independent zero-average random variables representing, respectively, the intrinsic noise in ceRNA levels, in miRNA levels, in the binding/unbinding dynamics of complexes, in the stoichiometric complex degradation channel and in the catalytic complex degradation channel. Correlations are, for each component, described by
where
denote the mean steady-state molecular levels. To obtain Fig. 2 of the Main Text, we have simulated the above system with *M* = 1, *N* = 2 using the Gillespie algorithm [2].

## Acknowledgments

We gratefully acknowledge Carla Bosia and Andrea Pagnani for useful insight and suggestions.

## Footnotes

↵* andrea.demartino{at}roma1.infn.it

## References

## References

- [1].
- [2].