PT  - JOURNAL ARTICLE
AU  - Albert Planas
AU  - Xiangfu Zhong
AU  - Simon Rayner
TI  - miRAW: A deep learning approach to predict miRNA targets by analyzing whole miRNA transcripts
AID  - 10.1101/220483
DP  - 2017 Jan 01
TA  - bioRxiv
PG  - 220483
4099  - http://biorxiv.org/content/early/2017/11/16/220483.short
4100  - http://biorxiv.org/content/early/2017/11/16/220483.full
AB  - Abstract MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression by binding to partially complementary regions within the ’UTR of their target genes. Computational methods play an important role in target prediction and assume that the miRNA “seed region” (nt 2 to 8) is required for functional targeting, but typically only identify ∽80% of known bindings. Recent studies have highlighted a role for the entire miRNA, suggesting that a more flexible methodology is needed.We present a novel approach for miRNA target prediction based on Deep Learning (DL) which, rather than incorporating any knowledge (such as seed regions), investigates the entire miRNA and 3’UTR mRNA nucleotides to learn a uninhibited set of feature descriptors related to the targeting process.We collected more than 150,000 experimentally validated homo sapiens miRNA:gene targets and cross referenced them with different CLIP-Seq, CLASH and iPAR-CLIP datasets to obtain ∽20,000 validated miRNA:gene exact target sites. Using this data, we implemented and trained a deep neural network - composed of autoencoders and a feed-forward network - able to automatically learn features describing miRNA-mRNA interactions and assess functionality. Predictions were then refined using information such as site location or site accessibility energy.In a comparison using independent datasets, our DL approach consistently outperformed existing prediction methods, recognizing the seed region as a common feature in the targeting process, but also identifying the role of pairings outside this region. Thermodynamic analysis also suggests that site accessibility plays a role in targeting but that it cannot be used as a sole indicator for functionality. Predictions were then refined using information such as site location or site accessibility energy.In a comparison using independent datasets, our DL approach consistently outperformed existing prediction methods, recognizing the seed region as a common feature in the targeting process, but also identifying the role of pairings outside this region. Thermodynamic analysis also suggests that site accessibility plays a role in targeting but that it cannot be used as a sole indicator for functionality.Data and source code available at: https://bitbucket.org/account/user/bipous/projects/MIRAWAuthor summary microRNAs are small RNA molecules that regulate biological processes by binding to the 3&#039;UTR of a gene and their dysregulation is associated with several diseases. Computationally predicting these targets remains a challenge as they only partially match their target and so there can be hundreds of targets for a single microRNA. Current tools assume that most of the knowledge defining a microRNA-gene interaction can be captured by analysing the binding produced in the seed region (≈ the first 8nt in the miRNA). However, recent studies show that the whole microRNA can be important and form non-canonical targets. Here, we use a target prediction methodology that relies on deep neural networks to automatically learn the relevant features describing microRNA-gene interactions for predicting microRNA targets. This means we make no assumptions about what is important, leaving the task to the deep neural network. A key part of the work is obtaining a suitable dataset. Thus, we collected and curated more than 150,000 experimentally verified microRNA targets and used them to train the network. Using this approach, we are able to gain a better understanding of non-canonical targets and to improve the accuracy of state-of-the-art prediction tools.