Site identification in high-throughput RNA-protein interaction data

Bioinformatics. 2012 Dec 1;28(23):3013-20. doi: 10.1093/bioinformatics/bts569. Epub 2012 Sep 28.

Abstract

Motivation: Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation- (CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however.

Results: We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions.

Availability and implementation: We have implemented our method in a software tool called Piranha. Source code and binaries, licensed under the GNU General Public License (version 3) are freely available for download from http://smithlab.usc.edu.

Contact: andrewds@usc.edu

Supplementary information: Supplementary data available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Base Sequence
  • Binding Sites
  • Computational Biology / methods
  • HEK293 Cells
  • HeLa Cells
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • RNA / genetics
  • RNA-Binding Proteins / genetics
  • Sequence Analysis, RNA / methods*
  • Software*

Substances

  • RNA-Binding Proteins
  • RNA