Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA

View ORCID ProfileMuhammed Hasan Çelik, View ORCID ProfileAli Mortazavi
doi: https://doi.org/10.1101/2022.11.08.515683
Muhammed Hasan Çelik
1University of California Irvine, Department of Developmental and Cell Biology, Irvine, CA
2University of California Irvine, Center for Complex Biological Systems, Irvine, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Muhammed Hasan Çelik
Ali Mortazavi
1University of California Irvine, Department of Developmental and Cell Biology, Irvine, CA
2University of California Irvine, Center for Complex Biological Systems, Irvine, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ali Mortazavi
  • For correspondence: ali.mortazavi@uci.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Motivation Alternative polyadenylation (APA) is a major mechanism that increases transcriptional diversity and regulates mRNA abundance. Existing computational tools to analyze APA have low precision because these tools are designed for short-read RNA-seq, which is a suboptimal data source to study APA. Long-read RNA-seq (LR-RNA-seq) accurately detects complete transcript isoforms with poly(A)-tails, providing an ideal data source to study APA. However, current computational tools are incompatible with LR-RNA-seq.

Results Here, we introduce LAPA, a computational toolkit to study alternative polyadenylation (APA) from diverse data sources such as LR-RNA-seq and short-read 3’ sequencing (3’-seq). LAPA counts and clusters reads with poly(A)-tail, then performs peak-calling to detect poly(A)-site in a data source agnostic manner. The resulting peaks are annotated based on genomics features and regulatory sequence elements such as presence of a poly(A)-signal. Finally, LAPA can perform robust statistical testing and multiple testing correction to detect differential APA.

We analyzed ENCODE LR-RNA-seq data from human WTC11, mouse C2C12 myoblast, and C2C12-derived differentiated myotube cells using LAPA. Comparing LR-RNA-seq from different platforms and library preparation methods against 3’-seq shows that LR-RNA-seq detects poly(A)-sites with a performance of 75% precision at 57% recall. Moreover, LAPA consistently improved TES validation by at least 25% over the baseline transcriptome annotation generated by TALON, independent of protocol or platform. Differential APA analysis detected 788 statistically significant genes with unique polyadenylation signatures between undifferentiated myoblast and differentiated myotube cells. Among these genes, 3’ UTR elongation is significantly associated with higher expression, while shortening is linked with lower expression. This analysis reveals a link between cell state/identity and APA. Overall, our results show that LR-RNA-seq is a reliable data source for the study of post-transcriptional regulation by providing precise information about alternative polyadenylation.

Availability LAPA is publicly available at https://github.com/mortazavilab/lapa and PyPI.

Contact:: ali.mortazavi{at}uci.edu

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted November 08, 2022.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA
Muhammed Hasan Çelik, Ali Mortazavi
bioRxiv 2022.11.08.515683; doi: https://doi.org/10.1101/2022.11.08.515683
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA
Muhammed Hasan Çelik, Ali Mortazavi
bioRxiv 2022.11.08.515683; doi: https://doi.org/10.1101/2022.11.08.515683

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4113)
  • Biochemistry (8815)
  • Bioengineering (6519)
  • Bioinformatics (23463)
  • Biophysics (11790)
  • Cancer Biology (9209)
  • Cell Biology (13323)
  • Clinical Trials (138)
  • Developmental Biology (7438)
  • Ecology (11410)
  • Epidemiology (2066)
  • Evolutionary Biology (15151)
  • Genetics (10436)
  • Genomics (14044)
  • Immunology (9171)
  • Microbiology (22154)
  • Molecular Biology (8812)
  • Neuroscience (47570)
  • Paleontology (350)
  • Pathology (1428)
  • Pharmacology and Toxicology (2491)
  • Physiology (3730)
  • Plant Biology (8080)
  • Scientific Communication and Education (1437)
  • Synthetic Biology (2221)
  • Systems Biology (6037)
  • Zoology (1253)