Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

FIRM: Flexible Integration of single-cell RNA-sequencing data for large-scale Multi-tissue cell atlas datasets

Jingsi Ming, Zhixiang Lin, Jia Zhao, Xiang Wan, Can Yang, View ORCID ProfileAngela Ruohao Wu
doi: https://doi.org/10.1101/2020.06.02.129031
Jingsi Ming
1Academy for Statistics and Interdisciplinary Sciences, Faculty of Economics and Management, East China Normal University, Shanghai, China
2Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhixiang Lin
3Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jia Zhao
2Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiang Wan
4Shenzhen Research Institute of Big Data, Shenzhen, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Can Yang
2Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: macyang@ust.hk angelawu@ust.hk
Angela Ruohao Wu
5Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Angela Ruohao Wu
  • For correspondence: macyang@ust.hk angelawu@ust.hk
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Single-cell RNA-sequencing (scRNA-seq) is being used extensively to measure the mRNA expression of individual cells from deconstructed tissues, organs, and even entire organisms to generate cell atlas references, leading to discoveries of novel cell types and deeper insight into biological trajectories. These massive datasets are usually collected from many samples using different scRNA-seq technology platforms, including the popular SMART-Seq2 (SS2) and 10X platforms. Inherent heterogeneities between platforms, tissues, and other batch effects makes scRNA-seq data difficult to compare and integrate, especially in large-scale cell atlas efforts; yet, accurate integration is essential for gaining deeper insights into cell biology. Through comprehensive data exploration, we found that accurate integration is often hampered by differences in cell-type compositions. Herein we describe FIRM, an algorithm that addresses this problem and achieves efficient and accurate integration of heterogeneous scRNA-seq datasets across multiple tissue types, platforms, and experimental batches. We applied FIRM to numerous large-scale scRNA-seq datasets from mouse, mouse lemur, and human, comparing its performance in dataset integration with other state-of-the-art methods. FIRM-integrated datasets show accurate mixing of shared cell type identities and superior preservation of original structure without overcorrection, generating robust integrated datasets for downstream exploration and analysis. It is also a facile way to transfer cell type labels and annotations from one dataset to another, making it a reliable and versatile tool for scRNA-seq analysis, especially for cell atlas data integration.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted August 09, 2021.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
FIRM: Flexible Integration of single-cell RNA-sequencing data for large-scale Multi-tissue cell atlas datasets
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
FIRM: Flexible Integration of single-cell RNA-sequencing data for large-scale Multi-tissue cell atlas datasets
Jingsi Ming, Zhixiang Lin, Jia Zhao, Xiang Wan, Can Yang, Angela Ruohao Wu
bioRxiv 2020.06.02.129031; doi: https://doi.org/10.1101/2020.06.02.129031
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
FIRM: Flexible Integration of single-cell RNA-sequencing data for large-scale Multi-tissue cell atlas datasets
Jingsi Ming, Zhixiang Lin, Jia Zhao, Xiang Wan, Can Yang, Angela Ruohao Wu
bioRxiv 2020.06.02.129031; doi: https://doi.org/10.1101/2020.06.02.129031

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3514)
  • Biochemistry (7367)
  • Bioengineering (5347)
  • Bioinformatics (20326)
  • Biophysics (10046)
  • Cancer Biology (7777)
  • Cell Biology (11353)
  • Clinical Trials (138)
  • Developmental Biology (6453)
  • Ecology (9980)
  • Epidemiology (2065)
  • Evolutionary Biology (13357)
  • Genetics (9373)
  • Genomics (12614)
  • Immunology (7725)
  • Microbiology (19104)
  • Molecular Biology (7465)
  • Neuroscience (41153)
  • Paleontology (301)
  • Pathology (1235)
  • Pharmacology and Toxicology (2142)
  • Physiology (3180)
  • Plant Biology (6880)
  • Scientific Communication and Education (1276)
  • Synthetic Biology (1900)
  • Systems Biology (5328)
  • Zoology (1091)