Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data

Xiangling Ji, Danielle Tsao, Kailun Bai, Min Tsao, View ORCID ProfileLi Xing, View ORCID ProfileXuekui Zhang
doi: https://doi.org/10.1101/2022.02.19.481159
Xiangling Ji
1Department of Mathematics & Statistics, University of Victoria, Victoria, V8P 5C2, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Danielle Tsao
1Department of Mathematics & Statistics, University of Victoria, Victoria, V8P 5C2, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kailun Bai
1Department of Mathematics & Statistics, University of Victoria, Victoria, V8P 5C2, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Min Tsao
1Department of Mathematics & Statistics, University of Victoria, Victoria, V8P 5C2, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Li Xing
2Department of Mathematics & Statistics, University of Saskatchewan, Saskatoon, S7N 5C9, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Li Xing
  • For correspondence: xuekui@uvic.ca
Xuekui Zhang
1Department of Mathematics & Statistics, University of Victoria, Victoria, V8P 5C2, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xuekui Zhang
  • For correspondence: xuekui@uvic.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Motivation Single-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis is often to distinguish cell types so they can be investigated separately. Researchers have recently developed several automated cell type annotation tools, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data widely used in differential expression analysis. However, dropout information is not explicitly used by any current cell annotation method. Fully utilizing dropout information for cell type annotation motivated this work.

Results We present scAnnotate, a cell annotation tool that fully utilizes dropout information. We model every gene’s marginal distribution using a mixture model, which describes both the dropout proportion and the distribution of the non-dropout expression levels. Then, using an ensemble machine learning approach, we combine the mixture models of all genes into a single model for cell-type annotation. This combining approach can avoid estimating numerous parameters in the high-dimensional joint distribution of all genes. Using fourteen real scRNA-seq datasets, we demonstrate that scAnnotate is competitive against nine existing annotation methods. Furthermore, because of its distinct modelling strategy, scAnnotate’s misclassified cells are very different from competitor methods. This suggests using scAnnotate together with other methods could further improve annotation accuracy.

Availability We implemented scAnnotate as an R package and made it publicly available from CRAN.

Contact Xuekui Zhang: xuekui{at}uvic.ca and Li Xing: li.xing{at}math.usask.ca

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Significant update according to previous reviewer's comments.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted August 07, 2022.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data
Xiangling Ji, Danielle Tsao, Kailun Bai, Min Tsao, Li Xing, Xuekui Zhang
bioRxiv 2022.02.19.481159; doi: https://doi.org/10.1101/2022.02.19.481159
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
scAnnotate: an automated cell type annotation tool for single-cell RNA-sequencing data
Xiangling Ji, Danielle Tsao, Kailun Bai, Min Tsao, Li Xing, Xuekui Zhang
bioRxiv 2022.02.19.481159; doi: https://doi.org/10.1101/2022.02.19.481159

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4235)
  • Biochemistry (9140)
  • Bioengineering (6784)
  • Bioinformatics (24008)
  • Biophysics (12132)
  • Cancer Biology (9537)
  • Cell Biology (13782)
  • Clinical Trials (138)
  • Developmental Biology (7638)
  • Ecology (11707)
  • Epidemiology (2066)
  • Evolutionary Biology (15513)
  • Genetics (10648)
  • Genomics (14330)
  • Immunology (9484)
  • Microbiology (22849)
  • Molecular Biology (9096)
  • Neuroscience (49007)
  • Paleontology (355)
  • Pathology (1483)
  • Pharmacology and Toxicology (2570)
  • Physiology (3848)
  • Plant Biology (8332)
  • Scientific Communication and Education (1471)
  • Synthetic Biology (2296)
  • Systems Biology (6194)
  • Zoology (1301)