Abstract
Gene expression-based classification of a biological sample’s cell type is an important step in many transcriptomic analyses, including that of annotating cell types in single-cell RNA-seq datasets. In this work, we explore the novel application of hierarchical classification algorithms that take into account the graph structure of the Cell Ontology to this task. We train these algorithms on a novel curated dataset comprising nearly all human public, primary bulk samples in the NCBI’s Sequence Read Archive. These algorithms improve on state-of-the-art methods and produce accurate cell type predictions on both bulk and single-cell data across diverse and fine-grained cell types.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.