RT Journal Article SR Electronic T1 A systematic evaluation of highly variable gene selection methods for single-cell RNA-sequencing JF bioRxiv FD Cold Spring Harbor Laboratory SP 2024.08.25.608519 DO 10.1101/2024.08.25.608519 A1 Zhao, Ruzhang A1 Lu, Jiuyao A1 Zhou, Weiqiang A1 Zhao, Ni A1 Ji, Hongkai YR 2024 UL http://biorxiv.org/content/early/2024/08/26/2024.08.25.608519.abstract AB Background Selecting highly variable features is a crucial step in most analysis pipelines of single-cell RNA-sequencing (scRNA-seq) data. Despite numerous methods proposed in recent years, a systematic understanding of the best solution is still lacking.Results Here, we systematically evaluate 47 highly variable gene (HVG) selection methods, consisting of 21 baseline methods developed based on different data transformations and mean-variance adjustment techniques and 26 hybrid methods developed based on mixtures of baseline methods. Across 19 diverse benchmark datasets, 18 objective evaluation criteria per method, and 5,358 analysis settings, we observe that no single baseline method consistently outperforms the others across all datasets and criteria. However, hybrid methods as a group robustly outperform individual baseline methods. Based on these findings, a new HVG selection approach, mixture HVG selection (mixHVG), that incorporates top-ranked features from multiple baseline methods is proposed as a better solution to HVG selection. An open source R package mixhvg is developed to enable convenient use of mixHVG and its integration into users’ data analysis pipelines.Conclusion Our benchmark study not only provides a systematic comparison of existing methods, leading to a better HVG selection solution, but also creates a pipeline and resource consisting of diverse benchmark data and criteria for evaluating new methods in the future.Competing Interest StatementThe authors have declared no competing interest.ADTAntibody-Derived TagsARIAdjusted Rand IndexASWAverage Silhouette WidthCITE-seqCellular Indexing of Transcriptomes and Epitopes by SequencingHVGHighly Variable GeneLISILocal Inverse Simpson IndexLOESSLocally weighted (or estimated) scatterplot smootherLSILatent Semantic IndexingNMINormalized Mutual InformationPBMCPeripheral Blood Mononuclear CellPCPrincipal ComponentPCAPrincipal Component AnalysisscATAC-seqSingle-cell Assay for Transposase-Accessible Chromatin using sequencingscRNA-seqSingle-cell RNA sequencingSCTsctransformSVDSingular Value DecompositionTFIDFTerm Frequency–Inverse Document Frequency