Abstract
SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific algorithms. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach. The pipeline has minimal installation requirements, and can be executed with a single command. SPLASH2 enables efficient analysis of massive datasets from a wide range of sequencing technologies and biological contexts at unmatched scale and speed, showcased by revealing new biology in rapid analysis of single-cell RNA-sequencing data from human muscle cells, and bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE) and a study of Amyotrophic Lateral Sclerosis.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Revised the preprint