RT Journal Article SR Electronic T1 The usefulness of sparse k-means in metabolomics data: An example from breast cancer data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.02.05.479235 DO 10.1101/2022.02.05.479235 A1 Goudo, Misa A1 Sugimoto, Masahiro A1 Hiwa, Satoru A1 Hiroyasu, Tomoyuki YR 2022 UL http://biorxiv.org/content/early/2022/02/08/2022.02.05.479235.abstract AB In processing metabolomics data, multidimensional quantitative data from thousands of metabolites are often sparse, that is, only a small fraction of metabolites are relevant to the phenotype of interest. Clustering is therefore used to discover subtypes from omics data. Sparse processing, which selects important metabolites from the total omics data, is an effective clustering technique. This study investigated the effectiveness of sparse k-means for metabolomics data. Specifically, sparse k-means was used to cluster blood lipid metabolite data of breast cancer patients in two studies: (1) before and after menopause, and (2) pre- and postoperative chemotherapy. In both cases, sparse k-means showed comparable discrimination accuracy with fewer metabolites than k-means. Furthermore, when the L1 norm values were varied, no significant changes were observed. The mean silhouette coefficients of sparse k-means and k-means were (1) 0.38 ± 0.14 (S.D.) and 0.17 ± 0.01, (2) 0.38 ± 0.07 and 0.17 ±0.01, indicating that feature selection using sparse k-means can improve clustering results. In addition, metabolite selection using sparse k-means was consistent regardless of the test data or the constrained value of the L1 norm, indicating robustness.Competing Interest StatementThe authors have declared no competing interest.