PT - JOURNAL ARTICLE AU - Goudo, Misa AU - Sugimoto, Masahiro AU - Hiwa, Satoru AU - Hiroyasu, Tomoyuki TI - The usefulness of sparse k-means in metabolomics data: An example from breast cancer data AID - 10.1101/2022.02.05.479235 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.02.05.479235 4099 - http://biorxiv.org/content/early/2022/02/08/2022.02.05.479235.short 4100 - http://biorxiv.org/content/early/2022/02/08/2022.02.05.479235.full AB - In processing metabolomics data, multidimensional quantitative data from thousands of metabolites are often sparse, that is, only a small fraction of metabolites are relevant to the phenotype of interest. Clustering is therefore used to discover subtypes from omics data. Sparse processing, which selects important metabolites from the total omics data, is an effective clustering technique. This study investigated the effectiveness of sparse k-means for metabolomics data. Specifically, sparse k-means was used to cluster blood lipid metabolite data of breast cancer patients in two studies: (1) before and after menopause, and (2) pre- and postoperative chemotherapy. In both cases, sparse k-means showed comparable discrimination accuracy with fewer metabolites than k-means. Furthermore, when the L1 norm values were varied, no significant changes were observed. The mean silhouette coefficients of sparse k-means and k-means were (1) 0.38 ± 0.14 (S.D.) and 0.17 ± 0.01, (2) 0.38 ± 0.07 and 0.17 ±0.01, indicating that feature selection using sparse k-means can improve clustering results. In addition, metabolite selection using sparse k-means was consistent regardless of the test data or the constrained value of the L1 norm, indicating robustness.Competing Interest StatementThe authors have declared no competing interest.