Abstract
The improving access to increasing amounts of biomedical data provides completely new chances for advanced patient stratification and disease subtyping strategies. This requires computational tools that produce uniformly robust results across highly heterogeneous molecular data. Unsupervised machine learning methodologies are able to discover de-novo patterns in such data. Biclustering is especially suited by simultaneously identifying sample groups and corresponding feature sets across heterogeneous omics data. The performance of available biclustering algorithms heavily depends on individual parameterization and varies with their application. Here, we developed MoSBi (Molecular Signature identification using Biclustering), an automated multi-algorithm ensemble approach that integrates results utilizing an error model-supported similarity network. We evaluated the performance of MoSBi on transcriptomics, proteomics and metabolomics data, as well as synthetic datasets covering various data properties. Profiting from multi-algorithm integration, MoSBi identified robust group and disease specific signatures across all scenarios overcoming single algorithm specificities. Furthermore, we developed a scalable network-based visualization of bicluster communities that support biological hypothesis generation. MoSBi is available as an R package and web-service to make automated biclustering analysis accessible for application in molecular sample stratification.
Competing Interest Statement
The authors have declared no competing interest.