Abstract
Despite the increasing importance metabolomics approaches, the structural elucidation of metabolites from mass spectral data remains a challenge. Although several reliable tools to identify known metabolites exist, identifying compounds that have not been previously seen is a challenging task that still eludes modern bioinformatics tools. Here, we describe an automated method for substructure recommendation from mass spectra using pattern mining techniques. Based on previously seen recurring substructures our approach succeeds in identifying parts of unknown metabolites. An important advantage of this approach is that it does not require any prior information concerning the metabolites to be identified, and therefore it can be used for the (partial) identification of unknown unknowns. Using rules extracted by pattern mining we are able to recommend valid substructures even for those metabolites for which no match can be found in spectral libraries. We further demonstrate how this approach is complementary to existing metabolite identification tools, achieving improved identification results. The method is called MESSAR (MEtabolite SubStructure AutoRecommender) and is implemented as a free online web service available at http://www.biomina.be/apps/MESSAR.