plantMASST - Community-driven chemotaxonomic digitization of plants
Abstract
Understanding the distribution of hundreds of thousands of plant metabolites across the plant kingdom presents a challenge. To address this, we curated publicly available LC-MS/MS data from 19,075 plant extracts and developed the plantMASST reference database encompassing 246 botanical families, 1,469 genera, and 2,793 species. This taxonomically focused database facilitates the exploration of plant-derived molecules using tandem mass spectrometry (MS/MS) spectra. This tool will aid in drug discovery, biosynthesis, (chemo)taxonomy, and the evolutionary ecology of herbivore interactions.
Competing Interest Statement
P.C.D. is a scientific advisor and holds equity in Cybele and bileOmix, and he is a Scientific Co- founder, and advisor and holds equity in Ometa, Arome, and Enveda with prior approval by UC- San Diego. T.R.N. is a scientific advisor and holds equity in BrightSeed Bio. J.J.J.v.d.H. is a member of the Scientific Advisory Board of NAICONS Srl., Milano, Italy, and consults for Corteva Agriscience, Indianapolis, IN, USA. R.S. and T.P. are co-founders of mzio GmbH, Bremen, Germany.
Data availability
Data used to generate the reference database of plantMASST are publicly available at GNPS/MassIVE (https://massive.ucsd.edu/). A list with all the accession numbers (MassIVE IDs) of the studies used to generate this tool is available on GitHub (https://github.com/helenamrusso/plantmasst, plant_masst_table.csv). All the taxonomic trees shown in this manuscript can be interactively explored by downloading the .html files available on GitHub (https://github.com/helenamrusso/plantmasst). To help interpret and establish that distinct plant species’ small molecules were only found, known molecules already present in the GNPS library (https://library.gnps2.org/) were employed.
- Moroidin (CCMSLIB00005435737)
- Piperlongumine (CCMSLIB00010117596)
- Caffeine (CCMSLIB00006365672)
- Quercetin (CCMSLIB00010118464)
- Morin (CCMSLIB00010122829)
- Reserpine (CCMSLIB00010110971)
- Icaridin (CCMSLIB00000565057)
- Lutein (CCMSLIB00005777353)
- Methoxsalen (CCMSLIB00006417040)
- Cannabidiol (CCMSLIB00009943776)
- Tryptophan (CCMSLIB00003136269)
- Acetylcholine (CCMSLIB00000578035)
- Dopamine (CCMSLIB00006121682)
- GABA (CCMSLIB00000215050)
- Glutamate (CCMSLIB00000081783)
- Norepinephrine (CCMSLIB00000219763)
- Serotonin (CCMSLIB00006114036)
- THC (CCMSLIB00005774204)
- Tryptamine (CCMSLIB00004693658)
Data used to search for plant-derived molecules (Figure 2c) from fecal samples of vegans and omnivores is publicly available in GNPS/MassIVE under the accession number MSV000086989. Data used to assess plant-derived molecules in fecal samples from people subjected to an American and Mediterranean diet is publicly available in GNPS/MassIVE under the accession number MSV000093005. Data acquired for retention time matching between piperlongumine standard and plant extracts is available in GNPS/MassIVE under the accession number MSV000094562.
Subject Area
- Biochemistry (12739)
- Bioengineering (9612)
- Bioinformatics (31129)
- Biophysics (16039)
- Cancer Biology (13116)
- Cell Biology (18748)
- Clinical Trials (138)
- Developmental Biology (10144)
- Ecology (15132)
- Epidemiology (2067)
- Evolutionary Biology (19338)
- Genetics (12839)
- Genomics (17713)
- Immunology (12846)
- Microbiology (30059)
- Molecular Biology (12536)
- Neuroscience (65501)
- Paleontology (484)
- Pathology (2028)
- Pharmacology and Toxicology (3500)
- Physiology (5426)
- Plant Biology (11239)
- Synthetic Biology (3104)
- Systems Biology (7745)
- Zoology (1748)