TY - JOUR T1 - Interoperable and scalable data analysis with microservices: Applications in Metabolomics JF - bioRxiv DO - 10.1101/213603 SP - 213603 AU - Payam Emami Khoonsari AU - Pablo Moreno AU - Sven Bergmann AU - Joachim Burman AU - Marco Capuccini AU - Matteo Carone AU - Marta Cascante AU - Pedro de Atauri AU - Carles Foguet AU - Alejandra Gonzalez-Beltran AU - Thomas Hankemeier AU - Kenneth Haug AU - Sijin He AU - Stephanie Herman AU - David Johnson AU - Namrata Kale AU - Anders Larsson AU - Steffen Neumann AU - Kristian Peters AU - Luca Pireddu AU - Philippe Rocca-Serra AU - Pierrick Roger AU - Rico Rueedi AU - Christoph Ruttkies AU - Noureddin Sadawi AU - Reza M Salek AU - Susanna-Assunta Sansone AU - Daniel Schober AU - Vitaly Selivanov AU - Etienne A. Thevenot AU - Michael van Vliet AU - Gianluigi Zanetti AU - Christoph Steinbeck AU - Kim Kultima AU - Ola Spjuth Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/07/13/213603.abstract N2 - Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed in parallel using the Kubernetes container orchestrator. The access point is a virtual research environment which can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and established workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry studies, one nuclear magnetic resonance spectroscopy study and one fluxomics study, showing that the method scales dynamically with increasing availability of computational resources. We achieved a complete integration of the major software suites resulting in the first turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, multivariate statistics, and metabolite identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. ER -