Abstract
Microbiome studies increasingly associate geographical features like rurality and climate types with microbiomes. However, microbiologists/bioinformaticians often struggle to access and integrate rich geographical metadata from sources such as GeoTIFFs; and inconsistent definitions of rurality, for example, can hinder cross-study comparisons. To address this, we present OMEinfo, a Python-based tool for automated retrieval of consistent geographical metadata from user-provided location data. OMEinfo leverages open data sources such as the Global Human Settlement Layer, Köppen-Geiger climate classification models, and Open-Data Inventory for Anthropogenic Carbon dioxide, to ensure metadata accuracy and provenance. OMEinfo’s Dash application enables users to visualise their sample metadata on an interactive map and to investigate the spatial distribution of metadata features, which is complemented by data visualisation to analyse patterns and trends in the geographical data before further analysis. The tool is available as a Docker container, providing a portable, lightweight solution for researchers. Through its standardised metadata retrieval approach and incorporation of FAIR and Open data principles, OMEinfo promotes reproducibility and consistency in microbiome metadata. To demonstrate its utility, OMEinfo is utilised to replicate the results of a previous study linking population density to soil sample alpha diversity. As the field continues to explore the relationship between microbiomes and geographical features, tools like OMEinfo will prove vital in developing a robust, accurate, and interconnected understanding of these interactions, whilst having applicability beyond this field to any studies utilising location-based metadata. Finally, we release the OMEinfo annotation dataset, a collection of 5.3 million OMEinfo annotated samples from the ENA, for use in a retrospective analysis of sequencing samples, and highlight a number of ways researchers and sequencing read repositories can improve the quality of underlying metadata submitted to these public stores.
Availability OMEinfo is freely available and released under an MIT licence. OMEinfo source code is available at https://github.com/m-crown/OMEinfo/
Contact matthew.crown{at}northumbria.ac.uk, matthew.bashton{at}northumbria.ac.uk
Competing Interest Statement
The authors have declared no competing interest.