Abstract
Underutilized sheep and goat breeds have the ability to adapt to challenging environments due to their genetic composition. Integrating publicly available genomic datasets with new data will facilitate genetic diversity analyses; however, this process is complicated by important data discrepancies, such as outdated assembly versions or different data formats. Here we present the SMARTER-database, a collection of tools and scripts to standardize genomic data and metadata mainly from SNP chips arrays on global small ruminant populations with a focus on reproducibility. SMARTER-database harmonizes genotypes for about 12,000 sheep and 6,000 goats to a uniform coding and assembly version. Users can access the genotype data via FTP and interact with the metadata through a web interface or programmatically using their custom scripts, enabling efficient filtering and selection of samples. These tools will empower researchers to focus on the crucial aspects of adaptation and contribute to livestock sustainability, leveraging the rich dataset provided by the SMARTER-database.
Availability & Implementation The code is available as open source software under the MIT license at https://github.com/cnr-ibba/SMARTER-database.
- sheep
- goat
- genotypes
- adaptation
- REST-API
- standardization
- reproducibility
- database
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵* paolo.cozzi{at}ibba.cnr.it
The manuscript was updated based on the reviewers' feedback. The title was revised to more accurately reflect the manuscript's focus, avoiding potential misunderstandings regarding the use of WGS data. In the "Data Composition" section, a new paragraph was added to discuss the advantages of SNP arrays in comparison to WGS data, emphasizing their affordability and higher accuracy for genotype calls. The summary table was reorganized to list information by species, dataset, breed, and country, with redundant abbreviations removed for simplicity. Additionally, the focus on the REST API was strengthened, highlighting its alignment with the FAIR principles (Findable, Accessible, Interoperable, and Reusable) and its importance as the central topic of the manuscript. A clarification was also added regarding future assembly updates, noting that while the current publication does not focus on these updates, automated support for new assemblies may be included in future releases. The manuscript further addresses minimal filtering using IBS to remove duplicate data and advises users to apply their own filtering criteria. Lastly, clarifications were made to explain that the Swagger interface is intended as a documentation tool to help developers understand the API, rather than a tool for direct data retrieval, which should be done using custom scripts or the smarterapi R package.
List of abbreviations
- API
- Application Programming Interface
- CI
- Continuous Integration
- EBI
- European Bioinformatics Institute
- EVA
- European Variation Archive
- FAO
- Food and Agriculture Organization
- FTPS
- File Transfer Protocol Secure
- GNU
- GNU’s Not Unix
- GPS
- Global Positioning System
- HTTP
- HyperText Transfer Protocol
- IBS
- Identical By State
- JSON
- JavaScript Object Notation
- MAF
- Minor Allele Frequency
- ODM
- Object-Document Mapper
- PATO
- Phenotype And Trait Ontology
- REST
- Representational State Transfer
- RESTful
- REST-compliant systems
- rsID
- Reference SNP cluster ID
- SNP
- Single Nucleotide Polymorphism
- UML
- Unified Modeling Language
- URL
- Uniform Resource Locator
- VCF
- Variant Calling Format
- WGS
- Whole Genome Sequencing