RT Journal Article SR Electronic T1 SpiderSeqR: an R package for crawling the web of high-throughput multi-omic data repositories for data-sets and annotation JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.04.13.039420 DO 10.1101/2020.04.13.039420 A1 Anna M. Sozanska A1 Charles Fletcher A1 Dóra Bihary A1 Shamith A. Samarajiwa YR 2020 UL http://biorxiv.org/content/early/2020/04/14/2020.04.13.039420.abstract AB More than three decades ago, the microarray revolution brought about high-throughput data generation capability to biology and medicine. Subsequently, the emergence of massively parallel sequencing technologies led to many big-data initiatives such as the human genome project and the encyclopedia of DNA elements (ENCODE) project. These, in combination with cheaper, faster massively parallel DNA sequencing capabilities, have democratised multi-omic (genomic, transcriptomic, translatomic and epigenomic) data generation leading to a data deluge in bio-medicine. While some of these data-sets are trapped in inaccessible silos, the vast majority of these data-sets are stored in public data resources and controlled access data repositories, enabling their wider use (or misuse). Currently, most peer reviewed publications require the deposition of the data-set associated with a study under consideration in one of these public data repositories. However, clunky and difficult to use interfaces, subpar or incomplete annotation prevent discovering, searching and filtering of these multi-omic data and hinder their re-purposing in other use cases. In addition, the proliferation of multitude of different data repositories, with partially redundant storage of similar data are yet another obstacle to their continued usefulness. Similarly, interfaces where annotation is spread across multiple web pages, use of accession identifiers with ambiguous and multiple interpretations and lack of good curation make these data-sets difficult to use. We have produced SpiderSeqR, an R package, whose main features include the integration between NCBI GEO and SRA databases, enabling an integrated unified search of SRA and GEO data-sets and associated annotations, conversion between database accessions, as well as convenient filtering of results and saving past queries for future use. All of the above features aim to promote data reuse to facilitate making new discoveries and maximising the potential of existing data-sets.Competing Interest StatementThe authors have declared no competing interest.