Summary
Access to primary research data is vital for the advancement of the scientific enterprise. It facilitates the validation of existing observations and provides the raw materials to build new hypotheses and make new discoveries. In the life sciences, research communities have repeatedly collaborated to build resources that allow for submission, archiving and access to gene sequences, macromolecular structures, and data from functional genomics experiments. Added value databases build on these archives by harmonising and integrating different datasets to enable simple queries and to unravel underlying biology. To extend the range of data types supported by community repositories, we have built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired using many different imaging modalities including high-content screening, super-resolution microscopy, time-lapse imaging and digital pathology, and links them in a single resource. IDR links experimental perturbations to public genetic or chemical databases, and cell and tissue phenotypes to controlled vocabularies expressed as ontologies. By integrating the phenotypic and genetic metadata from multiple studies, IDR makes it possible to reveal novel functional networks of genetic interactions linked to specific cell phenotypes. To enhance the access to IDR’s integrated datasets, we have built a computational resource based on IPython notebooks that allows remote access to the full complement of IDR data. IDR is built as a platform that others can use to publish their own image data, and to enhance and extend the sharing and re-analysis of scientific image data.