JUMP Cell Painting dataset: morphological impact of 136,000 chemical and genetic perturbations

Abstract
Image-based profiling has emerged as a powerful technology for various steps in basic biological and pharmaceutical discovery, but the community has lacked a large, public reference set of data from chemical and genetic perturbations. Here we present data generated by the Joint Undertaking for Morphological Profiling (JUMP)-Cell Painting Consortium, a collaboration between 10 pharmaceutical companies, six supporting technology companies, and two non-profit partners. When completed, the dataset will contain images and profiles from the Cell Painting assay for over 116,750 unique compounds, over-expression of 12,602 genes, and knockout of 7,975 genes using CRISPR-Cas9, all in human osteosarcoma cells (U2OS). The dataset is estimated to be 115 TB in size and capturing 1.6 billion cells and their single-cell profiles. File quality control and upload is underway and will be completed over the coming months at the Cell Painting Gallery: https://registry.opendata.aws/cellpainting-gallery. A portal to visualize a subset of the data is available at https://phenaid.ardigen.com/jumpcpexplorer/.
Competing Interest Statement
The authors gratefully acknowledge a grant from the Massachusetts Life Sciences Center Bits to Bytes Capital Call program for funding the data production and catalyzing this Consortium. We appreciate funding to support data analysis and interpretation from members of the JUMP Cell Painting Consortium (Amgen, AstraZeneca, Bayer AG, Biogen, Eisai, Janssen Pharmaceutica NV, Merck KGaA, Darmstadt, Germany, Pfizer, Servier, Takeda Development Center Americas, Inc. (TDCA)), from the National Institutes of Health (NIH MIRA R35 GM122547 to AEC), and from grant number 2020-225720 to BAC from the Chan Zuckerberg Initiative DAF, an advised fund of the Silicon Valley Community Foundation. We would like to acknowledge the Supporting Partners for their in-kind contributions: Ardigen for their deep learning expertise and JUMP-CP Data Explorer web application (part of phenAID platform); Google/Verily for the compute support and configuration/optimization of Terra, which is co-developed by the Broad Institute of MIT and Harvard, Microsoft and Verily (its use is not described in this paper); Horizon Discovery, a PerkinElmer company, for the CRISPR-Cas9 library; Nomic bio for their protein profiling (not described in this paper); and PerkinElmer, for the PhenoVueTM Cell Painting JUMP kit. We also are grateful for the Amazon Web Services Registry of Open Data for hosting the public dataset. The authors also gratefully acknowledge the use of the PerkinElmer Opera Phenix High-Content/High-Throughput imaging system at the Broad Institute, funded by the S10 Grant NIH OD-026839.
Footnotes
The incorrect supplementary materials file was replaced.
https://github.com/jump-cellpainting/jump-data-production-paper
Subject Area
- Biochemistry (9629)
- Bioengineering (7123)
- Bioinformatics (24937)
- Biophysics (12670)
- Cancer Biology (9994)
- Cell Biology (14400)
- Clinical Trials (138)
- Developmental Biology (7989)
- Ecology (12147)
- Epidemiology (2067)
- Evolutionary Biology (16025)
- Genetics (10951)
- Genomics (14778)
- Immunology (9905)
- Microbiology (23739)
- Molecular Biology (9506)
- Neuroscience (51049)
- Paleontology (370)
- Pathology (1545)
- Pharmacology and Toxicology (2692)
- Physiology (4038)
- Plant Biology (8693)
- Synthetic Biology (2404)
- Systems Biology (6458)
- Zoology (1350)