RT Journal Article SR Electronic T1 Learning representations for image-based profiling of perturbations JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.08.12.503783 DO 10.1101/2022.08.12.503783 A1 Nikita Moshkov A1 Michael Bornholdt A1 Santiago Benoit A1 Matthew Smith A1 Claire McQuin A1 Allen Goodman A1 Rebecca A. Senft A1 Yu Han A1 Mehrtash Babadi A1 Peter Horvath A1 Beth A. Cimini A1 Anne E. Carpenter A1 Shantanu Singh A1 Juan C. Caicedo YR 2022 UL http://biorxiv.org/content/early/2022/10/03/2022.08.12.503783.abstract AB Measuring the phenotypic effect of treatments on cells through imaging assays is an efficient and powerful way of studying cell biology, and requires computational methods for transforming images into quantitative data that highlight phenotypic outcomes. Here, we present an optimized strategy for learning representations of treatment effects from high-throughput imaging data, which follows a causal framework for interpreting results and guiding performance improvements. We use weakly supervised learning (WSL) for modeling associations between images and treatments, and show that it encodes both confounding factors and phenotypic features in the learned representation. To facilitate their separation, we constructed a large training dataset with Cell Painting images from five different studies to maximize experimental diversity, following insights from our causal analysis. Training a WSL model with this dataset successfully improves downstream performance, and produces a reusable convolutional network for image-based profiling, which we call Cell Painting CNN-1. We conducted a comprehensive evaluation of our strategy on three publicly available Cell Painting datasets, discovering that representations obtained by the Cell Painting CNN-1 can improve performance in downstream analysis for biological matching up to 30% with respect to classical features, while also being more computationally efficient.Competing Interest StatementThe authors have declared no competing interest.