Abstract
Large cohorts of human iPSCs from healthy donors are potentially a powerful tool for investigating the relationship between genetic variants and cellular phenotypes. Here we integrate high content imaging, gene expression and DNA sequence datasets for over 100 human iPSC lines to identify the genetic basis of inter-individual variability in cell behaviour. By applying a dimensionality reduction approach, Probabilistic Estimation of Expression Residuals (PEER), we identified genes that correlated in expression with intrinsic (genetic) and extrinsic (ECM) factors. However, variation in mRNA levels could not account for outlier cell behaviour. Instead, we identified rare, deleterious SNVs in the coding sequence of genes involved in ECM adhesion that occurred in cell lines that were outliers for one or more phenotypes such as cell spreading. These also correlated with altered germ layer differentiation on micropatterned surfaces. Our study thus establishes a strategy for integrating genetic and cell biological measurements for high-throughput analysis.