TY - JOUR T1 - Phenotype to genotype mapping using supervised and unsupervised learning JF - bioRxiv DO - 10.1101/2022.03.17.484826 SP - 2022.03.17.484826 AU - Vito Paolo Pastore AU - Ashwini Oke AU - Sara Capponi AU - Daniel Elnatan AU - Jennifer Fung AU - Simone Bianco Y1 - 2022/01/01 UR - http://biorxiv.org/content/early/2022/03/19/2022.03.17.484826.abstract N2 - The relationship between the genotype, the genetic instructions encoded into a genome, and phenotype, the macroscopic realization of such instructions, remains mostly uncharted. In addition, tools able to uncover the connection between the phenotype with a specific set of responsible genes are still under definition. In this work, we focus on yeast organelles called vacuoles, which are cell membrane compartments that vary size and shape in response to various stimuli, and we develop a framework relating changes of cellular morphology to genetic modification. The method is a combination of convolutional neural network (CNN) and an unsupervised learning pipeline, which employs a deep-learning based segmentation, classification, and anomaly detection algorithm. From the live 3D fluorescence vacuole images, we observe that different genetic mutations generate distinct vacuole phenotypes and that the same mutation might correspond to more than one vacuole morphology. We trained a Unet architecture to segment our cellular images and obtain precise, quantitative information in 2D depth-encoded images. We then used an unsupervised learning approach to cluster the vacuole types and to establish a correlation between genotype and vacuole morphology. Using this procedure, we obtained 4 phenotypic groups. We extracted a set of 131 morphological features from the segmented vacuoles images, reduced to 50 after a tree-based feature selection. We obtained a purity of 85% adopting a Fuzzy K-Means based algorithm on a random subset of 880 images, containing all the detected phenotypic groups. Finally, we trained a CNN on the labels assigned during clustering. The CNN has been used for prediction of a large dataset (6942 images) with high accuracy (80%). Our approach can be applied extensively for live fluorescence image analysis and most importantly can unveil the basic principles relating genotype to vacuole phenotype in yeast cell, which can be thought as a first step for inferring cell designing principles to generate organelles with a specific, desired morphology.Competing Interest StatementThe authors have declared no competing interest. ER -