TY - JOUR T1 - araDEEPopsis: From images to phenotypic traits using deep transfer learning JF - bioRxiv DO - 10.1101/2020.04.01.018192 SP - 2020.04.01.018192 AU - Patrick Hüther AU - Niklas Schandry AU - Katharina Jandrasits AU - Ilja Bezrukov AU - Claude Becker Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/04/01/2020.04.01.018192.abstract N2 - Linking plant phenotype to genotype, i.e., identifying genetic determinants of phenotypic traits, is the goal of plant breeders and geneticists alike. While the ever-growing genomic resources and the rapid decrease of sequencing costs have greatly facilitated obtaining the critical amount of genomic data, collecting phenotypic data for large numbers of plants remains a bottleneck. Many phenotyping strategies rely on recording images of plants, which makes it necessary to extract phenotypic measurements from these images in a fast and robust way. Common image segmentation tools for plant phenotyping mostly rely on color information, which is error-prone when background or plant color are variable in the experiment or deviate from the underlying expectations. We have developed araDEEPopsis, a versatile, fully open-source pipeline to extract phenotypic measurements from plant images in an unsupervised manner. araDEEPopsis was built around the deep-learning model DeepLabV3+ and re-trained for segmentation of Arabidopsis thaliana rosettes. It uses semantic segmentation to classify leaf tissue into up to three categories: healthy, anthocyanin-rich, and senescent. This makes araDEEPopsis particularly powerful at quantitative phenotyping from early to late developmental stages, of mutants with aberrant leaf color and/or phenotype, and of plants growing in stressful conditions, where leaf-color may deviate from green. Using araDEEPopsis on a panel of 210 natural Arabidopsis accessions, we were able to not only accurately segment phenotypically diverse genotypes but also to map known loci related to anthocyanin production and early necrosis using the araDEEPopsis output in genome-wide association analyses. Our pipeline is able to handle images of diverse origins, image quality, and background composition, and could even accurately segment images of a distantly related Brassicaceae. Because it can be deployed on virtually any common operating system and is compatible with several high-performance computing environments, araDEEPopsis can be used independently of bioinformatics expertise and computing resources. ER -