RT Journal Article SR Electronic T1 Toward Machine Learning-based Data-driven Functional Protein Studies: Understanding Colour Tuning Rules and Predicting the Absorption Wavelengths of Microbial Rhodopsins JF bioRxiv FD Cold Spring Harbor Laboratory SP 226118 DO 10.1101/226118 A1 Masayuki Karasuyama A1 Keiichi Inoue A1 Hideki Kandori A1 Ichiro Takeuchi YR 2017 UL http://biorxiv.org/content/early/2017/11/29/226118.abstract AB The light-dependent ion-transport function of microbial rhodopsin has been widely used in optogenetics for optical control of neural activity. In order to increase the variety of rhodopsin proteins having a wide range of absorption wavelengths, the light absorption properties of various wild-type rhodopsins and their artificially mutated variants were investigated in the literature. Here, we demonstrate that a machine-learning-based (ML-based) data-driven approach is useful for understanding and predicting the light-absorption properties of microbial rhodopsin proteins. We constructed a database of 796 proteins consisting of microbial rhodopsin wildtypes and their variants. We then proposed an ML method that produces a statistical model describing the relationship between amino-acid sequences and absorption wavelengths and demonstrated that the fitted statistical model is useful for understanding colour tuning rules and predicting absorption wavelengths. By applying the ML method to the database, two residues that were not considered in previous studies are newly identified to be important to colour shift.