TY - JOUR T1 - Toward Machine Learning-based Data-driven Functional Protein Studies: Understanding Colour Tuning Rules and Predicting the Absorption Wavelengths of Microbial Rhodopsins JF - bioRxiv DO - 10.1101/226118 SP - 226118 AU - Masayuki Karasuyama AU - Keiichi Inoue AU - Hideki Kandori AU - Ichiro Takeuchi Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/11/29/226118.abstract N2 - The light-dependent ion-transport function of microbial rhodopsin has been widely used in optogenetics for optical control of neural activity. In order to increase the variety of rhodopsin proteins having a wide range of absorption wavelengths, the light absorption properties of various wild-type rhodopsins and their artificially mutated variants were investigated in the literature. Here, we demonstrate that a machine-learning-based (ML-based) data-driven approach is useful for understanding and predicting the light-absorption properties of microbial rhodopsin proteins. We constructed a database of 796 proteins consisting of microbial rhodopsin wildtypes and their variants. We then proposed an ML method that produces a statistical model describing the relationship between amino-acid sequences and absorption wavelengths and demonstrated that the fitted statistical model is useful for understanding colour tuning rules and predicting absorption wavelengths. By applying the ML method to the database, two residues that were not considered in previous studies are newly identified to be important to colour shift. ER -