PT - JOURNAL ARTICLE AU - R. Vicedomini AU - J.P. Bouly AU - E. Laine AU - A. Falciatore AU - A. Carbone TI - Multiple probabilistic models extract features from protein sequence data and resolve functional diversity of very different protein families AID - 10.1101/717249 DP - 2021 Jan 01 TA - bioRxiv PG - 717249 4099 - http://biorxiv.org/content/early/2021/03/09/717249.short 4100 - http://biorxiv.org/content/early/2021/03/09/717249.full AB - Sequence functional classification has become a critical bottleneck in understanding the myriad of protein sequences that accumulate in our databases. The great diversity of homologous sequences hides, in many cases, a variety of functional activities that cannot be anticipated. Their identification appears critical for a fundamental understanding of living organisms and for biotechnological applications.ProfileView is a sequence-based computational method, designed to functionally classify sets of homologous sequences. It relies on two main ideas: the use of multiple probabilistic models whose construction explores evolutionary information in available databases, and a new definition of a representation space where to look at sequences from the point of view of probabilistic models combined together. ProfileView classifies families of proteins for which functions should be discovered or characterised within known groups.We validate ProfileView on seven classes of widespread proteins, involved in the interaction with nucleic acids, amino acids and small molecules, and in a large variety of functions and enzymatic reactions. ProfileView agrees with the large set of functional data collected for these proteins from the literature regarding the organisation into functional subgroups and residues that characterize the functions. Furthermore, ProfileView resolves undefined functional classifications and extracts the molecular determinants underlying protein functional diversity, showing its potential to select sequences towards accurate experimental design and discovery of new biological functions.ProfileView proves to outperform three functional classification approaches, CUPP, PANTHER, and a recently developed neural network approach based on Restricted Boltzmann Machines. It overcomes time complexity limitations of the latter.Competing Interest StatementThe authors have declared no competing interest.