RT Journal Article SR Electronic T1 RADI (Reduced Alphabet Direct Information): Improving execution time for direct-coupling analysis JF bioRxiv FD Cold Spring Harbor Laboratory SP 406603 DO 10.1101/406603 A1 Bernat Anton A1 Mireia Besalú A1 Oriol Fornes A1 Jaume Bonet A1 Gemma De las Cuevas A1 Narcís Fernández-Fuentes A1 Baldo Oliva YR 2018 UL http://biorxiv.org/content/early/2018/09/03/406603.abstract AB Motivation Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. Current algorithms for DCA, although efficient, have a high computational cost of determining Direct Information (DI) values for large proteins or domains. In this paper, we present RADI (Reduced Alphabet Direct Information), a variation of the original DCA algorithm that simplifies the computation of DI values by grouping physicochemically equivalent residues.Results We have compared the first top ranking 40 pairs of DI values and their closest paired contact in 3D. The ranking is also compared with results obtained using a similar but faster approach based on Mutual Information (MI). When we simplify the number of symbols used to describe a protein sequence to 9, RADI achieves similar results as the original DCA (i.e. with the classical alphabet of 21 symbols), while reducing the computation time around 30-fold on large proteins (with length around 1000 residues) and with higher accuracy than predictions based on MI. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure, having a relevant and useful predictive value, while the computation time is reduced between 100 and 2500-fold.Availability RADI is available at https://github.com/structuralbioinformatics/RADIContact baldo.oliva{at}upf.eduSupplementary information Supplementary data is available in the git repository.