RT Journal Article SR Electronic T1 Unsupervised learning reveals landscape of local structural motifs across protein classes JF bioRxiv FD Cold Spring Harbor Laboratory SP 2023.12.04.569990 DO 10.1101/2023.12.04.569990 A1 Derry, Alexander A1 Altman, Russ B. YR 2023 UL http://biorxiv.org/content/early/2023/12/05/2023.12.04.569990.abstract AB Proteins are known to share similarities in local regions of 3D structure even across disparate global folds. Such correspondences can help to shed light on functional relationships between proteins and identify conserved local structural features that lead to function. Self-supervised deep learning on large protein structure datasets has produced high-fidelity representations of local structural microenvironments, enabling comparison of local structure and function at scale. In this work, we leverage these representations to cluster over 15 million environments in the Protein Data Bank, resulting in the creation of a “lexicon” of local 3D motifs which form the building blocks of all known protein structures. We characterize these motifs and demonstrate that they provide valuable information for modeling structure and function at all scales of protein analysis, from full protein chains to binding pockets to individual amino acids. We devise a new protein representation based solely on its constituent local motifs and show that this representation enables state-of-the-art performance on protein structure search and model quality assessment. We then show that this approach enables accurate prediction of drug off-target interactions by modeling the similarity between local binding pockets. Finally, we identify structural motifs associated with pathogenic variants in the human proteome by leveraging the predicted structures in the AlphaFold structure database.Competing Interest StatementThe authors have declared no competing interest.