TY - JOUR T1 - Protein Fold Determination by Assembling Extended Super-Secondary Structure Motifs Using Limited NMR Data JF - bioRxiv DO - 10.1101/509356 SP - 509356 AU - Kala Bharath Pilla Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/01/14/509356.abstract N2 - 3D fold determination of proteins by computational algorithms guided by experimental restraints is a reliable and efficient approach. However, the current algorithms struggle with sampling conformational space and scaling in performance with increasing size of proteins. This paper demonstrates a new data-driven, time-efficient, heuristics algorithm that assembles the 3D structure of a protein from its elemental super-secondary structure motifs (Smotifs) using a limited number of nuclear magnetic resonance (NMR) derived restraints. The DINGO-NOE-RDC algorithm (3D assembly of Individual smotifs to Near-native Geometry as Orchestrated by limited nuclear Overhauser effects (NOE) and residual dipolar couplings (RDC)) leverages on the distance restraints recorded on methyl-methyl (CH3-CH3), methyl-amide (CH3-HN), and amide-amide (HN– HN) NOE contacts, and orientation restraints recorded via RDC on the backbone amide protons, to assemble the target’s Smotifs. Two conceptual advancements were made to bootstrap the structure determination from limited NMR restraints: Firstly, the basic definition of a ‘Smotif’ was expanded and secondly, a data driven approach for selection, scoring, ranking and clustering of Smotif assemblies is employed. In contrast to existing methods, the DINGO-NOE-RDC algorithm does not use a force-field or physical/empirical scoring function. Additionally, the algorithm employs a universal Smotif library that applies to any target protein and, can generate numerically reproducible results. For a benchmark set of ten different targets with different topologies, ranging from 100-200 residues, the algorithm identified near-native Smotifs for all of them. ER -