TY - JOUR T1 - Analysing protein post-translational modform regions by linear programming JF - bioRxiv DO - 10.1101/456640 SP - 456640 AU - Deepesh Agarwal AU - Ryan T. Fellers AU - Bryan P. Early AU - Dan Lu AU - Caroline J. DeHart AU - Philip D. Compton AU - Paul M. Thomas AU - Galit Lahav AU - Neil L. Kelleher AU - Jeremy Gunawardena Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/10/30/456640.abstract N2 - Post-translational modifications (PTMs) at multiple sites can collectively influence protein function but the scope of such PTM coding has been challenging to determine. The number of potential combinatorial patterns of PTMs on a single molecule increases exponentially with the number of modification sites and a population of molecules exhibits a distribution of such “modforms”. Estimating these “modform distributions” is central to understanding how PTMs influence protein function. Although mass-spectrometry (MS) has made modforms more accessible, we have previously shown that current MS technology cannot recover the modform distribution of heavily modified proteins. However, MS data yield linear equations for modform amounts, which constrain the distribution within a high-dimensional, polyhedral “modform region”. Here, we show that linear programming (LP) can efficiently determine a range within which each modform value must lie, thereby approximating the modform region. We use this method on simulated data for mitogen-activated protein kinase 1 with the 7 phosphorylations reported on UniProt, giving a modform region in a 128 dimensional space. The exact dimension of the region is determined by the number of linearly independent equations but its size and shape depend on the data. The average modform range, which is a measure of size, reduces when data from bottom-up (BU) MS, in which proteins are first digested into peptides, is combined with data from top-down (TD) MS, in which whole proteins are analysed. Furthermore, when the modform distribution is structured, as might be expected of real distributions, the modform region for BU and TD combined has a more intricate polyhedral shape and is substantially more constrained than that of a random distribution. These results give the first insights into high-dimensional modform regions and confirm that fast LP methods can be used to analyse them. We discuss the problems of using modform regions with real data, when the actual modform distribution will not be known. ER -