ABSTRACT
Low-complexity regions (LCRs) in proteins are important for higher-order assemblies of organisms, yet we lack a unified view of their sequences, features, relationships, and functions. Here, we use dotplots and dimensionality reduction to systematically define LCR type/copy relationships and create a map of LCR sequence space capable of integrating LCR features and functions. By defining LCR relationships across the proteome, we provide insight into how LCR type and copy number contribute to higher-order assemblies, such as the importance of K-rich LCR copy number for assembly of the nucleolar protein RPA43 in vivo and in vitro. With LCR maps, we reveal the underlying structure of LCR sequence space, and relate differential occupancy in this space to the conservation and emergence of higher-order assemblies, including the metazoan extracellular matrix and plant cell wall. Together, LCR relationships and maps uncovered the distribution and prevalence of E-rich LCRs in the nucleolus, and revealed previously undescribed regions of LCR sequence space with properties of higher order assemblies, including a teleost-specific T/H-rich sequence space. Thus, this unified view of LCRs enables discovery of how LCRs encode higher-order assemblies of organisms.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
The following changes: (1) Corrected figure supplement references in legends for Figure 2 and Figure 5 (2) expanded acronym (LCRs) in title. No other changes were made.