RT Journal Article SR Electronic T1 Where Natural Protein Sequences Stand out From Randomness JF bioRxiv FD Cold Spring Harbor Laboratory SP 706119 DO 10.1101/706119 A1 Laura Weidmann A1 Tjeerd Dijkstra A1 Oliver Kohlbacher A1 Andrei Lupas YR 2019 UL http://biorxiv.org/content/early/2019/07/28/706119.abstract AB Biological sequences are the product of natural selection, raising the expectation that they differ substantially from random sequences. We test this expectation by analyzing all fragments of a given length derived from either a natural dataset or different random models. For this, we compile all distances in sequence space between fragments within each dataset and compare the resulting distance distributions between sets. Even for 100mers, 95.4% of all distances between natural fragments are in accordance with those of a random model incorporating the natural residue composition. Hence, natural sequences are distributed almost randomly in global sequence space. When further accounting for the specific residue composition of domain-sized fragments, 99.2% of all distances between natural fragments can be modeled. Local residue composition, which might reflect biophysical constraints on protein structure, is thus the predominant feature characterizing distances between natural sequences globally, whereas homologous effects are only barely detectable.