Abstract
Domain architecture – the arrangement of features in a protein – exhibits syntactic patterns similar to the grammar of a language. This feature enables pattern mining for protein function prediction, comparative genomics, and studies of molecular evolution and complexity. To facilitate such work, here we propose Regular Architecture (RegArch), an expression language to describe syntactic patterns in protein architectures. Like the well-known Regular Expressions for text, RegArchs codify positional and non-positional patterns of elements into nested JSON objects. We describe the standard and provide a reference implementation in JavaScript to parse RegArchs and match annotated proteins.
Footnotes
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.