Abstract
The recent public release of the latest version of the AlphaFold database has given us access to over 200 million predicted protein structures. We use a “shape-mer” approach, a structural fragmentation method analogous to sequence k-mers, to describe these structures and look for novelties - both in terms of proteins with rare or novel structural composition and possible functional annotation of under-studied proteins. Data and code will be made available at https://github.com/TurtleTools/afdb-shapemer-darkness
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
janani.durairaj{at}unibas.ch, joana.pereira{at}unibas.ch, mehmet{at}vant.ai
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.