Abstractions, algorithms and data structures for structural bioinformatics in PyCogent

J Appl Crystallogr. 2011 Apr 1;44(Pt 2):424-428. doi: 10.1107/S0021889811004481. Epub 2011 Feb 11.

Abstract

To facilitate flexible and efficient structural bioinformatics analyses, new functionality for three-dimensional structure processing and analysis has been introduced into PyCogent - a popular feature-rich framework for sequence-based bioinformatics, but one which has lacked equally powerful tools for handling stuctural/coordinate-based data. Extensible Python modules have been developed, which provide object-oriented abstractions (based on a hierarchical representation of macromolecules), efficient data structures (e.g.kD-trees), fast implementations of common algorithms (e.g. surface-area calculations), read/write support for Protein Data Bank-related file formats and wrappers for external command-line applications (e.g. Stride). Integration of this code into PyCogent is symbiotic, allowing sequence-based work to benefit from structure-derived data and, reciprocally, enabling structural studies to leverage PyCogent's versatile tools for phylogenetic and evolutionary analyses.