InParanoid 7: new algorithms and tools for eukaryotic orthology analysis

Nucleic Acids Res. 2010 Jan;38(Database issue):D196-203. doi: 10.1093/nar/gkp931. Epub 2009 Nov 5.

Abstract

The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cluster Analysis
  • Computational Biology / methods*
  • Computational Biology / trends
  • Databases, Genetic*
  • Databases, Nucleic Acid*
  • Escherichia coli / genetics*
  • Escherichia coli / metabolism
  • Eukaryotic Cells / chemistry*
  • Genome, Bacterial
  • Humans
  • Information Storage and Retrieval / methods
  • Internet
  • Protein Structure, Tertiary
  • Proteins / genetics*
  • Proteomics / methods
  • Software

Substances

  • Proteins