TY - JOUR T1 - Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance JF - bioRxiv DO - 10.1101/092742 SP - 092742 AU - Toshiyuki Oda AU - Kyungtaek Lim AU - Kentaro Tomii Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/12/13/092742.abstract N2 - PSI-BLAST, an extremely popular tool for sequence similarity search, features the utilization of Position Specific Scoring Matrix (PSSM) constructed from a multiple sequence alignment (MSA). PSSM allows the detection of more distant homologs than a general amino acid substitution matrix does. An accurate estimation of the weights of sequences in an MSA is crucially important for PSSM construction. PSI-BLAST divides a given MSA into multiple blocks, for which sequence weights are calculated. When the block width becomes very narrow, the sequence weight calculation can be difficult.We demonstrate that PSI-BLAST indeed generates a significant fraction of blocks having widths less than 5, thereby degrading the PSI-BLAST performance. We revised the code of PSI-BLAST to prevent the blocks from being narrower than a given minimum block width (MBW). We designate the modified application of PSI-BLAST as PSI-BLASTexB. When MBW is 25, PSI-BLASTexB notably outperforms PSI-BLAST consistently for three independent benchmark sets. The performance boost is even more drastic when an MSA, instead of a sequence, was used as a query.Our results demonstrate that the generation of narrow-width blocks during the sequence weight calculation is a critically important factor that restricts the PSI-BLAST search performance. By preventing narrow blocks, PSI-BLASTexB remarkably upgrades the PSI-BLAST performance. ER -