| Literature DB >> 20413635 |
Bong-Hyun Kim1, Qian Cong, Nick V Grishin.
Abstract
UNLABELLED: Profile-based similarity search is an essential step in structure-function studies of proteins. However, inclusion of non-homologous sequence segments into a profile causes its corruption and results in false positives. Profile corruption is common in multidomain proteins, and single domains with long insertions are a significant source of errors. We developed a procedure (HangOut) that, for a single domain with specified insertion position, cleans erroneously extended PSI-BLAST alignments to generate better profiles. AVAILABILITY: HangOut is implemented in Python 2.3 and runs on all Unix-compatible platforms. The source code is available under the GNU GPL license at http://prodata.swmed.edu/HangOut/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20413635 PMCID: PMC2881392 DOI: 10.1093/bioinformatics/btq208
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.HangOut method to clean PSI-BLAST profiles. (a) HangOut flowchart. Starting from a query domain (blue–red, with inserted domain in yellow removed), a PSI-BLAST search of the NCBI non-redundant database (NR) is performed (Step 1) to produce alignments. Erroneously extended regions (yellow) that cross the insertion boundary (vertical dotted line) are removed to produce a ‘cleaned’ alignment (Step 2–1). Remaining contaminants not crossing the domain boundary are removed by PSI-BLAST profiles built from the long insertion (Step 2–2). The PSI-BLAST result is checked for convergence (Step 3), and possibly continued from Step 1. (b) Structure of two middle domains of the TrmE GTP-binding protein (PDB ID 1xzp). The α/β P-loop hydrolase domain (yellow, SCOP ID d1xzpa1, chain A: 118–211, 372–450) is inserted into an α-helical bundle (N- and C-terminal segments are colored blue and red, respectively; SCOP ID d1xzpa2, chain A: 212–371). (c and d) HangOut performance test showing the number of corrupted profiles and the number of found homologs, respectively. Performances, of PSI-BLAST and RemoveHit, are also shown. RemoveHit removes all alignments for hits with two overlapping HSPs as in Supplementary Figure S1b. The HangOut profiles show high accuracy with only one case of possible corruption (c) and without losing sensitivity (d). Color version of the figure is available at Bioinformatics online.