Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Empirical determination of effective gap penalties for sequence comparison.

Literature DB >> 12424122

Empirical determination of effective gap penalties for sequence comparison.

Abstract

MOTIVATION: No general theory guides the selection of gap penalties for local sequence alignment. We empirically determined the most effective gap penalties for protein sequence similarity searches with substitution matrices over a range of target evolutionary distances from 20 to 200 Point Accepted Mutations (PAMs).
RESULTS: We embedded real and simulated homologs of protein sequences into a database and searched the database to determine the gap penalties that produced the best statistical significance for the distant homologs. The most effective penalty for the first residue in a gap (q+r) changes as a function of evolutionary distance, while the gap extension penalty for additional residues (r) does not. For these data, the optimal gap penalties for a given matrix scaled in 1/3 bit units (e.g. BLOSUM50, PAM200) are q=25-0.1 * (target PAM distance), r=5. Our results provide an empirical basis for selection of gap penalties and demonstrate how optimal gap penalties behave as a function of the target evolutionary distance of the substitution matrix. These gap penalties can improve expectation values by at least one order of magnitude when searching with short sequences, and improve the alignment of proteins containing short sequences repeated in tandem.

Mesh：

Year: 2002 PMID： 12424122 DOI： 10.1093/bioinformatics/18.11.1500

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

21 in total

1. Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function.

Authors: Nalin C W Goonesekere; Byungkook Lee
Journal: Nucleic Acids Res Date: 2004-05-20 Impact factor: 16.971

2. Aligning sequences by minimum description length.

Authors: John S Conery
Journal: EURASIP J Bioinform Syst Biol Date: 2007

3. Selecting the Right Similarity-Scoring Matrix.

Authors: William R Pearson
Journal: Curr Protoc Bioinformatics Date: 2013

4. Finding Protein and Nucleotide Similarities with FASTA.

Authors: William R Pearson
Journal: Curr Protoc Bioinformatics Date: 2016-03-24

5. Testing statistical significance scores of sequence comparison methods with structure similarity.

Authors: Tim Hulsen; Jacob de Vlieg; Jack A M Leunissen; Peter M A Groenen
Journal: BMC Bioinformatics Date: 2006-10-12 Impact factor: 3.169

6. Parameters for accurate genome alignment.

Authors: Martin C Frith; Michiaki Hamada; Paul Horton
Journal: BMC Bioinformatics Date: 2010-02-09 Impact factor: 3.169

7. Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences.

Authors: Valery O Polyanovsky; Mikhail A Roytberg; Vladimir G Tumanyan
Journal: Algorithms Mol Biol Date: 2011-10-27 Impact factor: 1.405

8. A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships.

Authors: Claire Lemaitre; Aurélien Barré; Christine Citti; Florence Tardy; François Thiaucourt; Pascal Sirand-Pugnet; Patricia Thébault
Journal: BMC Bioinformatics Date: 2011-11-24 Impact factor: 3.169

9. MCALIGN2: faster, accurate global pairwise alignment of non-coding DNA sequences based on explicit models of indel evolution.

Authors: Jun Wang; Peter D Keightley; Toby Johnson
Journal: BMC Bioinformatics Date: 2006-06-08 Impact factor: 3.169

10. The effectiveness of position- and composition-specific gap costs for protein similarity searches.

Authors: Aleksandar Stojmirović; E Michael Gertz; Stephen F Altschul; Yi-Kuo Yu
Journal: Bioinformatics Date: 2008-07-01 Impact factor: 6.937