Literature DB >> 1774068

Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

W R Pearson1.   

Abstract

The sensitivity and selectivity of the FASTA and the Smith-Waterman protein sequence comparison algorithms were evaluated using the superfamily classification provided in the National Biomedical Research Foundation/Protein Identification Resource (PIR) protein sequence database. Sequences from each of the 34 superfamilies in the PIR database with 20 or more members were compared against the protein sequence database. The similarity scores of the related and unrelated sequences were determined using either the FASTA program or the Smith-Waterman local similarity algorithm. These two sets of similarity scores were used to evaluate the ability of the two comparison algorithms to identify distantly related protein sequences. The FASTA program using the ktup = 2 sensitivity setting performed as well as the Smith-Waterman algorithm for 19 of the 34 superfamilies. Increasing the sensitivity by setting ktup = 1 allowed FASTA to perform as well as Smith-Waterman on an additional 7 superfamilies. The rigorous Smith-Waterman method performed better than FASTA with ktup = 1 on 8 superfamilies, including the globins, immunoglobulin variable regions, calmodulins, and plastocyanins. Several strategies for improving the sensitivity of FASTA were examined. The greatest improvement in sensitivity was achieved by optimizing a band around the best initial region found for every library sequence. For every superfamily except the globins and immunoglobulin variable regions, this strategy was as sensitive as a full Smith-Waterman. For some sequences, additional sensitivity was achieved by including conserved but nonidentical residues in the lookup table used to identify the initial region.

Entities:  

Mesh:

Substances:

Year:  1991        PMID: 1774068     DOI: 10.1016/0888-7543(91)90071-l

Source DB:  PubMed          Journal:  Genomics        ISSN: 0888-7543            Impact factor:   5.736


  173 in total

1.  ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

Authors:  T Rognes
Journal:  Nucleic Acids Res       Date:  2001-04-01       Impact factor: 16.971

2.  Complete nucleotide sequence of Tn10.

Authors:  R Chalmers; S Sewitz; K Lipkow; P Crellin
Journal:  J Bacteriol       Date:  2000-05       Impact factor: 3.490

3.  Characterization of telomere-subtelomere junctions in Silene latifolia.

Authors:  E Sýkorová; J Cartagena; M Horáková; K Fukui; J Fajkus
Journal:  Mol Genet Genomics       Date:  2003-02-13       Impact factor: 3.291

4.  Genome-wide analysis of microRNAs in rubber tree (Hevea brasiliensis L.) using high-throughput sequencing.

Authors:  Manassawe Lertpanyasampatha; Lei Gao; Panida Kongsawadworakul; Unchera Viboonjun; Hervé Chrestin; Renyi Liu; Xuemei Chen; Jarunya Narangajavana
Journal:  Planta       Date:  2012-03-10       Impact factor: 4.116

Review 5.  Target selection and determination of function in structural genomics.

Authors:  James D Watson; Annabel E Todd; James Bray; Roman A Laskowski; Aled Edwards; Andrzej Joachimiak; Christine A Orengo; Janet M Thornton
Journal:  IUBMB Life       Date:  2003 Apr-May       Impact factor: 3.885

6.  A novel IS element, IS621, of the IS110/IS492 family transposes to a specific site in repetitive extragenic palindromic sequences in Escherichia coli.

Authors:  Sunju Choi; Shinya Ohta; Eiichi Ohtsubo
Journal:  J Bacteriol       Date:  2003-08       Impact factor: 3.490

7.  The PIR-International Protein Sequence Database.

Authors:  W C Barker; D G George; H W Mewes; A Tsugita
Journal:  Nucleic Acids Res       Date:  1992-05-11       Impact factor: 16.971

8.  Sequence conservation in the chagasin family suggests a common trend in cysteine proteinase binding by unrelated protein inhibitors.

Authors:  Daniel J Rigden; Vladimir V Mosolov; Michael Y Galperin
Journal:  Protein Sci       Date:  2002-08       Impact factor: 6.725

9.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

10.  A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene.

Authors:  Yoshihisa Yano; Rintaro Saito; Noriyuki Yoshida; Atsushi Yoshiki; Anthony Wynshaw-Boris; Masaru Tomita; Shinji Hirotsune
Journal:  J Mol Med (Berl)       Date:  2004-05-18       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.