Literature DB >> 24509512

Selecting the Right Similarity-Scoring Matrix.

William R Pearson1.   

Abstract

Protein sequence similarity searching programs like BLASTP, SSEARCH (UNIT 3.10), and FASTA use scoring matrices that are designed to identify distant evolutionary relationships (BLOSUM62 for BLAST, BLOSUM50 for SEARCH and FASTA). Different similarity scoring matrices are most effective at different evolutionary distances. "Deep" scoring matrices like BLOSUM62 and BLOSUM50 target alignments with 20 - 30% identity, while "shallow" scoring matrices (e.g. VTML10 - VTML80), target alignments that share 90 - 50% identity, reflecting much less evolutionary change. While "deep" matrices provide very sensitive similarity searches, they also require longer sequence alignments and can sometimes produce alignment overextension into non-homologous regions. Shallower scoring matrices are more effective when searching for short protein domains, or when the goal is to limit the scope of the search to sequences that are likely to be orthologous between recently diverged organisms. Likewise, in DNA searches, the match and mismatch parameters set evolutionary look-back times and domain boundaries. In this unit, we will discuss the theoretical foundations that drive practical choices of protein and DNA similarity scoring matrices and gap penalties. Deep scoring matrices (BLOSUM62 and BLOSUM50) should be used for sensitive searches with full-length protein sequences, but short domains or restricted evolutionary look-back require shallower scoring matrices.

Entities:  

Keywords:  BLOSUM matrices; PAM matrices; sequence alignment; similarity scoring matrices

Mesh:

Substances:

Year:  2013        PMID: 24509512      PMCID: PMC3848038          DOI: 10.1002/0471250953.bi0305s43

Source DB:  PubMed          Journal:  Curr Protoc Bioinformatics        ISSN: 1934-3396


  12 in total

1.  Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method.

Authors:  Tobias Müller; Rainer Spang; Martin Vingron
Journal:  Mol Biol Evol       Date:  2002-01       Impact factor: 16.240

2.  Empirical determination of effective gap penalties for sequence comparison.

Authors:  J T Reese; W R Pearson
Journal:  Bioinformatics       Date:  2002-11       Impact factor: 6.937

3.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

4.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

5.  The rapid generation of mutation data matrices from protein sequences.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  Comput Appl Biosci       Date:  1992-06

6.  Exhaustive matching of the entire protein sequence database.

Authors:  G H Gonnet; M A Cohen; S A Benner
Journal:  Science       Date:  1992-06-05       Impact factor: 47.728

7.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

8.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

9.  Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

Authors:  W R Pearson
Journal:  Genomics       Date:  1991-11       Impact factor: 5.736

10.  Amino acid substitution matrices from an information theoretic perspective.

Authors:  S F Altschul
Journal:  J Mol Biol       Date:  1991-06-05       Impact factor: 5.469

View more
  35 in total

1.  Carcinogenic Pesticide Control via Hijacking Endosymbiosis; The Paradigm of DSB-A from Wolbachia pipientis for the Management of Otiorhynchus singularis.

Authors:  Thomas Kostaropoulos; Louis Papageorgiou; Spyridon Champeris Tsaniras; Dimitrios Vlachakis; Elias Eliopoulos
Journal:  In Vivo       Date:  2018 Sep-Oct       Impact factor: 2.155

2.  On the possible origin of protein homochirality, structure, and biochemical function.

Authors:  Jeffrey Skolnick; Hongyi Zhou; Mu Gao
Journal:  Proc Natl Acad Sci U S A       Date:  2019-12-10       Impact factor: 11.205

3.  Semi-Supervised Pipeline for Autonomous Annotation of SARS-CoV-2 Genomes.

Authors:  Kristen L Beck; Edward Seabolt; Akshay Agarwal; Gowri Nayar; Simone Bianco; Harsha Krishnareddy; Timothy A Ngo; Mark Kunitomi; Vandana Mukherjee; James H Kaufman
Journal:  Viruses       Date:  2021-12-03       Impact factor: 5.048

4.  BitterMatch: recommendation systems for matching molecules with bitter taste receptors.

Authors:  Eitan Margulis; Yuli Slavutsky; Tatjana Lang; Maik Behrens; Yuval Benjamini; Masha Y Niv
Journal:  J Cheminform       Date:  2022-07-07       Impact factor: 8.489

5.  Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance.

Authors:  Xiaolong Wang; Quanjiang Dong; Gang Chen; Jianye Zhang; Yongqiang Liu; Yujia Cai
Journal:  BMC Genomics       Date:  2022-06-02       Impact factor: 4.547

6.  Large scale genotype- and phenotype-driven machine learning in Von Hippel-Lindau disease.

Authors:  Andreea Chiorean; Kirsten M Farncombe; Sean Delong; Veronica Andric; Safa Ansar; Clarissa Chan; Kaitlin Clark; Arpad M Danos; Yizhuo Gao; Rachel H Giles; Anna Goldenberg; Payal Jani; Kilannin Krysiak; Lynzey Kujan; Samantha Macpherson; Eamonn R Maher; Liam G McCoy; Yasser Salama; Jason Saliba; Lana Sheta; Malachi Griffith; Obi L Griffith; Lauren Erdman; Arun Ramani; Raymond H Kim
Journal:  Hum Mutat       Date:  2022-05-10       Impact factor: 4.700

7.  Fast detection of differential chromatin domains with SCIDDO.

Authors:  Peter Ebert; Marcel H Schulz
Journal:  Bioinformatics       Date:  2021-06-09       Impact factor: 6.937

8.  Parameterizing sequence alignment with an explicit evolutionary model.

Authors:  Elena Rivas; Sean R Eddy
Journal:  BMC Bioinformatics       Date:  2015-12-10       Impact factor: 3.169

9.  Query-seeded iterative sequence similarity searching improves selectivity 5-20-fold.

Authors:  William R Pearson; Weizhong Li; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2017-04-20       Impact factor: 16.971

10.  An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.

Authors:  Deborah Galpert; Sara Del Río; Francisco Herrera; Evys Ancede-Gallardo; Agostinho Antunes; Guillermin Agüero-Chapin
Journal:  Biomed Res Int       Date:  2015-10-29       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.