Literature DB >> 19351620

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

Eric L Peterson1, Jané Kondev, Julie A Theriot, Rob Phillips.   

Abstract

MOTIVATION: Many proteins with vastly dissimilar sequences are found to share a common fold, as evidenced in the wealth of structures now available in the Protein Data Bank. One idea that has found success in various applications is the concept of a reduced amino acid alphabet, wherein similar amino acids are clustered together. Given the structural similarity exhibited by many apparently dissimilar sequences, we undertook this study looking for improvements in fold recognition by comparing protein sequences written in a reduced alphabet.
RESULTS: We tested over 150 of the amino acid clustering schemes proposed in the literature with all-versus-all pairwise sequence alignments of sequences in the Distance mAtrix aLIgnment database. We combined several metrics from information retrieval popular in the literature: mean precision, area under the Receiver Operating Characteristic curve and recall at a fixed error rate and found that, in contrast to previous work, reduced alphabets in many cases outperform full alphabets. We find that reduced alphabets can perform at a level comparable to full alphabets in correct pairwise alignment of sequences and can show increased sensitivity to pairs of sequences with structural similarity but low-sequence identity. Based on these results, we hypothesize that reduced alphabets may also show performance gains with more sophisticated methods such as profile and pattern searches. AVAILABILITY: A table of results as well as the substitution matrices and residue groupings from this study can be downloaded from (http://www.rpgroup.caltech.edu/publications/supplements/alphabets).

Mesh:

Substances:

Year:  2009        PMID: 19351620      PMCID: PMC2732308          DOI: 10.1093/bioinformatics/btp164

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  37 in total

1.  An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp70 heat shock proteins.

Authors:  P Bork; C Sander; A Valencia
Journal:  Proc Natl Acad Sci U S A       Date:  1992-08-15       Impact factor: 11.205

2.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

Review 3.  De novo proteins from designed combinatorial libraries.

Authors:  Michael H Hecht; Aditi Das; Abigail Go; Luke H Bradley; Yinan Wei
Journal:  Protein Sci       Date:  2004-07       Impact factor: 6.725

4.  Recognition of different nucleotide-binding sites in primary structures using a property-pattern approach.

Authors:  P Bork; C Grunwald
Journal:  Eur J Biochem       Date:  1990-07-31

5.  Theory for the folding and stability of globular proteins.

Authors:  K A Dill
Journal:  Biochemistry       Date:  1985-03-12       Impact factor: 3.162

6.  A structural basis for sequence comparisons. An evaluation of scoring methodologies.

Authors:  M S Johnson; J P Overington
Journal:  J Mol Biol       Date:  1993-10-20       Impact factor: 5.469

7.  Protein structure comparison by alignment of distance matrices.

Authors:  L Holm; C Sander
Journal:  J Mol Biol       Date:  1993-09-05       Impact factor: 5.469

8.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

9.  Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

Authors:  W R Pearson
Journal:  Genomics       Date:  1991-11       Impact factor: 5.736

10.  The relation between the divergence of sequence and structure in proteins.

Authors:  C Chothia; A M Lesk
Journal:  EMBO J       Date:  1986-04       Impact factor: 11.598

View more
  17 in total

1.  An information-theoretic classification of amino acids for the assessment of interfaces in protein-protein docking.

Authors:  Christophe Jardin; Arno G Stefani; Martin Eberhardt; Johannes B Huber; Heinrich Sticht
Journal:  J Mol Model       Date:  2013-07-05       Impact factor: 1.810

2.  SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier.

Authors:  Xiao Hu; Iddo Friedberg
Journal:  Gigascience       Date:  2019-10-01       Impact factor: 6.524

3.  GRASP: guided reference-based assembly of short peptides.

Authors:  Cuncong Zhong; Youngik Yang; Shibu Yooseph
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

4.  Fold homology detection using sequence fragment composition profiles of proteins.

Authors:  Armando D Solis; Shalom R Rackovsky
Journal:  Proteins       Date:  2010-10

5.  Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets.

Authors:  Aydin Albayrak; Hasan H Otu; Ugur O Sezerman
Journal:  BMC Bioinformatics       Date:  2010-08-18       Impact factor: 3.169

6.  Evidence of evolutionary constraints that influences the sequence composition and diversity of mitochondrial matrix targeting signals.

Authors:  Stephen R Doyle; Naga R P Kasinadhuni; Chee Kai Chan; Warwick N Grant
Journal:  PLoS One       Date:  2013-06-25       Impact factor: 3.240

7.  RAPSearch: a fast protein similarity search tool for short reads.

Authors:  Yuzhen Ye; Jeong-Hyeon Choi; Haixu Tang
Journal:  BMC Bioinformatics       Date:  2011-05-15       Impact factor: 3.307

8.  iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

Authors:  Bin Liu; Jinghao Xu; Xun Lan; Ruifeng Xu; Jiyun Zhou; Xiaolong Wang; Kuo-Chen Chou
Journal:  PLoS One       Date:  2014-09-03       Impact factor: 3.240

9.  Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids.

Authors:  Jayanta Kumar Das; Provas Das; Korak Kumar Ray; Pabitra Pal Choudhury; Siddhartha Sankar Jana
Journal:  PLoS One       Date:  2016-12-08       Impact factor: 3.240

10.  A Tale of Loops and Tails: The Role of Intrinsically Disordered Protein Regions in R-Loop Recognition and Phase Separation.

Authors:  Leonardo G Dettori; Diego Torrejon; Arijita Chakraborty; Arijit Dutta; Mohamed Mohamed; Csaba Papp; Vladimir A Kuznetsov; Patrick Sung; Wenyi Feng; Alaji Bah
Journal:  Front Mol Biosci       Date:  2021-06-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.