Literature DB >> 24496727

Principal components analysis of protein sequence clusters.

Bo Wang1, Michael A Kennedy.   

Abstract

Sequence analysis of large protein families can produce sub-clusters even within the same family. In some cases, it is of interest to know precisely which amino acid position variations are most responsible for driving separation into sub-clusters. In large protein families composed of large proteins, it can be quite challenging to assign the relative importance to specific amino acid positions. Principal components analysis (PCA) is ideal for such a task, since the problem is posed in a large variable space, i.e. the number of amino acids that make up the protein sequence, and PCA is powerful at reducing the dimensionality of complex problems by projecting the data into an eigenspace that represents the directions of greatest variation. However, PCA of aligned protein sequence families is complicated by the fact that protein sequences are traditionally represented by single letter alphabetic codes, whereas PCA of protein sequence families requires conversion of sequence information into a numerical representation. Here, we introduce a new amino acid sequence conversion algorithm optimized for PCA data input. The method is demonstrated using a small artificial dataset to illustrate the characteristics and performance of the algorithm, as well as a small protein sequence family consisting of nine members, COG2263, and finally with a large protein sequence family, Pfam04237, which contains more than 1,800 sequences that group into two sub-clusters.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24496727      PMCID: PMC3982804          DOI: 10.1007/s10969-014-9173-2

Source DB:  PubMed          Journal:  J Struct Funct Genomics        ISSN: 1345-711X


  26 in total

1.  Assignment of enzyme substrate specificity by principal component analysis of aligned protein sequences: an experimental test using DNA glycosylase homologs.

Authors:  A Gogos; D Jantz; S Sentürker; D Richardson; M Dizdaroglu; N D Clarke
Journal:  Proteins       Date:  2000-07-01

2.  Automatic methods for predicting functionally important residues.

Authors:  Antonio del Sol; Antonio del Sol Mesa; Florencio Pazos; Alfonso Valencia
Journal:  J Mol Biol       Date:  2003-02-28       Impact factor: 5.469

3.  Solving the protein sequence metric problem.

Authors:  William R Atchley; Jieping Zhao; Andrew D Fernandes; Tanja Drüke
Journal:  Proc Natl Acad Sci U S A       Date:  2005-04-25       Impact factor: 11.205

4.  Solvent accessibility, residue charge and residue volume, the three ingredients of a robust amino acid substitution matrix.

Authors:  Hani Goodarzi; Ali Katanforoush; Noorossadat Torabi; Hamed Shateri Najafabadi
Journal:  J Theor Biol       Date:  2006-12-19       Impact factor: 2.691

Review 5.  Computation and analysis of genomic multi-sequence alignments.

Authors:  Mathieu Blanchette
Journal:  Annu Rev Genomics Hum Genet       Date:  2007       Impact factor: 8.929

6.  Robust principal component analysis by self-organizing rules based on statistical physics approach.

Authors:  L Xu; A L Yuille
Journal:  IEEE Trans Neural Netw       Date:  1995

7.  Protein interactions and ligand binding: from protein subfamilies to functional specificity.

Authors:  Antonio Rausell; David Juan; Florencio Pazos; Alfonso Valencia
Journal:  Proc Natl Acad Sci U S A       Date:  2010-01-19       Impact factor: 11.205

8.  Analysis of metabolomic PCA data using tree diagrams.

Authors:  Mark T Werth; Steven Halouska; Matthew D Shortridge; Bo Zhang; Robert Powers
Journal:  Anal Biochem       Date:  2009-12-21       Impact factor: 3.365

9.  Combining specificity determining and conserved residues improves functional site prediction.

Authors:  Olga V Kalinina; Mikhail S Gelfand; Robert B Russell
Journal:  BMC Bioinformatics       Date:  2009-06-09       Impact factor: 3.169

10.  Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins.

Authors:  Qiwen Dong; Xiaolong Wang; Lei Lin; Yi Guan
Journal:  BMC Bioinformatics       Date:  2007-05-05       Impact factor: 3.169

View more
  6 in total

1.  Computational prediction of active sites and ligands in different AHL quorum quenching lactonases and acylases.

Authors:  Zulkar Nain; Utpal Kumar Adhikari; Faruq Abdulla; Nahid Hossain; Nirmal Chandra Barman; Fariha Jasin Mansur; Hiroyuki Azakami; Mohammad Minnatul Karim
Journal:  J Biosci       Date:  2020       Impact factor: 1.826

2.  Structural basis of P[II] rotavirus evolution and host ranges under selection of histo-blood group antigens.

Authors:  Shenyuan Xu; Kristen Rose McGinnis; Yang Liu; Pengwei Huang; Ming Tan; Michael Robert Stuckert; Riley Erin Burnside; Elsa Grace Jacob; Shuisong Ni; Xi Jiang; Michael A Kennedy
Journal:  Proc Natl Acad Sci U S A       Date:  2021-09-07       Impact factor: 11.205

3.  Cdc48-like protein of actinobacteria (Cpa) is a novel proteasome interactor in mycobacteria and related organisms.

Authors:  Michal Ziemski; Ahmad Jomaa; Daniel Mayer; Sonja Rutz; Christoph Giese; Dmitry Veprintsev; Eilika Weber-Ban
Journal:  Elife       Date:  2018-05-29       Impact factor: 8.140

4.  Principal Component Analysis Applications in COVID-19 Genome Sequence Studies.

Authors:  Bo Wang; Lin Jiang
Journal:  Cognit Comput       Date:  2021-01-13       Impact factor: 4.890

5.  Pairwise sequence similarity mapping with PaSiMap: Reclassification of immunoglobulin domains from titin as case study.

Authors:  Kathy Su; Olga Mayans; Kay Diederichs; Jennifer R Fleming
Journal:  Comput Struct Biotechnol J       Date:  2022-09-26       Impact factor: 6.155

6.  Protein Structural Information and Evolutionary Landscape by In Vitro Evolution.

Authors:  Marco Fantini; Simonetta Lisi; Paolo De Los Rios; Antonino Cattaneo; Annalisa Pastore
Journal:  Mol Biol Evol       Date:  2020-04-01       Impact factor: 16.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.