Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Principal components analysis of protein sequence clusters.

Literature DB >> 24496727

Principal components analysis of protein sequence clusters.

Abstract

Sequence analysis of large protein families can produce sub-clusters even within the same family. In some cases, it is of interest to know precisely which amino acid position variations are most responsible for driving separation into sub-clusters. In large protein families composed of large proteins, it can be quite challenging to assign the relative importance to specific amino acid positions. Principal components analysis (PCA) is ideal for such a task, since the problem is posed in a large variable space, i.e. the number of amino acids that make up the protein sequence, and PCA is powerful at reducing the dimensionality of complex problems by projecting the data into an eigenspace that represents the directions of greatest variation. However, PCA of aligned protein sequence families is complicated by the fact that protein sequences are traditionally represented by single letter alphabetic codes, whereas PCA of protein sequence families requires conversion of sequence information into a numerical representation. Here, we introduce a new amino acid sequence conversion algorithm optimized for PCA data input. The method is demonstrated using a small artificial dataset to illustrate the characteristics and performance of the algorithm, as well as a small protein sequence family consisting of nine members, COG2263, and finally with a large protein sequence family, Pfam04237, which contains more than 1,800 sequences that group into two sub-clusters.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
Amino Acids
Proteins

Year: 2014 PMID： 24496727 PMCID： PMC3982804 DOI： 10.1007/s10969-014-9173-2

Source DB: PubMed Journal: J Struct Funct Genomics ISSN： 1345-711X

26 in total

1. Assignment of enzyme substrate specificity by principal component analysis of aligned protein sequences: an experimental test using DNA glycosylase homologs.

Authors: A Gogos; D Jantz; S Sentürker; D Richardson; M Dizdaroglu; N D Clarke
Journal: Proteins Date: 2000-07-01

2. Automatic methods for predicting functionally important residues.

Authors: Antonio del Sol; Antonio del Sol Mesa; Florencio Pazos; Alfonso Valencia
Journal: J Mol Biol Date: 2003-02-28 Impact factor: 5.469

3. Solving the protein sequence metric problem.

Authors: William R Atchley; Jieping Zhao; Andrew D Fernandes; Tanja Drüke
Journal: Proc Natl Acad Sci U S A Date: 2005-04-25 Impact factor: 11.205

4. Solvent accessibility, residue charge and residue volume, the three ingredients of a robust amino acid substitution matrix.

Authors: Hani Goodarzi; Ali Katanforoush; Noorossadat Torabi; Hamed Shateri Najafabadi
Journal: J Theor Biol Date: 2006-12-19 Impact factor: 2.691

Review 5. Computation and analysis of genomic multi-sequence alignments.

Authors: Mathieu Blanchette
Journal: Annu Rev Genomics Hum Genet Date: 2007 Impact factor: 8.929

6. Robust principal component analysis by self-organizing rules based on statistical physics approach.

Authors: L Xu; A L Yuille
Journal: IEEE Trans Neural Netw Date: 1995

7. Protein interactions and ligand binding: from protein subfamilies to functional specificity.

Authors: Antonio Rausell; David Juan; Florencio Pazos; Alfonso Valencia
Journal: Proc Natl Acad Sci U S A Date: 2010-01-19 Impact factor: 11.205

8. Analysis of metabolomic PCA data using tree diagrams.

Authors: Mark T Werth; Steven Halouska; Matthew D Shortridge; Bo Zhang; Robert Powers
Journal: Anal Biochem Date: 2009-12-21 Impact factor: 3.365

9. Combining specificity determining and conserved residues improves functional site prediction.

Authors: Olga V Kalinina; Mikhail S Gelfand; Robert B Russell
Journal: BMC Bioinformatics Date: 2009-06-09 Impact factor: 3.169

10. Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins.

Authors: Qiwen Dong; Xiaolong Wang; Lei Lin; Yi Guan
Journal: BMC Bioinformatics Date: 2007-05-05 Impact factor: 3.169

6 in total

1. Computational prediction of active sites and ligands in different AHL quorum quenching lactonases and acylases.

Authors: Zulkar Nain; Utpal Kumar Adhikari; Faruq Abdulla; Nahid Hossain; Nirmal Chandra Barman; Fariha Jasin Mansur; Hiroyuki Azakami; Mohammad Minnatul Karim
Journal: J Biosci Date: 2020 Impact factor: 1.826

2. Structural basis of P[II] rotavirus evolution and host ranges under selection of histo-blood group antigens.

Authors: Shenyuan Xu; Kristen Rose McGinnis; Yang Liu; Pengwei Huang; Ming Tan; Michael Robert Stuckert; Riley Erin Burnside; Elsa Grace Jacob; Shuisong Ni; Xi Jiang; Michael A Kennedy
Journal: Proc Natl Acad Sci U S A Date: 2021-09-07 Impact factor: 11.205

3. Cdc48-like protein of actinobacteria (Cpa) is a novel proteasome interactor in mycobacteria and related organisms.

Authors: Michal Ziemski; Ahmad Jomaa; Daniel Mayer; Sonja Rutz; Christoph Giese; Dmitry Veprintsev; Eilika Weber-Ban
Journal: Elife Date: 2018-05-29 Impact factor: 8.140

4. Principal Component Analysis Applications in COVID-19 Genome Sequence Studies.

Authors: Bo Wang; Lin Jiang
Journal: Cognit Comput Date: 2021-01-13 Impact factor: 4.890

5. Pairwise sequence similarity mapping with PaSiMap: Reclassification of immunoglobulin domains from titin as case study.

Authors: Kathy Su; Olga Mayans; Kay Diederichs; Jennifer R Fleming
Journal: Comput Struct Biotechnol J Date: 2022-09-26 Impact factor: 6.155

6. Protein Structural Information and Evolutionary Landscape by In Vitro Evolution.

Authors: Marco Fantini; Simonetta Lisi; Paolo De Los Rios; Antonino Cattaneo; Annalisa Pastore
Journal: Mol Biol Evol Date: 2020-04-01 Impact factor: 16.240

6 in total