Literature DB >> 15851683

Solving the protein sequence metric problem.

William R Atchley1, Jieping Zhao, Andrew D Fernandes, Tanja Drüke.   

Abstract

Biological sequences are composed of long strings of alphabetic letters rather than arrays of numerical values. Lack of a natural underlying metric for comparing such alphabetic data significantly inhibits sophisticated statistical analyses of sequences, modeling structural and functional aspects of proteins, and related problems. Herein, we use multivariate statistical analyses on almost 500 amino acid attributes to produce a small set of highly interpretable numeric patterns of amino acid variability. These high-dimensional attribute data are summarized by five multidimensional patterns of attribute covariation that reflect polarity, secondary structure, molecular volume, codon diversity, and electrostatic charge. Numerical scores for each amino acid then transform amino acid sequences for statistical analyses. Relationships between transformed data and amino acid substitution matrices show significant associations for polarity and codon diversity scores. Transformed alphabetic data are used in analysis of variance and discriminant analysis to study DNA binding in the basic helix-loop-helix proteins. The transformed scores offer a general solution for analyzing a wide variety of sequence analysis problems.

Mesh:

Substances:

Year:  2005        PMID: 15851683      PMCID: PMC1088356          DOI: 10.1073/pnas.0408677102

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  18 in total

1.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis.

Authors:  W R Atchley; K R Wollenberg; W M Fitch; W Terhalle; A W Dress
Journal:  Mol Biol Evol       Date:  2000-01       Impact factor: 16.240

2.  Positional dependence, cliques, and predictive motifs in the bHLH protein domain.

Authors:  W R Atchley; W Terhalle; A Dress
Journal:  J Mol Evol       Date:  1999-05       Impact factor: 2.395

3.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

4.  Sequence signatures and the probabilistic identification of proteins in the Myc-Max-Mad network.

Authors:  William R Atchley; Andrew D Fernandes
Journal:  Proc Natl Acad Sci U S A       Date:  2005-04-25       Impact factor: 11.205

5.  Exhaustive matching of the entire protein sequence database.

Authors:  G H Gonnet; M A Cohen; S A Benner
Journal:  Science       Date:  1992-06-05       Impact factor: 47.728

6.  A natural classification of the basic helix-loop-helix class of transcription factors.

Authors:  W R Atchley; W M Fitch
Journal:  Proc Natl Acad Sci U S A       Date:  1997-05-13       Impact factor: 11.205

7.  Covariation of residues in the homeodomain sequence family.

Authors:  N D Clarke
Journal:  Protein Sci       Date:  1995-11       Impact factor: 6.725

8.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis.

Authors:  B T Korber; R M Farber; D H Wolpert; A S Lapedes
Journal:  Proc Natl Acad Sci U S A       Date:  1993-08-01       Impact factor: 11.205

9.  Conformation of amino acid side-chains in proteins.

Authors:  J Janin; S Wodak
Journal:  J Mol Biol       Date:  1978-11-05       Impact factor: 5.469

10.  Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap.

Authors:  K R Wollenberg; W R Atchley
Journal:  Proc Natl Acad Sci U S A       Date:  2000-03-28       Impact factor: 11.205

View more
  152 in total

1.  SySAP: a system-level predictor of deleterious single amino acid polymorphisms.

Authors:  Tao Huang; Chuan Wang; Guoqing Zhang; Lu Xie; Yixue Li
Journal:  Protein Cell       Date:  2011-12-19       Impact factor: 14.870

2.  Tree preserving embedding.

Authors:  Albert D Shieh; Tatsunori B Hashimoto; Edoardo M Airoldi
Journal:  Proc Natl Acad Sci U S A       Date:  2011-09-26       Impact factor: 11.205

3.  TCRβ repertoire of CD4+ and CD8+ T cells is distinct in richness, distribution, and CDR3 amino acid composition.

Authors:  Hoi Ming Li; Toyoko Hiroi; Yongqing Zhang; Alvin Shi; Guobing Chen; Supriyo De; E Jeffrey Metter; William H Wood; Alexei Sharov; Joshua D Milner; Kevin G Becker; Ming Zhan; Nan-ping Weng
Journal:  J Leukoc Biol       Date:  2015-09-22       Impact factor: 4.962

4.  Coevolutionary patterns in cytochrome c oxidase subunit I depend on structural and functional context.

Authors:  Zhengyuan O Wang; David D Pollock
Journal:  J Mol Evol       Date:  2007-11       Impact factor: 2.395

5.  Differential neutralization efficiency of hemagglutinin epitopes, antibody interference, and the design of influenza vaccines.

Authors:  Wilfred Ndifon; Ned S Wingreen; Simon A Levin
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-13       Impact factor: 11.205

6.  Prediction of protein amidation sites by feature selection and analysis.

Authors:  Weiren Cui; Shen Niu; Lulu Zheng; Lele Hu; Tao Huang; Lei Gu; Kaiyan Feng; Ning Zhang; Yudong Cai; Yixue Li
Journal:  Mol Genet Genomics       Date:  2013-06-21       Impact factor: 3.291

7.  Biophysicochemical Motifs in T-cell Receptor Sequences Distinguish Repertoires from Tumor-Infiltrating Lymphocyte and Adjacent Healthy Tissue.

Authors:  Jared Ostmeyer; Scott Christley; Inimary T Toby; Lindsay G Cowell
Journal:  Cancer Res       Date:  2019-01-08       Impact factor: 12.701

8.  Molecular evolutionary history of Sugarcane yellow leaf virus based on sequence analysis of RNA-dependent RNA polymerase and putative aphid transmission factor-coding genes.

Authors:  Abdelaleim Ismail ElSayed; Moncef Boulila; Philippe Rott
Journal:  J Mol Evol       Date:  2014-06-22       Impact factor: 2.395

9.  Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics.

Authors:  J Gregory Caporaso; Sandra Smit; Brett C Easton; Lawrence Hunter; Gavin A Huttley; Rob Knight
Journal:  BMC Evol Biol       Date:  2008-12-03       Impact factor: 3.260

Review 10.  Analytical methods for inferring functional effects of single base pair substitutions in human cancers.

Authors:  William Lee; Peng Yue; Zemin Zhang
Journal:  Hum Genet       Date:  2009-05-12       Impact factor: 4.132

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.