Literature DB >> 17998252

AAindex: amino acid index database, progress report 2008.

Shuichi Kawashima¹, Piotr Pokarowski, Maria Pokarowska, Andrzej Kolinski, Toshiaki Katayama, Minoru Kanehisa.

Abstract

AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).

Entities: Chemical

Mesh：

Substances：
Amino Acids
Proteins

Year: 2007 PMID： 17998252 PMCID： PMC2238890 DOI： 10.1093/nar/gkm998

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Protein structures and functions are defined by the combinations of physicochemical and biochemical properties of 20 naturally occurring amino acids that are the building-blocks of proteins. A wide variety of properties of amino acids have been investigated through a large number of experiments and theoretical studies. Each of these amino acid properties that can be represented by a set of 20 numerical values is referred to as an amino acid index. Nakai et al. (1) collected 222 amino acid indices from published literature and investigated the relationships among them using hierarchical cluster analysis. They also released the amino acid indices as an online database. In 1996, Tomii and Kanehisa (2) further collected amino acid indices to enrich the database. Additionally, they also collected 42 amino acid substitution matrices from the literature and released the collection as AAindex2. The AAindex database is continuously updated by the present authors (3,4). AAindex has been used in wide-ranging bioinformatics research on protein sequences, such as predicting protein subcellular localization (5), immunogenicity of MHC class I binding peptides (6), protein SUMO modification site (7) and coordinated substitutions in multiple alignments of protein sequences (8). Furthermore, there is a derivative database of AAindex (UMBC AAindex Database: http://www.evolvingcode.net:8080/aaindex/) and a web tool for visualizing relationships among AAindex entries (9). Given the examples cited here, AAindex has become a useful resource in bioinformatics. In 2005, Pokarowski et al. (10) compared 29 published matrices of protein pairwise contact potentials, i.e. energy functions that are obtained from statistical analysis of protein structures (10). These potentials have long been used to predict protein structures in silico. Pokarowski and coworkers elucidated that each of the contact potentials is similar to one of two popular matrices derived by Miyazawa and Jernigan (11). Recently, working on 29 mostly new amino acid substitution matrices and 5 contact potentials, the same team (12) obtained segregation of substitution matrices similar to Tomii and Kanehisa (2). Moreover, they found intermediate links between substitution matrices and contact potentials—matrices and potentials that exhibit mutual correlations of at least 0.8. In both works (10,12), Pokarowski and coworkers approximated matrices by simple functions of amino acid indices, which allow us to comprehend better the exchangeability of amino acids as well as the residue–residue interactions in proteins. These relations between substitution matrices, contact potentials and amino acid indices provide motivation to extend the AAindex database. In the present work, we have compiled the data collected in the study on contact potentials (10) as a new section of AAindex database, named AAindex3. As a result we believe that the AAindex has increased its utility in the bioinformatics study of proteins. In this paper we report the current status of the three sections of AAindex.

THE CURRENT DATABASE

The AAindex is released approximately annually. The latest version is the 9.0 release. The AAindex database is a flat file database that consists of three sections: AAindex1 for the amino acid indices, AAindex2 for the amino acid substitution matrices and AAindex3 for the amino acid contact potentials. The contents of the three sections are as follows.

AAindex1

The AAIndex1 currently contains 544 amino acid indices. Each entry consists of an accession number, a short description of the index, the reference information and the numerical values for the properties of 20 amino acids. We have provided a link to the corresponding PubMed entries of each AAindex entry, instead of a link to the LitDB literature database (13) that we originally used. In addition, each entry contains cross-links to other entries with an absolute value for the correlation coefficient of 0.8 or larger. The links enable the users to identify a set of entries describing similar properties. In some instances the values are not reported for all 20 amino acids. To represent an overview of the relationships among current amino acids indices, we constructed the minimum spanning tree of amino acid indices by the procedure described by Tomii et al. (2) (Figure 1). In Figure 1, each rectangle represents an index. The colored rectangles are the 402 indices classified in six groups defined by Tomii and coworkers. The indices belonging to the Tomii's classification are still grouped into clusters. Newly added indices are distributed evenly across the tree. That is, the indices for various kinds of properties have been added to the AAindex.

Figure 1.

The minimum spanning tree of the amino acid indices stored in the AAindex1 release 9.0. Each rectangle is an amino acid index. Colored nodes represent the indices classified by Tomii et al. (2) Red: alpha and turn propensities, Yellow: beta propensity, Green: composition, Blue: hydrophobicity, Cyan: physicochemical properties, Gray: other properties. White: the indices added to the AAindex after the release 3.0 by Tomii et al. (2).

AAindex2

The AAindex2 currently contains 94 amino acid substitution matrices: 67 symmetric matrices and 27 non-symmetric matrices. The format of the entry is almost the same as that of AAindex1 except that it contains 210 numerical values (20 diagonal and 20 × 19/2 off-diagonal elements) for a symmetric matrix and 400 or more numerical values for a non-symmetric matrix (some matrices include a gap or distinguish two states of cysteine). In the previous release, each symmetric matrix, which is triangular in shape, was folded into a 10 × 21 table for the purpose of saving space, and columns were separated by space characters. In the present release, symmetric matrices are not folded and delimiter of columns has been changed into a tab character easier parsing of the entry.

AAindex3

The AAindex3 section currently contains 47 amino acid contact potential matrices: 44 symmetric matrices and 3 non-symmetric matrices. The format of the entry is almost the same as that of AAindex2. A sample entry of the AAindex3 is shown in Figure 2.

Figure 2.

An example of database entry in the AAindex3. Each record of an entry is identified by the one-letter codes: H, accession number; D, definition of the entry; R, PMID identifier; A, author(s); T, title of the journal article; J, journal citation information; M, actual data in the specified order.

AVAILABILITY

The AAindex database can be retrieved through the DBGET/LinkDB system (14) of the Japanese GenomeNet service (15) at http://www.genome.jp/dbget-bin/www_bfind?aaindex. The DBGET/LinkDB system integrates most of the major molecular biology databases and is especially suited for using hyperlinks to related entries within the AAindex database as well as to the other databases. Alternatively, the entries database may be copied and used locally. The URL for anonymous FTP is: ftp://ftp.genome.jp/pub/db/community/aaindex/ BioRuby that is a bioinformatics library of Ruby programming language has provided the useful functions to handle the AAindex database (http://bioruby.org/). EMBOSS (16) has provided a program to extract the index data from the AAindex entry. Users are requested to cite this article when making use of the AAindex database.

15 in total

1. AAindex: amino acid index database.

Authors: S Kawashima; M Kanehisa
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues.

Authors: S Miyazawa; R L Jernigan
Journal: Proteins Date: 1999-01-01

3. CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences.

Authors: Dmitry A Afonnikov; Nikolay A Kolchanov
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

4. Inferring ideal amino acid interaction forms from statistical protein contact potentials.

Authors: Piotr Pokarowski; Andrzej Kloczkowski; Robert L Jernigan; Neha S Kothari; Maria Pokarowska; Andrzej Kolinski
Journal: Proteins Date: 2005-04-01

5. AAindex: Amino Acid Index Database.

Authors: S Kawashima; H Ogata; M Kanehisa
Journal: Nucleic Acids Res Date: 1999-01-01 Impact factor: 16.971

6. DBGET/LinkDB: an integrated database retrieval system.

Authors: W Fujibuchi; S Goto; H Migimatsu; I Uchiyama; A Ogiwara; Y Akiyama; M Kanehisa
Journal: Pac Symp Biocomput Date: 1998

Review 7. Linking databases and organisms: GenomeNet resources in Japan.

Authors: M Kanehisa
Journal: Trends Biochem Sci Date: 1997-11 Impact factor: 13.807

8. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins.

Authors: K Tomii; M Kanehisa
Journal: Protein Eng Date: 1996-01

9. Cluster analysis of amino acid indices for prediction of protein structure and function.

Authors: K Nakai; A Kidera; M Kanehisa
Journal: Protein Eng Date: 1988-07

10. Ideal amino acid exchange forms for approximating substitution matrices.

Authors: Piotr Pokarowski; Andrzej Kloczkowski; Szymon Nowakowski; Maria Pokarowska; Robert L Jernigan; Andrzej Kolinski
Journal: Proteins Date: 2007-11-01

310 in total

1. Real value prediction of protein folding rate change upon point mutation.

Authors: Liang-Tsung Huang; M Michael Gromiha
Journal: J Comput Aided Mol Des Date: 2012-03-18 Impact factor: 3.686

2. Mapping of H3N2 influenza antigenic evolution in China reveals a strategy for vaccine strain recommendation.

Authors: Xiangjun Du; Libo Dong; Yu Lan; Yousong Peng; Aiping Wu; Ye Zhang; Weijuan Huang; Dayan Wang; Min Wang; Yuanji Guo; Yuelong Shu; Taijiao Jiang
Journal: Nat Commun Date: 2012-02-28 Impact factor: 14.919

3. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins.

Authors: Fatemeh Miri Disfani; Wei-Lun Hsu; Marcin J Mizianty; Christopher J Oldfield; Bin Xue; A Keith Dunker; Vladimir N Uversky; Lukasz Kurgan
Journal: Bioinformatics Date: 2012-06-15 Impact factor: 6.937

4. Prediction of O-glycosylation sites based on multi-scale composition of amino acids and feature selection.

Authors: Yuan Chen; Wei Zhou; Haiyan Wang; Zheming Yuan
Journal: Med Biol Eng Comput Date: 2015-03-10 Impact factor: 2.602

5. Predicting protein crystallization propensity from protein sequence.

Authors: György Babnigg; Andrzej Joachimiak
Journal: J Struct Funct Genomics Date: 2010-02-23

6. Capturing the mutational landscape of the beta-lactamase TEM-1.

Authors: Hervé Jacquier; André Birgy; Hervé Le Nagard; Yves Mechulam; Emmanuelle Schmitt; Jérémy Glodt; Beatrice Bercot; Emmanuelle Petit; Julie Poulain; Guilène Barnaud; Pierre-Alexis Gros; Olivier Tenaillon
Journal: Proc Natl Acad Sci U S A Date: 2013-07-22 Impact factor: 11.205

7. Proteochemometric modeling of the antigen-antibody interaction: new fingerprints for antigen, antibody and epitope-paratope interaction.

Authors: Tianyi Qiu; Han Xiao; Qingchen Zhang; Jingxuan Qiu; Yiyan Yang; Dingfeng Wu; Zhiwei Cao; Ruixin Zhu
Journal: PLoS One Date: 2015-04-22 Impact factor: 3.240

8. Evolution of general transcription factors.

Authors: K V Gunbin; A Ruvinsky
Journal: J Mol Evol Date: 2012-12-11 Impact factor: 2.395

9. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

Authors: Yanju Zhang; Ruopeng Xie; Jiawei Wang; André Leier; Tatiana T Marquez-Lago; Tatsuya Akutsu; Geoffrey I Webb; Kuo-Chen Chou; Jiangning Song
Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622

10. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites.

Authors: Zhen Chen; Xuhan Liu; Fuyi Li; Chen Li; Tatiana Marquez-Lago; André Leier; Tatsuya Akutsu; Geoffrey I Webb; Dakang Xu; Alexander Ian Smith; Lei Li; Kuo-Chen Chou; Jiangning Song
Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622