| Literature DB >> 20047649 |
Karel Zimmermann1, Jean-François Gibrat.
Abstract
BACKGROUND: Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20047649 PMCID: PMC3098074 DOI: 10.1186/1471-2105-11-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Top panel: the blue curve is the plot of the substitution matrix elements (210 elements of the lower triangular BLOSUM62, non-rounded, expressed in bit units) sorted by increasing value; the red curve is their approximations, . Bottom panel: the blue curve is the same as above but with centered matrix elements (i.e., the mean of the shifted BLOSUM62 matrix is zero), the red curve is the approximation computed with the centered vectors, as described in the text. The x-axis corresponds to the sorted 210 lower triangular matrix elements, e.g., the 210th element is the diagonal element corresponding to the tryptophan, s- the largest element in the BLOSUM62 matrix. The y-axis corresponds to the values of the matrix elements. Notice that correlation coefficients are very similar in both cases (0.989 for the curves of the top panel vs 0.998 for the curves of the bottom panel).
Figure 2Plot of the matrix mean (blue), matrix relative entropy (red) and amino acid galaxy radius, . The x-axis corresponds to BLOSUM matrix indices, from 30 to 100 by increment of 5, the y-axis corresponds to the values.
Figure 3Three-dimensional projection of the (non-rounded) BLOSUM62 amino acid galaxy together with its physicochemical characteristics. Property vectors are projected on the left, bottom and rear faces of the parallelepiped. The values on the X, Y, Z axes correspond to the first 3 components of the 20 amino acid vectors.
Clustering of amino acids for BLOSUM62 matrix by the k-means algorithm.
| 2: | C | I | L | M | V | F | W | Y | • | A | G | P | S | T | D | E | N | Q | K | R | H | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3: | F | W | Y | • | C | I | L | M | V | • | A | G | P | S | T | D | E | N | Q | K | R | H | |||
| 4: | F | W | Y | • | C | I | L | M | V | • | A | G | P | S | T | • | D | E | N | Q | K | R | H | ||
| 5: | F | W | Y | • | C | I | L | M | V | • | A | G | P | S | T | • | D | E | N | Q | K | R | • | H | |
| 6: | F | W | Y | • | C | I | L | M | V | • | A | T | • | G | P | S | • | D | E | N | Q | K | R | • | H |
Contributions of the physicochemical properties to BLOSUM matrices.
| Properties | 1rsvol | 2chrg | 3achrg | 4awrat | 5arom | 6hdrp | 7kbulk | 8khdr2 | 9khdr3 |
|---|---|---|---|---|---|---|---|---|---|
| BLOSUM30 | 12.0 | 5.7 | 5.4 | 10.9 | 13.4 | 12.0 | 11.8 | 12.7 | 11.7 |
| BLOSUM62 | 12.7 | 6.0 | 7.6 | 9.6 | 14.2 | 17.7 | 12.1 | 15.6 | 17.1 |
| BLOSUM100 | 12.0 | 6.4 | 7.5 | 8.9 | 13.1 | 16.6 | 11.4 | 14.7 | 16.2 |
| BLOSUM30 | 11.5 | 12.3 | 9.2 | 5.3 | 7.2 | 9.7 | 10.6 | 11.6 | 5.2 ± 1.4 |
| BLOSUM62 | 17.2 | 15.5 | 12.5 | 5.6 | 9.0 | 12.8 | 13.9 | 11.6 | 5.2 ± 1.4 |
| BLOSUM100 | 16.2 | 14.9 | 12.4 | 5.6 | 8.8 | 12.2 | 13.2 | 10.8 | 5.2 ± 1.3 |
(Contributions are in % see Eq. 9). The property categories are: volume (1rsvol), charge (2chrg, 3achrg), aromaticity (5arom), hydrophobicity (6hdrp, 8khdr2, 9khdr3, 10khdr4, 16hydro), bulkiness (7kbulk), mass (17mass), α-propensity (13kaprf), β-propensity (11kbpr1, 12kbpr2). The last column 18rand is a simulated "random" property.
Figure 4Plot of the matrix mean (blue), matrix relative entropy (red) and amino acid galaxy radius, . As explained in the text, the observed lack of monotonicity of the matrix mean and galaxy radius curves, is probably due to the fact that rounded PAM matrices were used. The x-axis corresponds to PAM matrix indices, from 10 to 500 by increment of 10, the y-axis corresponds to the values.