| Literature DB >> 27122148 |
Martin Hess1,2, Frank Keul3, Michael Goesele1, Kay Hamacher2.
Abstract
BACKGROUND: BLOSUM matrices belong to the most commonly used substitution matrix series for protein homology search and sequence alignments since their publication in 1992. In 2008, Styczynski et al. discovered miscalculations in the clustering step of the matrix computation. Still, the RBLOSUM64 matrix based on the corrected BLOSUM code was reported to perform worse at a statistically significant level than the BLOSUM62. Here, we present a further correction of the (R)BLOSUM code and provide a thorough performance analysis of BLOSUM-, RBLOSUM- and the newly derived CorBLOSUM-type matrices. Thereby, we assess homology search performance of these matrix-types derived from three different BLOCKS databases on all versions of the ASTRAL20, ASTRAL40 and ASTRAL70 subsets resulting in 51 different benchmarks in total. Our analysis is focused on two of the most popular BLOSUM matrices - BLOSUM50 and BLOSUM62.Entities:
Keywords: ASTRAL; BLOCKS 13+; BLOCKS 14.3; BLOSUM; CorBLOSUM; Correction; Homologous sequence search; Performance evaluation; RBLOSUM; Substitution matrix
Mesh:
Year: 2016 PMID: 27122148 PMCID: PMC4849092 DOI: 10.1186/s12859-016-1060-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of the matrices assessed in this study and their respective clustering values, relative entropies and corresponding scale in bits per unit
| Matrix | Clust. value | Rel. entropy | Bit units |
|---|---|---|---|
| BLOSUM50 5.0 | 50 | 0.4808 | 1/3 |
| RBLOSUM52 5.0 | 52 | 0.4918 | 1/3 |
| CorBLOSUM49 5.0 | 49 | 0.4849 | 1/3 |
| BLOSUM62 5.0 | 62 | 0.6979 | 1/2 |
| RBLOSUM64 5.0 | 64 | 0.7003 | 1/2 |
| CorBLOSUM61 5.0 | 61 | 0.6939 | 1/2 |
| BLOSUM50 13+ | 50 | 0.2430 | 1/4 |
| RBLOSUM59 13+ | 59 | 0.2410 | 1/4 |
| CorBLOSUM57 13+ | 57 | 0.2479 | 1/4 |
| BLOSUM62 13+ | 62 | 0.3672 | 1/3 |
| RBLOSUM69 13+ | 69 | 0.3601 | 1/3 |
| CorBLOSUM66 13+ | 66 | 0.3653 | 1/3 |
| BLOSUM50 14.3 | 50 | 0.1509 | 1/5 |
| RBLOSUM59 14.3 | 59 | 0.1477 | 1/5 |
| CorBLOSUM57 14.3 | 57 | 0.1515 | 1/5 |
| BLOSUM62 14.3 | 62 | 0.2685 | 1/4 |
| RBLOSUM69 14.3 | 69 | 0.2662 | 1/4 |
| CorBLOSUM67 14.3 | 67 | 0.2636 | 1/4 |
Fig. 1Comparison of matrix entries using the same clustering value 62. Shown are the differences of BLOSUM62 and RBLOSUM62 to CorBLOSUM62 for BLOCKS 5, BLOCKS 13+ and BLOCKS 14.3. Blue tiles represent matrix entries where the respective CorBLOSUM62 values are larger than entries of the compared matrix. Red tiles represent the opposite. While differences for BLOCKS 5 based substitution matrices only range from −1 to 1, the range of these differences is substantially larger for newer BLOCKS versions
Fig. 2Comparison of CorBLOSUM61 5.0 with BLOSUM62 5.0 and RBLOSUM64 5.0. Differences between CorBLOSUM61 5.0 and BLOSUM62 5.0 are displayed in the lower triangle and those between CorBLOSUM61 5.0 and RBLOSUM64 5.0 in the upper triangle, with CorBLOSUM61 5.0 values shown. Light gray tiles represent entries where the CorBLOSUM61 5.0 matrix is one log-odd score point higher than the compared matrix, whereas dark gray represent a one point lower score of CorBLOSUM61 5.0 matrix. Noticeably, the CorBLOSUM correction introduces further changes into the RBLOSUM64 5.0 matrix (upper triangle) which results in numerous value adjustments when compared to the BLOSUM62 5.0 matrix (lower triangle)
Fig. 3Progression of the maximum achieved coverage of CorBLOSUM-, RBLOSUM- and BLOSUM-type matrices for all ASTRAL40 test databases. The upper row shows the results for the respective BLOSUM50 entropy level, the lower row for BLOSUM62 entropy level. Insignificant coverage differences between CorBLOSUM and BLOSUM are indicated by an O and between CorBLOSUM and RBLOSUM by a small an X above the bars. The corresponding gap parameter settings are listed in Additional file 2. Notably, the coverage increases for all tested substitution matrices dramatically with the introduction of the semi-automatic database generation of SCOPe. For the BLOSUM50 entropy level, CorBLOSUM-type matrices performed at least as good as their BLOSUM counterparts in ∼84 % of all tested scenarios and in ∼49 % showed a similar or better performance than the RBLOSUM-type matrices. For the BLOSUM62 entropy level CorBLOSUM matrices showed equally as good or better performance than BLOSUM in ∼67 % while improving performance over RBLOSUM in ∼60 % of all analyzed ASTRAL40 scenarios
Comparison of CorBLOSUM- with BLOSUM-type matrices
| ASTRAL subset | BLOSUM50 | BLOSUM62 | |
|---|---|---|---|
| entropy level | entropy level | ||
| BLOCKS 13+ | ASTRAL20 | 94.12 % | 58.82 % |
| ASTRAL40 | 100 % | 76.47 % | |
| ASTRAL70 | 100 % | 82.35 % | |
| BLOCKS 14.3 | ASTRAL20 | 76.47 % | 76.47 % |
| ASTRAL40 | 76.47 % | 100 % | |
| ASTRAL70 | 88.24 % | 70.59 % | |
| BLOCKS 5 | ASTRAL20 | 70.59 % | 23.53 % |
| ASTRAL40 | 76.47 % | 23.53 % | |
| ASTRAL70 | 100 % | 58.82 % |
Shown in percent is the relative frequency for which a CorBLOSUM matrix performed at least as good as its BLOSUM counterpart