| Literature DB >> 29784028 |
Renganayaki Govindarajan1, Biji Christopher Leela2, Achuthsankar S Nair2.
Abstract
OBJECTIVE: BLOSUM matrices serve as standard matrices for many protein sequence alignment programs. BLOSUM matrices have been constructed using BLOCKS version5.0 with 27,102 BLOCKS, whereas the latest updated version14.3 has 6,739,916 BLOCKS. We read with interest the research article by Hess et al. (BMC Bioinform 17:189, 2016) on CorBLOSUM, wherein it is argued that an inaccuracy in the BLOSUM code affects the cluster memberships of sequences. They show that replacing the integer based clustering threshold to floating point arguably improves the performances of CorBLOSUM over BLOSUM and RBLOSUM matrices. They compare BLOSUM6214.3 against RBLOSUM69, with relative entropies of 0.2685 and 0.2662 respectively. The present work attempts to repeat the computation to verify the respective analog matrices.Entities:
Keywords: BLOSUM; CorBLOSUM; RBLOSUM; Sequence similarity search; Substitution matrix
Mesh:
Year: 2018 PMID: 29784028 PMCID: PMC5963171 DOI: 10.1186/s13104-018-3415-5
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Matrices with respective entropy values (i) reported by Hess et al., (ii) present study
| Matrix | (i) Hess et al. | (ii) Present study | ||
|---|---|---|---|---|
| Entropy | Bit units | Entropy | Bit units | |
| BLOSUM505.0 | 0.4808 | 1/3 | 0.4808 | 1/3 |
| RBLOSUM525.0 | 0.4918 | 1/3 | 0.4918 | 1/3 |
| BLOSUM62 5.0 | 0.6979 | 1/2 | 0.6979 | 1/2 |
| RBLOSUM645.0 | 0.7003 | 1/2 | 0.7003 | 1/2 |
| BLOSUM5013+ | 0.2430 | 1/4 | 0.1922 | 1/5 |
| RBLOSUM5913+ | 0.2410 | 1/4 | 0.2411 | 1/4 |
| BLOSUM6213+ | 0.3672 | 1/3 | 0.3173 | 1/3 |
| RBLOSUM6913+ | 0.3601 | 1/3 | 0.3601 | 1/3 |
| BLOSUM5014.3 | 0.1509 | 1/5 | 0.1198 | 1/6 |
| RBLOSUM5914.3 | 0.1477 | 1/5 | 0.1537 | 1/5 |
| BLOSUM6214.3 | 0.2685 | 1/4 | 0.2360 | 1/4 |
| RBLOSUM6914.3 | 0.2662 | 1/4 | 0.2773 | 1/4 |
Fig. 1CVE plot showing the performance difference between the matrices for entropy level 50 and 62 using PSCE under linear normalization. a Performance difference between three matrix families for 50 entropy level using PSCE under linear normalization. b Performance difference between three matrix families for 62 entropy level using PSCE under linear normalization
Matrix family with coverage under quadratic normalization for the gap opening and extension penalty of 12 and 1 respectively
| Matrix family | Matrix number | Coverage |
|---|---|---|
| BLOSUM | 50 | 0.370823 |
| RBLOSUM | 56 | 0.418540 |
| CorBLOSUM | 57 | 0.413656 |
| BLOSUM | 62 | 0.436900 |
| RBLOSUM | 66 | 0.451182 |
| CorBLOSUM | 67 | 0.438116 |