| Literature DB >> 25050396 |
Fan Yang1, Ying-Ying Xu2, Hong-Bin Shen2.
Abstract
Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25050396 PMCID: PMC4094881 DOI: 10.1155/2014/429049
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Illustrations of how to calculate LBP and CLBP features for an 8-neighborhood pixels. (a) The standard LBP operator (see (1)). (b) The framework of CLBP operator (see, (3) and (5)). (c) Example to obtain sign and magnitude pattern component from CLPB (see (1) and (4)).
Figure 2Illustrations tetra pattern based on one of four possible directions of each center pixel. (a) The directions of the center pixel based on (7). (b) Example of calculation of tetra pattern (see (8)-(9)) in the case of the center-pixel direction “1” defined from (7) using the direction of neighbors. Light wheat represents the direction of the center pixel and cyan represents its neighborhood pixels.
Figure 3Example to obtain the tetra and magnitude patterns. For generating tetra pattern, the bit is coded with the direction of neighbor when the direction of the center pixel and its neighbor are different, otherwise “0,” which are represented in red rounded rectangle. Then convert the tetra patterns for each direction to three binary patterns, that is, “Pattern 1” to “Pattern 3” in bottom purple rounded rectangle. For generating the magnitude pattern, which is defined in (11), the bit is coded with “1” when the magnitude of the center pixel is less than its neighbor, otherwise “0,” and the binary magnitude coding is given in bottom purple rounded rectangle.
Comparison SDA results derived from the combination of different local pattern features and global features.
| DB | Fold | 1096 dimensional SLFs_LBP feature fed into SDA | 1040 dimensional SLFs_CLBP feature fed into SDA | 1607 dimensional SLFs_LTrP feature fed into SDA | |||
|---|---|---|---|---|---|---|---|
| db1 | fold 1 | 55 (D1-H30-L24) | 44% | 50 (D1-H25-C24) | 48% | 69 (D2-H21-T46) | 67% |
| fold 2 | 48 (D3-H22-L23) | 48% | 40 (D2-H19-C19) | 48% | 59 (D3-H22-T34) | 58% | |
|
| |||||||
| db2 | fold 1 | 66 (D1-H39-L26) | 39% | 49 (D1-H32-C16) | 33% | 70 (D2-H29-T39) | 56% |
| fold 2 | 49 (D3-H21-L25) | 51% | 42 (D3-H23-C16) | 38% | 52 (D2-H17-T33) | 63% | |
|
| |||||||
| db3 | fold 1 | 58 (D2-H28-L28) | 48% | 50 (D1-H27-C22) | 44% | 66 (D2-H20-T44) | 67% |
| fold 2 | 34 (D3-H13-L18) | 53% | 44 (D3-H23-C18) | 41% | 45 (D2-H12-T31) | 69% | |
|
| |||||||
| db4 | fold 1 | 53 (D1-H28-L24) | 45% | 53 (D2-H27-C24) | 45% | 69 (D22-H3-T44) | 64% |
| fold 2 | 39 (D1-H16-L22) | 56% | 42 (D2-H24-C16) | 38% | 48 (D1-H16-T31) | 65% | |
|
| |||||||
| db5 | fold 1 | 53 (D1-H25-L27) |
| 49 (D2-H27-C20) | 41% | 64 (D1-H24-T39) | 61% |
| fold 2 | 44 (D3-H15-L26) | 59% | 42 (D3-H19-C20) | 48% | 46 (D1-H11-T34) | 74% | |
|
| |||||||
| db6 | fold 1 | 49 (D1-H22-L26) | 53% | 46 (D2-H23-C21) | 46% | 68 (D3-H20-T45) |
|
| fold 2 | 40 (D3-H22-L15) | 38% | 46 (D3-H27-C16) | 35% | 45 (D1-H13-T31) | 66% | |
|
| |||||||
| db7 | fold 1 | 56 (D1-H32-L23) | 41% | 49 (D1-H28-C10) | 20% | 65 (D3-H19-T43) | 66% |
| fold 2 | 39 (D3-H18-L18) | 46% | 41 (D2-H17-C22) | 54% | 42 (D1-H12-T29) | 69% | |
|
| |||||||
| db8 | fold 1 | 53 (D1-H26-L26) | 49% | 44 (D2-H23-C19) | 43% | 61 (D1-H17-T43) | 70% |
| fold 2 | 40 (D2-H14-L24) | 60% | 42 (D2-H21-C19) | 45% | 51 (D1-H13-T37) | 73% | |
|
| |||||||
| db9 | fold 1 | 54 (D1-H29-L24) | 44% | 51 (D3-H31-C17) | 33% | 67 (D3-H23-T41) | 61% |
| fold 2 | 43 (D2-H17-L24) | 56% | 43 (D3-H23-C17) | 40% | 54 (D2-H16-T36) | 67% | |
|
| |||||||
| db10 | fold 1 | 55 (D1-H24-L30) | 55% | 45 (D1-H25-C19) |
| 65 (D2-H23-T40) | 62% |
| fold 2 | 46 (D2-H17-L27) | 59% | 47 (D3-H19-C24) | 51% | 52 (D2-H15-T35) | 67% | |
D, H, L, C, and T denote DNA-protein overlap feature, Haralick feature, LBP, CLBP, and LTrP, respectively. DB denotes 10 different lengths of vanishing moment of Daubechies wavelet. All percentages denote the proportion of local pattern features in the whole feature set, and boldface type denotes the value closest to the mean of that column.
An illustration of features rank obtained by SDA for the combination of each of three local pattern features and global features in db6 of training set of each 2-fold.
| Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SLFs | f1 | H | D | H | H | H | H | H | H | H | H | H | H | H | H | H |
| f2 | D | H | H | H | H | H | H | H | H | H | H | H | H | H | D | |
|
| ||||||||||||||||
| SLFs_LBP | f1 | H | D | H | H | L | L | L | L | H | L | H | L | L | L | L |
| f2 | D | H | H | H | L | L | L | H | H | H | H | H | L | L | H | |
|
| ||||||||||||||||
| SLFs_LTrP | f1 | H | T | D | T | H | H | T | T | T | T | T | T | H | H | T |
| f2 | D | H | T | H | H | T | T | T | T | T | T | T | T | H | H | |
|
| ||||||||||||||||
| SLFs_CLBP | f1 | C | H | C | D | C | H | H | C | C | C | C | C | H | H | H |
| f2 | D | C | H | H | H | H | H | C | C | H | C | H | D | C | H | |
f1 and f2 denote each fold of 2-fold cross-validation. H denotes Haralick feature, D denotes DNA-protein overlap feature, L denotes LBP feature, T denotes LTrP feature, and C denotes CLPB feature.
Comparison of subset accuracies with different local pattern features.
| Feature combination | Subset accuracy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| db1 | db2 | db3 | db4 | db5 | db6 | db7 | db8 | db9 | db10 | |
| SLFs | 0.3715 |
| 0.3862 | 0.3864 | 0.3719 | 0.3825 | 0.3678 | 0.3749 | 0.3832 | 0.3699 |
|
| ||||||||||
| SLFs_LBP | 0.4170 |
| 0.4134 | 0.4194 | 0.4111 | 0.4140 | 0.4065 | 0.4088 | 0.4136 | 0.4210 |
|
| ||||||||||
| SLFs_CLPB | 0.4463 | 0.4362 | 0.4394 | 0.4481 | 0.4334 |
| 0.4378 | 0.4398 | 0.4513 | 0.4449 |
|
| ||||||||||
| SLFs_LTrP | 0.4368 | 0.4399 | 0.4283 | 0.4385 |
| 0.4405 | 0.4285 | 0.4323 | 0.4354 | 0.4339 |
Boldface type denotes the maximum value of each row, which corresponds to the best average subset accuracy of two fold in each db.
Figure 4The comprehensive evaluation of entire dataset (348 proteins) based on db6 global features and local pattern features.
Five evaluation index comparisons based on single-label and entire dataset by using BR model fed into different combinations of local and db6 global features.
| Evaluation index | Single label samples (258 proteins) | Entire dataset (348 proteins) | ||||||
|---|---|---|---|---|---|---|---|---|
| SLFs | SLFs_LBP | SLFs_CLBP | SLFs_LTrP | SLFs | SLFs_LBP | SLFs_CLBP | SLFs_LTrP | |
| Subset accuracy | 0.5058 | 0.5575 | 0.5872 | 0.5745 | 0.3825 | 0.4140 | 0.4555 | 0.4405 |
| Accuracy | 0.5141 | 0.5713 | 0.6153 | 0.5975 | 0.4586 | 0.4956 | 0.5451 | 0.5296 |
| Recall | 0.3855 | 0.4159 | 0.4640 | 0.4198 | 0.3528 | 0.3831 | 0.4240 | 0.4141 |
| precision | 0.3697 | 0.3944 | 0.4706 | 0.4610 | 0.3555 | 0.3802 | 0.4233 | 0.4552 |
| Label accuracy | 0.8353 | 0.8492 | 0.8508 | 0.8452 | 0.8020 | 0.8124 | 0.8360 | 0.8302 |
258 proteins correspond to single-label samples in our benchmark dataset, and 348 proteins denote entire dataset involved with single-label and multilabel samples.
All columns correspond to average of 2-fold on db6. SLFs denote global features involved with Haralick features and DNA-protein overlap features. Label accuracy denotes the average prediction accuracy of six labels.
Summarization of local pattern features focusing on bioimage informatics studies.
| Number | Name of local pattern features | Brief description | Types |
|---|---|---|---|
| 1 |
| A pioneer of local structural model quantization and the concatenate histogram statistics. | Type 1: focus on the gradient changes of the center pixel in specified directions. |
| 2 | Local ternary pattern (LTP) | Using a ternary arithmetic coding given a threshold based on LBP. | |
| 3 | Local quinary pattern (LQP) | Two thresholds enhancement based on LTP. | |
| 4 |
| Enhance by taking magnitude and center pixel level information into account based on Type 1. | |
| 5 | Local ternary cooccurrence pattern (LTCoP) | Encodes the cooccurrence of similar ternary edges calculated between the center pixel and its neighbors based on LTP; belongs to rotational invariant feature. | |
|
| |||
| 6 | Local derivative pattern (LDP) | High-order local pattern descriptor and encodes directional pattern features based on local derivative variations; belongs to a specific direction rotational variant feature. | Type 2: focus on transformation consistency statistics of directional derivative in specified directions between the center pixel and its neighbors. |
| 7 |
| Encodes the relationship of transformation consistency between the center pixel and its neighbors based on vertical and horizontal directions. | |
Italic type denotes the local pattern features investigated in this study.