| Literature DB >> 27142862 |
Lu Cao1,2, Marjo de Graauw3, Kuan Yan1, Leah Winkel4, Fons J Verbeek5.
Abstract
BACKGROUND: Endocytosis is regarded as a mechanism of attenuating the epidermal growth factor receptor (EGFR) signaling and of receptor degradation. There is increasing evidence becoming available showing that breast cancer progression is associated with a defect in EGFR endocytosis. In order to find related Ribonucleic acid (RNA) regulators in this process, high-throughput imaging with fluorescent markers is used to visualize the complex EGFR endocytosis process. Subsequently a dedicated automatic image and data analysis system is developed and applied to extract the phenotype measurement and distinguish different developmental episodes from a huge amount of images acquired through high-throughput imaging. For the image analysis, a phenotype measurement quantifies the important image information into distinct features or measurements. Therefore, the manner in which prominent measurements are chosen to represent the dynamics of the EGFR process becomes a crucial step for the identification of the phenotype. In the subsequent data analysis, classification is used to categorize each observation by making use of all prominent measurements obtained from image analysis. Therefore, a better construction for a classification strategy will support to raise the performance level in our image and data analysis system.Entities:
Keywords: EGFR endocytosis; Hierarchical classification; High throughput; Image analysis; Phenotype measurement; Wavelet-based texture measurement
Mesh:
Substances:
Year: 2016 PMID: 27142862 PMCID: PMC4855371 DOI: 10.1186/s12859-016-1053-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Sample images of the 3 phenotypic groups. Red channel is a P-ERK expression staining (Cy3); green channel is an EGFR expression staining (Alexa-488); blue channel is a nuclear staining (Hoechst #33258)
Description of phenotype texture measurements
| Feature name | Expression | Description |
|---|---|---|
| std |
| The standard deviation of |
| intensity from all the pixels | ||
| in a region. | ||
| Smoothness |
| The relative smoothness of the |
| intensity in a region. It is 0 for a | ||
| region of constant intensity and | ||
| 1 for a region with large excursion | ||
| in the values of its intensity levels. | ||
| Skewness |
| The order moment about the |
| mean. The departure from | ||
| symmetry about the mean | ||
| intensity. It is 0 for symmetric | ||
| histograms, positive for | ||
| histograms skewed to the right | ||
| and negative for histograms | ||
| skewed to the left. | ||
| Uniformity |
| The sum of squared elements in |
| Histogram. It reaches maximum | ||
| when all intensity levels are equal | ||
| and decreases from there. | ||
| Entropy |
| The statistical measure of |
| randomness. |
i represents the intensity value
H(i) is the histogram of intensity
mean symbolizes the average intensity
Description of Wavelet texture measurements
| Feature name | Description |
|---|---|
| H_mean | The average intensity of Horizontal detail from discrete |
| wavelet transformation. | |
| H_std | The intensity variation of Horizontal detail from discrete |
| wavelet transformation. | |
| H_Entropy | The statistical randomness of Horizontal detail from |
| discrete wavelet transformation. | |
| V_mean | The average intensity of Vertical detail from discrete |
| wavelet transformation. | |
| V_std | The intensity variation of Vertical detail from discrete |
| wavelet transformation. | |
| V_Entropy | The statistical randomness of Vertical detail from discrete |
| wavelet transformation. | |
| D_mean | The average intensity of Diagonal detail from discrete |
| wavelet transformation. | |
| D_std | The intensity variation of Diagonal detail from discrete |
| wavelet transformation. | |
| D_entropy | The statistical randomness of Diagonal detail from |
| discrete wavelet transformation. |
Fig. 2Ground truth data production
Fig. 3Hierarchical tree of EGFR endocytosis process
Fig. 4Hierarchical classification workflow
Prior probability comparison
| C-variance (branch & bound) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Equal prior | knnc | ldc | qdc | neurc | svc | |||||
| mean | std | mean | std | mean | std | mean | std | mean | std | |
| 1st step | 0.0333 | 0 | 0.095 | 0.0224 | 0.0317 | 0.0075 | 0.0633 | 0.0149 | 0.0333 | 0 |
| 2nd step | 0.0575 | 0.0335 | 0.0575 | 0.0335 | 0.15 | 0 | 0.055 | 0.0224 | 0.1025 | 0.0112 |
| No prior | knnc | ldc | qdc | neurc | svc | |||||
| mean | std | mean | std | mean | std | mean | std | mean | std | |
| 1st step | 0.0181 | 0.0014 | 0.0365 | 0.0057 | 0.0221 | 0.0037 | 0.01 | 0.0031 | 0.0142 | 0.001 |
| 2nd step | 0.0292 | 0 | 0.0357 | 0.0088 | 0.043 | 0.0121 | 0.0348 | 0.0065 | 0.0402 | 0.0069 |
| With prior | knnc | ldc | qdc | neurc | svc | |||||
| mean | std | mean | std | mean | std | mean | std | mean | std | |
| 1st step | 0.0053 | 0.0011 | 0.0061 | 0.0038 | 0.0059 | 0.0028 | 0.0061 | 0.0029 | 0.0061 | 0.0038 |
| 2nd step | 0.0214 | 0.0031 | 0.0321 | 0.0011 | 0.056 | 0.0019 | 0.0231 | 0.0048 | 0.0214 | 0.002 |
Fig. 5First classifier training (membrane-episode VS super class of cluster-episode and vesicle-episode). a Branch & bound feature selection method with standard variance normalization (b) Branch & bound feature selection method with within-class variance normalization (c) Individual feature selection method with standard variance variance normalization (d) Individual feature selection method with within-class variance normalization
Fig. 6Second classifier training (cluster-episode VS vesicle-episode). a Branch & bound feature selection method with standard variance normalization, (b) branch & bound feature selection method with within-class variance normalization, (c) Individual feature selection method with standard variance variance normalization, (d) Individual feature selection method with within-class variance normalization
Fig. 7Scatter plot of training data with LDC and QDC. This plot shows the better performance of LDC over QDC for our dataset with two selected features
Weighted error comparison
| 1st step | knnc | ldc | qdc | neurc | svc | |||||
| C-V | mean | std | mean | std | mean | std | mean | std | mean | std |
| B&B | 0.0053 | 0.0011 | 0.0061 | 0.0038 | 0.0059 | 0.0028 | 0.0061 | 0.0029 | 0.0061 | 0.0038 |
| IND | 0.0058 | 0.0023 | 0.0061 | 0.0038 | 0.0059 | 0.0028 | 0.0055 | 0 | 0.0057 | 0.0014 |
| 2nd step | knnc | ldc | qdc | neurc | svc | |||||
| C-V | mean | std | mean | std | mean | std | mean | std | mean | std |
| B&B | 0.0214 | 0.0031 | 0.0321 | 0.0011 | 0.056 | 0.0019 | 0.0231 | 0.0048 | 0.0214 | 0.002 |
| IND | 0.0261 | 0 | 0.0267 | 0 | 0.0607 | 0 | 0.0214 | 0.0021 | 0.0261 | 0 |
| 1st step | knnc | ldc | qdc | neurc | svc | |||||
| VAR | mean | std | mean | std | mean | std | mean | std | mean | std |
| B&B | 0.0053 | 0.0011 | 0.0061 | 0.0038 | 0.0059 | 0.0028 | 0.0055 | 0.0018 | 0.0061 | 0.0038 |
| IND | 0.0058 | 0.0023 | 0.0061 | 0.0038 | 0.0059 | 0.0028 | 0.0056 | 0 | 0.0058 | 0.0019 |
| 2nd step | knnc | ldc | qdc | neurc | svc | |||||
| VAR | mean | std | mean | std | mean | std | mean | std | mean | std |
| B&B | 0.0237 | 0.0044 | 0.0319 | 0.0015 | 0.0563 | 0.0027 | 0.0236 | 0.0069 | 0.0218 | 0.0028 |
| IND | 0.0263 | 0 | 0.0272 | 0.0029 | 0.0598 | 0.004 | 0.0218 | 0.004 | 0.0265 | 0.001 |
C-V represents c-variance
B&B represents branch & bound
IND represents individual
VAR represents variance
Feature selection performance
| Features | Step 1 | Features | Step 2 |
|---|---|---|---|
| Long axis | 100 | Closest FA dist | 100 |
| Int Std | 100 | Int Entropy | 100 |
| D_entropy | 87 | Area | 96 |
| H_entropy | 7 | Int Std | 73 |
| V_entropy | 5 | Compact factor | 44 |
| Smoothness | 1 | Int uniformity | 34 |
| Area | 0 | Smoothness | 14 |
| Perimeter | 0 | H_entropy | 8 |
| Extension | 0 | Border dist/nucleus dist | 7 |
| Dispersion | 0 | Perimeter | 6 |
| Elongation | 0 | Long Axis | 6 |
| Orientation | 0 | Short Axis | 5 |
| Compact factor | 0 | D_std | 5 |
| Border dist/nucleus dist | 0 | Skewness | 2 |
| Closest FA dist | 0 | Extension | 0 |
| Short axis | 0 | Dispersion | 0 |
| Skewness | 0 | Elongation | 0 |
| Int uniformity | 0 | Orientation | 0 |
| Int entropy | 0 | H_mean | 0 |
| H_mean | 0 | H_std | 0 |
| H_std | 0 | V_mean | 0 |
| V_mean | 0 | V_std | 0 |
| V_std | 0 | V_entropy | 0 |
| D_mean | 0 | D_mean | 0 |
| D_std | 0 | D_entropy | 0 |
Fig. 8Step 1 scatter plot. For step 1, the KNNC classifier was chosen. In (a), (b) and (c), the performance of the KNNC classifier for the three major features is plotted
Fig. 9Step 2 Scatter Plot. For step 2, the SVC classifier was chosen. In (a), (b) and (c), the performance of the SVC classifier for the three selected features is plotted
Feature selection performance
| F-25 | F-sel | H-25 | H-sel | |
|---|---|---|---|---|
| hF | 0.9603 | 0.9745 | 0.9839 | 0.9889 |
|
| 0.0094 | 0.0103 | 0.0066 | 0.0065 |
hF: F-measure for hierarchical classification
F-25: flat classification with total 25 features
F-sel: flat classification with selected features from branch & bound feature selection method
H-25: hierarchical classification with total 25 features
H-sel: hierarchical classification with selected features
Kolmogorov-Smirnov test result
| F-25 vs F-sel | F-25 vs H-25 | F-25 vs H-sel | F-sel vs H-25 | F-sel vs H-sel | H-25 vs H-sel | |
|---|---|---|---|---|---|---|
|
| 1 | 1 | 1 | 1 | 1 | 1 |
|
| 3.3e-17 | 1.3e-38 | 3.9e-41 | 4.4e-15 | 6.7e-22 | 5.2e-08 |
On the basis of hF, all test results render to be significantly different i.e. H value is 1
Confusion matrices for all strategies
| [1] | [2] | [3] | [4] | |||
|---|---|---|---|---|---|---|
| F-25 | Prediction | Number of test objects | Sensitivity | |||
| Membrane | Vesicle | Cluster | ||||
| Truth | Membrane | 34.96 | 2.04 | 0 | 37 | 0.945 |
| Vesicle | 1.81 | 166.01 | 0.18 | 168 | 0.988 | |
| Cluster | 0.12 | 4.39 | 5.49 | 10 | 0.549 | |
| F-sel | Prediction | Number of test objects | Sensitivity | |||
| Membrane | Vesicle | Cluster | ||||
| Truth | Membrane | 35.67 | 1.33 | 0 | 37 | 0.964 |
| Vesicle | 0.58 | 166.79 | 0.63 | 168 | 0.993 | |
| Cluster | 0.17 | 2.77 | 7.06 | 10 | 0.706 | |
| H-25 | Prediction | Number of test objects | Sensitivity | |||
| Membrane | Vesicle | Cluster | ||||
| Truth | Membrane | 34 | 3 | 0 | 37 | 0.919 |
| Vesicle | 0.11 | 166.83 | 1.06 | 168 | 0.993 | |
| Cluster | 0.07 | 4.51 | 5.42 | 10 | 0.542 | |
| H-sel | Prediction | Number of test objects | Sensitivity | |||
| Membrane | Vesicle | Cluster | ||||
| Truth | Membrane | 35.53 | 1.47 | 0 | 37 | 0.960 |
| Vesicle | 0.75 | 166.86 | 0.39 | 168 | 0.993 | |
| Cluster | 0 | 2.58 | 7.42 | 10 | 0.742 | |
Column 1 represents the strategy and class labels
Column 2 represents the prediction
In column 3, the number of test objects represents the ground truth
In column 4, the sensitivity is given per class
Fig. 10EGFR endocytosis regulator identification results. a Pixels of plasma-membrane, (b) Number of clusters, (c) Number of vesicles
Fig. 11Dynamic stages of EGFR endocytosis. a Number of EGFR localized at Plasma-membrane (pixels/nucleus), (b) Number of EGFR as vesicle in early endosome (number/nucleus)