| Literature DB >> 28465724 |
Madhav Sigdel1, Imren Dinc2, Madhu S Sigdel1, Semih Dinc1, Marc L Pusey3, Ramazan S Aygun1.
Abstract
BACKGROUND: Large number of features are extracted from protein crystallization trial images to improve the accuracy of classifiers for predicting the presence of crystals or phases of the crystallization process. The excessive number of features and computationally intensive image processing methods to extract these features make utilization of automated classification tools on stand-alone computing systems inconvenient due to the required time to complete the classification tasks. Combinations of image feature sets, feature reduction and classification techniques for crystallization images benefiting from trace fluorescence labeling are investigated.Entities:
Keywords: Feature analysis; Image classification; Protein crystallization; Trace-fluorescent labeling
Year: 2017 PMID: 28465724 PMCID: PMC5408444 DOI: 10.1186/s13040-017-0133-9
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Sample protein crystallization trial images a-c) non-crystals, d-f) likely-leads, and g-i) crystals. Reprinted with permission from [28]. Copyright 2013 American Chemical Society
Summary of related work
| Research paper | Image categories | Feature extraction | Classification method | Classification accuracy |
|---|---|---|---|---|
| Zuk and Ward (1991) [ | NA | Edge features | Detection of lines using Hough transform and line tracking | Not provided |
| Walker et al. (2007) [ | 7 | Radial and angular descriptors from Fourier Transform | Learning vector quantization | 14 - 97% for different categories |
| Xu et al. (2006) [ | 2 | Features from multiscale Laplacian pyramid filters | Neural network | 95% accuracy |
| Wilson (2002) [ | 3 | Intensity and geometric features | Naive Bayes | Recall 86% for crystals, 77% for unfavourable objects |
| Hung et al. (2014) [ | 3 | Shape context, Gabor filters and Fourier transforms | Cascade classifier on naive Bayes and random forest | 74% accuracy |
| Spraggon et al. (2002) [ | 6 | Geometric and texture features | Self-organizing neural networks | 47 to 82% for different categories |
| Cumba et al. (2003) [ | 2 | Radon transform line features and texture features | Linear discriminant analysis | 85% accuracy with roc 0.84 |
| Saitoh et al. (2004) [ | 5 | Geometric and texture features | Linear discriminant analysis | 80 - 98% for different categories |
| Bern et al. (2004) [ | 5 | Gradient and geometric features | Decision tree with hand crafted thresholds | 12% FN and 14% FP |
| Cumba et al. (2005) [ | 2 | Texture features, line measures and energy measures | Association rule mining | 85% accuracy with ROC 0.87 |
| Zhu et al. (2004) [ | 2 | Geometric and texture features | Decision tree with boosting | 14.6% FP and 9.6% FN |
| Berry et al. (2006) [ | 2 | NA | Learning vector quantization, self organizing maps and bayesian algorithm | NA |
| Pan et al. (2006) [ | 2 | Intensity stats, texture features, Gabor wavelet decomposition | Support vector machine | 2.94% FN and 37.68% FP |
| Yang et al. (2006) [ | 3 | Hough transform, DFT, GLCM features | Hand tuned thresholds | 85% accuracy |
| Saitoh et al. (2006) [ | 5 | Texture features, differential image features | Decision tree and SVM | 90% for 3-class problem |
| Po and Laine (2008) [ | 2 | Multiscale Laplacian pyramid filters and histogram analysis | Genetic algorithm and neural network | Accuracy: 93.5% with 88% TP and 99% TN |
| Liu et al. (2008) [ | Crystal likelihood | Features from Gabor filters, integral histograms, and gradient images | Decision tree with boosting | ROC 0.92 |
| Cumba et al. (2010) [ | 3 and 6 | Basic stats, energy, Euler numbers, Radon-Laplacian, Sobel-edge, GLCM | Multiple random forest with bagging and feature subsampling | Recall 80% crystals, 89% precipitate, 98% clear drops |
| Sigdel et al. (2013) [ | 3 | Intensity and blob features | Multilayer perception neural network | 1.2% crystal misses with 88% accuracy |
| Sigdel et al. (2014) [ | 3 | Intensity and blob features | Semi-supervised | 75% - 85% overall accuracy |
| Dinc et al. (2014) [ | 3 and 2 | Intensity and blob features | 5 classifiers, feature reduction using PCA | 96% on non-crystals, 95% on likely-leads |
| Yann et al. (2016) [ | 10 | Deep learnining on grayscale image | Deep CNN with 13 layers | 90.8% accuracy |
Fig. 2Image hierarchy
Fig. 3Sample images in non-crystal category a-b) Clear drops, c) Phase separation, d-e) Precipitates
Fig. 4Sample images in likely-lead category a-c) Microcrystals d-e) Unclear bright images
Fig. 5Sample images in crystal category a) Dendrites/Spherulites b) Needles c) 2D plates d) Small 3D crystals e) Large 3D crystals
Dataset image distribution
| Category | Total images | Sub-category | No. of images | Percentage |
|---|---|---|---|---|
| Non-crystals | 1600 | Clear drop | 1273 | 46.19% |
| Phase separation | 1 | 0.04% | ||
| Precipitate | 204 | 7.4% | ||
| Doubtful | 122 | 4.43% | ||
| Likely-leads | 675 | Micro-crystals | 122 | 4.43% |
| Unclear bright images | 369 | 13.39% | ||
| Doubtful | 184 | 6.68% | ||
| Crystals | 481 | Dendrites/Spherulites | 63 | 2.29% |
| Needles | 153 | 5.55% | ||
| 2D Plates | 8 | 0.29% | ||
| Small 3D crystals | 129 | 4.68% | ||
| Large 3D crystals | 35 | 1.27% | ||
| Doubtful | 93 | 3.37% | ||
| Total | 2756 |
Computation time for feature extraction
| Feature group | Description | No of features | Avg time per feature | Avg time per image |
|---|---|---|---|---|
| Intensity | Intensity features | 6 | 0.009 | 0.052 |
| Region Otsu | Region features using Otsu | 52 | 0.005 | 0.258 |
| Region | Region features using | 52 | 0.010 | 0.495 |
| Region | Region features using | 52 | 0.004 | 0.193 |
| Region Morph | Region features using morph thresh | 52 | 0.006 | 0.311 |
| Graph | Hough features and edge features | 13 | 0.022 | 0.284 |
| Hough | Hough features only | 2 | 0.049 | 0.097 |
| Texture | Texture features | 46 | 0.001 | 0.037 |
| Histogram | Histogram features | 21 | 0.009 | 0.178 |
| DCT | DCT features | 15 | 1.709 | 25.639 |
List of classification experiments
| Exp ID | Tasks | No. of experiments 3 |
|---|---|---|
| 1 | Run all classifiers for 511 feature set (5 classifiers with/without normalization) | 2 * 5 * 511* 1 = 5110 |
| 2 | Run the best classifier 5 times and take the average for the best 70-feature set (RF) | 1 * 1 * 64 * 5 = 320 |
| 3 | Run classifiers PCA for 10,20,..,50 features | 1 * 5 * 5 * 2= 50 |
| 4 | Run classifiers using RF feature selection (10,20,...,50) | 1 * 5 * 5* 2 = 50 |
| 5 | Run BYS, DT and RF (with and without normalization, with graph features) for crystal sub categories | 2 * 3 * 511 * 1 = 3066 |
| 6 | Run RF, DT and BYS classifiers with and without normalization for likely-lead subcategories | 2 * 3 * 1 * 1 = 6 |
| 7 | Run RF, DT and BYS classifiers with and without normalization for non-crystal subcategories | 2 * 3 * 1 * 1 = 6 |
| 8 | Calculate training and testing time of the random forest for the largest feature | 1 * 1 * 1 * 5 = 5 |
| 9 | Calculate timings for feature extraction of an image | 1 * 1 * 11 *1 = 11 |
| Total number of experiments | 8624 |
Classification results for preliminary experiment using random forest classifier (Experiment ID 1)
| Feature set | Norm. | Acc | Pacc | Sensitivity | Adjusted sensitivity |
|---|---|---|---|---|---|
| Intensity, Region Otsu, Region | Yes | 0.963 | 0.942 | 0.867 | 1 |
| Intensity, Region Otsu, Region | No | 0.963 | 0.942 | 0.871 | 1 |
| Intensity, Region Otsu, Region | Yes | 0.963 | 0.941 | 0.863 | 1 |
| Intensity, Region Otsu, Region | No | 0.962 | 0.94 | 0.881 | 1 |
| Intensity, Region Otsu, Region | No | 0.962 | 0.94 | 0.867 | 1 |
| Intensity, Region Otsu, Region | Yes | 0.962 | 0.939 | 0.865 | 1 |
| Intensity, Region Otsu, Region | Yes | 0.962 | 0.939 | 0.871 | 1 |
| Intensity, Region Otsu, Region | No | 0.962 | 0.939 | 0.869 | 1 |
| Intensity, Region | Yes | 0.962 | 0.938 | 0.861 | 1 |
| Intensity, Region Otsu, Region | No | 0.962 | 0.938 | 0.861 | 1 |
Classification results for the best 8 of 64 experiments using random forest classifier
| Feature Set | Norm. | Acc | Pacc | Sensitivity | Adjusted sensitivity | Time per image (sec) |
|---|---|---|---|---|---|---|
| Intensity, Region Otsu, Region | No | 0.961 | 0.938 | 0.87 | 1 | 1.08 |
| Region | ||||||
| Intensity, Region Otsu, Region | No | 0.96 | 0.935 | 0.857 | 1 | 1.31 |
| Region | ||||||
| Region Otsu, Region | Yes | 0.959 | 0.935 | 0.861 | 1 | 1.028 |
| Region Otsu, Region | No | 0.959 | 0.934 | 0.852 | 1 | 26.668 |
| Region Otsu, Region | Yes | 0.959 | 0.934 | 0.858 | 1 | 0.77 |
| Region Otsu, Region | No | 0.959 | 0.934 | 0.859 | 1 | 1.028 |
| Intensity, Region Otsu, Region | No | 0.958 | 0.934 | 0.854 | 1 | 26.756 |
| Histogram, DCT | ||||||
| Region Otsu, Region | No | 0.957 | 0.931 | 0.853 | 1 | 26.409 |
Classification results after feature reduction by PCA
| Classifier | # Features | Norm | Acc | Pacc | Sensitivity | Adjusted sensitivity |
|---|---|---|---|---|---|---|
| RF | 30 | Yes | 0.934 | 0.901 | 0.740 | 0.954 |
| RF | 20 | Yes | 0.934 | 0.905 | 0.744 | 0.944 |
| RF | 40 | Yes | 0.931 | 0.897 | 0.728 | 0.948 |
| RF | 50 | Yes | 0.930 | 0.896 | 0.719 | 0.950 |
| RF | 50 | No | 0.928 | 0.893 | 0.715 | 0.940 |
| SVM | 50 | Yes | 0.918 | 0.870 | 0.761 | 0.990 |
| SVM | 40 | Yes | 0.916 | 0.869 | 0.763 | 0.983 |
| SVM | 30 | Yes | 0.910 | 0.858 | 0.726 | 0.985 |
| RF | 40 | No | 0.909 | 0.880 | 0.688 | 0.861 |
| SVM | 50 | No | 0.909 | 0.858 | 0.765 | 0.983 |
Fig. 6Principal component variances of the best 50 features
Classification results after feature selection by Random Forest
| Classifier | # Features | Norm | Acc | Pacc | Sensitivity | Adjusted sensitivity |
|---|---|---|---|---|---|---|
| RF | 30 | Yes | 0.960 | 0.936 | 0.863 | 0.998 |
| RF | 40 | No | 0.958 | 0.933 | 0.852 | 0.994 |
| RF | 50 | Yes | 0.957 | 0.932 | 0.859 | 0.996 |
| RF | 50 | No | 0.956 | 0.930 | 0.859 | 0.996 |
| RF | 30 | No | 0.954 | 0.926 | 0.834 | 0.994 |
| RF | 30 | Yes | 0.952 | 0.925 | 0.817 | 0.994 |
| RF | 20 | No | 0.950 | 0.920 | 0.832 | 0.992 |
| RF | 20 | Yes | 0.946 | 0.915 | 0.817 | 0.996 |
| SVM | 30 | Yes | 0.938 | 0.901 | 0.854 | 0.996 |
| SVM | 50 | Yes | 0.934 | 0.895 | 0.844 | 0.996 |
Classification performance with individual feature sets
| Feature Set | Classifier | Norm | Acc | Pacc | Sensitivity | Adjusted sensitivity |
|---|---|---|---|---|---|---|
| Intensity | ID3 | No | 0.877 | 0.836 | 0.701 | 0.950 |
| Region Otsu | BYS | Yes | 0.751 | 0.702 | 0.622 | 0.915 |
| Region | SVM | Yes | 0.864 | 0.818 | 0.676 | 0.944 |
| Region | SVM | Yes | 0.882 | 0.838 | 0.723 | 0.944 |
| Region Morph | BYS | Yes | 0.738 | 0.717 | 0.580 | 0.994 |
| Hough | SVM | Yes | 0.841 | 0.737 | 0.235 | 0.906 |
| Texture | ID3 | Yes | 0.822 | 0.778 | 0.605 | 0.877 |
| DCT | BYS | Yes | 0.691 | 0.647 | 0.480 | 0.775 |
| Histogram | SVM | Yes | 0.908 | 0.852 | 0.705 | 0.996 |
Non-crystal sub-classification
| Classifier | Normalization | Accuracy | Pacc | Sensitivity |
|---|---|---|---|---|
| Naïve Bayes | No | 0.88 | 0.71 | 0.59 |
| Naïve Bayes | Yes | 0.88 | 0.72 | 0.68 |
| Decision Tree | No | 0.96 | 0.79 | 0.85 |
| Decision Tree | Yes | 0.96 | 0.79 | 0.85 |
| Random Forest | No | 0.98 | 0.81 | 0.91 |
| Random Forest | Yes | 0.98 | 0.81 | 0.91 |
Likely-lead sub-classification
| Classifier | Normalization | Accuracy | Pacc | Sensitivity |
|---|---|---|---|---|
| Naïve Bayes | No | 0.59 | 0.62 | 0.86 |
| Naïve Bayes | Yes | 0.58 | 0.63 | 0.93 |
| Decision Tree | No | 0.87 | 0.85 | 0.74 |
| Decision Tree | Yes | 0.88 | 0.86 | 0.76 |
| Random Forest | No | 0.92 | 0.91 | 0.80 |
| Random Forest | Yes | 0.91 | 0.89 | 0.78 |
Crystal sub-classification
| Feature set | Norm | Accuracy | Pacc | Sensitivity | Time (s) |
|---|---|---|---|---|---|
| Intensity, Region Otsu, Region | Yes | 0.742 | 0.667 | 0.909 | 1.267 |
| Region Otsu, Region | Yes | 0.74 | 0.684 | 0.896 | 0.949 |
| Region Otsu, Region | Yes | 0.737 | 0.658 | 0.896 | 1.408 |
| Region | No | 0.735 | 0.659 | 0.902 | 0.779 |
| Intensity, R_ | No | 0.735 | 0.667 | 0.896 | 1.201 |
| Intensity, R_Otsu, R_ | No | 0.735 | 0.657 | 0.89 | 1.267 |
| Intensity, Region Otsu, Region | No | 0.735 | 0.682 | 0.878 | 0.964 |
Confusion matrix of hierarchical classification (FL: the first level, SL: the second level)
| SL=True | SL=False | |
|---|---|---|
| FL=True | 2103 | 147 |
| FL=False | 84 | 23 |
Confusion matrix for the first level
| Actual | ||||
|---|---|---|---|---|
| Class | 0 | 1 | 2 | |
| Prediction | 0 | 1474 | 1 | 1 |
| 1 | 2 | 461 | 73 | |
| 2 | 2 | 29 | 314 | |
Confusion matrix for non-crystal classification (*: first level misclassification)
| Actual | ||||
|---|---|---|---|---|
| Non Crystals | Clear drop | Phase separation | Precipitate | |
| *Prediction | Clear drop | 1265 | 0 | 20 |
| Phase separation | 0 | 0 | 0 | |
| Precipitate | 8 | 0 | 181 | |
| * | 0 | 1 | 3 | |
Confusion matrix for likely leads classification (*: first level misclassification)
| Actual | |||
|---|---|---|---|
| Likely Leads | Micro-crystals | Unclear bright images | |
| *Prediction | Micro-crystals | 97 | 14 |
| Unclear bright images | 16 | 334 | |
| * | 9 | 21 | |
Confusion matrix for crystal classification (*: first level misclassification)
| Actual | ||||||
|---|---|---|---|---|---|---|
| Crystals | Dendrites/Spherulites | Needles | 2D plates | Small 3D | Large 3D | |
| *Prediction | Dendrites/Spherulites | 11 | 1 | 0 | 4 | 0 |
| Needles | 11 | 99 | 1 | 13 | 0 | |
| 2D plates | 0 | 0 | 0 | 0 | 0 | |
| Small 3D | 32 | 7 | 2 | 95 | 12 | |
| Large 3D | 0 | 0 | 1 | 5 | 21 | |
| * | 9 | 46 | 4 | 12 | 2 | |
List of intensity features
| Symbol | Description | Formulation |
|---|---|---|
|
| Average image intensity |
|
|
| Minimum image intensity |
|
|
| Maximum image intensity |
|
|
| Standard deviation of intensity |
|
|
| Otsu’s threshold intensity | [ |
|
| Threshold effectiveness metric | [ |
List of histogram features
| Symbol | Description | Formulation |
|---|---|---|
|
| Average image intensity |
|
|
| Std devn of intensity |
|
|
| Skewness |
|
|
| Kurtosis |
|
| v | Entropy | - |
|
| GLCM auto-correlation | Eq. |
|
| Image auto-correlation | Eq. |
|
| GLCM power spectrum magnitude |
|
|
| Image power spectrum magnitude |
|
List of texture features
| Feature | Formulation | |
|---|---|---|
|
| Autocorrelation [ |
|
|
| Contrast [ |
|
|
| Correlation (Matlab) [ |
|
|
| Correlation [ |
|
|
| Cluster prominence [ |
|
|
| Cluster shade [ |
|
|
| Dissimilarity [ |
|
|
| Energy [ |
|
|
| Entropy [ |
|
|
| Homogeneity (Matlab) [ |
|
|
| Homogeneity [ |
|
|
| Maximum probability [ |
|
|
| Sum of squares: Variance [ |
|
|
| Sum average [ |
|
|
| Sum entropy [ |
|
|
| Sum variance [ |
|
|
| Difference variance [ |
|
|
| Difference entropy [ |
|
|
| Information measure of correlation 1 [ |
|
|
| Information measure of correlation 2 [ | (1−exp[−2( |
|
| Inverse difference (INV) [ |
|
|
| Inverse difference normalized [ |
|
|
| Inverse difference moment [ |
|
List of global binary image features
| Symbol | Description | Formulation |
|---|---|---|
|
| No of white pixels in |
|
|
| Foreground avg intensity |
|
|
| Foreground std devn intensity |
|
|
| Background avg intensity |
|
|
| Background std devn intensity |
|
|
| Number of blobs | No. of connected components |
|
| Image fullness |
|
List of blob features
| Symbol | Description | Formulation |
|---|---|---|
|
| Average intensity of |
|
|
| Std devn of intensity of |
|
|
| No of pixels in |
|
|
| No of white pixels in |
|
|
| Perimeter of |
|
|
| Convex hull area of | [ |
|
| Blob eccentricity of | [ |
|
| Blob extent of | [ |
|
| Equivalent circular diameter of | [ |
Graph features
| Feature | Symbol | Description | Formulation |
|---|---|---|---|
| *Edge [ |
| No of graphs (connected edges) |
|
|
| No of graphs with a single edge |
| |
|
| No of graphs with 2 edges |
| |
|
| No of graphs whose edges form a cycle |
| |
|
| No of line normals |
| |
|
| Average length of edges in all segments |
| |
|
| Sum of lengths of all edges |
| |
|
| Maximum length of an edge |
| |
|
| 1 if |
| |
|
| 1 if |
| |
|
| No of Harris corners | [ | |
| *Hough |
| No of Hough lines | [ |
|
| Average length of Hough lines | [ |
Shape-adaptive DCT features
| Symbol | Description |
|---|---|
|
| Maximum of non-zero coefficients of SA-DCT of |
|
| Average of non-zero coefficients of SA-DCT of |
|
| No. of non-zero coefficients of SA-DCT of |