| Literature DB >> 25336059 |
Syed Saiden Abbas1, Tjeerd M H Dijkstra, Tom Heskes.
Abstract
BACKGROUND: Millions of cells are present in thousands of images created in high-throughput screening (HTS). Biologists could classify each of these cells into a phenotype by visual inspection. But in the presence of millions of cells this visual classification task becomes infeasible. Biologists train classification models on a few thousand visually classified example cells and iteratively improve the training data by visual inspection of the important misclassified phenotypes. Classification methods differ in performance and performance evaluation time. We present a comparative study of computational performance of gentle boosting, joint boosting CellProfiler Analyst (CPA), support vector machines (linear and radial basis function) and linear discriminant analysis (LDA) on two data sets of HT29 and HeLa cancer cells.Entities:
Mesh:
Year: 2014 PMID: 25336059 PMCID: PMC4287552 DOI: 10.1186/1471-2105-15-342
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Open source tools for high-throughput screening
| Tool
[ | Language | Classifier | Advantage |
|---|---|---|---|
| WND-CHARM
[ | C++ | Weighted Nearest Neighbor | Many image features |
| Enhanced CellClassifier
[ | Matlab | SVM | Good classifier |
| FARSIGHT
[ | C++ | Supervised Spectral Clustering | Programmer friendly |
| CellMorph, EBImage
[ | R | SVM | Link to machine |
| learning algorithms | |||
| CellCognition
[ | Python | Hidden Markov Model | Classifies movies |
| CellXpress
[ | C++ | R package for SVM | Phenotypic profiling |
| Ilastik
[ | Python | Random Forest | Interactive segmentation |
| BIOCAT
[ | Java | Nearest Neighbor, Random Forest, | User friendly and extensible |
| SVM and Decision Trees |
HT29 colon cancer cells with 14 phenotypes
| Phenotypes | Cells |
|---|---|
| Actin blebs (AB) | 107 |
| Actin dots (AD) | 111 |
| Anaphase -Telophase (AT) | 182 |
| Angular cell edges (ACE) | 73 |
| Crecent nuclei (CN) | 185 |
| Large spread cells (LSC) | 201 |
| Long projections (LP) | 59 |
| Metaphase (MP) | 563 |
| Motile (M) | 190 |
| Peas in a pod (PIP) | 34 |
| Perpheral actin (PA) | 59 |
| Phospho-Histone H3 dots (PHD) | 264 |
| Prometaphase (PMP) | 345 |
| Prophase (PP) | 153 |
| Total | 2526 |
Each cell has 615 features.
HeLa cancer cells with 10 phenotypes
| Phenotypes | Cells |
|---|---|
| Actin fiber (AF) | 170 |
| Big cells (BC) | 310 |
| Condensed cells (C) | 338 |
| Debris (D) | 219 |
| Lamellipodia (LA) | 258 |
| Metaphase (MP) | 186 |
| Membrane blebbing (MB) | 110 |
| Normal cells (N) | 542 |
| Protrusion and elongation (P) | 315 |
| Telophase (Z) | 97 |
| Total | 2545 |
Each cell has 51 features.
Figure 1Dendrograms of merging of phenotypes based upon the dissimilarity matrix obtained from the averaged confusion matrix of four classifiers: gentle boosting, SVM (linear), SVM (RBF) and LDA.
Figure 2A comparison of performance and cross-validation time with all features of the HT29 and HeLa data sets.
Figure 3A comparison of performance of SVM (RBF) and SVM (linear) on HT29 and HeLa data sets with different sizes of training sets.
Figure 4Misclassification of condensed (C) cells and protrusion-elongation (P) cells by SVM (RBF). (a) Correctly classified condensed cells (b) Condensed cells misclassified as protrusion-elongation cells (c) Correctly classified protrusion-elongation cells (d) Protrusion-elongation cells misclassified as condensed cells.
Figure 5Change in performance of SVM (linear) and SVM (RBF) for HT29 and HeLa data sets by removing cells with lower posterior probabilities of phenotypes.