| Literature DB >> 30694704 |
Scott J Warchal1, John C Dawson1, Neil O Carragher1.
Abstract
Multiparametric high-content imaging assays have become established to classify cell phenotypes from functional genomic and small-molecule library screening assays. Several groups have implemented machine learning classifiers to predict the mechanism of action of phenotypic hit compounds by comparing the similarity of their high-content phenotypic profiles with a reference library of well-annotated compounds. However, the majority of such examples are restricted to a single cell type often selected because of its suitability for simple image analysis and intuitive segmentation of morphological features. The aim of the current study was to evaluate and compare the performance of a classic ensemble-based tree classifier trained on extracted morphological features and a deep learning classifier using convolutional neural networks (CNNs) trained directly on images from the same dataset to predict compound mechanism of action across a morphologically and genetically distinct cell panel. Our results demonstrate that application of a CNN classifier delivers equivalent accuracy compared with an ensemble-based tree classifier at compound mechanism of action prediction within cell lines. However, our CNN analysis performs worse than an ensemble-based tree classifier when trained on multiple cell lines at predicting compound mechanism of action on an unseen cell line.Entities:
Keywords: cancer and cancer drugs; cell-based assays; high-content screening; machine learning
Mesh:
Year: 2019 PMID: 30694704 PMCID: PMC6484528 DOI: 10.1177/2472555218820805
Source DB: PubMed Journal: SLAS Discov ISSN: 2472-5552 Impact factor: 3.341
Panel of Breast Cancer Cell Lines Chosen for Study.
| Mutation Status | |||
|---|---|---|---|
| Cell Line | Subclass | PTEN | PI3K |
| MCF7 | ER | WT | E545K |
| T47D | ER | WT | H1047R |
| MDA-MB-231 | TN | WT | WT |
| MDA-MB-157 | TN | WT | WT |
| HCC1569 | HER2 | WT | WT |
| SKBR3 | HER2 | WT | WT |
| HCC1954 | HER2 | ? | H1047R |
| KPL4 | HER2 | ? | H1047R |
PTEN = phosphatase and tensin homolog; PI3K = phosphoinsitide-3-kinase; ER = estrogen receptor; TN = triple negative; HER2 = human epidermal growth factor; WT = wild-type; ? = lack of consensus regarding the mutational status. The breast cancer cell line mutational status was taken from Dai et al.[19]
Annotated Compounds and Their Associated MoA Label Used in the Classification Tasks.
| Compound | Class | Subclass | Supplier | Cat. No. |
|---|---|---|---|---|
| Paclitaxel | Microtubule disrupting | Microtubule stabilizer | Sigma | T7402 |
| Epothilone B | Microtubule disrupting | Microtubule stabilizer | Selleckchem | S1364 |
| Colchicine | Microtubule disrupting | Microtubule destabilizer | Sigma | C9754 |
| Nocodazole | Microtubule disrupting | Microtubule destabilizer | Sigma | M1404 |
| Monastrol | Microtubule disrupting | Eg5 kinesin inhibitor | Sigma | M8515 |
| ARQ621 | Microtubule disrupting | Eg5 kinesin inhibitor | Selleckchem | S7355 |
| Barasertib | Aurora B inhibitor | Aurora B inhibitor | Selleckchem | S1147 |
| ZM447439 | Aurora B inhibitor | Aurora B inhibitor | Selleckchem | S1103 |
| Cytochalasin D | Actin disrupting | Actin disrupter | Sigma | C8273 |
| Cytochalasin B | Actin disrupting | Actin disrupter | Sigma | C6762 |
| Jasplakinolide | Actin disrupting | Actin stabilizer | Tocris | 2792 |
| Latrunculin B | Actin disrupting | Actin stabilizer | Sigma | L5288 |
| MG132 | Protein degradation | Proteasome | Selleckchem | S2619 |
| Lacacystin | Protein degradation | Proteasome | Tocris | 2267 |
| ALLN | Protein degradation | Cysteine/calpain | Sigma | A6165 |
| ALLM | Protein degradation | Cysteine/calpain | Sigma | A6060 |
| Emetine | Protein synthesis | Protein synthesis | Sigma | E2375 |
| Cycloheximide | Protein synthesis | Protein synthesis | Sigma | 1810 |
| Dasatinib | Kinase inhibitor | Src-EMT | Selleckchem | S1021 |
| Saracatinib | Kinase inhibitor | Src-EMT | Selleckchem | S1006 |
| Lovastatin | Statin | Statin | Sigma | PHR1285 |
| Simvastatin | Statin | Statin | Sigma | PHR1438 |
| Camptothecin | DNA damaging agent | Topoisomerase 1 inhibitor | Selleckchem | S1288 |
| SN38 | DNA damaging agent | Topoisomerase 1 inhibitor | Selleckchem | S4908 |
Figure 1.Workflow diagram. (a) Fluorescent cell images are segmented and morphological features are measured in CellProfiler, which are used in an ensemble-based tree classifier to train and predict compound MoAs. (b) Fluorescent cell images are chopped into 300 × 300 pixel regions around each cell and used as labeled input for a CNN classifier to predict compound MoA.
Figure 2.Confusion matrices for MoA prediction when trained and predicted on the same cell line. Data were split into 70%/30% training/test sets. (a) Ensemble-based tree classifier. (b) ResNet18 CNN classifier.
Figure 3.Confusion matrices for MoA prediction when trained on seven cell lines and tested on an unseen cell line. Titles indicate the unseen cell line. (a) Ensemble-based tree classifier. (b) ResNet18 CNN classifier trained on balanced class sizes by undersampling overrepresented MoA classes.
Figure 4.The effect of training with additional cell lines when predicting MoAs on a withheld 30% dataset of the MDA-MB-231 cell line. Box plots show testing accuracy when trained with different combinations of the additional cell lines.