| Literature DB >> 34390568 |
Mirae Kim1, Soonwoo Hong2, Thomas E Yankeelov2,3,4,5,6, Hsin-Chih Yeh2,7, Yen-Liang Liu8,9.
Abstract
MOTIVATION: Motions of transmembrane receptors on cancer cell surfaces can reveal biophysical features of the cancer cells, thus providing a method for characterizing cancer cell phenotypes. While conventional analysis of receptor motions in the cell membrane mostly relies on the mean-squared displacement plots, much information is lost when producing these plots from the trajectories. Here we employ deep learning to classify breast cancer cell types based on the trajectories of epidermal growth factor receptor (EGFR). Our model is an artificial neural network trained on the EGFR motions acquired from six breast cancer cell lines of varying invasiveness and receptor status: MCF7 (hormone receptor-positive), BT474 (HER2-positive), SKBR3 (HER2-positive), MDA-MB-468 (triple-negative, TN), MDA-MB-231 (TN), and BT549 (TN).Entities:
Year: 2021 PMID: 34390568 PMCID: PMC8696113 DOI: 10.1093/bioinformatics/btab581
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The schematic diagram of deep learning classification of breast cancer cells based on TReD. The deep learning model is a 14-layer variant of a residual neural network (ResNet) optimized using SGD with categorical cross-entropy as the loss function. Our model takes the two-dimensional trajectories of EGFR as inputs and outputs the probabilities of predicted classes, in which the highest probabilities represent the model predictions. The probabilities were then converted into a confusion matrix for a quick assessment of the model prediction
Receptor status for each of the cell lines used in training and testing the model
| Cell line | Hormone receptor status | ||
|---|---|---|---|
| ER | PR | HER2 | |
| MCF7 | + | + | − |
| BT474 | + | + | + |
| SKBR3 | − | − | + |
| MDA-MB-468 | − | − | − |
| MDA-MB-231 | − | − | − |
| BT549 | − | − | − |
Fig. 2.Cell line classification results with an overall accuracy of 83%. (A) Normalized confusion matrix showing rates of correct classifications and misclassifications for each cell-line sample (i.e., label). (B) UMAP showing the clustering of the low-dimensional projections according to their cell-line origins. The green circle indicates overlapping clusters of the two HER2+ cell lines (BT474 and SKBR3). The orange circle indicates overlapping clusters of the two TN cell lines (MDA-MB-468 and BT549).
Fig. 3.Receptor status classification results with an overall accuracy of 85%. (A) Normalized confusion matrix showing rates of correct classifications and misclassifications for each receptor status subtype. (B) UMAP showing the clustering of the low-dimensional projections according to their receptor status.
Fig. 4.Validation of the trained models using a newly reassembled dataset. The classification of six cell lines and receptor statuses achieved 82% and 90% overall accuracies, respectively. (A) Confusion matrix on the new dataset, using the pretrained model from Figure 2. (B) Confusion matrix on the new dataset, using the pretrained model from Figure 3.
Fig. 5.Receptor status classification using the trained model before and after the induction of EMT. MCF10A and MCF7 increased in TN classification by 20% and 142%, respectively. MDA-MB-231 decreased in TN classification by 22%. (A) Confusion matrix of the three cell lines before the EMT induction. (B) Confusion matrix of the three cell lines after the EMT induction. After EMT induction, both MCF10A and MCF7 cells have even higher chances to be classified as TN cells (red squares).