| Literature DB >> 35550583 |
Lei Tong1,2, Adam Corrigan2, Navin Rathna Kumar3, Kerry Hallbrook3, Jonathan Orme4, Yinhai Wang5, Huiyu Zhou1.
Abstract
Cell line authentication is important in the biomedical field to ensure that researchers are not working with misidentified cells. Short tandem repeat is the gold standard method, but has its own limitations, including being expensive and time-consuming. Deep neural networks achieve great success in the analysis of cellular images in a cost-effective way. However, because of the lack of centralized available datasets, whether or not cell line authentication can be replaced or supported by cell image classification is still a question. Moreover, the relationship between the incubation times and cellular images has not been explored in previous studies. In this study, we automated the process of the cell line authentication by using deep learning analysis of brightfield cell line images. We proposed a novel multi-task framework to identify cell lines from cell images and predict the duration of how long cell lines have been incubated simultaneously. Using thirty cell lines' data from the AstraZeneca Cell Bank, we demonstrated that our proposed method can accurately identify cell lines from brightfield images with a 99.8% accuracy and predicts the incubation durations for cell images with the coefficient of determination score of 0.927. Considering that new cell lines are continually added to the AstraZeneca Cell Bank, we integrated the transfer learning technique with the proposed system to deal with data from new cell lines not included in the pre-trained model. Our method achieved excellent performance with a precision of 97.7% and recall of 95.8% in the detection of 14 new cell lines. These results demonstrated that our proposed framework can effectively identify cell lines using brightfield images.Entities:
Mesh:
Year: 2022 PMID: 35550583 PMCID: PMC9098893 DOI: 10.1038/s41598-022-12099-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Example images of three cell lines with different incubated durations (hours). In the three examples shown here, 3 discrete time points were taken for A549, A549, T47D at 24, 48 and 72 h. With the increased amount of the incubation time, we observed increased cell counts and confluency and a formation of colonies. Notable single cell morphology can also be observed, e.g. A549 cells are more elongated in shape compared to A431 cells, whereas T47D cells are typically larger in size and round.
Figure 2The framework of the proposed automated cell line authentication system. In the data preparation stage, cell images are collected using the high-throughput IncuCyte microscopy technique from 30 cell lines and each cell image has two separate labels, e.g. cell line name and incubation time. Then, the deep learning network CLCNet learns the image-level features from the input cell images with their cell line labels and outputs predicted classes for test cell images. Once CLCNet is trained, the convolutional features of the training data are extracted to train CLRNet. CLRNet predicts the times of how long cell lines have been incubated simultaneously.
Classification and regression results on the 30 cell lines dataset: [mean value standard deviation] by four deep networks.
| Backbones | Classification | Regression | ||||
|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1-score | MSE | R2-score | |
| ResNet50 | 0.987 | 0.984 | 0.987 | 0.985 | 452.533 | 0.880 |
| VGG19 | 0.975 | 0.968 | 0.972 | 0.972 | 677.637 | 0.821 |
| MobileNet | 0.965 | 0.952 | 0.960 | 0.956 | 769.111 | 0.797 |
| Xception | ||||||
R2-score: coefficient of determination regression score. The best results are shown in bold.
Figure 3Visualization of the classification and regression results on the 30 cell lines’ dataset. (A) Confusion matrices of the 30-category classification. The corresponding cell line names of the coordinates are shown in the right legend. Compared with ResNet50, VGG19 and MobileNet, Xception model made fewer prediction errors. (B) Scatter plots of the predicted incubated durations vs. the real incubated durations. Here we show the prediction results on the test fold-1 as an example; the complete results of cross-validation are shown in Fig. S2. Xception outperformed other three methods in predicting the amount of incubation time with R2-score = 0.939.
Figure 4t-SNE Embedding of our 30 cell lines. The t-SNE tool reduces the dimension of the CLCNet’s convolutional features and visualizes the processed features in 2D space. Each dot represents one cell image and is colored by its matching cell line name. There were clear gaps between different cell lines, which validates that CLCNet model can distinguish the 30 cell lines well. Example images of the 30 cell lines are shown in Fig. S6.
Figure 5t-SNE plots of 3 example cell lines (e.g. HT1080, PC3, KELLY). These plots enlarge the three cell lines’ distribution of Fig. 4. Each dot is colored by the range of the incubation duration (e.g. 0–24 h, 24–48 h). An interesting phenomenon was found that samples with similar times locate in adjacent areas. For example, the samples of HT1080 were clustered well and samples within 0–24 h were close to the samples within 24–48 h.
Figure 6Performance of the transfer learning technique for identifying 14 new cell lines. (A) Confusion matrix of the 14 cell line classification. (B) t-SNE plot of 44 cell lines. The pink dots are the data of 30 cell lines and dots of other colors are the data of 14 new cell lines. It can be seen that the samples of the 14 new cell lines were mapped to the margin space between the 30 cell lines. Example images of the 14 cell lines are shown in Fig. S7. (C) Regression results for the test fold-1 of the cross-validation. (D) Comparison of the convergence speed between full model training from scratch and final layer fine-tuning using the transfer learning. “Training from scratch” means that all layers of Xception are re-trained using the dataset. (E) Comparison of the training times.
Classification and regression results of the transfer learning technique for identifying the 44 cell lines.
| Metrics | 14 cell lines | 30 cell lines | Overall |
|---|---|---|---|
| Accuracy | 0.965 | 0.998 | 0.997 |
| Precision | 0.977 | 0.997 | 0.991 |
| Recall | 0.958 | 0.996 | 0.984 |
| F1-score | 0.968 | 0.997 | 0.987 |
| MSE | 526.230 | 232.690 | 263.360 |
| R2-score | 0.853 | 0.939 | 0.932 |
The columns of the “14 cell lines” represented the data of 14 cell lines is split from the test dataset during evaluation. The columns of the “30 cell lines” represented the data of 30 cell lines is split from the test dataset during evaluation. The “overall” column showed the results for the whole test set (44 cell lines).