| Literature DB >> 33199764 |
Deogratias Mzurikwao1, Muhammad Usman Khan2, Oluwarotimi Williams Samuel3, Jindrich Cinatl4, Mark Wass5, Martin Michaelis5, Gianluca Marcelli6, Chee Siang Ang6.
Abstract
Although short tandem repeat (STR) analysis is available as a reliable method for the determination of the genetic origin of cell lines, the occurrence of misauthenticated cell lines remains an important issue. Reasons include the cost, effort and time associated with STR analysis. Moreover, there are currently no methods for the discrimination between isogenic cell lines (cell lines of the same genetic origin, e.g. different cell lines derived from the same organism, clonal sublines, sublines adapted to grow under certain conditions). Hence, additional complementary, ideally low-cost and low-effort methods are required that enable (1) the monitoring of cell line identity as part of the daily laboratory routine and 2) the authentication of isogenic cell lines. In this research, we automate the process of cell line identification by image-based analysis using deep convolutional neural networks. Two different convolutional neural networks models (MobileNet and InceptionResNet V2) were trained to automatically identify four parental cancer cell line (COLO 704, EFO-21, EFO-27 and UKF-NB-3) and their sublines adapted to the anti-cancer drugs cisplatin (COLO-704rCDDP1000, EFO-21rCDDP2000, EFO-27rCDDP2000) or oxaliplatin (UKF-NB-3rOXALI2000), hence resulting in an eight-class problem. Our best performing model, InceptionResNet V2, achieved an average of 0.91 F1-score on tenfold cross validation with an average area under the curve (AUC) of 0.95, for the 8-class problem. Our best model also achieved an average F1-score of 0.94 and 0.96 on the authentication through a classification process of the four parental cell lines and the respective drug-adapted cells, respectively, on a four-class problem separately. These findings provide the basis for further development of the application of deep learning for the automation of cell line authentication into a readily available easy-to-use methodology that enables routine monitoring of the identity of cell lines including isogenic cell lines. It should be noted that, this is just a proof of principal that, images can also be used as a method for authentication of cancer cell lines and not a replacement for the STR method.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33199764 PMCID: PMC7670423 DOI: 10.1038/s41598-020-76670-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Training strategies selection.
| Grayscale | RGB | Without data augmentation | With data augmentation | Without transfer learning | With transfer learning | Combination number |
|---|---|---|---|---|---|---|
| ✓ | ✓ | ✓ | I | |||
| ✓ | Nearest | ✓ | II | |||
| ✓ | Constant | ✓ | III | |||
| ✓ | ✓ | ✓ | IV | |||
| ✓ | Nearest | ✓ | V | |||
| ✓ | Constant | ✓ | VI | |||
| ✓ | ✓ | ✓ | VII | |||
| ✓ | Nearest | ✓ | VIII | |||
| ✓ | Constant | ✓ | IX |
Number of images per cell line.
| Cell line | Number of images (n) |
|---|---|
| COLO-704 | 220 |
| COLO-704rCDDP1000 | 270 |
| EFO-21 | 220 |
| EFO-21rCDDP2000 | 220 |
| EFO-27 | 220 |
| EFO-27rCDDP2000 | 220 |
| UKF-NB-3 | 201 |
| UKF-NB-3rOXALI4000 | 170 |
Figure 1Cancer cell lines sample images.
Figure 2Pilot classification.
Model comparison.
| Model | Test F1-score |
|---|---|
| InceptionResNet V2 | 0.88 |
| MobileNet | 0.82 |
Figure 3Learning curves for model comparisons.
Four class authentication.
| Cell type | Mean F1 score | Standard deviation |
|---|---|---|
| Parental cancer cell lines | 0.96 | 0.02 |
| Drug treated cancer cell lines | 0.91 | 0.03 |
Eight classes authentication.
| Cell type | Mean F1 score | Standard deviation |
|---|---|---|
| Combined cancer cell lines | 0.91 | 0.03 |
Two classes authentication.
| Cell line | Average F1 score | Standard deviation |
|---|---|---|
| COLO-704/ COLO-704rCDDP1000 | 0.90 | 0.01 |
| EFO-21/ EFO-21rCDDP2000 | 0.94 | 0.02 |
| EFO-27/ EFO-27rCDDP2000 | 0.98 | 0.03 |
| UKF-NB-3/ UKF-NB-3rOXALI4000 | 0.98 | 0.03 |
| Average | 0.95 | 0.02 |
10 folds cross validation with training sample size drop.
| Percentage (%) of training sample size | Mean F1-score | Standard deviation |
|---|---|---|
| 100 | 0.91 | 0.03 |
| 80 | 0.88 | 0.04 |
| 60 | 0.80 | 0.12 |
| 40 | 0.72 | 0.13 |
| 20 | 0.57 | 0.16 |
Model performance per class.
| Class | Mean F1-score | Standard deviation |
|---|---|---|
| COLO-704rCDDP1000 | 0.95 | 0.03 |
| COLO-704 | 0.96 | 0.03 |
| EFO-21rCDDP2000 | 0.76 | 0.12 |
| EFO-21 | 0.95 | 0.04 |
| EFO-27rCDDP2000 | 0.71 | 0.16 |
| EFO-27 | 0.95 | 0.06 |
| UKF-NB-3rOXALI4000 | 0.98 | 0.04 |
| UKF-NB-3 | 0.99 | 0.03 |
Model confidence.
| Cell line | Correct confidence | Wrong confidence |
|---|---|---|
| COLO-704rCDDP1000 | 0.93 | 0.76 |
| COLO-704 | 0.99 | 0.74 |
| EFO-21rCDDP2000 | 0.78 | 0.70 |
| EFO-21 | 0.96 | 0.73 |
| EFO-27rCDDP2000 | 0.76 | 0.75 |
| EFO-27 | 0.96 | 0.79 |
| UKF-NB-3rOXALI4000 | 0.97 | 0.68 |
| UKF-NB-3 | 1.0 | 0 |
| average | 0.92 | 0.64 |
Investigation on Efo-21 and Efo-27.
| Cell line | Mean F1-score | Standard deviation |
|---|---|---|
| Efo-21/Efo-27 Parental | 0.94 | 0.01 |
| Efo-21/Efo-27 Drug treated | 0.60 | 0.05 |