| Literature DB >> 35690622 |
James Hong1, Laureen D Hachem1,2, Michael G Fehlings3,4,5.
Abstract
Application of deep learning methods to transcriptomic data has the potential to enhance the accuracy and efficiency of tissue classification and cell state identification. Herein, we developed a multitask deep learning model for tissue classification combining publicly available whole transcriptomic (RNA-seq) datasets of non-neoplastic, neoplastic and peri-neoplastic tissue to classify disease state, tissue origin and neoplastic subclass. RNA-seq data from a total of 10,116 patient samples processed through a common pipeline were used for model training and validation. The model achieved 99% accuracy for disease state classification (ROC-AUC of 0.98) and 97% accuracy for tissue origin (ROC-AUC of 0.99). Moreover, the model achieved an accuracy of 92% (ROC-AUC 0.95) for neoplastic subclassification. This is the first multitask deep learning algorithm developed for tissue classification employing a uniform pipeline analysis of transcriptomic data with multiple tissue classifiers. This model serves as a framework for incorporating large transcriptomic datasets across conditions to facilitate clinical diagnosis and cell-based treatment strategies.Entities:
Mesh:
Year: 2022 PMID: 35690622 PMCID: PMC9188604 DOI: 10.1038/s41598-022-13665-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Bayesian Hyperparameter Tuning of Deep Learning Models. (A) Search space of hyperparameters for Bayesian tuning; (B) Architecture of multitask classifier for disease state and tissue origin along with tuned hyperparameters; (C) Architecture of neoplastic subtype classifier along with tuned hyperparameters.
Disease state classifier.
| Accuracy | 0.9882 | |||
| Balanced accuracy | 0.9675 | |||
Figure 2Performance of disease state classifier. (A) Confusion matrix of disease state classifier; (B) Receiver operating characteristic curve (ROC) with area under the curve (AUC); (C) top K accuracy plot.
Tissue origin classifier.
| Accuracy | 0.9705 | |||
| Balanced accuracy | 0.9587 | |||
Figure 3Performance of tissue origin classifier. (A) Confusion matrix of tissue origin classifier; (B) Receiver operating characteristic curve (ROC) with area under the curve (AUC); (C) top K accuracy plot.
Neoplastic subtype classifier.
| Accuracy | 0.9229 | |||
| Balanced accuracy | 0.8548 | |||
Figure 4Performance of neoplastic subtype classifier. (A) Confusion matrix of tissue origin classifier; (B) Receiver operating characteristic curve (ROC) with area under the curve (AUC); (C) top K accuracy plot.
Figure 5Benchmarking of deep learning classifiers and other machine learning algorithms. (A) Disease state classifier benchmarks; (B) Tissue origin classifier benchmarks; (C) Neoplastic subtype classifier benchmarks. Solid line separates the deep learning models from classic machine learning algorithms, and the dotted line indicates the highest balanced accuracy achieved by machine learning algorithms in each classifier.