| Literature DB >> 35912249 |
Jan-Niklas Eckardt1,2, Martin Bornhäuser1,3,4, Karsten Wendt2,5, Jan Moritz Middeke1,2.
Abstract
In cancer diagnostics, a considerable amount of data is acquired during routine work-up. Recently, machine learning has been used to build classifiers that are tasked with cancer detection and aid in clinical decision-making. Most of these classifiers are based on supervised learning (SL) that needs time- and cost-intensive manual labeling of samples by medical experts for model training. Semi-supervised learning (SSL), however, works with only a fraction of labeled data by including unlabeled samples for information abstraction and thus can utilize the vast discrepancy between available labeled data and overall available data in cancer diagnostics. In this review, we provide a comprehensive overview of essential functionalities and assumptions of SSL and survey key studies with regard to cancer care differentiating between image-based and non-image-based applications. We highlight current state-of-the-art models in histopathology, radiology and radiotherapy, as well as genomics. Further, we discuss potential pitfalls in SSL study design such as discrepancies in data distributions and comparison to baseline SL models, and point out future directions for SSL in oncology. We believe well-designed SSL models to strongly contribute to computer-guided diagnostics in malignant disease by overcoming current hinderances in the form of sparse labeled and abundant unlabeled data.Entities:
Keywords: artificial intelligence; cancer; diagnostics; machine learning; semi-supervised learning
Year: 2022 PMID: 35912249 PMCID: PMC9329803 DOI: 10.3389/fonc.2022.960984
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Inputs and Outputs of supervised, unsupervised and semi-supervised learning. In supervised learning (A) all data is labeled. Labels are used to train a classifier to map learned labels to previously unseen data. Unsupervised learning (B) does not use labels. Data is being clustered into groups based on inherent patterns. Semi-supervised learning (C) uses both labeled and unlabeled data. Labels are used to train a classifier which is augmented by unlabeled data of the same distribution to derive additional information in order to boost performance.
Figure 2How does unlabeled data boost classification performance? Consider a number of features n at the input level which corresponds to an n-dimensional feature space. In such an n-dimensional coordinate system, every input is located according to its feature vector given by its n features and can thus be sorted by similarities and differences in relation to other inputs which is represented by proximity or distance points in the feature space. For clarity reasons, we only consider two features (x, y) in a two-dimensional feature space. When labeled data is sparse (A), as is often the case in medical data sets, the decision boundary of a classifier is less constraint. This may lead to inaccuracies and poor generalization on external data. If many labels are given, the decision boundary is more constraint and thus a more accurate classifier is given that can potentially generalize better. However, manual labeling of such large data sets is often time- and cost-ineffective. Unlabeled data is often available in abundance (C) and can be used to constrain the decision boundary of a classifier in a way as large labeled data sets could do, however, without the need for excessive labeling. The decision boundary then lies in an area with low density. Nevertheless, as can be derived from (B) and (C), the performance gap between supervised and semi-supervised learning shrinks as the amount of labeled data grows if no further unlabeled samples are provided.
Overview of Studies on Semi-Supervised Learning in Histopathology.
| Authors and Reference | Entity | Objective | Technique | Publicly Available Code |
|---|---|---|---|---|
| Yu et al. ( | colorectal and lung cancer as well as lymph nodes | detecting malignant patches in WSI | mean teacher | yes |
| Shaw et al. ( | colorectal cancer | detecting malignant patches in WSI | student-teacher-chain | no |
| Wenger et al. ( | bladder cancer | detection and grading | consistency regularization and self-ensembling | no |
| Jaiswal et al. ( | metastasized tumors | detecting metastases in lymph node WSI | pseudo-labeling | no |
| Su et al. ( | breast cancer | detecting malignant samples in WSI | combination of association cycle consistency loss and maximal conditional association loss | no |
| Das et al. ( | breast cancer | grading samples | stacked semi-supervised GAN | no |
| Al Azzam et al. ( | breast cancer | cancer detection from nuclei morphologies | comparison of 9 SL and SSL classifiers | no |
| Kapil et al. ( | lung cancer | PD-L1 scoring | auxiliary classifier GAN and pixel-based quantification | no |
| Marini et al. ( | prostate cancer | Gleason scoring | teacher-student chain and pseudo-labeling | yes |
| Li et al. ( | prostate cancer | Gleason scoring | expectation maximization-based fully convolutional encoder-decoder network | no |
| Masood et al. ( | melanoma | detecting malignant samples | Co-training of Deep Belief Network and advised SVM | no |
GAN, generative adversarial networks; SL, supervised learning; SLL, semi-supervised learning; SVM, support vector machines; WSI, whole-slide-images.
Overview of Studies on Semi-Supervised Learning in Radiology and Radiotherapy.
| Authors and Reference | Entity | Objective | Technique | Publicly Available Code |
|---|---|---|---|---|
| Khosravan et al. ( | lung cancer | detecting malignant nodules in chest CAT scans | SSL-based multi-task network | no |
| Xie et al. ( | lung cancer | detecting malignant nodules in chest CAT scans | semi-supervised adversarial autoencoders, learnable transition layers, and supervised classification | no |
| Shi et al. ( | lung cancer | detecting malignant nodules in chest CAT scans | transfer learning and semi-supervised feature matching | no |
| Sun et al. ( | breast cancer | detecting breast cancer in mammogram images | co-training | no |
| Azary et al. ( | breast cancer | detecting breast cancer in mammogram images | co-training | no |
| Shin et al. ( | breast cancer | detecting breast cancer in ultrasound images | joint weakly- and strongly-supervised framework and self-training | yes |
| Wodzinski et al. ( | breast cancer | identifying target volumes for radiotherapy | semi-supervised multilevel encoder-decoder | yes |
| Ge et al. ( | brain tumor | glioma grading and IDH-mutation prediction in MRI scans | GAN-augmented networks in a graph-based framework | no |
| Chen et al. ( | brain tumor, multiple sclerosis, ischemic stroke | detecting pathological samples in MRI scans | student-teacher chain combined with adversarial learning | yes |
| Meier et al. ( | brain tumor | detecting residual tumor tissue in postoperative brain MRI | semi-supervised decision forest | no |
| Turk et al. ( | thyroid cancer | detecting thyroid cancer from ultrasound textures and clinical scoring systems | autoencoders and synthetic minority oversampling | no |
CAT, computer-assisted tomography; GAN, generative adversarial networks; MRI, magnetic resonance imaging.
Overview of Studies on Semi-Supervised Learning using non-image-based data.
| Authors and Reference | Entity | Objective | Technique | Publicly Available Code |
|---|---|---|---|---|
| Chai et al. ( | breast, lung, gastric and liver cancer | predicting survival | self-paced learning with Cox proportional hazard and accelerated failure time models | no |
| Shi et al. ( | colorectal and breast cancer | predicting relapse | low density separation | no |
| Park et al. ( | colorectal and breast cancer | predicting relapse | graph-based regularization | no |
| Hassanzadeh et al. ( | kidney, ovarian and pancreatic cancer | predicting survival | ensemble learning with robust boost and decision trees | no |
| Cristovao et al. ( | breast cancer | subtyping, model comparison | comparison of different SL and SSL algorithms | no |
| Ma et al. ( | lung, kidney, uterus and adrenal gland cancer | predicting primary tumor site | Affinity Network Fusion | yes |
| Sherafat et al. ( | ovarian cancer | predicting tumor-rejection mediating neoepitopes | Positive-unlabeled Learning using Auto-ML | no |
| Camargo et al. ( | acute myeloid leukemia, E. coli, plant leaves | model comparison | root distance boundary sampling | yes |
| Livieris et al. ( | breast and lung cancer | model comparison | self- and co-training with ensemble learning | no |