| Literature DB >> 34103684 |
Wencke Walter1, Claudia Haferlach1, Niroshan Nadarajah1, Ines Schmidts1, Constanze Kühn1, Wolfgang Kern1, Torsten Haferlach2.
Abstract
Artificial intelligence (AI) is about to make itself indispensable in the health care sector. Examples of successful applications or promising approaches range from the application of pattern recognition software to pre-process and analyze digital medical images, to deep learning algorithms for subtype or disease classification, and digital twin technology and in silico clinical trials. Moreover, machine-learning techniques are used to identify patterns and anomalies in electronic health records and to perform ad-hoc evaluations of gathered data from wearable health tracking devices for deep longitudinal phenotyping. In the last years, substantial progress has been made in automated image classification, reaching even superhuman level in some instances. Despite the increasing awareness of the importance of the genetic context, the diagnosis in hematology is still mainly based on the evaluation of the phenotype. Either by the analysis of microscopic images of cells in cytomorphology or by the analysis of cell populations in bidimensional plots obtained by flow cytometry. Here, AI algorithms not only spot details that might escape the human eye, but might also identify entirely new ways of interpreting these images. With the introduction of high-throughput next-generation sequencing in molecular genetics, the amount of available information is increasing exponentially, priming the field for the application of machine learning approaches. The goal of all the approaches is to allow personalized and informed interventions, to enhance treatment success, to improve the timeliness and accuracy of diagnoses, and to minimize technically induced misclassifications. The potential of AI-based applications is virtually endless but where do we stand in hematology and how far can we go?Entities:
Mesh:
Year: 2021 PMID: 34103684 PMCID: PMC8225509 DOI: 10.1038/s41388-021-01861-y
Source DB: PubMed Journal: Oncogene ISSN: 0950-9232 Impact factor: 9.867
Fig. 1Overview of the different domains and the process of supervised learning.
The left side represents the different domains of supervised learning going from artificial intelligence to machine learning and finally deep learning. The right side depicts the process of supervised learning. At the top right corner, the requirements for a good training data set are listed. The data is used for automated feature extraction, leading to the generation of a model, the performance of which is evaluated by its capability to correctly predict the labels of unseen instances (= test data). Based on the evaluation outcome, the model is retrained to refine the selected features and to optimize the model. After several rounds of retraining the final model emerges. AI artificial intelligence, DL deep learning, ML machine learning.
Overview of the different diagnostic tests, the current challenges and known confunders for the clinical implementation of AI-based methods, and the requirements for a successful implementation.
| Cytomorphology | Cytogenetics | Immunophenotyping | Molecular genetics | |
|---|---|---|---|---|
| Method | Microscopy | Chromosome banding analysis | Multiparameter flow cytometry | Genomic analysis |
| Aim | Identification and characterization of cell populations based on morphology | Identification of cytogenetic abnormalities | Identification and characterization of cell populations based on light-scattering properties and antigen expression patterns | Identification of individual molecular profiles |
| Challenges | Differentiation between artefacts and informative material (=cells) | Identification and selection of individual chromosomes | Accurate representation and transformation of the raw data | Data matrix usually sparse and informative signals might be lost in noise |
| Correct identification of borders (very dense regions with overlapping cells) | Correct identification of structural abnormalities | Meaningful combination of various data types and differentiation between absence of information (e.g., insufficient coverage) and true negative results | ||
| Extraction of features that allow the differentiation of maturation states of the same cell type | Lack of knowledge for annotation and interpretation of variants in coding and especially non-coding regions | |||
| Confunders | Resolution, image capturing, image cropping are not standardized between laboratories | Different banding and staining methods | Different methods for data pre-processing and data/image transformation | Plethora of methods for the identification of features (CNV, SV, SNV, Fusions, etc), data transformation, and dimensionality reduction with limited concordance and individual biases |
| Unbalanced data sets for training with rare cell types being underrepresented | Unbalanced data sets for training with rare structural abnormalities being underrepresented | |||
| Requirements for final implementation | Time and cost efficient digitization of glass slides | Harmonization and standardization of used antibodies | Harmonization of gene panels | |
| Standardized, automated systems for the recording of digital microscopic images | Standardization of analysis pipelines and variant interpretation | |||
| Balanced training data capturing as much biological and technical variety as possible | ||||
Fig. 2Overview of implemented and potential ML applications in hematology.
The central part of the figure displays current and future diagnostic tests and methods, while the outer part illustrates the various data types and the potential clinical impact of ML-based applications and analyses. DL deep learning, ML machine learning.