| Literature DB >> 33120974 |
Vibhuti Gupta1, Thomas M Braun2, Mosharaf Chowdhury3, Muneesh Tewari4,5,6, Sung Won Choi1.
Abstract
Machine learning techniques are widely used nowadays in the healthcare domain for the diagnosis, prognosis, and treatment of diseases. These techniques have applications in the field of hematopoietic cell transplantation (HCT), which is a potentially curative therapy for hematological malignancies. Herein, a systematic review of the application of machine learning (ML) techniques in the HCT setting was conducted. We examined the type of data streams included, specific ML techniques used, and type of clinical outcomes measured. A systematic review of English articles using PubMed, Scopus, Web of Science, and IEEE Xplore databases was performed. Search terms included "hematopoietic cell transplantation (HCT)," "autologous HCT," "allogeneic HCT," "machine learning," and "artificial intelligence." Only full-text studies reported between January 2015 and July 2020 were included. Data were extracted by two authors using predefined data fields. Following PRISMA guidelines, a total of 242 studies were identified, of which 27 studies met the inclusion criteria. These studies were sub-categorized into three broad topics and the type of ML techniques used included ensemble learning (63%), regression (44%), Bayesian learning (30%), and support vector machine (30%). The majority of studies examined models to predict HCT outcomes (e.g., survival, relapse, graft-versus-host disease). Clinical and genetic data were the most commonly used predictors in the modeling process. Overall, this review provided a systematic review of ML techniques applied in the context of HCT. The evidence is not sufficiently robust to determine the optimal ML technique to use in the HCT setting and/or what minimal data variables are required.Entities:
Keywords: HSCT; artificial intelligence; hematopoietic stem cell transplantation; mHealth; machine learning; mobile health; sensors
Mesh:
Year: 2020 PMID: 33120974 PMCID: PMC7663237 DOI: 10.3390/s20216100
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1PRISMA Workflow for systematic identification of scientific literature.
Search query for the retrieval of studies.
| (HSCT OR HCT OR GVHD OR acute GVHD OR aGVHD OR leukemia OR lymphoma OR autologous HCT OR allogeneic HCT OR Hematopoietic Cell Transplantation OR Bone marrow transplant OR Hematopoietic cell transplant OR Hematopoietic stem cell transplantation OR Graft-versus-host disease) AND (Machine Learning OR Artificial Intelligence). |
A brief summary of reviewed studies.
| Reference | No. of Participants | Data Streams Used | Outcomes | Best ML Technique | Compared ML Techniques | Major Theme Identified |
|---|---|---|---|---|---|---|
| Lu et al., 2019 [ | 637 | Clinical, genomic & demographics | AML 2-years survival and relapse, mortality | Att-BLSTM | SVM, LR | Post-HSCT complications |
| Fuse et al., 2019 [ | 217 | Clinical | Risk of Leukemia relapse after 1 year of allo-HSCT | - | ADT | Post-HSCT complications |
| Goswami et al., 2019 [ | 347 | Clinical | Relapse risk within 36 months of autologous-HSCT | - | Stacked ML | Post-HSCT complications |
| Ritari et al., 2018 [ | 161 | Clinical & genomic | Genomic biomarkers for relapse risk of various hematological malignancies for allo-HSCT recipient | - | RF | Post-HSCT complications |
| Marino et al., 2016 [ | 2107 | Clinical | High-risk amino acid substitutions and position types for grade III-IV acute-GVHD, TRM, disease free survival | - | RF, LR | Post-HSCT complications |
| ArabYarmohammadi et al., 2020 [ | 39 | Images | Relapse risk in AML patients post-HSCT | - | Deep learning, LDA | Post-HSCT complications |
| Krakow et al., 2017 [ | 9563 | Clinical | Adaptive treatment strategies | - | RL | Post-HSCT complications |
| Liu et al., 2017 [ | 6021 | Clinical | Optimal Dynamic treatment regimes | - | Deep RL | Post-HSCT complications |
| Shouval et al., 2016 [ | 26,266 | Clinical | NRM 100 days post HCT in acute leukemia | - | NB, ADT, LR, MLP, RF, AdaBoost | Post-HSCT complications |
| Shouval et al., 2015 [ | 28,236 | Clinical | Overall Mortality 100 days post-HSCT | - | ADT | Post-HSCT complications |
| Tang et al., 2020 [ | 324 | Clinical | Grade II-IV acute-GVHD risk | - | L2 regularized LR | Post-HSCT complications |
| Arai et al., 2019 [ | 26,695 | Clinical | grade II-IV & III-IV aGVHD risk | ADT | NB, MLP, RF, Ada- boost | Post-HSCT complications |
| Kuang et al., 2019 [ | 28 | Clinical & sensor | Non-invasive biomarkers for acute-GVHD diagnosis in mice | - | PCA, k-means | Post-HSCT complications |
| Serrano-López et al., 2020 [ | 29 | Genomic | Gene biomarkers for chronic-GVHD diagnosis | - | RF | Post-HSCT complications |
| Sharifi et al., 2020 [ | 66 | Images | Differentiate among pulmonary complications post-HSCT | - | k-means + SVM | Post-HSCT complications |
| Gandelman et al., 2019 [ | 339 | Clinical | Classify patients with chronic-GVHD according to organ scores | - | k-means | Post-HSCT complications |
| Sharafeldin et al., 2020 [ | 277 | Clinical, genomic & demographics | post-BMT cognitive impairment | - | ENR | Post-HSCT complications |
| Cocho et al., 2015 [ | 36 | Clinical & genomic | Genomic biomarkers for GVHD associated Dry eye | SVM | k-NN, SDA | Post-HSCT complications |
| Leclerc et al., 2018 [ | 155 | Clinical & biological | initial cyclosporine dose blood concentrations Post-HSCT | BN | NB, SVM, RF | Others |
| Li et al., 2020 [ | 10,258 | Clinical & Demographics | Donor availability | BDT | LR, SVM | Pre-HSCT factors |
| Sivasankaran et al., 2018 [ | Not clear | Demographics & member related factors | Donor availability | GBM | SVM, LR | Pre-HSCT factors |
| Buturovic et al., 2018 [ | 1255 | Clinical | Selecting appropriate unrelated donor for patients undergoing HSCT | - | SVM | Pre-HSCT factors |
| Sivasankaran et al., 2015 [ | 3035 | Clinical | Selecting appropriate unrelated donor for patients undergoing HSCT | SVM | k-NN, CART | Pre-HSCT factors |
| Brasier et al., 2015 [ | 68 | Clinical | Detection of pre-HSCT infection in patients undergoing chemotherapy | GPS | RF, CART, MARS | Post-HSCT complications |
| Lee et al., 2018 [ | 9651 | Clinical | Grade II-IV agvhd risk or death within 100 days post-HSCT | SL | LR, BRT, MARS, BART, RR, ENR, ANN | Predictive Tools Development |
| Okamura, et al. 2020 [ | 363 | Clinical | 1-year overall survival, PFS, relapse, and NRM | - | RSF | Predictive Tools Development |
| Leclerc et al., 2020 [ | 211 | Clinical & biological | Best first cyclosporine dose | - | BN | Predictive Tools Development |
Abbreviated Terms: BDT: Boosted Decision Tree; LR: Logistic Regression; SVM: Support Vector Machine; HSCT: Hematopoietic stem cell transplantation; GVHD: Graft-versus-host-disease; RF: Random Forest; AML: Acute Myeloid Leukemia; Att-BLSTM: Attention Bidirectional Long-short-term-memory; PCA: Principal Component Analysis; NRM: Non-relapse mortality; NB: Naïve Bayes; MLP: Multilayer Perceptron; AdaBoost: Adaptive Boosting; GPS: Generalized Path Seeker; CART: Classification and Regression Tree; MARS: Multivariate Adaptive Regression Spline; ADT: Alternating Decision Tree; ENR: Elastic Net Regression; BN: Bayesian Network; SL: Super Learner; GBM: Gradient Boosting Machine; BRT: Boosted Regression Trees; BART: Bayesian Additive Regression Tree; RR: Ridge Regression; ANN: Artificial Neural Network; SDA: Shrinkage Discriminant Analysis; LDA: Linear Discriminant Analysis; RSF: Random Survival Forest; SM: Stacked Model; PLR: Penalized Logistic Regression; k-NN: k-nearest Neighbor.
Figure 2Distribution of studies by publication year.
Summary of challenges in applying ML techniques in HSCT.
| Challenges | Reasons | Potential Solution |
|---|---|---|
| Limited Data Capture |
Complex HSCT procedure with numerous post-transplant complications Lack of continuous and real-time capture of various data streams involved Mix of automated and manual data capture |
Utilize wearable sensor devices or leverage mHealth platforms for robust data collection |
| Data Quality Issues |
Lot of missingness and inconsistencies due to complex data collection procedures Loss of important variables lead to loss of relevant information |
Developing autonomous, adaptive, and online preprocessing algorithms that can automatically capture the data quality issues and resolve them by employing appropriate techniques in real-time |
| High Dimensional Data |
Large number of clinical and/or genomic variables associated with the HSCT outcome |
Developing novel streaming dimension reduction techniques for efficient processing of large number of features associated with the HSCT outcome |
| Data Privacy Issues |
Large amount of sensitive patient data is required in building predictive models due to numerous factors involved Combining multiple data streams from disperse data stores leads to potential data privacy issues |
Developing appropriate privacy measures, such as data anonymization techniques to ensure complete privacy of patients’ data Using technique such as “federated learning” [ Enabling some form of privacy access control to different data streams that can ensure that only those with proper authorization can access a patient’s data streams |
| Obsolete Predictive Models |
Dynamic evolution of disease states in patients undergoing HSCT |
Developing adaptive ML techniques having capability of detecting data changes over time and adapting accordingly |
| Diverse Data Types |
Captured data are of different modalities and sampled at different rates |
Multi-modal data integration techniques using deep learning has to be developed for effective integration |
| Data Integration issues |
Most of the captured data are typically dispersed among various data stores (e.g., cloud storage, EHR, individually-managed databases) |
Using mHealth platforms could be a potential solution. |
Summary of limitations of reviewed studies.
| Limitations | Consequences | Potential Solution |
|---|---|---|
| Lack of interpretable predictive models |
Biased results Lack of generalizable models Non-applicability to clinical decision making |
Development or application of more interpretable ML techniques such as ADT in HSCT setting Utilizing methods such as shapely additive explanations (SHAP) [ Better data visualization techniques |
| Lack of model validation |
Leads to non-generalizable models |
Use of validation sets to check initial errors of the built model and calibrate the model further before applying it to test sets |
| Smaller sample size |
Biased results Not clinically relevant model |
Larger representative sample has to be used for applying ML techniques to produce robust, scalable and unbiased results |
| Lack of multi-center studies |
Leads to non-generalizable models |
Registries having multicenter data from heterogeneous set of patients has to be used in the studies |
| Lack of diverse data streams used |
Leads to non-generalizable models |
Studies with diverse data streams are required that could potentially help in providing personalized healthcare solutions |
Figure 3Schematic workflow of Roadmap 2.0. Firstly, (a) large volumes of wearable sensor (i.e., Fitbit) data stream (e.g., heart rate, sleep, activity/steps), Electronic health Records and physiological data streams will be captured in real-time in mHealth platform Roadmap 2.0. The captured multi-parameter data streams (b) will be stored in secure HIPPA compliant server. It will contain multivariate physiological signals and patient reported outcomes data (generated from patients’ response of survey questionnaires). (c) The stored data will be processed in data analytics pipeline. Here, firstly features will be extracted from all diverse types of data and then machine learning algorithms will be used to build a predictive model. This model will be applied to test set for predictions on the unseen data. Finally, the predictive model will be evaluated using AUC. Also, feature importance will be computed. (d) The final results will be stored in the secure server.