| Literature DB >> 32517778 |
Julia Schaefer1,2, Moritz Lehne3, Josef Schepers2, Fabian Prasser2,4, Sylvia Thun2,4,5.
Abstract
BACKGROUND: Emerging machine learning technologies are beginning to transform medicine and healthcare and could also improve the diagnosis and treatment of rare diseases. Currently, there are no systematic reviews that investigate, from a general perspective, how machine learning is used in a rare disease context. This scoping review aims to address this gap and explores the use of machine learning in rare diseases, investigating, for example, in which rare diseases machine learning is applied, which types of algorithms and input data are used or which medical applications (e.g., diagnosis, prognosis or treatment) are studied.Entities:
Keywords: Machine learning; Rare diseases; Scoping review
Mesh:
Year: 2020 PMID: 32517778 PMCID: PMC7285453 DOI: 10.1186/s13023-020-01424-6
Source DB: PubMed Journal: Orphanet J Rare Dis ISSN: 1750-1172 Impact factor: 4.123
Data extracted from the studies
| Variable | Categories | Definition | Example(s) |
|---|---|---|---|
| All rare diseases described at least once in the studies (studies investigating more than one rare disease were categorized as “Diverse”) | Orphanet disorder name | Cystic fibrosis, Sickle cell anemia, Gaucher disease | |
| All disease groups of the 381 specific diseases included in the search as well as disease groups of other diseases identified in the studies | Orphanet disease group as defined by the preferential parent in the classification hierarchy | Rare neurologic disease, Rare respiratory disease, Rare endocrine disease | |
| Years from 2010 to 2019 | Year of the publication date of the article | ||
| All countries that published at least one article | Country of institution of senior (i.e. last) author of the study | ||
| Diagnosis | Studies aiming to correctly diagnose patients | Classification of cases and controls or different disease subtypes, Identification of biomarkers, Deep phenotyping, Decision support | |
| Treatment | Studies aiming to improve treatment or develop new therapies | Detection of therapeutic targets, Identification of binding proteins | |
| Prognosis | Prediction of a patient-relevant endpoint | Prediction of complication, disease onset, survival, disease progression, Risk estimation | |
| Basic research | Other basic research not classified into one of the categories above | Exploration of molecular disease mechanisms | |
| “< 20”, “20–99”, “100–1000”, “> 1000”, “not applicable / no information” | Number of patients included in the study | ||
| Clinical test score | Data from a clinical test score | Glasgow Coma Scale, ALS Functional Rating Scale | |
| Demographic data | General patient characteristics | Age, Sex, Ethnicity | |
| Functional test data | Data from physiological tests | ECG, EEG, EMG, gait pattern, pulse, blood pressure, eye movements | |
| Images | Data from medical imaging | MRI, PET, CT, retinal images, face photographs | |
| Laboratory data | Data from laboratory test | Blood glucose, platelet counts, creatinine | |
| Literature | Data extracted from scientific texts | Published literature, NCBI disease corpus | |
| Medication data | Data about medication | Use of antibiotics, medication plan | |
| Omics data | Molecular data | Genomics, Proteomics, Metabolomics, Epigenomics | |
| Patient / Family history | Data from patients’ or relatives’ past medical history | Pre-existing conditions, parental data | |
| Other EHR data | Other data from electronic health records | Diagnoses, procedures, other medical records | |
| Other | Other types of input data | Questionnaire or interview data, donors’ characteristics in HSCT | |
| Artificial Neural Network | Convolutional neural network, Recurrent neural network, Multi-layer perceptron | ||
| Bayesian Methods | Naïve Bayes | ||
| Clustering | k-means clustering, Hierarchical clustering | ||
| Decision Tree | Decision tree | ||
| Discriminant Analysis | Linear discriminant analysis | ||
| Ensemble Methods | AdaBoost, Random forest | ||
| Instance-based Learning | k-nearest neighbor | ||
| Regression (logistic) | Logistic regression | ||
| Regression (other) | Linear regression | ||
| Support Vector Machine | Support vector machine | ||
| Other | Algorithms not classified into one of the categories above | Reinforcement learning, Graphical models | |
| yes / no | Performance of algorithm tested on external data or against a human expert | Comparing automated scoring of chest radiographs with scoring by radiologists |
aFor these variables, a study could be assigned to more than one category
Fig. 1Selection of sources of evidence
Fig. 2World map showing publications by country (a); countries with more than five publications (b); total number of publications per year (c; for comparison, the inset shows the publication trend for machine learning in general)
Rare diseases most frequently investigated in the studies (all diseases appearing in five or more studies are listed)
| Rare disease | Orpha number | Prevalence | Number of studies |
|---|---|---|---|
| Amyotrophic lateral sclerosis | 803 | 1–9 / 100,000 | 16 (7.6%) |
| Systemic lupus erythematosus | 536 | 1–5 / 10,000 | 14 (6.6%) |
| Moderate and severe traumatic brain injury | 90056 | 1–5 / 10,000 | 12 (5.7%) |
| Cystic fibrosis | 586 | 1–9 / 100,000 | 10 (4.7%) |
| Huntington disease | 399 | 1–9 / 100,000 | 9 (4.3%) |
| Down syndrome | 870 | 1–5 / 10,000 | 7 (3.3%) |
| Preeclampsia | 275555 | 1–5 / 10,000 | 7 (3.3%) |
| Acquired aneurysmal subarachnoid hemorrhage | 90065 | 1–5 / 10,000 | 6 (2.8%) |
| Systemic sclerosis | 90291 | 1–5 / 10,000 | 6 (2.8%) |
| Fragile X syndrome | 908 | 1–5 / 10,000 | 5 (2.4%) |
| Retinopathy of prematurity | 90050 | 1–5 / 10,000 | 5 (2.4%) |
Fig. 3Distribution across disease groups: The distribution of the 381 diseases included in the literature search is shown in comparison with the distribution of the 74 diseases investigated in the studies (left; disease groups smaller than 3% are not shown); differences between the percentages show disease groups that are over- or underrepresented in the studies (right)
Fig. 4Types of algorithms used in the studies (a); input data (b); medical application (c); number of patients (d). Studies using more than one type of algorithm or input data are listed in more than one category