| Literature DB >> 32299466 |
Carole Faviez1, Xiaoyi Chen2, Nicolas Garcelon2,3, Antoine Neuraz2,4, Bertrand Knebelmann5,6,7, Rémi Salomon3,8, Stanislas Lyonnet6,9,10, Sophie Saunier6,11, Anita Burgun2,4,6,12.
Abstract
INTRODUCTION: Rare diseases affect approximately 350 million people worldwide. Delayed diagnosis is frequent due to lack of knowledge of most clinicians and a small number of expert centers. Consequently, computerized diagnosis support systems have been developed to address these issues, with many relying on rare disease expertise and taking advantage of the increasing volume of generated and accessible health-related data. Our objective is to perform a review of all initiatives aiming to support the diagnosis of rare diseases.Entities:
Keywords: Artificial intelligence; Clinical decision support; Diagnosis; Genetic diseases; Machine learning; Patient similarity; Phenotype; Rare disease; Scoping review
Mesh:
Year: 2020 PMID: 32299466 PMCID: PMC7164220 DOI: 10.1186/s13023-020-01374-z
Source DB: PubMed Journal: Orphanet J Rare Dis ISSN: 1750-1172 Impact factor: 4.123
Database queries
| Database | Query |
|---|---|
| PubMed | |
| Web of Science | ALL = ALL = |
Fig. 1Flowchart of the screening process
Publication summary
| Material | Knowledge | Machine learning | Articles |
|---|---|---|---|
| Phenotype concepts (22 studies) | Knowledge-based (14 studies) | No | [ |
| Hybrid (7 studies) | Yes | [ | |
| No | [ | ||
| Data driven (1 study) | Yes | [ | |
| Fluids (12 studies) | Hybrid (2 studies) | Yes | [ |
| Data driven (10 studies) | Yes | [ | |
| Images (16 studies) | Hybrid (2 studies) | Yes | [ |
| Data driven (14 studies) | Yes | [ | |
| Questionnaires (3 studies) | Data driven (3 studies) | Yes | [ |
| Family history and combined material (8 studies) | Knowledge-based (5 studies) | No | [ |
| Hybrid (2 studies) | Yes | [ | |
| Data driven (1 study) | Yes | [ |
References are listed in column “articles” according to the type of material considered and the model used (presence/absence of prior knowledge and of machine learning). The number of studies according to material and knowledge is given in parentheses
Fig. 2Correlations between the number of targeted diseases and material nature. All studies directed to all rare/genetic diseases were based on phenotype concepts. Studies directed to a class or one specific disease could take advantage of disease-related materials such as images or fluids
Number of patients and controls per dataset
| Number of studies | Number of studies with datasets | Number of datasets | Number of patients | ||
|---|---|---|---|---|---|
| Median | Mean [Min, Max] | ||||
| Group 1 | 29 studies | 27 studies | 29 datasets | 50 | 291 [7, 5050] |
| Group 2 | 15 studies | 14 studies | 20 datasets | 98 | 730 [5, 10,593] |
| Group 3 | 17 studies | 8 studies | 10 datasets | 161 | 6929 [40, 39,000] |
| Group 1 | 29 studies | 27 studies | 29 datasets | 70 | 105,491 [10, 2,966,363] |
Studies are grouped according to the number of diseases they address. Group 1: one disease; group 2: a class of diseases; group 3: all rare/genetic diseases. The number of studies, datasets and patients per dataset for each group is given. For group 1, the number of individuals in the control groups is also given. Datasets from studies addressing all rare diseases (group 3) contain more patients on average
Fig. 3Correlations between the knowledge model and material nature. Knowledge-based models were based on phenotype concepts or combinations of clinical features. Data-driven models were mostly based on images or fluids
Fig. 4Correlations between the knowledge model and the methods. Data-driven systems were all based on machine learning (associated or not to simple similarity measurement). Knowledge-based systems were either based on simple similarity or manually generated algorithms
Number of publications for different evaluation processes
| Evaluation | Data driven | Knowledge-based | Hybrid |
|---|---|---|---|
| Comparison to other methods | 15 studies | 9 studies | 4 studies |
| Comparison to other tools | 1 study | 8 studies | 3 studies |
| Comparison to experts | 3 studies | 1 study | 1 study |
| External validation | 8 studies | 8 studies | 2 studies |
| Method for imbalance issue | 2 studies | 0 studies | 0 studies |
The number of studies is specified for each evaluation process according to the type of knowledge included. External validation is only specified in 18 studies, and a specific method to address imbalance issues is only specified in two studies
Characteristics of online tools
| Tool name | Date | Data sources | Performances: Top 10 ranking | Related articles | URL |
|---|---|---|---|---|---|
| 2009 | Phenotype concepts | NA | [ | http://compbio.charite.de/phenomizer | |
| 2012 | Phenotype concepts | NA | [ | http://compbio.charite.de/boqa/ | |
| 2013 | Phenotype concepts | NA | [ | http:// | |
| 2013 | Phenotype concepts | 63% | [ | http://www.findzebra.com/ | |
| 2014 | Phenotype concepts/genes | ~ 99% | [ | http://compbio.charite.de/PhenIX/ | |
| 2015 | Phenotype concepts/genes | ~ 85% | [ | http://phenolyzer.usc.edu | |
| 2016, 2017 | Phenotype concepts | 38% | [ | http://diseasediscovery.udl.cat/ | |
| 2018 | Phenotype concepts | 90% | [ | http://www.iembase.org/app | |
| 2018 | Phenotype concepts | 57% | [ | https://pubcasefinder.dbcls.jp/ | |
| 2018 | Phenotype concepts/genes | 95% | [ | ||
| 2019 | Phenotype concepts | ~ 32% | [ | https://gddp.research.cchmc.org/ | |
| 2019 | Phenotype concepts/genes | ~ 95% | [ | https://web.stanford.edu/~xm24/Xrare/ | |
| 2017 | Images | NA | [ | https://www.cc-cruiser.com/ | |
| 2019 | Images | NA | [ | https://www.face2gene.com/ |
For each online tool, we listed the publication year, the materials used, the performance indicated in each publication, and the URLs provided in the publications. For the performance, the proportion of accurate diagnoses within the top 10 most relevant disease for each patient is given for all algorithms based on diagnoses recommendation (i.e., providing for each patient a list of potential diagnoses ranked by relevance). These results were provided by the authors of each tool and thus do not allow a comparison of tool performance, as the nature and volume of each dataset were different