| Literature DB >> 27547534 |
Rui Alves1, Marc Piñol2, Jordi Vilaplana3, Ivan Teixidó3, Joaquim Cruz1, Jorge Comas4, Ester Vilaprinyo1, Albert Sorribas1, Francesc Solsona3.
Abstract
Introduction. Most documented rare diseases have genetic origin. Because of their low individual frequency, an initial diagnosis based on phenotypic symptoms is not always easy, as practitioners might never have been exposed to patients suffering from the relevant disease. It is thus important to develop tools that facilitate symptom-based initial diagnosis of rare diseases by clinicians. In this work we aimed at developing a computational approach to aid in that initial diagnosis. We also aimed at implementing this approach in a user friendly web prototype. We call this tool Rare Disease Discovery. Finally, we also aimed at testing the performance of the prototype. Methods. Rare Disease Discovery uses the publicly available ORPHANET data set of association between rare diseases and their symptoms to automatically predict the most likely rare diseases based on a patient's symptoms. We apply the method to retrospectively diagnose a cohort of 187 rare disease patients with confirmed diagnosis. Subsequently we test the precision, sensitivity, and global performance of the system under different scenarios by running large scale Monte Carlo simulations. All settings account for situations where absent and/or unrelated symptoms are considered in the diagnosis. Results. We find that this expert system has high diagnostic precision (≥80%) and sensitivity (≥99%), and is robust to both absent and unrelated symptoms. Discussion. The Rare Disease Discovery prediction engine appears to provide a fast and robust method for initial assisted differential diagnosis of rare diseases. We coupled this engine with a user-friendly web interface and it can be freely accessed at http://disease-discovery.udl.cat/. The code and most current database for the whole project can be downloaded from https://github.com/Wrrzag/DiseaseDiscovery/tree/no_classifiers.Entities:
Keywords: Computer assisted diagnosis; Family doctors; Rare diseases; User-friendly webserver; eHealth
Year: 2016 PMID: 27547534 PMCID: PMC4963223 DOI: 10.7717/peerj.2211
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1The web interface for Rare Disease Discovery.
(A) Entry screen. Users can type and select various symptoms. Once all relevant symptoms have been selected, the user can press “Submit Symptoms.” (B) Example of a differential diagnosis provided by the program. (C) List of diseases in the database, with its associated symptoms. (D) List of symptoms in the database, with its associated diseases.
Examples of prediction results for a randomly chosen set of ten rare diseases.
Diseases are identified in column 1. Column 2 indicates the total number of symptoms associated to the disease in the ORPHANET dataset. Column 3 presents DS for the disease when one symptom is submitted to RDD, as well as the ranking of the disease in the list of predictions. Column 4 shows DS for the disease when five symptoms are simultaneously submitted to RDD, as well as the ranking of the disease in the list of predictions. Column 5 displays minimum DS at which the disease is ranked as the most likely prediction, as well as the number of symptoms needed for that value of DS to be obtained. Finally, column 6 indicates the number of symptoms that make DS ≥ 0.5 for the disease, which is the value above which DS is statistically significant. Details about the symptoms are given in Supporting Table 1 of Appendix S1.
| Disease | Number of associated symptoms | Score at 1 symptom (rank) | Minimun score at rank 1 (number of symptoms) | Number of symptoms for statistically significant score ( |
|---|---|---|---|---|
| Beta-Thalassemia | 23 | 0.043(67th) | 0.13(3) | 12 |
| Canavan disease | 19 | 0.053(23rd) | 0.26(5) | 10 |
| Down syndrome | 48 | 0.021(244th) | 0.083(4) | 24 |
| Fabry disease | 66 | 0.015(111th) | 0.12(8) | 33 |
| Goldblatt syndrome | 23 | 0.043(81st) | 0.13(3) | 12 |
| Turner syndrome | 26 | 0.038(21st) | 0.077(2) | 13 |
| Uncombable hair syndrome | 7 | 0.14(1st) | 0.14(1) | 4 |
| Williams syndrome | 180 | 0.006(121st) | 0.028(5) | 90 |
| Yunis-Varon syndrome | 66 | 0.015(7th) | 0.14(9) | 33 |
| Zellweger-like syndrome without peroxisomal anomalies | 25 | 0.042(31st) | 0.12(3) | 13 |
Comparison of predictions between DDX generators.
Here we compare the most likely diagnosis of four well-known and freely available (at least for testing purposes) DDX generators with that provided by Rare Disease Discovery, when considering the joint symptoms used to perform the study summarized in columns 1 and 4 of Table 1.
| Disease | Diagnosis pro | ISABEL | Phenomizer | FindZebra | Rare Disease Discovery |
|---|---|---|---|---|---|
| Beta-Thalassemia | + | + | + | ∗ | + |
| Canavan disease | ∗ | ∗ | + | + | + |
| Down syndrome | ∗ | ∗ | + | + | + |
| Fabry disease | + | + | + | + | + |
| Goldblatt syndrome | ∗ | ∗ | + | + | + |
| Turner syndrome | ∗ | + | + | + | + |
| Uncombable hair syndrome | ∗ | ∗ | + | + | + |
| Williams syndrome | + | ∗ | + | + | + |
| Yunis-Varon syndrome | ∗ | ∗ | + | + | + |
| Zellweger-like syndrome without peroxisomal anomalies | + | ∗ | + | + | + |
Notes.
+ Suggests the appropriate disease in the top 10 ranked list of predictions.
∗ Does not suggest the appropriate disease in any position of the top 10 ranked list of predictions.
Comparison of predictions between DDX generators.
Here we compare the most likely diagnosis of four well-known and freely available (at least for testing purposes) DDX generators with that provided by Rare Disease Discovery. 10 patients with different symptoms and/or diseases were randomly selected from the RAMEDIS dataset. All symptoms were used.
| Disease (Patient ID) | Diagnosis pro | ISABEL | Phenomizer | FindZebra | Rare disease discovery |
|---|---|---|---|---|---|
| Classical homocystinuria (5) | + | + + | + + | + + | + + |
| Propionic acidemia (821) | + | + | + | ∗ | + + |
| Glycogen storage disease (1086) | + | + + | + | + + | + + |
| Isovaleric acidemia (1050) | + | ∗ | + | + + | + + |
| Galactosemia (970) | + | + | + + | + + | + + |
| Carnitine palmitoyl transferase II deficiency (1024) | ∗ | + + | + + | + | + + |
| Canavan disease (492) | ∗ | ∗ | ∗ | + + | + + |
| Porphyria (866) | + | ∗ | + + | + + | + + |
| Mitochondrial DNA depletion syndrome (940) | ∗ | + | + | + + | + + |
| Congenital neuronal ceroid lipofuscinosis (830) | + | + | + + | + + | + + |
Notes.
+ Suggests the appropriate disease in the top 100 list of possible diseases.
+ + Suggests the appropriate disease in the top 10 list of predictions.
∗ Does not suggest the appropriate disease in any position of the top 100 list of predictions.
Figure 2Joint effect of unreported and unrelated symptoms on the predictive accuracy of Rare Disease Discovery.
(A) Plot of F1-Score as a function of the % of patients with a known rare disease where 1, 2, 3, 4, 5, 10, or 20 symptoms were randomly added or deleted. (B) Plot of Precision as a function of the % of patients with a known rare disease where 1, 2, 3, 4, 5, 10, or 20 symptoms were randomly added or deleted. (C) Plot of Sensitivity as a function of the % of patients with a known rare disease where 1, 2, 3, 4, 5, 10, or 20 symptoms were randomly added or deleted. Without noise, the F1-Score is always 1. The F1-Score decreases as noise (% of patients with deleted symptoms) increases. This is mainly due to a decrease in precision. Sensitivity is always low because the number of false positives is always orders or magnitude smaller than the number of true negatives. In the worst case scenario (20 incorrect symptoms in 100% of the patients), the appropriate disease is contained in the set of diseases with the highest score for more than 80% of the patients.
Figure 3Effect of unreported symptoms on the predictive accuracy of Rare Disease Discovery.
(A) Plot of F1-Score as a function of the % of patients with a known rare disease where 25%, 50%, and 75% of the symptoms were randomly deleted. (B) Plot of Precision as a function of the % of patients with a known rare disease where 25%, 50%, and 75% of the symptoms were randomly deleted. (C) Plot of Sensitivity as a function of the % of patients with a known rare disease where 25%, 50%, and 75% of the symptoms were randomly deleted. Without noise (no deleted symptoms), the F1-Score is always 1. The F1-Score decreases as noise (% of patients with deleted symptoms) increases. This is mainly due to a decrease in precision. Sensitivity is always low because the number of false positives is always orders or magnitude smaller than the number of true negatives. In the worst case scenario (75% deleted symptoms in 100% of the patients), the appropriate disease is contained in the set of diseases with the highest score for more than 90% of the patients.