| Literature DB >> 31600214 |
Lorenz Grigull1, Sandra Mehmecke2, Ann-Katrin Rother3, Susanne Blöß4, Christian Klemann5, Ulrike Schumacher6, Urs Mücke1, Xiaowei Kortum7, Werner Lechner8, Frank Klawonn7,9.
Abstract
BACKGROUND: Rare diseases (RD) result in a wide variety of clinical presentations, and this creates a significant diagnostic challenge for health care professionals. We hypothesized that there exist a set of consistent and shared phenomena among all individuals affected by (different) RD during the time before diagnosis is established.Entities:
Year: 2019 PMID: 31600214 PMCID: PMC6786570 DOI: 10.1371/journal.pone.0222637
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Example of questions (selection of 6 out of 53 questions; for the complete questionnaire: S1 file).
| Did you suspect for a period of time prior to your diagnosis that something was wrong with your health? |
| Do you deliberately avoid activities or tasks that make your symptoms obvious to others? |
| Is it difficult for you to describe your complaints / symptoms? |
| Do you notice any special tricks or techniques you have developed to compensate for symptom-related limitations in mastering everyday tasks? |
| Can you recall a situation when your symptoms caused you to feel threatened? |
| Did you attempt to research possible causes for the complaints / symptoms you were experiencing? |
Return rates of questionnaires.
| Sarcoidosis | 144 |
| PAH | 50 |
| Syringomyelia | 44 |
| SLE | 31 |
| Rare endocrinological diseases | 36 |
| Rare neuromuscular diseases | 90 |
| Rare diseases of the skin | 68 |
| Rare neurological diseases | 93 |
| Rare pain syndromes | 22 |
| Rare autoimmune diseases | 94 |
| Rare metabolic diseases | 52 |
| Rare pulmonary diseases | 34 |
| NRO | 200 |
| CD | 149 |
| PSY | 48 |
| no diagnosis (online questionnaires) | 349 |
| incomplete questionnaires | 225 |
| healthy individuals | 34 |
a Including patients with acromegaly, addisons disease, adenoma, cushings disease
b Including patients with ALS, CIDP (chronic inflammatory demyelinating polyneuropathy), muscular dystrophy Duchenne, FSHD, SMA, PNP
c Including patients with EDS, ectodermal dysplasia, epidermolysis bullosa, lipoedema, mastocytosis
d Including patients with GBS, M. Menière, Arnold chiari malformation
e Mostly patients with cluster headache
f Including patients with M. Still, M. Wegener, M. Behcet, dermatomyositis, Moya-Moya syndrome
g Including patients with Glycogenosis 1 to 9, M. Fabry, metachromatic leukodystrophia, Niemann-Pick Type C
h Including mostly patients with PCD and Cystic Fibrosis
I Patients feeling ill but without a conclusive diagnosis despite intensive workup. The rare disease center Bonn added 34 questionnaires from individuals without diagnosis despite intensive testing and searching
k Including patients with asthma, inflammatory bowel disease
Structures of the 3 data subsets.
| Data set | Classifier class 1 | Classifier class 2 | Questionnaires |
|---|---|---|---|
| 1 | pulmonary hypertension (PAH), cystic fibrosis | other non-rare diseases | 90 : 90 → 180 |
| 2 | Sarcoidosis, syringomyelia | chronic diseases | 90 : 90 → 180 |
| 3 | CIDP | psychosomatic disorders | 42 : 38 → 80 |
a Systemic lupus erythematodes
b Chronic inflammatory demyelinating polyneuropathy
c A selection of questionnaires was chosen at random
Fig 1Sensitivity values of a 10-fold stratified cross-validation run.
Data set 1 (RD versus NRO). The single diagnosis of the four different classifiers and their corresponding probabilities were evaluated by a further classifier, which computed the final diagnosis. For the fusion, a support vector machine (SVM, black line) was selected, because it performed best. For a better reading of the curves are shifted vertically with a few pixels.
Results of stratified 10-fold cross-validation runs for data set 1, 2 and 3.
A binary confusion matrix is based on the results of cross-validation by counting the numbers of true positives (TP), false negatives (FN), false positives (FP) and true negatives. For data set 1 the TP values are assigned to the RD and the TN to the NRO. The TN numbers of data set 2 corresponds to the CD and the TN number of data set 3 to the PSY. The sensitivity values for all 3 data sets are defined by TP/(TP+FN) and the corresponding specificity is given by TN/(TN+FP).
| Data | Diagnostic groups | Sensitivity | Specificity | Confusion matrix |
|---|---|---|---|---|
| 1 | RD versus NRO | RD 87.7% | NRO 86.6% | 79 TP / 11 FN / 12 FP / 78 TN |
| 2 | RD versus CD | RD 93.3% | CD 87.7% | 84 TP / 6 FN / 11 FP / 79 TN |
| 3 | RD versus PSY | RD 85.7% | PSY 84.2% | 36 TP / 6 FN / 6 FP / 32 TN |
Fig 2ROC curves and AUC values for RD of data set 1 (RD versus NRO).
ROC curves and AUC values indicate variable diagnostic sensitivity among different classifier systems for identifying correctly classified questionnaires of patients with RD of data set 1.
Fig 3Diagnostic support for a potential professional user.
Results of different patient questionnaires with a) Fabry disease (upper left), with b) an unknown diagnosis (upper right), with a c) chronic condition (below left) and with a d) somatoform disorder (below right) disease. The machine learning approach calculates these graphics, visualizing the probability values for a RD compared to other diagnoses. In a clinical setting, such a result could then be interpreted by the user in the context of the patient history.