| Literature DB >> 28726811 |
Jessica J Y Lee1,2,3, Wyeth W Wasserman1,4, Georg F Hoffmann5, Clara D M van Karnebeek1,6,7, Nenad Blau3,5.
Abstract
PurposeRecognizing individuals with inherited diseases can be difficult because signs and symptoms often overlap those of common medical conditions. Focusing on inborn errors of metabolism (IEMs), we present a method that brings the knowledge of highly specialized experts to professionals involved in early diagnoses. We introduce IEMbase, an online expert-curated IEM knowledge base combined with a prototype diagnosis support (mini-expert) system.MethodsDisease-characterizing profiles of specific biochemical markers and clinical symptoms were extracted from an expert-compiled IEM database. A mini-expert system algorithm was developed using cosine similarity and semantic similarity. The system was evaluated using 190 retrospective cases with established diagnoses, collected from 15 different metabolic centers.ResultsIEMbase provides 530 well-defined IEM profiles and matches a user-provided phenotypic profile to a list of candidate diagnoses/genes. The mini-expert system matched 62% of the retrospective cases to the exact diagnosis and 86% of the cases to a correct diagnosis within the top five candidates. The use of biochemical features in IEM annotations resulted in 41% more exact phenotype matches than clinical features alone.ConclusionIEMbase offers a central IEM knowledge repository for many genetic diagnostic centers and clinical communities seeking support in the diagnosis of IEMs.Entities:
Mesh:
Year: 2017 PMID: 28726811 PMCID: PMC5763153 DOI: 10.1038/gim.2017.108
Source DB: PubMed Journal: Genet Med ISSN: 1098-3600 Impact factor: 8.822
An example disorder profile extracted from the nascent database
| Disorder name | Sepiapterin reductase deficiency | ||||||
| Disorder abbreviation | SRD | ||||||
| Associated gene | |||||||
| Chromosomal localization | 2p14–p12 | ||||||
| Affected protein | Sepiapterin reductase | ||||||
| MIM number | 182125 | ||||||
| Axial hypotonia | ++ | ++ | ++ | + | ? | No | |
| Cerebral palsy | ? | ? | ± | ± | ± | Yes | |
| Eye movements, abnormal | ± | ± | ± | ? | ? | No | |
| Hypokinesia | + | ++ | ± | ± | ± | Yes | |
| Muscle weakness | + | ± | ± | ± | ? | No | |
| 5-Hyroxyindoleacetic acid, 5HIAA (cerebrospinal fluid) | ↓↓↓ | ↓↓↓ | ↓↓↓ | ↓↓↓ | ↓↓↓ | Yes | |
| Biopterin (cerebrospinal fluid) | ↑ | ↑ | ↑ | ↑ | ↑ | Yes | |
| Biopterin (urine) | n | n | n | n | n | No | |
| Dihydrobiopterin (cerebrospinal fluid) | ↑↑ | ↑↑ | ↑↑ | ↑↑ | ↑↑ | Yes | |
| Homovanillic acid, HVA (cerebrospinal fluid) | ↓↓↓ | ↓↓↓ | ↓↓↓ | ↓↓↓ | ↓↓↓ | Yes | |
| Neopterin (cerebrospinal fluid) | n | n | n | n | n | No | |
| Neopterin (urine) | n | n | n | n | n | No | |
| Phenylalanine (plasma) | n | n | n | n | n | Yes | |
| Prolactin (plasma) | ↑ | ↑ | ↑ | ↑ | ↑ | Yes | |
| Sepiapterin (cerebrospinal fluid) | ↑↑ | ↑↑ | ↑↑ | ↑↑ | ↑↑ | Yes | |
| Sepiapterin (urine) | ? | ↑↑ | ↑↑ | ↑↑ | ? | Yes | |
For clinical symptoms, + denotes their presence and ± denotes occasional absence/presence. For biochemical markers, ↑ denotes elevated values, ↓ decreased values, and n denotes normal values. ? denotes uncertain/unreported presence of biomarkers/symptoms.
The affected biochemical markers and clinical symptoms are selected for brevity.
Vocabulary compatibility assessment results
| HPO | 0 | 450 | 450 |
| ICD 10 | 6 | 92 | 98 |
| SNOMED CT | 371 | 389 | 760 |
| MeSH | 324 | 283 | 607 |
| ChEBI | 301 | 3 | 304 |
| LOINC | 367 | 61 | 428 |
ChEBI, Chemical Entities of Biological Interest; HPO, Human Phenotype Ontology; ICD 10, International Classification of Diseases, 10th revision; LOINC, Logical Observation Identifiers Names and Codes; MeSH, Medical Subject Headings; SNOMED CT, Systematized Nomenclature of Medicine–Clinical Terms.
Total number of biochemical phenotypes in IEMbase is 1,123. Total number of clinical phenotypes in IEMbase is 1,200. Total number of phenotypes in IEMbase is 2,323.
Figure 1Mini-expert algorithm flowchart. Users enter a list of biochemical/clinical phenotypes into IEMbase’s mini-expert system. The system’s phenotype-matching algorithm first divides the input list into biochemical and clinical categories. The algorithm then ranks the disorders in IEMbase by comparing the biochemical profile of each disorder against the input biochemical profile, using cosine similarity. Subsequently, the algorithm breaks ties in the ranked list by comparing the clinical profiles, using semantic similarity.
Mini-expert system performance evaluation results
| MRR | 0.72 | 0.70 | 0.72 | 0.68 |
| % success at 1 | 62 | 59 | 63 | 57 |
| % success at 5 | 86 | 85 | 85 | 83 |
| % success at 10 | 90 | 91 | 90 | 89 |
| % success at 20 | 93 | 92 | 92 | 91 |
Mean reciprocal rank (MRR) measures how close the correct match is to the top rank on average. It ranges from 0 to 1, and values close to 1 indicate that correct matches appear closer to the top on average. % success at N = % of cases with correct diagnoses within top N ranks. Combined = combined cosine and semantic similarity. Cosine = cosine similarity only.
Figure 2Mini-expert system performance using only biochemical/clinical information. The system performance when using only biochemical phenotypes was compared with that when using only clinical phenotypes of 172 retrospective cases. Percentage success N measures % of cases whose actual diagnoses ranked within the top N ranks. The system performance when using only biochemical phenotypes was significantly better than that when using only clinical phenotypes (P < 2.2e-16; Mann-Whitney-U).