| Literature DB >> 29855327 |
Nicolas Garcelon1,2,3, Antoine Neuraz4,5, Rémi Salomon6,7, Nadia Bahi-Buisson6,8, Jeanne Amiel6,9,10, Capucine Picard6,11,12, Nizar Mahlaoui6,11,13,14, Vincent Benoit6, Anita Burgun4,5,15, Bastien Rance4,15.
Abstract
BACKGROUND: Secondary use of data collected in Electronic Health Records opens perspectives for increasing our knowledge of rare diseases. The clinical data warehouse (named Dr. Warehouse) at the Necker-Enfants Malades Children's Hospital contains data collected during normal care for thousands of patients. Dr. Warehouse is oriented toward the exploration of clinical narratives. In this study, we present our method to find phenotypes associated with diseases of interest.Entities:
Keywords: Data mining; Data warehouse; Natural language processing; Next generation phenotyping; Rare diseases
Mesh:
Year: 2018 PMID: 29855327 PMCID: PMC5984368 DOI: 10.1186/s13023-018-0830-6
Source DB: PubMed Journal: Orphanet J Rare Dis ISSN: 1750-1172 Impact factor: 4.123
Description of the population of the data warehouse at Necker hospital
| DrWH | |
|---|---|
| Nb patients | 446,481 |
| Sex ratio (M) | 47% |
| Median Nb reports excluding biological reports per patient | 2 [1–6] |
| Median follow up (years) per patient | 0.06 [0–2] |
In brackets lower and upper quartile
Number of documents per Hospital department and per type of records
| Hospital departments | # Documents | Types of records | # Documents |
|---|---|---|---|
| Gyneco-Obstetrics | 433,698 | Laboratory | 1,563,450 |
| Pediatric Cardiology | 253,474 | Consultation | 834,619 |
| Adult Clinical Hematology | 227,520 | Imaging | 379,538 |
| Metabolism-Pediatric Neurology | 207,804 | Discharge letter | 293,342 |
| Nephrology Transplantations Adult | 187,388 | Diagnostic Related Group | 255,312 |
| Pediatric Nephrology | 175,041 | Hospitalization | 226,723 |
| Pediatric Immuno-Hematology | 152,226 | surgery | 111,598 |
| Pediatric Radiology | 151,811 | Day hospital | 88,244 |
| Adult Radiology | 150,612 | Emergency | 41,515 |
| Pediatric Cardiac Surgery | 136,272 | Exams | 31,042 |
| Pediatric Visceral Surgery | 121,758 | Prescription | 24,859 |
| Pediatric Orthopedic Surgery | 120,287 | Medical certificate | 24,222 |
| Adult Nephrology | 116,602 | Pathology report | 24,215 |
| Anesthesia intensive care unit Adult And Pediatric | 114,773 | Foetopathology | 8858 |
| Pediatric Gastroenterology | 113,857 | Multidisciplinary consultation meeting | 6605 |
| Emergency | 108,367 | Other | 5786 |
| General Pediatrics | 97,831 | Staff meeting reports | 3669 |
| Physiology | 88,981 | Total | 3,923,597 |
| Pediatric ear nose and throat | 82,717 | ||
| Pediatric Intensive Care Unit | 77,599 | ||
| Other | 804,979 | ||
| Total | 3,923,597 |
Fig. 1Overview of the method applied to extract phenotypes from the narrative reports
Fig. 2Overview of the method applied to perform next generation phenotyping
Fig. 3Evaluation procedure for the RETT set
Number of phenotypical terms extracted per context and certainty
| Context / Certainty | Negated | Not negated |
|---|---|---|
| Family history | 179,938 | 522,009 |
| Patient | 5,007,517 | 12,988,474 |
| Total number of terms | 5,187,455 | 13,510,483 |
Description and evaluation of the 6 sets of patients
| Sets | RETT | DOCK8 deficiency | LOWE | SILVER RUSSELL | BARDET BIEDL | APDS 1 and 2 |
|---|---|---|---|---|---|---|
| Median age at visit (years) | 8.2 [4.8–12.6] | 11.4 [9.3–14.1] | 12.8 [5.8–20.3] | 2.4 [0.8–5.4] | 15.7 [10.1–41.5] | 12.8 [7.7–18.6] |
| Median follow up (years) | 2.6 [0–4.9] | 3.1 [0.3–9] | 6.6 [3–10.3] | 2 [0.8–4.7] | 2 [0.1–6.6] | 7.5 [4.8–8.6] |
| # Patients | 209 | 15 | 23 | 50 | 53 | 23 |
| # Documents | 5034 | 3296 | 1325 | 1133 | 1317 | 2337 |
| Phenotypes extracted, not negated and in patient context | ||||||
| # Phenotypes | 18,538 | 6886 | 5281 | 6563 | 6345 | 9716 |
| # distinct Phenotypes | 1022 | 706 | 577 | 738 | 801 | 710 |
| Evaluation by experts in the Top50 phenotypes | ||||||
| Medical Experts | NBB | CP | RS | JA | RS | NM |
| # Phenotypes ranked by Freq | 31 | 36 | 36 | 16 | 17 | 39 |
| # Phenotypes ranked by TF-IDF | 38 | 37 | 41 | 11 | 12 | 37 |
| # Phenotypes Freq union TF-IDF | 42 | 52 | 50 | 16 | 19 | 52 |
| # Phenotypes Freq intersect TF-IDF | 28 | 22 | 28 | 11 | 11 | 25 |
| Average Precision, ranked by Freq | 0.86 | 0.91 | 0.88 | 0.55 | 0.66 | 0.83 |
| Average Precision, ranked by TF-IDF | 0.91 | 0.84 | 0.90 | 0.49 | 0.52 | 0.83 |
Fig. 4Screenshot of Dr. Warehouse and the concept tab for “Rett syndrome” query
Comparison with Orphadata
| RETT | DOCK8 | LOWE | SILVER RUSSELL | BARDET BIEDL | APDS | |
|---|---|---|---|---|---|---|
| # Concepts HPO Orphadata (English) | 39 | 18 | 120 | 16 | 25 | – |
| # Concepts HPO Orphadata (French) [A] | 31 | 10 | 76 | 6 | 17 | – |
| # UMLS distinct phenotypes extracted [B] | 1022 | 706 | 577 | 738 | 801 | 710 |
| # [A] intersection [B] (coverage) | 22 | 7 | 50 | 6 | 14 | – |
| % [A] intersection [B] / [A] (coverage %) | 0.71 | 0.70 | 0.66 | 1.00 | 0.82 | – |