Literature DB >> 34256808

Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome.

Tommaso Lo Barco1,2, Mathieu Kuchenbuch1,3, Nicolas Garcelon3, Antoine Neuraz4,5,6, Rima Nabbout7,8,9.   

Abstract

BACKGROUND: The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care.
METHODS: Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions.
RESULTS: We found significative higher representation of concepts related to seizures' phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders.
CONCLUSIONS: Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.
© 2021. The Author(s).

Entities:  

Keywords:  Data mining; Dravet syndrome; Early diagnosis; Natural Language Processing; Rare Diseases

Mesh:

Year:  2021        PMID: 34256808      PMCID: PMC8278630          DOI: 10.1186/s13023-021-01936-9

Source DB:  PubMed          Journal:  Orphanet J Rare Dis        ISSN: 1750-1172            Impact factor:   4.123


Objectives

Electronic health records (EHRs) contain healthcare data of individuals and population electronically-stored in a digital format [1]. In the last decade, the use of EHRs has become part of routine care across the majority of developed countries [2]. Through data mining techniques, this growing use of EHRs is allowing the development of predictive models aimed to individuate high risk patients and support prevention initiatives [3, 4]. As well, models to support diagnosis and treatment of rare diseases are emerging [5, 6]. EHRs consist of structured and unstructured data. Structured data are produced through constrained choices (drop-down menus, check boxes and pre-filled templates as in registries), whereas unstructured clinical data exist in the form of free text narratives and are often used in clinical care for medical reports [7]. Combining Natural Language Processing (NLP) technology and UMLS (Unified Medical Language System), providers’ notes and narratives can be converted into structured, standardized formats, usable for data mining [8-10]. Dravet Syndrome (DS) is a rare disorder, with a worldwide incidence between 1/40,000 and 1/15,700 [11]. DS is a genetic developmental and epileptic encephalopathy with onset in first year of life, characterized at onset by febrile seizures and convulsive status epilepticus in otherwise healthy infants [12]. Starting by the second year, individuals present multiple seizure types (clonic, tonic–clonic, motor and non-motor onset focal seizures, myoclonic, atypical absences), that are often drug resistant, with developmental slowing leading to definite cognitive impairment [13]. Diagnosis is easier after the age of two as more pathognomonic seizure types and other symptoms are present from this age. Genetic testing shows a pathogenic variant in SCN1A in over 85% of cases reinforcing the diagnosis suspicion, but this testing might take months and is not available for all individuals with suspected DS [14]. However there is a need for early diagnosis in order to avoid worsening therapies and to establish best therapy protocol as seizure control might be partly related to cognitive improvement and a better quality of life [15]. Early diagnosis of individuals with DS is often delayed as it is difficult to differentiate at onset from Febrile Seizures (FS) [16]. These two conditions present substantial clinical differences, leading to exclude one on other diagnosis but might be overlapping at onset. Even if physician awareness of Dravet syndrome has markedly improved in last decades [17], time to diagnosis is still over 2 years [18], and it remains underdiagnosed in adult population and in developing countries [19, 20]. Using data mining, we analysed clinical reports produced before the age of 2 years for individuals with confirmed DS and FS with the aim of identifying specific terms (concepts) allowing early DS suspicion and reducing diagnosis delay. We then explored the differences between the concepts in the reports of two subgroups of individuals with DS: patients with suspected diagnosis before the age of 2 years and patients for whom diagnosis was suspected after the age of two.

Materials and methods

Data were collected from Necker Enfants Malades Hospital, a paediatric University hospital belonging to the Assistance Publique Hopitaux de Paris group (400 paediatric beds, 200 adult beds), which is a national and European reference center for rare and undiagnosed diseases, including the reference a centre for rare epilepsies. DrWarehouse® [21] (DrWH) is a document-based open-source data warehouse oriented toward narrative clinical reports from the Electronic health records (EHRs). It contains more than 4.5 million clinical free-text documents produced at Necker Hospital from 2009, for more than 465,000 individuals and more than 20 departments. DrWarehouse® uses UMLS Metathesaurus to recognize phenotype concepts inside narrative medical reports. In this manuscript, the word “concepts” will refer to phenotypes extracted automatically from hospital reports, without a priori, by using a UMLS subset of 20,000 phenotypic words or expressions. By using the appropriate research field in DrWarehouse®, we searched all individuals who presented in their medical reports the word “Dravet” or “Severe Myoclonic Epilepsy of Infancy” at least in one clinical document. We then selected from this group all individuals that had a definite diagnosis of DS based on clinical and genetic criteria, and evaluated after the age of four where the full blown syndrome can be confirmed. We finally included from this group individuals with at least one clinical report before the age of 2 years and this final selection constituted the “Dravet Syndrome Cohort” (DS Cohort). Subsequently, we searched in the data of DrWarehouse all individuals whose medical reports produced before the age of two presented the words “seizure”/“seizures” or “convulsion”/“convulsions” in proximity (max 5 words away) to “fever” or “febrile”. From this group, we excluded the individuals of the DS Cohort and individuals in which febrile seizures was a symptom of a more complex condition (infections involving the central nervous system, other encephalopathies, structural brain injury, detected genetic or metabolic pathologies, or epilepsies). The “Febrile Seizures’cohort” (FS cohort) included the individuals from this group aged over year where we confirmed the diagnosis of febrile seizures based on EHRs or by telephone interviewing of the family (FS Cohort) (Fig. 1).
Fig. 1

Flowchart of the selection procedures and constitution of the cohorts

Flowchart of the selection procedures and constitution of the cohorts The phenome-wide scan consists in comparing the distribution of phenotypes between two groups (cases and controls) and estimates the association between the phenotypes and the groups. These associations are assessed sequentially [22, 23]. We evaluated the statistical associations between the phenotypes and the cohorts DS and FS, using a series of multivariate logistic regressions adjusted on gender and age. For the analysis, we used concepts found in clinical reports with a minimum number of occurrences of three individuals, excluding negations and those associated to family members. The p-values were corrected for multiple testing using a false discovery rate (FDR) methodology. We also compared the phenotype differences in the DS Cohort between the subgroup where diagnosis of DS was confirmed or suspected before the age of 2 years, and the subgroup where DS diagnosis was not reported.

Results

“Dravet Syndrome Cohort” (DS Cohort)

The term “Dravet” and/or “severe myoclonic epilepsy of infancy” appeared in 305 individuals present in the warehouse: 194 of them had a final diagnosis of DS in the last document on the database, 51 had at least one document produced under the age of 2 years. All had a clinical and genetic diagnosis of DS. These individuals constituted the DS Cohort. DS cohort included 28 males and 23 females. The mean age at first seizure was 5.5 months (min 2–max 12). The average age of the first produced document was 1.05 years, median is 1.15 (min 0.25–max 1.98). The average length of the follow-up of these individuals was 5.68 years, median 4.98 (min 3.75–max 13.42). In order to compare early characteristics of this population with a population with FS at the same age, documents produced exclusively before 2 years were selected, for a total of 318 documents (mean: 6.24; median: 3 for each individual). 3484 concepts were extracted from the abovementioned documents (mean: 10.9 per document), 454 of which were unique concepts. Concepts present in almost 10% of the population are listed in a decreasing order in the Table 1. The most prevalent concepts were “Seizures” (found in 48 individuals – 94%), “Fever” (43 individuals – 84%), “Epilepsy” (42 individuals – 82%), “Dravet Syndrome” (37 individuals – 73%), “Convulsions” (31 individuals – 61%).
Table 1

Comparison between concepts found in more than 10% of individuals of DS Cohort (left) and FS Cohort (right)

UMLS CUI codeConceptDS cohortFS cohort
Number of  individualsFrequency (%)Number of individualsFrequency (%)
C0036572Seizures48944483
C0015967Fever43844891
C0014544Epilepsy42823566
C0751122Dravet syndrome3773
C0234972Convulsions31614075
C1419856SCN1A2855
C0009952Febrile seizures21413770
C0026827Hypotonia21411936
C2825055Recurrence21413057
C0234535Clonic1733917
C1705236Deviation1733
C3809175Prolonged seizure1631815
C2830004Drowsiness16311732
C0038220Status epilepticus1529815
C0259972Ketogenic diet1325
C0235195Sedation1325
C3887612Psychomotor agitation1224713
C0009450Infection1224917
C0027066Myoclonia1122
C1698630Virosis11221121
C0030193Pain1020611
C0013144Falling asleep10201223
C0026837Hypertonus1020815
C0029877Otitis10201325
C0522336Rolling of eyes10201834
C0010200Cough1020917
C0004134Ataxia918
C0006271Bronchiolitis9181121
C0009443Rhinitis9181426
C0085639Fall816
C0424927Education816
C0024115Pneumonia816
C0684320Regression816
C0020517Allergies714
C0751495Focal Seizures714
C0015672Fatigue714713
C0026205Myosis714
C0036973Shiverings714
C0035561Side612
C0031350Pharingitis612815
C0424230Psychomotor DELAY612
C0027441Nasopharyngitis612
C0549209Startle612
C0003123Anorexia510
C0034642Rales510
C0270844Tonic Seizures510611
C0010520Cyanosis510917
C0011991Diarrhea510
C0017160Gastroenteritis510
C0019209Hepatomegaly510
C0013384Movement disorder510
C0013604Edema510
C0232483Reflux510
C0035203Respiration510
C0037763Spasms510
C1504405Pyramidal syndrome510
C0008049Chickenpox510713
C0596002Reflex1936
C0271429Acute otitis media917
C0855324Normal pulse pressure815
C0042963Vomiting815
C0034150Purpura713
C0494475Tonic–clonic seizures611

CUI concept unique identifiers

Comparison between concepts found in more than 10% of individuals of DS Cohort (left) and FS Cohort (right) CUI concept unique identifiers

“Febrile Seizure Cohort” (FS Cohort)

The research of the words “seizure” or “convulsion” in individuals’ reports close to the words “febrile” or “fever”, limited to documents produced by the first 2 years of life and excluding individuals of DS Cohort, led to 256 subjects. After exclusion of other aetiologies, we included all 53 subjects with a diagnosis of febrile seizures. Diagnosis was confirmed after age four by reviewing child's medical history, neurological and developmental outcome in the available medical files in addition to a telephone interview with the family. This cohort was constituted of 17 females and 36 males. The mean age of the first document produced was 1.18 years, while median was 1.3 (min 0.30–max 1.96). The mean duration of follow-up was 4.20 years, median 4.02 (min 3.70–max 5.57). The mean age at first seizure was 12.4 months (min 4–max 21) with 1 individual having an onset before 6 months and 23 before 12 months. In order to compare phenotypes of FS Cohort with DS Cohort at the same age (before the age of 2 years), documents produced exclusively before 2 years were selected, for a total of 233 documents (mean 4.4; median 3 for each individual). From these, 2053 concepts have been extrapolated (mean 8.8 concepts per document), 303 of which were unique concepts. The concepts present in more than 10% of individuals are shown in Table 1. The most prevalent concepts were “Fever” (found in 48 individuals—91%), “Seizures” (44 individuals—83%), “Convulsions” (40 individuals—75%), “Febrile Seizures” (37 individuals—70%), “Epilepsy” (35 individuals—66%).

Comparison of DS and FS cohorts

DS cohort was constituted of 54% of males and 46% of females while in FS cohort, gender comparison showed significant difference with 68% of males and 32% of females (p = 0.009). The different length of follow-up at our centre among the two cohorts shows the higher medical needs for individuals with DS (mean 3.99 years, median 3.11) compared to individuals with FS (mean 1.82 years, median 1.37 years). Indeed, the follow-up at our centre often stops when the diagnosis of FS is confirmed, and children are usually referred back to their paediatrician or general practitioner. The mean number of documents per individual produced during the same period (0–2 years), was higher in the population with DS (6.2 vs 4.4), as well as the mean number of concepts extrapolated per document (10.9 vs 8.8). The phenome-wide comparison of both cohorts showed a different representation of a series of concepts (Table 2). Some of these concepts were related to seizures. Concept “Deviation” (p < 0.01), which is found within sentences describing focal seizures, point out to a significant higher occurrence of focal seizures in DS cohort compared to FS cohort. The frequency of “prolonged seizures” concept was also significantly higher in DS cohort (31% compared to 15% in FS cohort, p = 0.05. Another concept, “sedation”, which was used in the medical reports with reference to the post-ictal phase or to the need of rescue medication showed a significant difference (25% in the DS Cohort, 0% in the FS Cohort; p = 0.02). The concept “myoclonia” was not found in the FS Cohort, while was reported in 22% of individuals of DS Cohort (p = 0.02), and the concept “clonic” was reported two folds in the DS Cohort compared to the FS one (33% versus 17%, p = 0.05). The concept “febrile seizures” was significantly higher in the FS Cohort and was found in 70% of individuals compared to 41% of individuals of DS Cohort (p = 0.01). Other non seizures concepts were found only in the DS Cohort, namely “ataxia” (18%; p = 0.02), “regression” (16%; p = 0.03) and “pneumonia” (16%; p = 0.03).
Table 2

Phenome-wide comparison of DS Cohort and FS Cohort

UMLS CUI codeConcept DS individuals with the concept(%)FS individuals with the concept (%)DS individuals without the concept (%) FS individuals without the concept (%)ORp value
C0751122Dravet syndrome36 (70.6)0 (0)15 (29.4)53 (100)129.600.00
C1419856SCN1A28 (54.9)0 (0)23 (45.1)53 (100)65.740.00
C1705236Deviation17 (33.3)4 (7.5)34 (66.7)49 (92.5)6.370.00
C0259972Ketogenic diet13 (25.5)0 (0)38 (74.5)53 (100)18.470.01
C0009952Febrile seizures21 (41.2)37 (69.8)30 (58.8)16 (30.2)0.340.01
C0235195Sedation13 (25.5)4 (7.5)38 (74.5)49 (92.5)4.360.02
C0027066Myoclonia11 (21.6)3 (5.7)40 (78.4)50 (94.3)4.770.02
C0004134Ataxia9 (17.6)1 (1.9)42 (82.4)52 (98.1)11.570.02
C0024115Pneumonia8 (15.7)0 (0)43 (84.3)53 (100)10.050.03
C0684320Regression8 (15.7)0 (0)43 (84.3)53 (100)10.050.03
C0234535Clonic17 (33.3)9 (17)34 (66.7)44 (83)2.560.05
C0085639Fall8 (15.7)2 (3.8)43 (84.3)51 (96.2)4.930.05
C3809175Prolonged seizure16 (31.4)8 (15.1)35 (29.6)45 (84.9)2.870.05
C0014544Epilepsy41 (80.4)35 (66)10 (19.6)18 (34)2.340.06
C0038220Status epilepticus15 (29.4)8 (15.1)36 (70.6)45 (84.9)2.450.07
C0424230Psychomotor delay6 (11.8)0 (0)45 (88.2)53 (100)7.200.07
C0549209Startle6 (11.8)0 (0)45 (88.2)53 (100)7.200.07
C0855324Normal pulse pressure2 (3.9)8 (15.1)49 (96.1)52 (98.1)0.240.08
C0026205Myosis7 (13.7)2 (3.8)44 (86.3)51 (96.2)4.220.08
C0036572Seizures47 (92.9)44 (83)4 (7.8)9 (17)2.940.08
C0271429Acute otitis media3 (5.9)9 (17)48 (94.1)44 (83)0.320.10
C0003123Anorexia5 (9.8)0 (0)46 (90.2)53 (100)5.870.11
C0013604Edema5 (9.8)1 (1.9)46 (90.2)52 (98.1)5.870.11
C0035203Respiration5 (9.8)1 (1.9)46 (90.2)52 (98.1)5.870.11
C0037763Spasms5 (9.8)1 (1.9)46 (90.2)52 (98.1)5.870.11
C0034150Purpura2 (3.9)7 (13.2)49 (96.1)46 (86.8)0.280.12
C0522336Rolling of eyes10 (19.6)18 (34)41 (80.4)35 (66)0.500.13
C0231218Malaise1 (2)5 (9.4)50 (98)48 (90.6)0.200.15
C0427008Stiffness1 (2)5 (9.4)50 (98)48 (90.6)0.200.15
C1336751Flat1 (2)5 (9.4)50 (98)48 (90.6)0.200.15
C3887612Pyschomotor agitation12 (23.5)7 (13.2)39 (76.5)46 (86.8)2.110.15
C0042963Vomiting3 (5.9)8 (15.1)48 (94.1)45 (84.9)0.370.16
C0036973Shiverings7 (13.7)3 (5.7)44 (86.3)50 (94.3)2.760.16
C0751495Focal seizures7 (13.7)3 (5.7)44 (86.3)50 (94.3)2.760.16
C2825055Recurrence21 (41.2)30 (56.6)30 (58.8)23 (43.4)0.580.17
C0013473Eating disorders4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C0018989Hemiparesis4 (7.8)1 (1.9)47 (92.9)52 (98.1)4.600.18
C0020649Hypotension4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C0032290Aspiration Pneumonia4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C0037036Hypersalivation4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C0038450Stridor4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C0205721Nosocomial infection4 (7.8)1 (1.9)47 (92.9)52 (98.1)4.600.18
C0333641Atrophy4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C0349506Photosensitivity4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C0428167FiO24 (7.8)1 (1.9)47 (92.9)52 (98.1)4.600.18
C0865850Acute respiratory insufficiency4 (7.8)0 (0)47 (92.9)53 (100)4.600.18
C1504405Pyramidal syndrome4 (7.8)0 (0)47 (92.9)53 (100)4.600.18

CUI concept unique identifiers

Phenome-wide comparison of DS Cohort and FS Cohort CUI concept unique identifiers In addition, a series of concepts were consistently more represented in the DS Cohort than in FS Cohort, without reaching a statistical significance as “status epilepticus” (29% versus 15%; p = 0.07, OR = 2.4), “startle” (12 versus 0%; p = 0.07, OR = 7.2), “psychomotor delay” (12 versus 0%; p = 0.07, OR = 7.2), “pyramidal syndrome” (10 versus 0%; p = 0.18, OR = 4.6) “hemiparesis” (8 versus 2%; p = 0.18, OR = 4.6) and “photosensitivity” (8 versus 0%; p = 0.18, OR = 4.6).

Analysis of the DS cohort in regard to the early diagnosis

In the DS cohort, we compared the subgroup of individuals who had DS diagnosis confirmed or suspected before the age of 2 years of age (n = 36) versus the subgroup where the diagnosis of DS was not suspected (n = 15). In the first, the term (concept) Dravet syndrome was reported in the clinical reports before the age of 2 years while none of the individuals of the second group had any use of this term suggesting that DS diagnosis was not suspected before the age of 2 years. The mean age at first seizure was 5.3 months (min 2–max 12) in the subpopulation that received a diagnosis or a suspected diagnosis before age 2 and 6.1 months (min 2 – max 9) in the group without an early diagnosis (p = 0.2). Individuals who received diagnosis within 2 years showed a higher rate of concepts as “seizures” (p < 0.01), “fever” (p < 0.01), “epilepsy” (p < 0.01), “prolonged seizures” (p < 0.01), “convulsions” (p = 0.01), “myoclonia” (p = 0.02) and “ataxia” (p = 0.04) compared to the second group (Table 3).
Table 3

Comparison between concepts found in more than 10% individuals of DS Cohort who received the diagnosis/suspicion of DS before (left) and after (right) the age of 2 years

UMLS CUI codeConceptDS cohort diag < 2 yearsDS cohort diag > 2 years
Number of individualsFrequency (%)Number of individualsFrequency (%)
C0751122Dravet syndrome36100
C0036572Seizures361001179
C0015967Fever3494857
C0014544Epilepsy3392857
C0234972Convulsions2672536
C1419856SCN1A2261643
C0026827Hypotonia1747429
C2825055Recurrence1747429
C0009952Febrile seizures1542536
C3809175Prolonged seizure1542
C0234535Clonic1439321
C0038220Status epilepticus1336214
C0596002Reflex1336214
C1705236Deviation1233536
C2830004Drowsiness1233429
C0259972Ketogenic diet1131214
C0235195Sedation1131214
C1698630Virosis1131
C3887612Psychomotor Agitation1028214
C0009450Infection1028
C0027066Myoclonia1028
C0004134Ataxia925
C0030193Pain925
C0013144Falling asleep925
C0029877Otitis925
C0009443Rhinitis925
C0010200Cough925
C0424927Education822
C0522336Rolling of eyes822214
C0020517Allergies719
C0006271Bronchiolitis719214
C0015672Fatigue719
C0026837Hypertonus719321
C0026205Myosis719
C0024115Pneumonia719
C0085639Fall617
C0494475Tonic–clonic seizures617
C0751495Focal seizures617
C0031350Pharingitis617
C0424230Psychomotor delay617
C0684320Regression617214
C0549209Startle617
C0036973Shiverings617
C0003123Anorexia514
C0035561Side514
C0034642Rales514
C0013604Edema514
C0035203Respiration514
C0424230Psychomotor delay514
C0008049Chickenpox514
C0004093Asthenia411
C0270844Tonic seizures411
C0010520Cyanosis411
C0011991Diarrhea411
C0221725Bronchial obstruction411
C0017160Gastroenteritis411
C0021400Flu411
C0037036Hypersalivation411
C0042769Viral infection411
C0349506Photosensitivity411
C0032290Aspiration pneumonia411
C1272641Arterial blood pressure411
C0027441Nasopharyngitis411214
C0037763Spasms411
C0038450Stridor411
C1504405Pyramidal syndrome411
C0018916Angiome214
C0085584Encephalopathy214
C0018989Hemiparesis214
C0018991Hemiplegia214
C0019209Hepatomegaly214
C0232483Reflux214

CUI concept unique identifiers

Comparison between concepts found in more than 10% individuals of DS Cohort who received the diagnosis/suspicion of DS before (left) and after (right) the age of 2 years CUI concept unique identifiers

Discussion

This study shows that narrative medical reports produced before 2 years include several clinical concepts which are significantly associated with individuals with DS compared to FS, this latter condition representing the main differential diagnosis at the onset. These concepts are consistent with the main clinical findings constituting the criteria for differentiating DS from FS in first 2 years of life. FS are usually reported after the first year with some cases initiating before 12 months. They are usually brief and generalized [24]. In our study, concepts referred to prolonged (“status epilepticus”, “prolonged seizures”, “sedation”) and focal seizures (“deviation”) are prominent in the DS cohort, emphasizing the higher tendency of individuals with DS to present at onset long lasting and focal febrile seizures compared to individuals with FS [16, 25, 26]. Importantly, individuals with DS develop different types of seizures as myoclonic or atypical absences in addition to the first seizures mimicking FS. We observed in our DS cohort concepts referring to seizures other than febrile convulsions, including “Myoclonia” and “startle”, which is mostly used in narrative reports to depict myoclonic seizure semiology [16, 27, 28]. The concept “hemiparesis” was more frequent in the DS Cohort compared to FS one. This is consistent with the higher occurrence of transitory hemiplegia after long-lasting hemiclonic seizures, a type of seizure being quite suggestive of DS [16, 27, 29]. Some important non-seizure concepts also emerged, differentiating the two cohorts. Subjects with DS and FS show a normal neurodevelopment at the seizure onset, but then psychomotor trajectories deviate [26, 30]. In accordance, concepts related to psychomotor delay were found only in the DS Cohort (“Regression”, “Psychomotor delay”). In addition, “Ataxia” was significantly more reported DS Cohort, reflecting the peculiar gait disorder commonly observed in individuals with DS, and representing an early motor-marker of this condition [28, 31]. Interestingly, the concept “febrile seizures” was found with significant higher frequency in the FS Cohort probably because it was used for a “diagnostic” purpose in the clinical reports. The study was carried out in a tertiary epilepsy center, so it is plausible that some words have been chosen as a consequence of the clinical suspicion of Dravet Syndrome by highly experienced specialist in epileptology (e.g. “myoclonia”, “ataxia”). However, many of the medical reports were done by physicians without a specific expertise in epilepsy or DS (e.g. emergency care or intensive care physicians), emphasizing the uniformity of expressions used for reporting disease and individuals description, and suggesting that most of key-concepts may have also been found into non-specialists medical reports (e.g. “deviation”, “prolonged seizures”, “startle”). Several studies show a substantial worldwide issue of diagnostic delay of DS, with a mean age at diagnosis that is usually over 2 years, resulting in “unnecessary, costly, and, at times, invasive testing, and use of ineffective therapies, which can exacerbate seizures, increase the risk of status epilepticus, and worsen cognitive outcome” [17, 32–34]. Moreover, DS is certainly less recognized in adult population and in developing countries [19, 20]. Computer-based models using EHRs able to suggest diagnosis and to avoid misdiagnosis are gaining ground [3, 35]. These models are mostly based on structured data, as image-based or laboratory data [36, 37]. Recently, more complex models of artificial intelligence are emerging, which are able to elaborate diagnosis by extracting clinically relevant information from unstructured data in EHRs [38, 39]. On the basis of our findings, further extensive studies might focus on elaborating a specific computer algorithm which combines significative concepts and their age of appearance within narrative specialists and non-specialists reports, in order to automatically produce an alert signal suggesting possible diagnosis of DS. Some results of our analysis set out some additional insights. For example, the major incidence of concept “pneumonia” in DS Cohort compared to FS Cohort appears to be relevant, since it can represent both a facilitator of the seizure onset or  a complication of an inhalation during a long lasting convulsive seizure or a status epilepticus [40]. In addition, a number of concepts related to peri-ictal nosocomial and respiratory complications were found with higher frequency in reports of individuals with DS (“nosocomial infections”, “acute respiratory insufficiency”, “aspiration pneumonia”, “FiO2”, “stridor”) underlying that convulsive status epilepticus might be a life-threatening condition in this population [40, 41]. Furthermore, in this study the concept “Dravet Syndrome” was found in 72% of individuals of DS Cohort before the age of 2 years. This is concordant with the literature showing the early recognition of DS in France [34]. Some clinical concepts were found with higher frequency in the reports of individuals who received the diagnosis/suspicion of DS before the age of 2 years: the “long-lasting seizure” concepts (“Status epilepticus”, “Prolonged seizures”, “Sedation”), the “myoclonic” concepts (“Myoclonia”, “Startles”), the “drug resistance” concepts (“Ketogenic diet”), as well as “Ataxia”, and “Photosensitivity”. Although statistical significance was not reached for all these concepts as sample was small, these findings may support that these clinical concepts are the most DS diagnosis orienting. We can hypothesise that individuals belonging to the sub-group who did not receive a diagnosis within 2 years presented a less “typical” phenotype. The diagnosis was made later than 2 years of age when the full blown syndrome is often complete with pharmacoresistant seizures and developmental plateauing. However, in this subgroup without early diagnosis with individuals presenting “intermediate” features between only FS and the “complete” DS clinical picture, the median age at first seizure was significantly lower than in FS cohort (6.1 months vs 12.4 months). This finding confirms that age at first seizure might be the strongest predictor of DS in infants who experience febrile seizures [25].

Study limitations

Word sense disambiguation poses a challenge in extracting meaningful data from unstructured text. Clinical notes often contain terms or phrases that have more than one meaning [8], or that need for a contextualisation to understand the real clinical meaning. For example, concept “deviation” apparently do not link to a specific clinical feature, but in the narrative reports of individuals of both cohorts it was mostly used within the description of the seizure semeiology, thus referring to a focal seizure. The presence of a clinical concept in a medical report does not necessary implies that the individual presents this clinical feature. For instance, the concept “spasms” that we found in five individuals of the DS Cohort, was used within the clinical description of paroxysmal motor events that could suggest epileptic spasms, but was not confirmed in any of them. Similarly, concept “Dravet Syndrome” could be found in reports of subjects who received the diagnosis, or in which a suspicion was made (i.e.: “We see today patient X for the suspicion of Dravet Syndrome”). The method used by Dr Warehouse automatically classifies concepts according to polarity (negation/affirmation) and the experiencer (patient/family). But there may still be errors in the classification. In addition, the classification does not take into account the notion of hypothesis. In this study, the FS population presents some “atypical” features; for instance, the frequency of the concept “status epilepticus” in these subjects is higher than expected in terms of incidence in individuals with febrile seizures [42, 43]. This might be due to a preferential referral to university hospital of individuals with febrile long lasting seizures or febrile status epilepticus, as they might need further admission to ICU.

Conclusion

Narrative medical reports of individuals younger than 2 years with febrile seizures, contain different words depending if they have or will develop clinical phenotype of DS, or not. The elaboration of algorithm exploiting NLP on the basis of our work, could be useful to early individualize these individuals, in order to establish early diagnosis and adequate therapy that in some instances need to address them to expert epilepsy centres. This methodology would represent an innovative, “cheap”, transposable and sustainable methodology to reduce time of diagnosis for individuals with Dravet Syndrome and other rare conditions. Some “key early symptoms” often identified by the patients/care givers and the non-expert physicians are merely linked to a given known disease causing diagnosis delay. Using these symptoms and signs as alerts and warning signs can help to address patients earlier to expert centres for a definite diagnosis. The future step is to validate the impact of the implementing of these “warnings” in the electronic health records on shortening the patient’s odyssey to diagnosis and therapies.
  42 in total

1.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

Review 2.  Mortality in Dravet syndrome: A review.

Authors:  Sharon Shmuely; Sanjay M Sisodiya; W Boudewijn Gunning; Josemir W Sander; Roland D Thijs
Journal:  Epilepsy Behav       Date:  2016-10-11       Impact factor: 2.937

3.  Technical report: treatment of the child with simple febrile seizures.

Authors:  R J Baumann
Journal:  Pediatrics       Date:  1999-06       Impact factor: 7.124

4.  Perception of impact of Dravet syndrome on children and caregivers in multiple countries: looking beyond seizures.

Authors:  Rima Nabbout; Stephane Auvin; Catherine Chiron; Elizabeth Thiele; Helen Cross; Ingrid E Scheffer; Amy L Schneider; Renzo Guerrini; Nicola Williamson
Journal:  Dev Med Child Neurol       Date:  2019-03-04       Impact factor: 5.449

5.  A screening test for the prediction of Dravet syndrome before one year of age.

Authors:  Junri Hattori; Mamoru Ouchida; Junko Ono; Susumu Miyake; Satoshi Maniwa; Nobuyoshi Mimaki; Yoko Ohtsuka; Iori Ohmori
Journal:  Epilepsia       Date:  2007-12-11       Impact factor: 5.864

Review 6.  Dravet Syndrome: Diagnosis and Long-Term Course.

Authors:  Mary B Connolly
Journal:  Can J Neurol Sci       Date:  2016-06       Impact factor: 2.104

7.  Stiripentol in Dravet syndrome: results of a retrospective U.S. study.

Authors:  Elaine C Wirrell; Linda Laux; David N Franz; Joseph Sullivan; Russell P Saneto; Richard P Morse; Orrin Devinsky; Harry Chugani; Angel Hernandez; Lorie Hamiwka; Mohamad A Mikati; Ignacio Valencia; Marie-Emmanuelle Le Guern; Laurent Chancharme; Marcio Sotero de Menezes
Journal:  Epilepsia       Date:  2013-07-12       Impact factor: 5.864

Review 8.  Axes of a revolution: challenges and promises of big data in healthcare.

Authors:  Smadar Shilo; Hagai Rossman; Eran Segal
Journal:  Nat Med       Date:  2020-01-13       Impact factor: 53.440

9.  A Global Overview of Precision Medicine in Type 2 Diabetes.

Authors:  Hugo Fitipaldi; Mark I McCarthy; Jose C Florez; Paul W Franks
Journal:  Diabetes       Date:  2018-10       Impact factor: 9.461

10.  Encephalopathy in children with Dravet syndrome is not a pure consequence of epilepsy.

Authors:  Rima Nabbout; Nicole Chemaly; Mathilde Chipaux; Giulia Barcia; Charles Bouis; Celia Dubouch; Dorothee Leunen; Isabelle Jambaqué; Olivier Dulac; Georges Dellatolas; Catherine Chiron
Journal:  Orphanet J Rare Dis       Date:  2013-11-13       Impact factor: 4.123

View more
  3 in total

1.  Patient-Patient Similarity-Based Screening of a Clinical Data Warehouse to Support Ciliopathy Diagnosis.

Authors:  Xiaoyi Chen; Carole Faviez; Marc Vincent; Luis Briseño-Roa; Hassan Faour; Jean-Philippe Annereau; Stanislas Lyonnet; Mohamad Zaidan; Sophie Saunier; Nicolas Garcelon; Anita Burgun
Journal:  Front Pharmacol       Date:  2022-03-25       Impact factor: 5.810

2.  Consideration of oral health in rare disease expertise centres: a retrospective study on 39 rare diseases using text mining extraction method.

Authors:  Lisa Friedlander; Marc Vincent; Ariane Berdal; Valérie Cormier-Daire; Stanislas Lyonnet; Nicolas Garcelon
Journal:  Orphanet J Rare Dis       Date:  2022-08-20       Impact factor: 4.303

3.  Natural language processing in clinical neuroscience and psychiatry: A review.

Authors:  Claudio Crema; Giuseppe Attardi; Daniele Sartiano; Alberto Redolfi
Journal:  Front Psychiatry       Date:  2022-09-14       Impact factor: 5.435

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.