| Literature DB >> 27570679 |
Yan Wang1, Elizabeth S Chen2, Ilo Leppik3, Serguei Pakhomov4, Indra Neil Sarkar2, Genevieve B Melton5.
Abstract
Epilepsy is a prevalent chronic neurological disorder afflicting about 50 million people worldwide. There is evidence of a strong relationship between familial risk factors and epilepsy, as well as associations with substance use. The goal of this study was to explore the interactions between familial risk factors and substance use based on structured data from the family and social history modules of an electronic health record system for adult epilepsy patients. A total of 8,957patients with 38,802 family history entries and 8,822 substance use entries were gathered and mined for associations at different levels of granularity for three age groupings (>18, 18-64, and ≥65 years old). Our results demonstrate the value of an association rule mining approach to validate knowledge of familial risk factors. The preliminary findings also suggest that substance use does not demonstrate significant association between social and familial risk factors for epilepsy.Entities:
Year: 2016 PMID: 27570679 PMCID: PMC5001738
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.Overview of the association rule mining process.
Examples of family history and substance use status entries.
| Examples of family history entries | |||||
|---|---|---|---|---|---|
| Problem | Relation | Side of Family | Degree | ||
| Hypertension | Mother |
|
| ||
| Diabetes | Neg Hx | - | - | ||
| Heart Disease | Brother | - | First | ||
| Examples of substance use status entries | |||||
| Tobacco Use | Alcohol Use | Drug Use | |||
| Quit | Yes | - | |||
| Passive | - | - | |||
“-” indicates not applicable or cannot be inferred from structured relation field
Distribution of entries, age, and sex for the full dataset, as well as positive and negative family history subsets for major age groupings.
| Epilepsy patients (≥18 years) | ||||||||
|---|---|---|---|---|---|---|---|---|
| Dataset | Total | # Entries/Patient | Age | Sex | ||||
| # Entries | # Patients | Range | Mean±SD | Range | Mean±SD | Female | Male | |
|
| 38,802 | 5868 | 1 - 67 | 6.1 ± 6.1 | 18 - 100 | 53 ± 9.1 | 3217 | 2651 |
|
| 32,997 | 5868 | 0 - 62 | 4.8 ± 4.1 | 18 - 100 | 51 ± 9.0 | 2970 | 2898 |
|
| 5805 | 1406 | 0 - 37 | 3.2 ± 3.0 | 18 - 100 | 56 ± 8.6 | 812 | 594 |
|
| ||||||||
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
| |
|
| 30,453 | 4550 | 0 - 61 | 5.8 ± 4.3 | 18 - 64 | 56 ± 7.6 | 2373 | 2177 |
|
| 25,946 | 4337 | 0 - 53 | 5.1 ± 4.1 | 18 - 64 | 51 ± 5.2 | 1326 | 3011 |
|
| 4507 | 1120 | 0 - 37 | 3.1 ± 2.8 | 18 - 64 | 52 ± 6.1 | 472 | 648 |
|
| ||||||||
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
| |
|
| 8349 | 1366 | 0 - 66 | 5.1 ± 3.7 | 65 - 100 | 76 ± 7.6 | 671 | 695 |
|
| 7051 | 1304 | 0 - 66 | 4.4 ± 4.1 | 65 - 100 | 69 ± 5.2 | 596 | 804 |
|
| 1298 | 293 | 0 - 37 | 3.3 ± 2.9 | 65 - 94 | 66 ± 6.1 | 135 | 158 |
Distribution of entries, age, and sex for substance use status subsets for each age grouping.
| Dataset | Total | Age | Sex | |||
|---|---|---|---|---|---|---|
| # Entries | # Patients | Range | Mean±SD | Female | Male | |
|
| 8822 | 8957 | 18 - 100 | 50.2 ± 18.2 | 4915 | 4042 |
|
| 6936 | 7408 | 18 - 64 | 41.4 ± 5.3 | 3857 | 4191 |
|
| 1886 | 1909 | 65 - 100 | 87.7 ± 6.7 | 1058 | 851 |
Ranking of family history problems and relations by prevalence ([n] indicates ranking by frequency).
| Family History of Problem | Family History of Problem | Negative Family History of Problem |
|---|---|---|
|
Malignant Neoplasms [ Diabetes [ Hypertensive disease [ Heart Diseases [ Cerebrovascular accident [ Coronary Artery Disease [ Malignant neoplasm of breast [ Nervous system disorder [ Arthritis [ Lipids [ |
Hypertensive disease mother [ Hypertensive disease father [ Diabetes mother [ Heart Disease father [ Malignant Neoplasms mother [ Diabetes father [ Malignant Neoplasms father [ Coronary Artery Disease father [ Heart Diseases mother [ Arthritis_mother [ |
Colorectal Cance [ Diabetes [ Cerebrovascular accident [ Malignant neoplasm of breast [ Coronary Artery Disease [ Malignant neoplasm of prostate [ Hypertensive disease [ Thyroid Diseases [ Malignant Neoplasm [ Age related macular degeneration [ |
Number of rules generated with combinations of minimum support and confidence for pos_problem subset of all adult patients. Highlighted cells are rules generated with “low” (minimum support of 0.01 and confidence of 0.2) and “intermediate” (minimum support of 0.04 and confidence of 0.4).
| Support | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Confidence | 0 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 | 0.1 | |
| 0 | 9506 | 590 | 308 | 170 | 108 | 76 | 50 | 32 | 28 | 24 | 22 | |
| 0.1 | 1684 | 490 | 271 | 155 | 108 | 76 | 50 | 32 | 28 | 24 | 22 | |
| 0.2 | 980 | 283 | 177 | 106 | 72 | 53 | 40 | 30 | 28 | 24 | 22 | |
| 0.3 | 622 | 168 | 123 | 87 | 59 | 45 | 33 | 23 | 21 | 19 | 18 | |
| 0.4 | 440 | 109 | 86 | 63 | 44 | 37 | 25 | 17 | 15 | 14 | 13 | |
| 0.5 | 327 | 43 | 32 | 24 | 16 | 14 | 8 | 4 | 4 | 3 | 3 | |
| 0.6 | 134 | 6 | 4 | 4 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | |
| 0.7 | 86 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | |
| 0.8 | 78 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.9 | 77 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 1 | 77 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Number of rules generated with combinations of minimum support and confidence for pos_problem subset of senior adult patients. Highlighted cells are rules generated with “low” (minimum support of 0.01 and confidence of 0.2) and “intermediate” (minimum support of 0.04 and confidence of 0.4).
| Support | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Confidence | 0 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 | 0.1 | |
| 0 | 6480 | 476 | 236 | 152 | 76 | 58 | 36 | 28 | 28 | 26 | 22 | |
| 0.1 | 1178 | 377 | 191 | 136 | 75 | 58 | 36 | 28 | 28 | 26 | 22 | |
| 0.2 | 747 | 233 | 135 | 92 | 54 | 45 | 33 | 28 | 28 | 26 | 22 | |
| 0.3 | 550 | 167 | 114 | 82 | 47 | 38 | 27 | 24 | 24 | 23 | 20 | |
| 0.4 | 374 | 107 | 75 | 50 | 28 | 22 | 15 | 12 | 12 | 12 | 12 | |
| 0.5 | 290 | 43 | 32 | 17 | 7 | 3 | 2 | 0 | 0 | 0 | 0 | |
| 0.6 | 158 | 14 | 12 | 5 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.7 | 116 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.8 | 107 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.9 | 106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 1 | 106 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Figure 2.Numbers of association rules with different support value at confidence of 0.2 and 0.4 for all age groupings.
Figure 3.Graph-based visualization of rules for pos_problem with minimum support of 0.04 and confidence of 0.4 [42 rules] for 18-64 year old adult patients (a) and rules for pos_problem with minimum support of 0.04 and confidence of 0.3 [43 rules] for senior patients (b).
Figure 4.Grouped matrix-based visualization of rules for pos_problem_relation with minimum support of 0.03 and confidence of 0.2 [11 rules] for 18-64 years adult patients (a) and rules for pos_problem_relation with minimum support of 0.03 and confidence of 0.2 [18 rules] for senior patients (b).
Figure 5.Grouped matrix-based visualization of rules for pos_problem_degree with minimum support of 0.04 and confidence of 0.4 [13 rules] for 18-64 year old adult patients (a) and rules for pos_problem_degree with minimum support of 0.04 and confidence of 0.3 [29 rules] for senior patients (b).
Figure 6.Graph-based visualization of rules for pos_problem_substance_status with minimum support of 0.04 and confidence of 0.4 [29 rules] for 18-64 years adult patients (a) and rules for pos_problem_substance_status with minimum support of 0.04 and confidence of 0.3 [50 rules] for senior patients (b).