| Literature DB >> 28789686 |
Jia Su1, Bin He1, Yi Guan2, Jingchi Jiang1, Jinfeng Yang3.
Abstract
BACKGROUND: Cardiovascular disease (CVD) has become the leading cause of death in China, and most of the cases can be prevented by controlling risk factors. The goal of this study was to build a corpus of CVD risk factor annotations based on Chinese electronic medical records (CEMRs). This corpus is intended to be used to develop a risk factor information extraction system that, in turn, can be applied as a foundation for the further study of the progress of risk factors and CVD.Entities:
Keywords: Annotation; Cardiovascular disease risk factors; Chinese electronic medical records; Corpus construction; Information extraction
Mesh:
Year: 2017 PMID: 28789686 PMCID: PMC5549299 DOI: 10.1186/s12911-017-0512-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1a A sample progress note after preprocessing (original). b A sample progress note after preprocessing (English version)
CVD risk factors and their indicators
| CVD risk factors | Indicators |
|---|---|
| Overweight/Obesity (O2) | • Mention: A diagnosis of patient overweight or obesity, e.g., obesity (“肥胖”) |
| Hypertension | • Mention: A diagnosis or history of hypertension, e.g., a history of hypertension for 20 years (“高血压病史20年”) |
| Diabetes | • Mention: A diagnosis or a history of diabetes, e.g., diabetes (“糖尿病”) |
| Dyslipidemia | • Mention: A diagnosis of dyslipidemia, hyperlipidemia or a history of hyperlipidemia, e.g., a history of hyperlipidemia (“高血脂史”) |
| Chronic Kidney Disease (CKD) | • Mention: A diagnosis of CKD, e.g., chronic nephritis (“慢性肾炎”) |
| Atherosis | • Mention: A diagnosis of atherosclerosis or atherosclerotic plaque, e.g., atherosclerotic plaque (“冠脉粥样斑块”) |
| Obstructive Sleep Apnea Syndrome (OSAS) | • Mention: A diagnosis of OSAS, e.g., OSAS (“阻塞型睡眠呼吸暂停综合症”) |
| Smoking | • Mention: Smoking or a patient history of smoking, e.g., smoking over 40 years (“吸烟40余年”) |
| Alcohol Abuse (A2) | • Mention: Alcohol abuse, e.g., a long history of heavy drinking (“长期大量饮酒史”) |
| Family History of CVD (FHCVD) | • Mention: Patient has a family history of CVD or has a first-degree relative (parents, siblings, or children) who has a history of CVD, e.g., the patient’s brother has a history of CVD (“哥哥有冠心病病史”) |
| Age | • Mention: The age of the patient, e.g., 66 years old (“66岁”) |
| Gender | • Mention: The gender of patient, e.g., female (“女性”) |
Fig. 2The flowchart for CVD risk factor annotation method
Fig. 3a A sample annotation for CVD risk factors (original). b A sample annotation for CVD risk factors (English version)
IAA values achieved during the iterative training process
| Iteration 1 | Iteration 2 | Iteration 3 | Iteration 4 | Iteration 5 | |
|---|---|---|---|---|---|
| Precision | 0.810 | 0.977 | 0.967 | 0.986 | 0.988 |
| Recall | 0.815 | 0.977 | 0.962 | 0.986 | 0.988 |
| F1-measure | 0.812 | 0.977 | 0.964 | 0.986 | 0.988 |
Distribution of CVD risk factors, indicators, their occurrence times, and assertions
| Risk factors | Indicators | Before DHS | During DHS | After DHS | Continuing | Total | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | A | Pb | N | Total | P | A | Pb | N | Total | P | A | Pb | N | Total | P | A | Pb | N | Total | |||
| O2 | Mention | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 0 | 0 | 0 | 18 | 18 |
| Hypertension | Mention | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 526 | 401 | 471 | 0 | 1398 | 1400 |
| High Bp | 304 | 0 | 0 | 0 | 304 | 1647 | 0 | 0 | 0 | 1647 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 0 | 4 | 1955 | |
| Regulate Bp | 56 | 0 | 0 | 0 | 56 | 244 | 0 | 0 | 0 | 244 | 10 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 310 | |
| Blood pressure drug | 43 | 1 | 0 | 0 | 44 | 17 | 0 | 0 | 0 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 3 | 64 | |
| Diabetes | Mention | 2 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 172 | 558 | 138 | 3 | 871 | 874 |
| High blood glucose | 19 | 0 | 0 | 0 | 19 | 20 | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 4 | 43 | |
| Regulate blood glucose | 8 | 0 | 0 | 0 | 8 | 28 | 0 | 0 | 0 | 28 | 6 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 42 | |
| Hypoglycemic drug | 31 | 0 | 0 | 0 | 31 | 8 | 0 | 0 | 0 | 8 | 7 | 0 | 0 | 0 | 7 | 2 | 0 | 0 | 0 | 2 | 48 | |
| Dyslipidemia | Mention | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 45 | 4 | 24 | 0 | 73 | 73 |
| High blood lipids | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 6 | |
| Regulate blood lipids | 2 | 0 | 0 | 0 | 2 | 249 | 0 | 0 | 0 | 249 | 5 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 256 | |
| Lip-lowering drug | 2 | 0 | 0 | 0 | 2 | 34 | 0 | 0 | 0 | 34 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 37 | |
| CKD | Mention | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 17 | 0 | 26 | 26 |
| Atherosis | Mention | 3 | 0 | 0 | 0 | 3 | 4 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 136 | 0 | 1 | 0 | 137 | 144 |
| OSAS | Mention | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 |
| Smoking | Mention | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 231 | 149 | 0 | 0 | 380 | 380 |
| Smoking cessation | 5 | 2 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 8 | |
| Smoking amount | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 119 | 0 | 0 | 0 | 119 | 120 | |
| A2 | Mention | 9 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 52 | 15 | 0 | 0 | 67 | 76 |
| Drinking amount | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 19 | 0 | 0 | 0 | 19 | 19 | |
| FHCVD | Mention | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | 10 | 10 |
| Age | Mention | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1233 |
| Age group | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 626 | |
| Gender | Mention | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 1909 |
DHS duration of hospital stay, P Present, A Absent, Pb Possible, N Not associated with the patient
“-” denotes not considered