| Literature DB >> 26380302 |
Hong-Jie Dai1, Shabbir Syed-Abdul2, Chih-Wei Chen2, Chieh-Chen Wu2.
Abstract
Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.Entities:
Mesh:
Year: 2015 PMID: 26380302 PMCID: PMC4563061 DOI: 10.1155/2015/873012
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
A sample of discharge summary.
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
|
|
|
| |
|
|
|
| Atenolol 25/50 mg qAM/qPM |
|
| ASA 325 mg qD |
|
|
|
|
|
| |
|
|
|
| +HTN: mother/brother |
|
|
|
|
|
| |
|
|
|
| VS: |
|
| Gen: Well-nourished male, NAD |
|
| HEENT: MMM, OP clear |
|
| Neck: JVP about 9 cm. |
|
|
| |
|
|
|
| Sodium 140 135–145 mmol/L 07/02/88 11:21 147(H) 10/08/82 13:24 |
|
| Potassium 4.1 3.4–4.8 mmol/L 07/02/88 11:21 |
|
| EKG: Sinus brady @ 60, w/LAD, ICVD (QRS 108), NS St/T wave changes. |
|
| CXR: Pending |
|
|
| |
|
|
|
|
|
|
| (1) Vertigo: Clinically peripheral disease. If central, would not expect to be affected by motion, be able to be eextinguished, and so forth. |
|
| Fall precautions |
|
| R/o cardiac ischemia: Troponins, monitor, and so forth |
|
| Betty Kaitlin Wood, MD |
|
Figure 1An annotated document sample on brat.
Statistics of the section heading recognition corpus. Since the corpus only contained the topmost sections, several different concepts or representations may be included in each section heading category. For instance, “Personal Histories” included the occupation, daily activity amount, substance history, and allergies.
| Section | Description | Number | Percentage |
|---|---|---|---|
| Chief Complaints | A statement describing the symptoms, problems, diagnoses, or other factors that are the reason of a medical encounter. | 803 | 5.7% |
| Present Illness | Separated paragraphs summarizing chief complaints related history. | 843 | 6.0% |
| Personal Histories | A merged concept of individual related histories, including past medical history, past surgical history, social history, and allergy. | 2701 | 19% |
| Family Histories | The health status of parents, children, siblings, and spouse, whether dead or alive. | 486 | 3.4% |
| Physical Examinations | The process by which a medical professional investigates the body of a patient for signs of disease. | 1104 | 7.9% |
| Laboratory Examinations | Biochemical studies performed in clinical laboratory. | 401 | 2.8% |
| Radiology Reports | Image studies. Some examples are X-ray, CT, MRI, and PET. | 87 | <1.0% |
| Data | A merged concept including laboratory examinations and radiology reports. | 103 | <1.0% |
| Impression | Medical diagnoses judged by doctors, also called assessments. | 884 | 6.3% |
| Recommendations | Treatments toward impressions, also called plans. | 468 | 3.3% |
| Others | Other section headings not included in the categories above, for example, patient ID, doctor ID, and hospital ID. | 6081 | 43.6% |
|
| |||
| Total | 13,962 | 100% | |
Orthographic features.
| Feature name | Regular expression |
|---|---|
| ALLCAPS |
∧[ |
| CAPSMIX |
∧[ |
| INITCAP |
∧[ |
| PUNCTUATION | ∧[∖.:]$ |
Occurrence information.
| Description | Feature value |
|---|---|
| The token was not matched. | 000 |
| The token only appeared in the first token among all section headings. | 001 |
| The token only appeared in the middle token among all section headings. | 010 |
| The token only appeared in the last token among all section headings. | 100 |
| The token appeared in both the first and middle tokens among all section headings. | 011 |
| The token appeared in both the middle and last tokens among all section headings. | 110 |
| The token appeared in both the first and last tokens among all section headings. | 101 |
| The token appeared in all places among all section headings. | 111 |
Performance comparison among different methods.
| Dataset | Configuration |
|
|
|
|---|---|---|---|---|
| Set 2 | Dict. method 1 (SecTag) | 19.9 | 79.31 | 31.82 |
| Dict. method 1 (set 1) | 52.18 | 94.04 | 67.12 | |
| Dict. method 1 (SecTag + set 1) | 23.19 |
| 33.47 | |
| Dict. method 2 (SecTag) | 41.19 | 79.31 | 54.22 | |
| Dict. method 2 (set 1) |
| 94.04 |
| |
| Dict. method 2 (SecTag + set 1) | 45.33 |
| 61.37 | |
| Sentence-based formulation (ME) | 81.54 | 82.16 | 81.85 | |
| Token-based formulation (CRF) |
| 92.66 |
| |
|
| ||||
| Test | Dict. method 1 (SecTag) | 21.15 | 80.23 | 33.47 |
| Dict. method 1 (set 1 + set 2) | 54.13 | 94.87 | 68.93 | |
| Dict. method 1 (SecTag + set 1 + set 2) | 24.38 |
| 38.84 | |
| Dict. method 2 (SecTag) | 41.72 | 80.23 | 54.89 | |
| Dict. method 2 (set 1 + set 2) |
| 94.84 |
| |
| Dict. method 2 (SecTag + set 1 + set 2) | 45.59 |
| 61.71 | |
| Sentence-based formulation (ME) | 85.46 | 85.54 | 85.5 | |
| Token-based formulation (CRF) |
| 92.4 |
| |
Performance comparison for the layout features.
| Dataset | Configuration |
|
|
|
|---|---|---|---|---|
| Set 2 | CRF-based without layout features | 94.8 | 90.72 | 92.72 |
| CRF-based with layout features |
|
|
| |
|
| ||||
| Test | CRF-based without layout features | 95.13 | 90.5 | 92.76 |
| CRF-based with layout features |
|
|
| |
Performance for EHR data without layout information.
| Dataset |
|
|
|
|---|---|---|---|
| Set 2 | 97.2 | 84.88 | 90.62 |
|
| |||
| Test | 97.59 | 84.81 | 90.75 |