| Literature DB >> 30961596 |
Yan Gao1, Lei Gu1, Yefeng Wang2, Yandong Wang1, Feng Yang3.
Abstract
BACKGROUND: Electronic Medical Records(EMRs) contain much medical information about patients. Medical named entity extracting from EMRs can provide value information to support doctors' decision making. The research on information extraction of Chinese Electronic Medical Records is still behind that has done in English.Entities:
Keywords: Annotation scheme; Deep neural network; Named entity extraction; Resident admit notes
Mesh:
Year: 2019 PMID: 30961596 PMCID: PMC6454673 DOI: 10.1186/s12911-019-0759-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1An example of admission record with annotated entities
Fig. 2The relationship annotation of example1
Fig. 3The relationship annotation of example2
Fig. 4The relationship annotation of example3
Fig. 5Architecture of the neural network
Fig. 6The illustration of the annotation process
Entity annotation consistency statistic table on 5 RANs in the first round
| Annotate A | Annotate B | Consistent number | P/R/F(%) | |
|---|---|---|---|---|
| Medical discovery | 497 | 514 | 385 | 74.90/77.46/76.15 |
| Temporal word | 14 | 13 | 10 | 76.92/71.43/74.03 |
| Inspection | 102 | 110 | 70 | 63.63/68.63/66.04 |
| Laboratory test | 17 | 20 | 10 | 50.00/58.82/54.08 |
| Treatment | 29 | 26 | 22 | 84.62/75.86/80.00 |
| Disease | 85 | 93 | 77 | 82.80/90.59/86.52 |
| Medication | 3 | 2 | 2 | 100/66.67/80.00 |
| Body part | 288 | 276 | 238 | 85.51/81.94/83.69 |
| Total(average) | 1035 | 1053 | 812 | 77.11/78.54/77.82 |
Entity annotation consistency statistic table on20 RANs in the second round
| Annotate A | Annotate B | Consistent number | P/R/F(%) | |
|---|---|---|---|---|
| Medical discovery | 2208 | 2175 | 1935 | 88.97/87.18/88.07 |
| Temporal word | 65 | 63 | 56 | 88.89/86.15/87.50 |
| Inspection | 510 | 465 | 426 | 91.61/83.53/87.38 |
| Laboratory test | 93 | 103 | 82 | 79.61/88.17/83.61 |
| Treatment | 97 | 102 | 94 | 92.17/96.91/94.48 |
| Disease | 384 | 382 | 342 | 89.53/93.97/97.70 |
| Medication | 11 | 11 | 11 | 100.00/100.00/100.00 |
| Body part | 982 | 989 | 942 | 92.25/95.93/95.59 |
| Total(average) | 4326 | 4200 | 3888 | 90.63/89.88/90.25 |
Entity annotation consistency statistic table on50 RANs in the third round
| Stage 3 (50DOCs) | Annotate A | Annotate B | Consistent number | P/R/F(%) |
|---|---|---|---|---|
| Medical discovery | 5478 | 5488 | 5465 | 97.76/97.94/97.85 |
| Temporal word | 192 | 186 | 187 | 90.86/88.02/89.42 |
| Inspection | 1255 | 1247 | 1250 | 98.16/97.53/97.84 |
| Laboratory test | 308 | 315 | 310 | 87.94/90.58/89.24 |
| Treatment | 286 | 294 | 294 | 92.86/95.45/94.14 |
| Disease | 1061 | 1058 | 1065 | 99.05/98.77/98.91 |
| Medication | 35 | 34 | 34 | 97.06/94.29/95.65 |
| Body part | 2486 | 2472 | 2458 | 99.43/99.79/99.11 |
| Total(average) | 11,101 | 11,092 | 11,064 | 97.78/97.69/97.73 |
Statistics of the data used in our experiment
| Sentences | Words | Features | Entities | |
|---|---|---|---|---|
| Total | 13,926 | 259,074 | 420,903 | 66,943 |
| Average number of texts | 54.61 | 1015.98 | 1650.6 | 262.52 |
Distribution of entity types in the corpus
| Count | Percentage | |
|---|---|---|
| Medical discovery | 29,247 | 43.96% |
| Temporal word | 1631 | 2.44% |
| Inspection | 6915 | 10.33% |
| Laboratory test | 2127 | 3.18% |
| Treatment | 2601 | 3.88% |
| Measurement | 2839 | 4.24% |
| Disease | 5286 | 7.90% |
| Medication | 1344 | 2.01% |
| Body part | 14,953 | 22.34% |
| Total | 66,943 | 100% |
Detailed named entity recognition performance
| Precision | Recall | F | |
|---|---|---|---|
| Medical discovery | 96.36 | 93.33 | 93.35 |
| Temporal word | 82.54 | 82.54 | 82.54 |
| Inspection | 90.97 | 91.66 | 91.31 |
| Laboratory test | 78.74 | 83.59 | 81.09 |
| Treatment | 89.29 | 82.51 | 82.4 |
| Measurement | 89.12 | 92.15 | 90.61 |
| Disease | 83.76 | 86.09 | 84.91 |
| Medication | 72.05 | 70.94 | 71.49 |
| Body part | 94.26 | 94.68 | 94.47 |
| Total | 90.76 | 91.4 | 91.08 |
Performance of removing different techniques
| Precision | Recall | F | |
|---|---|---|---|
| Best | 90.76 | 91.4 | 91.08 |
| No Dropout | 89.85 | 290.12 | 89.98 |
| No Attention | 90.37 | 91.15 | 90.76 |
| No Char Embeddings | 84.61 | 84.81 | 84.71 |
| LSTM for Char Embeddings | 90.64 | 91.38 | 91.01 |