| Literature DB >> 23920639 |
Yonghui Wu1, Jianbo Lei, Wei-Qi Wei, Buzhou Tang, Joshua C Denny, S Trent Rosenbloom, Randolph A Miller, Dario A Giuse, Kai Zheng, Hua Xu.
Abstract
Worldwide adoption of Electronic Medical Records (EMRs) databases in health care have generated an unprecedented amount of clinical data available electronically. There has been an increasing trend in US and western institutions towards collaborating with China on medical research using EMR data. However, few studies have investigated characteristics of EMR data in China and their differences with the data in US hospitals. As an initial step towards differentiating EMR data in Chinese and US systems, this study attempts to understand system and cultural differences that may exist between Chinese and English clinical documents. We collected inpatient discharge summaries from one Chinese and from three US institutions and manually analyzed three major clinical components in text: medical problems, tests, and treatments. We reported comparison results at the document level and section level and discussed potential reasons for observed differences. Documenting and understanding differences in clinical reports from the US and China EMRs are important for cross-country collaborations. Our study also provided valuable insights for developing natural language processing tools for Chinese clinical text.Entities:
Mesh:
Year: 2013 PMID: 23920639 PMCID: PMC4957806
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Figure 1Zipf’s distribution of vocabularies
Figure 2Normalized distribution of annotated entities
Distribution of different types of entities
| Corpus | # of Doc | Type | # of Entity | Average # of entity per note | Relative Frequency |
|---|---|---|---|---|---|
| UPMC (English) | 220 | Prob | 5805 | 26.39 | 43.76% |
| Test | 2762 | 12.55 | 20.82% | ||
| Treat | 4700 | 21.36 | 35.43% | ||
| All | 13267 | -- | |||
| PARTNERS (English) | 235 | Prob | 8542 | 36.35 | 44.69% |
| Test | 4884 | 20.78 | 25.55% | ||
| Treat | 5686 | 24.20 | 29.75% | ||
| All | 19112 | -- | |||
| BETH (English) | 191 | Prob | 11122 | 58.23 | 38.93% |
| Test | 8947 | 46.84 | 31.32% | ||
| Treat | 8499 | 44.50 | 29.75% | ||
| All | 28568 | -- | |||
| PUMCH (Chinese) | 400 | Prob | 20159 | 50.40 | 51.25% |
| Test | 12114 | 30.29 | |||
| Treat | 7061 | 17.65 | |||
| All | 39334 | -- |
Prob -- Problem, Treat -- Treatment
Figure 3Relative frequency of Problem, Tests, and Treatments in three English institution: UPMC, PARTNERS, and BETH, and one Chinese institution: PUMCH
Distribution of entities within matched sections
| UPMCD (English) | PARTNERS (English) | BETH (English) | PUMCH (Chinese) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Section | Doc | Entity | Ave | Doc | Entity | Ave | Doc | Entity | Ave | Doc | Entity | Ave |
| 131 | 4453 | 174 | 5259 | 151 | 6211 | 389 | 1761 | |||||
| 95 | 1224 | 138 | 1113 | 123 | 1418 | 196 | 518 | |||||
| 47 | 314 | 54 | 271 | 100 | 713 | 168 | 496 | |||||
| CC | 33 | 377 | 34 | 67 | 77 | 127 | 398 | 960 | ||||
| DD | 105 | 1005 | 35 | 126 | 136 | 793 | 387 | 2742 | ||||
| HOPI | 30 | 486 | 151 | 3481 | 159 | 4612 | 398 | 14713 | ||||
| 25 | 479 | 142 | 2489 | 157 | 3039 | 265 | 4000 | |||||
| 59 | 659 | 140 | 2209 | 166 | 3812 | 220 | 2365 | |||||
| PL | 48 | 187 | 41 | 237 | 35 | 699 | 397 | 1800 | ||||