| Literature DB >> 18838292 |
Hsin-Min Lu1, Hsinchun Chen, Daniel Zeng, Chwan-Chuen King, Fuh-Yuan Shih, Tsung-Shu Wu, Jin-Yi Hsiao.
Abstract
PURPOSE: Syndromic surveillance is aimed at early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which may be recorded in different languages. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories to facilitate subsequent data aggregation and analysis. Despite the fact that syndromic surveillance is largely an international effort, existing CC classification systems do not provide adequate support for processing CCs recorded in non-English languages. This paper reports a multilingual CC classification effort, focusing on CCs recorded in Chinese.Mesh:
Year: 2008 PMID: 18838292 PMCID: PMC7108263 DOI: 10.1016/j.ijmedinf.2008.08.004
Source DB: PubMed Journal: Int J Med Inform ISSN: 1386-5056 Impact factor: 4.046
Chinese chief complaint prevalence in Taiwan hospitals
| # Records | # Hospitals | % Chinese CCs | |
|---|---|---|---|
| Medical Center | 222,893 | 10 | 52% |
| Regional Hospital | 484,123 | 39 | 16% |
| District Hospital | 232,008 | 67 | 19% |
| Total | 939,024 | 116 | 25% |
Categories of Chinese chief complaints
| Category | Symptom-related | Name entity | Chinese punctuation | Other |
|---|---|---|---|---|
| Simple average | 40.79% | 13.97% | 20.32% | 24.92% |
| Weighted average | 53.80% | 7.36% | 14.63% | 14.63% |
Equally weighed for all hospitals.
Weighed by the number of Chinese CC records at each hospital.
Intermediate results of Chinese key phrase list construction
| Candidate | Included (Yes/No) | Comment |
|---|---|---|
| Yes | ||
| Yes | ||
| No | Not a phrase | |
| No | Unimportant information | |
| Yes | ||
| Yes | ||
| No | Not a phrase | |
| Yes | ||
| No | Not a phrase |
Fig. 1Chinese chief complaint extraction classification process. *The system design of BioPortal CC Classifier is reproduced from Lu et al. [11].
Performance comparison for MIM and Google Translation
| Syndrome | TP + FN | PPV | Sensitivity | Specificity | F | F2 |
|---|---|---|---|---|---|---|
| Mutual Information-based Mapping (MIM) | ||||||
| GI | 592 | 0.97*** | 0.97*** | 0.98*** | 0.97*** | 0.97*** |
| RASH | 45 | 0.87** | 0.77 | 0.99** | 0.82*** | 0.80*** |
| RESP | 331 | 0.89*** | 0.96*** | 0.97** | 0.93*** | 0.94*** |
| URESP | 132 | 0.86*** | 0.91** | 0.98*** | 0.88*** | 0.89*** |
| LRESP | 272 | 0.93 | 0.98*** | 0.98 | 0.95*** | 0.96*** |
| FEVER | 413 | 0.99** | 0.96 | 0.99** | 0.97 | 0.97 |
| Google Translation | ||||||
| GI | 592 | 0.91 | 0.90 | 0.96 | 0.91 | 0.91 |
| RASH | 45 | 0.76 | 0.73 | 0.99 | 0.75 | 0.74 |
| RESP | 331 | 0.84 | 0.83 | 0.96 | 0.83 | 0.83 |
| URESP | 132 | 0.70 | 0.83 | 0.97 | 0.76 | 0.78 |
| LRESP | 272 | 0.96** | 0.80 | 0.99*** | 0.87 | 0.84 |
| FEVER | 413 | 0.98 | 0.96 | 0.99 | 0.97 | 0.97 |
Statistical test is based on 3000 bootstrappings.
*p-value <0.1; **p-value <0.05; ***p-value <0.01.
Performance comparison for MIM and Bilingual Dictionary
| Syndrome | TP + FN | PPV | Sensitivity | Specificity | F | F2 |
|---|---|---|---|---|---|---|
| Mutual Information-based Mapping (MIM) | ||||||
| GI | 592 | 0.97 | 0.97 | 0.98 | 0.97 | 0.97 |
| RASH | 45 | 0.87 | 0.77 | 0.99 | 0.82 | 0.80 |
| RESP | 331 | 0.89 | 0.96 | 0.97 | 0.93 | 0.94 |
| URESP | 132 | 0.86 | 0.91 | 0.98 | 0.88 | 0.89 |
| LRESP | 272 | 0.93 | 0.98 | 0.98 | 0.95 | 0.96 |
| FEVER | 413 | 0.99 | 0.96 | 0.99 | 0.97 | 0.97 |
| Bilingual Dictionary | ||||||
| GI | 592 | 0.36 | 0.36 | 0.70 | 0.36 | 0.36 |
| RASH | 45 | 0.54 | 0.77 | 0.98 | 0.64 | 0.68 |
| RESP | 331 | 0.88 | 0.79 | 0.97 | 0.83 | 0.82 |
| URESP | 132 | 0.43 | 0.16 | 0.98 | 0.24 | 0.20 |
| LRESP | 272 | 0.95 | 0.90 | 0.99 | 0.93 | 0.92 |
| FEVER | 413 | NA | 0.00 | 1.00 | NA | NA |
Statistical test is based on 3000 bootstrappings.
p-value <0.1.
p-value <0.05.
p-value <0.01.
Example 1: Raw Chinese CC, translations and classification results
| Translation method | Translation outcome | Syndrome outcome | Gold standard |
|---|---|---|---|
| Raw Chinese CC: | |||
| MIM | Soreness, sore throat | UPPER RESP, RESP | CONST, RESP, UPPER RESP |
| Bilingual Dictionary | Ache, today early begin | UNKNOWN | |
| Google Translation | General soreness sore throat this morning before. | UPPER RESP, RESP | |
Example 2: Raw Chinese CC, translations and classification results
| Translation method | Translation outcome | Syndrome outcome | Gold standard |
|---|---|---|---|
| Raw Chinese CC: | |||
| MIM | Vomiting | GI | GI |
| Bilingual Dictionary | To spit, in the evening begin | UNKNOWN | |
| Google Translations | Spit at the beginning | UNKNOWN | |
Example 3: Raw Chinese CC, translations and classification results
| Translation method | Translation outcome | Syndrome outcome | Gold standard |
|---|---|---|---|
| Raw Chinese CC: | |||
| MIM | Fever, dyspnea | RESP, LRESP, FEVER, CONST | RESP, LRESP, FEVER |
| Bilingual Dictionary | Yesterday begin have a high temperature, to gasp | LRESP, RESP | |
| Google Translation | Surge began yesterday fever | FEVER, CONST | |