| Literature DB >> 28634156 |
Haihong Guo1, Xu Na1, Li Hou1, Jiao Li1.
Abstract
BACKGROUND: In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging.Entities:
Keywords: classification; consumer health information; hypertension; natural language processing
Mesh:
Year: 2017 PMID: 28634156 PMCID: PMC5497072 DOI: 10.2196/jmir.7156
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1A four-round annotating process to construct and modify the classification schema and annotated corpus.
Figure 2Mathematical equations.
| General Topics | Items | Contents |
| Diagnosis | Question | 昨天不知道怎么事,突然感到心慌慌的,四肢发凉,全身冒冷汗,之后老婆扶我到小区医院那里去看,量了一下血压,血压比以往要高,之后医生叫我放松,休息了20分钟左右,又感觉没有什么事了。。 请问突然感觉到心慌,四肢发凉,血压升高,这是啥病啊? (Yesterday, my heart suddenly palpitated, my limbs became cold, and my whole body began to sweat. Then my wife accompanied me to the community hospital and checked my blood pressure; it was higher than before. The doctor told me to relax, and I feel much better after resting for about 20 minutes… suddenly felt flustered, limbs became cold, and blood pressure rose. What disease is it?) |
| Pattern | 临床发现X1、X2、X3、……,这是啥病?(Clinical finding X1, X2, X3,… What disease is it?) | |
| Tag | 1.1.4.1 “诊断(Diagnosis)→病因/临床发现的解释(Interpretation of clinical finding)→不具体的发现或多种发现(Uncertain/multiple findings)” | |
| Treatment | Question | 65岁老人血压高经常不稳定,吃哪种降压药最好?(A 65-year-old man with unsteady high blood pressure… What’s the best blood pressure drug to take?) |
| Pattern | 病情y,吃/用/服用哪种药最好?(Condition y: What’s the best drug to take or use?) | |
| Tag | 2.1.2.1 “治疗(Treatment)→药物治疗(Drug therapy)→效力/适应症/药物选择(efficacy/indications/drug choosing)→治疗(Treatment)” |
Distribution of the 2000 consumer health questions in Chinese on the primary level of topics.
| No. | General Topics | Positive | Negative | Total |
| 1 | Diagnosis | 600 | 1400 | 2000 |
| 2 | Treatment | 1167 | 833 | 2000 |
| 3 | Condition management | 136 | 1864 | 2000 |
| 4 | Epidemiology | 233 | 1767 | 2000 |
| 5 | Healthy lifestyle | 278 | 1722 | 2000 |
| 6 | Health provider choice | 45 | 1955 | 2000 |
| 7 | Other | 5 | ||
| Total | 2000 | 2000 | 2000 |
Number and Φ distribution of each type of feature for the Chinese consumer health question classification on the topic of Lifestyle.
| Levels | Features Typesa | Avg Φ | σ (Φ) | nAF | n(Φ ≥ avg Φ) |
| Lexical | Bag-of-words | 0.0016 | 0.0067 | 4967 | 1301 |
| Part-of-speech | 0.0014 | 0.0060 | 6154 | 1490 | |
| Grammatical | Interrogative words | 0.0039 | 0.0204 | 97 | 13 |
| Noun head chunks | 0.0011 | 0.0010 | 48 | 14 | |
| Verb head chunks | 0.0008 | 0.0007 | 19 | 6 | |
| Noun rear chunks | 0.0011 | 0.0019 | 73 | 14 | |
| Verb rear chunks | 0.0010 | 0.0013 | 22 | 3 | |
| Interrogative + noun head chunks | 0.0011 | 0.0013 | 328 | 86 | |
| Interrogative + verb head chunks | 0.0011 | 0.0010 | 312 | 85 | |
| Noun rear chunks + interrogative | 0.0010 | 0.0013 | 315 | 67 | |
| Verb rear chunks + interrogative | 0.0012 | 0.0024 | 318 | 74 | |
| Semantic | CMeSH concepts | 0.0016 | 0.0033 | 43 | 9 |
| CMeSH semantic types | 0.0124 | 0.0101 | 3 | 1 | |
| Lexical & Statistical | Keywords (TF) | 0.0008 | 0.0009 | 1510 | 282 |
| Keywords (IDF) | 0.0007 | 0.0008 | 1137 | 192 | |
| Keywords (TF-IDF) | 0.0008 | 0.0008 | 1208 | 190 | |
| Statistical | Statistical features | 0.0073 | 0.0060 | 13 | 5 |
| Total with duplicates replaced | 15349 | 3656 |
aFor each type of feature, σ (Φ) is the standard deviation of Φ, nAF is the total number of features, n (Φ ≥ avg Φ) is the number of features with Φ ≥ avg Φ.
Feature reduction and the performance of each classifier.
| General topics | N (all features) | N (selected features) | Feature reduction proportion | Avg | σ ( |
| Diagnosis | 15349 | 5311 | 0.6540 | 0.9855 | 0.0164 |
| Treatment | 15349 | 4216 | 0.7253 | 0.7602 | 0.0482 |
| Condition management | 15349 | 3150 | 0.7948 | 0.9963 | 0.0117 |
| Epidemiology | 15349 | 4194 | 0.7268 | 0.7177 | 0.0798 |
| Healthy lifestyle | 15349 | 3656 | 0.7618 | 0.9913 | 0.0166 |
| Health provider choice | 15349 | 2282 | 0.8513 | 0.9635 | 0.0594 |
Figure 3Performance of each feature type for Chinese consumer health question classification on the topic of Lifestyle.