| Literature DB >> 29297414 |
Tong Ruan1, Mengjie Wang2, Jian Sun2, Ting Wang2, Lu Zeng2, Yichao Yin3, Ju Gao3.
Abstract
BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis.Entities:
Keywords: Information extraction; Knowledge base; Linked data; Symptoms in Chinese
Mesh:
Year: 2017 PMID: 29297414 PMCID: PMC5763289 DOI: 10.1186/s13326-017-0145-x
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Schema Graph for Knowledge Base of Symptoms in Chinese. Each rectangle represents a concept and the bottom of each rectangle is an instance
Fig. 2Workflow of Constructing the Knowledge Base of Symptoms in Chinese. It contains three steps: (1) Extract data from healthcare websites, encyclopedia sites and EMRs, respectively. (2) Align the extraction results. (3) Link our symptoms to concepts in UMLS
Basic information of eight healthcare websites
| Website name | URL |
|---|---|
| Familydoctor |
|
| JIANKE |
|
| 120ask |
|
| QQYY |
|
| 39Health |
|
| 99Health |
|
| Fh21 |
|
| Pcbaby |
|
Fig. 3Five Fields of Entity Page in Encyclopedia Sites
Classification features for six entity types
| Fields of page | Classification features for six entity types |
|---|---|
| Entity name | Ends with any words in (department, disease, inflammation, tumour, syndrome, examination) |
| Abstract | Contains any words in (symptom, syndrome, symptoms of illness, disease name of TCM) |
| Content | Has more than 3 words in (function, specification, adverse reaction, side effect, component, usage, dosage), |
| Has more than 3 words in (cause, examination, antidiastole, diagnosis, mitigation, pathogenesis, clinical manifestation) | |
| Full-text | Contains any words in (Chinese patent medicine, Chinese herbal medicine) |
| Category | Contains any words in (medicine, disease, TCM, drug, Chinese patent medicine, symptom) |
Features of CRF
| Feature type | Feature contents | |
|---|---|---|
| Literal features | Unigram |
|
| Bigram |
| |
| Trigram |
| |
| Position features | Index | |
| POS features | Unigram |
|
| Bigram |
| |
| Trigram |
| |
Fig. 4Classification Results for Encyclopedia Sites. The result for Examination is obviously lower than those for the other classifications, because some entity pages in encyclopedia sites are irregular, leading to few features being used. Besides, Chinese Wikipedia has a few seed entities of examination, so its result is worse than those for other encyclopedias
Fig. 5Data Distribution on our KB. From (a), symptom-related facts account for 49.23% in all facts of our KB, which is the result of symptoms being the focus of our KB. From (b), we find that medicine entities account for 50% of all entities, and 64.4% of them are TCM
Entity evaluation of different data sources
| Entity Type | Precision | Harvested entities | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Family doctor | JIANKE | 120ask | QQYY | 39Health | 99Health | fh21 | PCbaby | Baidu Baike | Hudong Baike | Chinese Wikipedia | EMRs | Result | ||
| Symptom | 0.981 | 4775 | 7748 | 7610 | 4123 | 6659 | 5745 | 9780 | 1787 | 2932 | 997 | 393 | 2376 | 26821 |
| Disease | 0.967 | 9998 | 8004 | 7138 | 2546 | 7778 | 344 | 3991 | 1389 | 5688 | 4184 | 587 | – | 32956 |
| Medicine | 0.983 | 893 | 1281 | 3271 | 2175 | 6325 | 1423 | 5121 | 879 | 22152 | 27469 | 365 | – | 67712 |
| Department | 0.976 | 31 | – | 55 | 12 | 37 | 31 | – | – | 192 | 125 | 53 | – | 292 |
| Examination | 0.783 | – | 1403 | 302 | – | 2909 | – | – | – | 230 | 301 | 39 | – | 7704 |
| Aggregateda | 0.938 | 15697 | 18436 | 18376 | 8856 | 23708 | 7543 | 18892 | 4055 | 31194 | 33076 | 1437 | 2376 | 135485 |
aPrecision values are averaged and numbers of harvested entities are summed
Fig. 6Data Distribution on the Linked Results. a Semantic Type Distribution on Linked Concepts in UMLS, b Property Distribution on Linked Symptom in our KB, c Property Distribution on Linked Concepts in UMLS