| Literature DB >> 20089162 |
Yaqiang Wang1, Zhonghua Yu, Yongguang Jiang, Kaikuo Xu, Xia Chen.
Abstract
BACKGROUND: In recent years, Data Mining technology has been applied more than ever before in the field of traditional Chinese medicine (TCM) to discover regularities from the experience accumulated in the past thousands of years in China. Electronic medical records (or clinical records) of TCM, containing larger amount of information than well-structured data of prescriptions extracted manually from TCM literature such as information related to medical treatment process, could be an important source for discovering valuable regularities of TCM. However, they are collected by TCM doctors on a day to day basis without the support of authoritative editorial board, and owing to different experience and background of TCM doctors, the same concept might be described in several different terms. Therefore, clinical records of TCM cannot be used directly to Data Mining and Knowledge Discovery. This paper focuses its attention on the phenomena of "one symptom with different names" and investigates a series of metrics for automatically normalizing symptom names in clinical records of TCM.Entities:
Mesh:
Year: 2010 PMID: 20089162 PMCID: PMC3098075 DOI: 10.1186/1471-2105-11-40
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Examples of datasets (SJZSTCMD, CRD, EVALDATA) used in experiments.
Figure 2Comparison of precisions, recalls and F-Measures obtained by different literal similarity metrics under different thresholds.
Figure 3Comparison of precisions, recalls and F-Measures obtained by different remedy-based similarity metrics under different thresholds.
Figure 4Comparison of precisions, recalls and F-Measures obtained by different hybrid similarity metrics with different weights (.
Weights (α, β) on making the optimized results of hybrid similarity metrics.
| Hybrid Similarity Metrics | Weights |
|---|---|
| Set+JD | (0.1, 0.9) |
| Set+JWD | (0.1, 0.9) |
| Set+SWD | (0.1, 0.9); (0.2, 0.8); (0.3, 0.7) |
| Set+SWGD | (0.1, 0.9); (0.2, 0.8); (0.3, 0.7) |
| TFIDF+VSM+JD | (0.1, 0.9) |
| TFIDF+VSM+JWD | (0.1, 0.9) |
| TFIDF+VSM+SWD | (0.1, 0.9); (0.2, 0.8) |
| TFIDF+VSM+SWGD | (0.1, 0.9); (0.2, 0.8) |
| SimRank+JD | (0.1, 0.9); (0.2, 0.8); (0.3, 0.7); (0.4, 0.6); (0.5, 0.5); (0.6, 0.4) |
| SimRank+JWD | (0.1, 0.9); (0.2, 0.8); (0.3, 0.7); (0.4, 0.6); (0.5, 0.5); (0.6, 0.4) |
| SimRank+SWD | (0.1, 0.9); (0.2, 0.8); (0.3, 0.7); (0.4, 0.6); (0.5, 0.5); (0.6, 0.4); (0.7, 0.3); (0.8, 0.2) |
| SimRank+SWGD | (0.1, 0.9); (0.2, 0.8); (0.3, 0.7); (0.4, 0.6); (0.5, 0.5); (0.6, 0.4); (0.7, 0.3); (0.8, 0.2) |
Figure 5Comparison of precisions, recalls and F-Measures obtained by JD and its corresponding hybrid similarity metrics under different thresholds.
Figure 6Comparison of precisions, recalls and F-Measures obtained by JWD and its corresponding hybrid similarity metrics under different thresholds.
Figure 7Comparison of precisions, recalls and F-Measures obtained by SWD and its corresponding hybrid similarity metrics under different thresholds.
Figure 8Comparison of precisions, recalls and F-Measures obtained by SWGD and its corresponding hybrid similarity metrics under different thresholds.