Huaiyu Wan1, Marie-Francine Moens2, Walter Luyten3, Xuezhong Zhou4, Qiaozhu Mei5, Lu Liu6, Jie Tang7. 1. Department of Computer Science and Technology, Tsinghua University, Beijing, China School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China. 2. Department of Computer Science, KU Leuven, Belgium. 3. Department of Biology, KU Leuven, Belgium. 4. School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China. 5. School of Information, University of Michigan, Ann Arbor, Michigan, USA. 6. TangoMe Inc, Mountain View, CA, USA. 7. Department of Computer Science and Technology, Tsinghua University, Beijing, China jietang@tsinghua.edu.cn.
Abstract
OBJECTIVE: Traditional Chinese medicine (TCM) is a unique and complex medical system that has developed over thousands of years. This article studies the problem of automatically extracting meaningful relations of entities from TCM literature, for the purposes of assisting clinical treatment or poly-pharmacology research and promoting the understanding of TCM in Western countries. METHODS: Instead of separately extracting each relation from a single sentence or document, we propose to collectively and globally extract multiple types of relations (eg, herb-syndrome, herb-disease, formula-syndrome, formula-disease, and syndrome-disease relations) from the entire corpus of TCM literature, from the perspective of network mining. In our analysis, we first constructed heterogeneous entity networks from the TCM literature, in which each edge is a candidate relation, then used a heterogeneous factor graph model (HFGM) to simultaneously infer the existence of all the edges. We also employed a semi-supervised learning algorithm estimate the model's parameters. RESULTS: We performed our method to extract relations from a large dataset consisting of more than 100,000 TCM article abstracts. Our results show that the performance of the HFGM at extracting all types of relations from TCM literature was significantly better than a traditional support vector machine (SVM) classifier (increasing the average precision by 11.09%, the recall by 13.83%, and the F1-measure by 12.47% for different types of relations, compared with a traditional SVM classifier). CONCLUSION: This study exploits the power of collective inference and proposes an HFGM based on heterogeneous entity networks, which significantly improved our ability to extract relations from TCM literature.
OBJECTIVE: Traditional Chinese medicine (TCM) is a unique and complex medical system that has developed over thousands of years. This article studies the problem of automatically extracting meaningful relations of entities from TCM literature, for the purposes of assisting clinical treatment or poly-pharmacology research and promoting the understanding of TCM in Western countries. METHODS: Instead of separately extracting each relation from a single sentence or document, we propose to collectively and globally extract multiple types of relations (eg, herb-syndrome, herb-disease, formula-syndrome, formula-disease, and syndrome-disease relations) from the entire corpus of TCM literature, from the perspective of network mining. In our analysis, we first constructed heterogeneous entity networks from the TCM literature, in which each edge is a candidate relation, then used a heterogeneous factor graph model (HFGM) to simultaneously infer the existence of all the edges. We also employed a semi-supervised learning algorithm estimate the model's parameters. RESULTS: We performed our method to extract relations from a large dataset consisting of more than 100,000 TCM article abstracts. Our results show that the performance of the HFGM at extracting all types of relations from TCM literature was significantly better than a traditional support vector machine (SVM) classifier (increasing the average precision by 11.09%, the recall by 13.83%, and the F1-measure by 12.47% for different types of relations, compared with a traditional SVM classifier). CONCLUSION: This study exploits the power of collective inference and proposes an HFGM based on heterogeneous entity networks, which significantly improved our ability to extract relations from TCM literature.
Authors: Elizabeth S Chen; George Hripcsak; Hua Xu; Marianthi Markatou; Carol Friedman Journal: J Am Med Inform Assoc Date: 2007-10-18 Impact factor: 4.497
Authors: Antti Airola; Sampo Pyysalo; Jari Björne; Tapio Pahikkala; Filip Ginter; Tapio Salakoski Journal: BMC Bioinformatics Date: 2008-11-19 Impact factor: 3.169