| Literature DB >> 18854039 |
Yu-Ching Fang1, Hsuan-Cheng Huang, Hsin-Hsi Chen, Hsueh-Fen Juan.
Abstract
BACKGROUND: Traditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East Asian countries. In recent years, many herbal medicines were found to exhibit a variety of effects through regulating a wide range of gene expressions or protein activities. As available TCM data continue to accumulate rapidly, an urgent need for exploring these resources systematically is imperative, so as to effectively utilize the large volume of literature.Entities:
Mesh:
Year: 2008 PMID: 18854039 PMCID: PMC2584015 DOI: 10.1186/1472-6882-8-58
Source DB: PubMed Journal: BMC Complement Altern Med ISSN: 1472-6882 Impact factor: 3.659
Figure 1The simplified relational scheme of TCMGeneDIT. Each gray box represents an entity with various major attributes characterized by oval-shape. For instance, each TCM herb may be associated with one or many genes involving in several signaling pathways and have many interacting partners. Theses associations may be related to the therapeutic mechanisms for certain diseases and could be supported by scientific evidences.
Figure 2The text mining approach and information integration for TCMGeneDIT. Literature corpus about TCMs was collected from PubMed and used for entity annotations and information extraction. Annotated documents were mined based on hypothesis test and collocation analysis to discover entity associations. On the other hand, the raw corpus was pre-processed with public tools and several rules can be applied to the processed sentences for extracting the relations between effecters and effects. The constructed effect set was used to filter candidate relation and literature annotation. Protein-protein interactions and biological pathways were integrated into our database and disease candidate genes from PharmGKB were used for transitive inference. Users can access the database via the web interface. Thick arrow indicates the basic work flow of TCMGeneDIT.
Figure 3The TCM and gene associations and visual information representations. (a) The Ganoderma lucidum and gene associations from text mining. Detailed information about TCM and genes could be accessed by following the links themselves. Literature evidences supporting the associations are available through following the links indicating the number of paper. Confidence thresholds are able to be selected. Users with domain knowledge can recommend the associations they think are correct. Moreover, TCM and gene associations could be discovered by transitive inference. (b) TCM and gene association graph. TCM and genes are represented by green and yellow nodes, respectively. The edges between them are colored according to t values (please see the text). The numbers on the edges mean how many literatures may support the associations. (c) TCM, gene and disease association graph. Red nodes mean diseases.
Evaluation of the associations between TCMs and various entities from collocation analysis
| (TCM, Gene) | (TCM, Disease) | (TCM, Gene, Disease) | (TCM, Effect) | |
| Precision (%) | 92.8 | 86.0 | 87.0 | 96.5 |
| Number of associations | 666 | 642 | 131 | 570 |
| Minimum confidence (%) | 95 | 95 | 95 | 97.5 |
| Number of TCMs contributed to the associations | 48 | 47 | 23 | 44 |