| Literature DB >> 19958519 |
Richard Tzong-Han Tsai1, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang, Yue-Yang Bow, Yen-Ching Chang, Wen-Harn Pan, Wen-Lian Hsu.
Abstract
BACKGROUND: The genetic factors leading to hypertension have been extensively studied, and large numbers of research papers have been published on the subject. One of hypertension researchers' primary research tasks is to locate key hypertension-related genes in abstracts. However, gathering such information with existing tools is not easy: (1) Searching for articles often returns far too many hits to browse through. (2) The search results do not highlight the hypertension-related genes discovered in the abstract. (3) Even though some text mining services mark up gene names in the abstract, the key genes investigated in a paper are still not distinguished from other genes. To facilitate the information gathering process for hypertension researchers, one solution would be to extract the key hypertension-related genes in each abstract. Three major tasks are involved in the construction of this system: (1) gene and hypertension named entity recognition, (2) section categorization, and (3) gene-hypertension relation extraction.Entities:
Mesh:
Year: 2009 PMID: 19958519 PMCID: PMC2788360 DOI: 10.1186/1471-2105-10-S15-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Section categorizer flowchart.
Section categorization performance
| Section Type | P (%) | R (%) | F (%) |
|---|---|---|---|
| Objective | 98.50 | 99.37 | 98.93 |
| Method | 98.54 | 97.74 | 98.14 |
| Results | 98.69 | 99.22 | 98.96 |
| Conclusion | 99.76 | 98.76 | 99.25 |
| ALL | 98.77 | 98.87 | 98.82 |
Figure 2Illustration of path NP↑PP↑NP↑S↓VP↓PP↓NP.
Figure 3Example of paired similar sentences containing H-G pairs.
Example of annotated sentence
| Sentence | KeyRelation? |
|---|---|
| In conclusion, <GENE> REN 10631A alleles</GENE> are significantly associated with <DISEASE> EHT</DISEASE> in the Emirati population. | Yes |
Performance improvement achieved by each feature set
| Config | Baseline Features | Template Features | Positional Features | ΔAUC |
| AUC>AUCB? ( | ||
|---|---|---|---|---|---|---|---|---|
| Baseline | + | 0.4936 | 0.1261 | N/A | N/A | N/A | ||
| B+T | + | + | 0.5133 | 0.1057 | 0.0114 | 0.65 | No | |
| B+P | + | + | 0.8140 | 0.087 | 0.3604 | 11.44 | Yes | |
| B+P+T | + | + | + | 0.8184 | 0.084 | 0.3783 | 11.75 | Yes |
Diabetes-gene pair extraction performance
| Config | Precision | Recall | AUC |
|---|---|---|---|
| (1) | 0.3652 | 0.7925 | 0.77 |
| (2) | 0.5679 | 0.8679 | 0.6866 |
| B+P+T | 0.6522 | 0.8491 | 0.8300 |
List of examined genes extracted from real-world data
| PubMed ID | EntrezGene ID |
|---|---|
| 16380460 | 2200 |
| 16530037 | 11117 |
| 16540569 | 10371 |
| 16685211 | 7222 |
| 16690767 | 6523 |
| 16801480 | 116985 |
| 16915036 | 3481 |
| 17015768 | 51320 |
| 17182005 | 2952 |
| 17250807 | 3606 |
| 17351372 | 7222 |
| 17351372 | 7224 |
| 17921333 | 27302 |
| 17976639 | 7178 |
| 17986358 | 8490 |
| 18067551 | 50848 |
| 18075463 | 2185 |
| 18097620 | 6098 |
| 18156195 | 5594 |
| 18156195 | 3726 |
| 18156195 | 1385 |
| 18158339 | 2697 |
| 18360038 | 8601 |
| 18360038 | 9630 |
| 18398332 | 7293 |
| 18398344 | 8837 |
Figure 4Section categorization.
Figure 5The distribution of key hypertension genes in different sections.
Figure 6An incorrectly sectioned abstract.