| Literature DB >> 21573008 |
Yi-Tsung Tang1, Shuo-Jang Li, Hung-Yu Kao, Shaw-Jenq Tsai, Hei-Chia Wang.
Abstract
BACKGROUND: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2011 PMID: 21573008 PMCID: PMC3091867 DOI: 10.1371/journal.pone.0019633
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The overall architecture of AutoPat.
The pattern templates used for pattern generation.
| ID | Pattern Templates |
| 1 |
|
| 2 |
|
| 3 |
|
Top 5 unsupervised patterns and examples of seed patterns of regulation relationships.
| Top N | Unsupervised Patterns | #Occurrence |
| 1 | [TF/TG].*activation.*of.* [TF/TG] | 1452 |
| 2 | [TF/TG].*induction.*of.* [TF/TG] | 1244 |
| 3 | [TF/TG].*activate.* [TF/TG] | 611 |
| 4 | [TF/TG].*regulation.*of.* [TF/TG] | 589 |
| 5 | [TF/TG].*binding.* [TF/TG] | 543 |
Figure 2Example of combined weight of extracted sentences.
Figure 3The frequency distribution of unsupervised patterns.
Figure 4The extraction precision of different thresholds.
Figure 5The precision of the Top N ranking results for unsupervised patterns.
Figure 6Examples of the homonym and abbreviation issues.
The performance comparison with Textpresso.
| Precision | Recall | F-measure | |
| Textpresso-Regulation | 42.5% | 22.9% | 29.7% |
| Textpresso-Spatial | 45.5% | 6.2% | 10.9% |
| Textpresso-Action | 32.0% | 9.2% | 14.3% |
| Textpresso-Join | 27.7% | 26.4% | 27.0% |
| AutoPat |
|
|
|
Top-K Precision of AutoPat.
| Top-K | AP1 | E2F1 | Average |
| 10 | 100% | 70% | 85.0% |
| 20 | 85% | 85% | 85.0% |
| 30 | 70% | 70% | 70.0% |
| 40 | 70% | 57% | 63.5% |
| 50 | 68% | 58% | 63.0% |
| R-Precision | 57.3% | 60.5% | 58.9% |
Note that the precision rates of baseline “TF-KV-TG” for AP1 and E2F1 are 48.1 and 53.3.
The overall performance comparison.
| Testing Data | Method | Precision | Recall | F-measure | ||
| AP1 (270) | Saric's method | 54.3% | 25.0% | 34.2% | ||
| AutoPat | Seed | Filter | ||||
| MIX | None | 49.2% | 70.8% | 58.1% | ||
| AP1 | 46.0% | 64.4% | 53.6% | |||
| E2F1 | 49.2% | 64.4% | 55.8% | |||
| HIF1 | 48.0% | 69.1% | 56.7% | |||
| HIF1 | None | 45.7% | 60.7% | 52.2% | ||
| AP1 | 48.3% | 64.0% | 55.1% | |||
| E2F1 | 52.1% | 69.7% | 59.6% | |||
| HIF1 | 47.8% | 85.4% | 61.3% | |||
| E2F1 (279) | Saric's method | 58.5% | 57.9% | 58.2% | ||
| Seed | Filter | |||||
| AutoPat | MIX | None | 60.5% | 70.5% | 65.1% | |
| AP1 | 61.7% | 72.4% | 66.6% | |||
| E2F1 | 62.6% | 66.3% | 64.4% | |||
| HIF1 | 58.9% | 70.6% | 64.3% | |||
| HIF1 | None | 57.9% | 62.9% | 60.3% | ||
| AP1 | 58.5% | 59.6% | 59.0% | |||
| E2F1 | 56.4% | 54.8% | 55.6% | |||
| HIF1 | 53.7% | 84.6% | 65.7% | |||
| H1F1 (619) | Saric's method | 50.8% | 24.4% | 33.0% | ||
| Seed | Filter | |||||
| AutoPat | MIX | None | 55.3% | 45.6% | 49.9% | |
| AP1 | 52.8% | 45.0% | 48.6% | |||
| E2F1 | 54.3% | 42.2% | 47.5% | |||
| HIF1 | 50.6% | 49.0% | 49.7% | |||
| HIF1 | None | 52.7% | 42.6% | 47.1% | ||
| AP1 | 56.3% | 38.5% | 45.7% | |||
| E2F1 | 59.6% | 38.5% | 46.8% | |||
| HIF1 | 52.7% | 57.7% | 55.1% |
The TFs in the “Filter” column are used to select the related abstracts from the training corpus.
The list of extraction results of HIF-1 TF pathway in PID.
| TF | Method | Target Gene | |
| HIF-1 | Found | AutoPat ( |
|
| Saric's method ( |
| ||
| Not Found | not in abstract ( |
| |
| in abstract, AutoPat ( |
| ||
| in abstract, Saric's method (1 |
| ||
| E2F1 | Found | AutoPat ( |
|
| Saric's method ( |
| ||
| Not Found | not in abstract ( |
| |
| in abstract, AutoPat ( |
| ||
| in abstract, Saric's method ( |
| ||
| AP1 | Found | AutoPat ( |
|
| Saric's method ( |
| ||
| Not Found | not in abstract ( |
| |
| in abstract, AutoPat ( |
| ||
| in abstract, Saric's method ( |
|
P: Precision, R: Recall, the bold-faced target gene means this TG can be extracted in only one method.
Figure 7The global network of HIF-1 TF.
The high frequency of TF-TG relationships in HIF-1 global network.
| Query TF | Middle Nodes | TGs | Relation Type |
| HIF-1 |
| Direct | |
|
|
| Indirect | |
|
| |||
|
| |||
|
|
| Indirect | |
|
| |||
|
| |||
|
|
| Indirect | |
|
| |||
|
| |||
|
|
| Indirect | |
|
| |||
|
|
Figure 8The example of an indirect relationship between HIF-1 and p53.