| Literature DB >> 27170286 |
Yifan Peng1, Cecilia Arighi2, Cathy H Wu2, K Vijay-Shanker3.
Abstract
There has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein-protein interactions (PPI). In BioCreative V, we participated in the BioC task and developed a PPI system to detect text passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system can be seamlessly added to the biocuration pipeline with little effort required for the system integration. A distinctive feature of our PPI system is that it utilizes extended dependency graph, an intermediate level of representation that attempts to abstract away syntactic variations in text. As a result, we are able to use only a limited set of rules to extract PPI pairs in the sentences, and additional rules to detect additional passages for PPI pairs. For evaluation, we used the 95 articles that were provided for the BioC annotation task. We retrieved the unique PPIs from the BioGRID database for these articles and show that our system achieves a recall of 83.5%. In order to evaluate the detection of passages with PPIs, we further annotated Abstract and Results sections of 20 documents from the dataset and show that an f-value of 80.5% was obtained. To evaluate the generalizability of the system, we also conducted experiments on AIMed, a well-known PPI corpus. We achieved an f-value of 76.1% for sentence detection and an f-value of 64.7% for unique PPI detection.Database URL: http://proteininformationresource.org/iprolink/corpora.Entities:
Mesh:
Year: 2016 PMID: 27170286 PMCID: PMC4915133 DOI: 10.1093/database/baw072
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Sample EDGs with an (a) active, (b), passive (b), and (c) nominalized forms of the verb “activate”.
Figure 2.Flowchart of our system in the BioC subtask 4 pipeline.
Constructs with examples
| Type | Explanation | Example | |
|---|---|---|---|
| 1 | Active form | Verbs in an active voice | |
| 2 | Passive form | Verbs in a passive voice | |
| 3 | Nominalization | Nominalized verbs | |
| 4 | Adjective | Verbs used as an adjective | |
| 5 | Full relative clause | Relative clauses introduced by relative pronouns, such as “which”, “who”, and “that”. | |
| 6 | Reduced relative clause | Relative clauses that start with a gerund or past participle and have no overt subject. | Structure of |
| 7 | Coordination | Structures that link two or more items (conjuncts) of syntactically equal status. | |
| 8 | Null argument | When the argument is omitted, but implied | |
| 9 | Is-A | Argument X is a hyponym of argument Y, if X is a subtype of Y, or when an instance of X refers to a concept Y | |
| 10 | Appositive | Constructs of two noun phrases next to each other, typically separated by comma and referring to the same entity | |
| 11 | Member-collection | Constructs that link a generic reference to a group of entities that are specified in other places in text. | The basic cleft of |
| 12 | Part-whole | Constructs that an argument extracted for a trigger comprises a part of the target entity. | |
| 13 | Combination | Any combination of above types | |
Entities in the extracted relations are marked in bold font.
Figure 3.Sample EDGs with coordination and apposition.
Recall on 95 annotated documents
| Unique PPI | 263 | 52 | 83.5 |
Recall on 20 in-house annotated documents (only Abstract and Results sections)
| Abstract | 20 | 5 | 4 | 80.0 | 83.3 | 81.6 |
| Results | 216 | 79 | 26 | 73.2 | 89.3 | 80.4 |
Evaluation results on AIMed
| Sentence detection | 370 | 29 | 197 | 92.7 | 64.6 | 76.1 |
| PPI pairs | 557 | 165 | 443 | 77.2 | 55.7 | 64.7 |
| Rule 1a | 458 | 116 | ||||
| Rule 1b | 86 | 46 | ||||
| Rule 2 | 13 | 3 |