| Literature DB >> 17134478 |
K Bretonnel Cohen1, Lawrence Hunter.
Abstract
BACKGROUND: Propositional representations of biomedical knowledge are a critical component of most aspects of semantic mining in biomedicine. However, the proper set of propositions has yet to be determined. Recently, the PASBio project proposed a set of propositions and argument structures for biomedical verbs. This initial set of representations presents an opportunity for evaluating the suitability of predicate-argument structures as a scheme for representing verbal semantics in the biomedical domain. Here, we quantitatively evaluate several dimensions of the initial PASBio propositional structure repository.Entities:
Mesh:
Year: 2006 PMID: 17134478 PMCID: PMC1764449 DOI: 10.1186/1471-2105-7-S3-S5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Representation of express in PASBio 1.0. The first three lines give the argument structure. The last three lines give the three examples that violate the θ-criterion: underlined phrases are mapped to Arg3.
| Arg | mnemonic |
| named entity being expressed (gene or gene products) | |
| property of the existing named entity | |
| location referring to organelle, cell or tissue | |
| Example number | Example text |
Arguments of transcribe in PASBio 1.0.
| causer, agent (Comment: protein) | |
| entity transcribed (Comment: gene, DNA) | |
| transcription site (Comment: promoter) | |
| entity after transcription | |
| location as organ or tissue |
Arity of PASBio predicates. The column headed 2 lists all predicates with two arguments, the column headed 5 lists all predicates with five arguments, etc.
| 2 | 3 | 4 | 5 |
Overlap between PASBio and the corpora. For each corpus, we give the percentage of verb tokens that could be accounted for by the PASBio verbs. The Verb tokens column gives the number of tokens covered by PASBio/the total number of verb tokens in the corpus. The Verb types column gives the number of types covered by PASBio/the total number of verb types in the corpus. See the text for why the numerator in the latter is not always 29.
| Corpus | Verb tokens | Verb types |
| BioIE (both) | 8.8% (1509/17,186) | 3.2% (28/871) |
| BioIE-P450 | 12.1% (1,148/9,455) | 3.7% (24/649) |
| BioIE-Onc | 4.9% (379/7,731) | 4.3% (26/601) |
| GENIA | 8.5% (4,416/51,879) | 2.6% (28/1077) |
Verb tokens covered by the 29 most frequent verbs in each corpus. These counts reflect filtering some non-biomedical verbs, such as be. Compare these data to those in Table 3.
| Corpus | Percentage | Tokens |
| BioIE (both) | 23.8% | 4,088/17,186 |
| BioIE-P450 | 29.2% | 2,757/9,455 |
| BioIE-Oncology | 21.7% | 1,675/7,731 |
| GENIA | 29.6% | 15,363/51,879 |