| Literature DB >> 18718948 |
Hagit Shatkay1, Fengxia Pan, Andrey Rzhetsky, W John Wilbur.
Abstract
MOTIVATION: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no 'average biologist' client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks.Entities:
Mesh:
Year: 2008 PMID: 18718948 PMCID: PMC2530883 DOI: 10.1093/bioinformatics/btn381
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Two annotated sentences. The first constitutes a single fragment. The second is fragmented where the statement's polarity changes from positive (P) to negative (N).
The number of sentences and of fragments, for which there is complete agreement in annotation, along each dimension
| Foc. & Ev. | Focus | Evidence | Certainty | Polarity | Trend | |
|---|---|---|---|---|---|---|
| Dataset name | ||||||
| Sentences | 1977 | 4068 | 2964 | 5644 | 6430 | 5907 |
| Fragments | 2109 | 4447 | 3133 | 5992 | 6945 | 6330 |
| No. of terms | F: 600 | 1500 | 1500 | 100 | 600 | 100 |
| selected | +E: 200 |
The ‘Foc. & Ev.’ column counts agreement along both Focus and Evidence dimensions used for training a classifier that exploits their dependency. The bottom row lists the optimal number of terms selected to represent text along each of the respective dimensions.
Examples of standard stop-words that serve as stop-words along some dimensions (denoted:✓) but are regarded as content-bearing along the other dimensions (denoted: 0). The dimensions are listed as F (focus), P (polarity), C (certainty), E (evidence) and T (trend)
| Word | F | P | C | E | T | Word | F | P | C | E | T | Word | F | P | C | E | T |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | ✓ | ✓ | ✓ | ✓ | ✓ | may | ✓ | 0 | 0 | 0 | ✓ | hence | ✓ | ✓ | 0 | 0 | ✓ |
| always | ✓ | 0 | 0 | 0 | ✓ | not | ✓ | 0 | 0 | 0 | ✓ | rather | ✓ | 0 | ✓ | ✓ | ✓ |
| perhaps | ✓ | ✓ | 0 | 0 | ✓ | their | ✓ | ✓ | ✓ | 0 | ✓ | using | 0 | ✓ | ✓ | 0 | ✓ |
Fig. 2.Distribution of annotation values along each of the dimensions in the dataset. The number of fragments in each class is shown next to the class tag. The dataset shown covers only annotations on which all three annotators were in agreement. Notably, along the certainty, polarity and trend dimensions, almost all fragments are annotated with the highest certainty (3), positive polarity and no trend (neither increase nor decrease in measurement), respectively. The focus and evidence distributions are also skewed, with over a half of the fragments (2853) discussing science, and over a third (1400) providing experimental evidence (E3).
The results of SVM classification in a 5-fold cross-validation experiment
| Category | No. of Fragments | Precision | Recall | Accuracy | |
|---|---|---|---|---|---|
| Evidence (FRAG_E) | |||||
| 993 | 0.85 | 0.93 | 0.89 | ||
| 59 | 0.77 | 0.46 | 0.57 | ||
| 681 | 0.94 | 0.94 | 0.94 | ||
| 1400 | 0.93 | 0.89 | 0.91 | ||
| Average | |||||
| Weighted average | |||||
| Focus (FRAG_F) | |||||
| Science | 2858 | 0.91 | 0.98 | 0.94 | |
| Methodology | 1406 | 0.95 | 0.86 | 0.91 | |
| Generic | 183 | 0.94 | 0.41 | 0.57 | |
| Average | 0.93 | 0.75 | 0.83 | ||
| Weighted average | 0.92 | 0.92 | 0.92 | 0.92 | |
| Certainty (FRAG_C) | |||||
| 0 | 32 | 0.71 | 0.53 | 0.61 | |
| 1 | 83 | 0.95 | 0.63 | 0.75 | |
| 2 | 38 | 0.68 | 0.34 | 0.46 | |
| 3 | 5832 | 0.99 | 1.00 | 0.99 | |
| Average | |||||
| Weighted average | |||||
| Polarity (FRAG_P) | |||||
| P | 6498 | 1.0 | 1.0 | 1.0 | |
| N | 447 | 0.96 | 0.93 | 0.95 | |
| Average | |||||
| Weighted average | |||||
| Trend (FRAG_T) | |||||
| No Trend | 6156 | 0.98 | 0.99 | 0.99 | |
| + | 83 | 0.64 | 0.39 | 0.48 | |
| − | 91 | 0.66 | 0.27 | 0.39 | |
| Average | |||||
| Weighted average | |||||
Within each of the five dimensions, we show for each of the categories the Precision, the Recall, the F-measure, along with the global measures: average, weighted average and overall accuracy of the classification. The number of fragments in each category within the dataset is also listed to emphasize the large variance in the number of samples among the different categories.
The performance of the Maximum Entropy classifiers along the joint Focus and the Evidence dimension on the dataset Frag_FE
| Category | No. of Fragments | Precision | Recall | Accuracy | |||||
|---|---|---|---|---|---|---|---|---|---|
| ME_1 | ME_2 | ME_1 | ME_2 | ME_1 | ME_2 | ME_1 | ME_2 | ||
| FOCUS | |||||||||
| S | 1420 | 0.91 | 0.94 | 0.96 | 0.97 | 0.94 | 0.96 | ||
| M | 620 | 0.90 | 0.92 | 0.84 | 0.90 | 0.87 | 0.91 | ||
| G | 69 | 0.77 | 0.83 | 0.31 | 0.38 | 0.44 | 0.52 | ||
| Average | |||||||||
| Weighted average | 0.90 | 0.90 | 0.90 | 0.90 | |||||
| EVIDENCE | |||||||||
| 696 | 0.82 | 0.80 | 0.91 | 0.91 | 0.86 | 0.85 | |||
| 53 | 0.65 | 0.70 | 0.28 | 0.40 | 0.39 | 0.51 | |||
| 493 | 0.90 | 0.93 | 0.87 | 0.84 | 0.89 | 0.88 | |||
| 867 | 0.91 | 0.92 | 0.87 | 0.89 | 0.89 | 0.91 | |||
| Average | 0.82 | 0.74 | 0.78 | ||||||
| Weighted average | 0.87 | 0.87 | 0.87 | 0.87 | 0.87 | 0.87 | 0.87 | ||
The dataset contains 1977 sentences (2109 fragments). ME_1 is a basic maximum entropy classifier, while ME_2 incorporates constraints reflecting the correlation between the annotations along the two dimensions. The higher values on the global results (averages and accuracies) are shown in boldface.