| Literature DB >> 16867190 |
W John Wilbur1, Andrey Rzhetsky, Hagit Shatkay.
Abstract
BACKGROUND: While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined.Entities:
Mesh:
Year: 2006 PMID: 16867190 PMCID: PMC1559725 DOI: 10.1186/1471-2105-7-356
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Annotation characteristics. Each column represents an annotator: Aut1–3 are the authors of this paper and Oth1–9 are the other annotators. The first five rows show the number of fragments into which sentences were broken by each of the annotators. The counts appearing in subsequent rows are normalized by the number of fragments created by that annotator.
| Aut1 | Aut2 | Aut3 | Oth1 | Oth2 | Oth3 | Oth4 | Oth5 | Oth6 | Oth7 | Oth8 | Oth9 | ||
| Number of fragments | 88 | 80 | 86 | 40 | 60 | 85 | 81 | 40 | 47 | 81 | 68 | 83 | |
| 10 | 18 | 12 | 42 | 31 | 16 | 17 | 44 | 44 | 19 | 30 | 16 | ||
| 2 | 3 | 3 | 18 | 7 | 0 | 3 | 13 | 8 | 1 | 3 | 2 | ||
| 1 | 0 | 0 | 1 | 2 | 0 | 0 | 4 | 2 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Focus | 0.09 | 0.15 | 0.13 | 0.27 | 0.29 | 0.16 | 0.22 | 0.15 | 0.2 | 0.08 | 0.13 | 0.07 | |
| 0.24 | 0.15 | 0.23 | 0.15 | 0.15 | 0.18 | 0.18 | 0.09 | 0.15 | 0.36 | 0.23 | 0.15 | ||
| 0.71 | 0.8 | 0.82 | 0.63 | 0.63 | 0.8 | 0.63 | 0.75 | 0.65 | 0.58 | 0.71 | 0.79 | ||
| Polarity | 0.94 | 0.94 | 0.93 | 0.81 | 0.92 | 0.92 | 0.92 | 0.88 | 0.81 | 0.94 | 0.92 | 0.83 | |
| 0.06 | 0.06 | 0.07 | 0.19 | 0.08 | 0.08 | 0.08 | 0.12 | 0.19 | 0.05 | 0.08 | 0.17 | ||
| Certainty | 0.02 | 0.02 | 0.04 | 0.01 | 0 | 0.02 | 0.03 | 0.04 | 0.01 | 0.03 | 0.01 | 0.08 | |
| 0.07 | 0.05 | 0.13 | 0.04 | 0.1 | 0.05 | 0.07 | 0.02 | 0.04 | 0.11 | 0.02 | 0.03 | ||
| 0.11 | 0.10 | 0.13 | 0.04 | 0.22 | 0.05 | 0.10 | 0.01 | 0.05 | 0.18 | 0.12 | 0.03 | ||
| 0.81 | 0.83 | 0.69 | 0.91 | 0.68 | 0.88 | 0.81 | 0.92 | 0.9 | 0.68 | 0.86 | 0.87 | ||
| Evidence | 0.31 | 0.34 | 0.26 | 0.75 | 0.38 | 0.27 | 0.27 | 0.7 | 0.83 | 0.07 | 0.31 | 0.42 | |
| 0.11 | 0.09 | 0.09 | 0.03 | 0.07 | 0.09 | 0.14 | 0.03 | 0 | 0.43 | 0.15 | 0.10 | ||
| 0.19 | 0.21 | 0.24 | 0.14 | 0.11 | 0.23 | 0.23 | 0.2 | 0.13 | 0.16 | 0.3 | 0.31 | ||
| 0.41 | 0.36 | 0.43 | 0.08 | 0.44 | 0.4 | 0.4 | 0.07 | 0.05 | 0.34 | 0.28 | 0.19 | ||
| Trend | + | 0.13 | 0.03 | 0.13 | 0.12 | 0.12 | 0.03 | 0.07 | 0.03 | 0.10 | 0.05 | 0.11 | 0.03 |
| - | 0.19 | 0.06 | 0.17 | 0.19 | 0.12 | 0.07 | 0.13 | 0.04 | 0.16 | 0.06 | 0.08 | 0.11 | |
| 0.68 | 0.91 | 0.70 | 0.69 | 0.76 | 0.90 | 0.80 | 0.93 | 0.74 | 0.89 | 0.81 | 0.86 |
Pairwise agreement among the authors. The numbers presented in rows 3–7 were calculated for each pair of authors, as: (# of annotation agreements)/(# of fragments mutually annotated) but only included those sentences on which the pair of authors agreed on the number of fragments (such agreement on fragment number as a percentage is given in row 2).
| Dimension | Aut1–Aut2 | Aut2–Aut3 | Aut1–Aut3 |
| # of fragments | 0.851 | 0.881 | 0.891 |
| Focus | 0.803 | 0.731 | 0.733 |
| Polarity | 1.000 | 0.989 | 0.989 |
| Certainty | 0.861 | 0.787 | 0.800 |
| Evidence | 0.757 | 0.787 | 0.844 |
| Trend | 0.803 | 0.775 | 0.866 |
Comparison of the majority annotation among Aut1–3, with the majority annotation from Oth1–9. As the numbers total to the 101 sentences in each row, these numbers can essentially be interpreted as percentage.
| Dimension | No Aut1–3 Majority | Aut1–3 agrees with Oth1–9 Majority | Aut1–3 disagrees with Oth1–9 Majority |
| # of fragments | 2 | 88 | 11 |
| Focus | 15 | 73 | 13 |
| Polarity | 2 | 86 | 13 |
| Certainty | 7 | 77 | 17 |
| Evidence | 6 | 79 | 16 |
| Trend | 9 | 76 | 16 |
Individual scores of Oth1–9 compared with Aut1–3. Average sentence scores based on 1 point awarded for each agreement with any one of Aut1–3. The maximum possible score is 5, while 0 is the minimum.
| Annotator | Oth1 | Oth2 | Oth3 | Oth4 | Oth5 | Oth6 | Oth7 | Oth8 | Oth9 |
| Average Score | 2.10 | 2.96 | 4.25 | 4.22 | 2.15 | 2.53 | 3.84 | 3.62 | 3.84 |
Individual scores for all annotators. Average sentence scores with 1 point awarded for each agreement with the majority annotation determined over the whole set of twelve annotators. The maximum possible score is 5, while 0 is the minimum.
| Annotator | Aut1 | Aut2 | Aut3 | Oth1 | Oth2 | Oth3 | Oth4 | Oth5 | Oth6 | Oth7 | Oth8 | Oth9 |
| Ave. Score | 3.96 | 4.25 | 4.08 | 2.06 | 2.95 | 4.00 | 4.00 | 2.07 | 2.50 | 3.59 | 3.59 | 3.57 |