| Literature DB >> 18230184 |
William A Baumgartner1, K Bretonnel Cohen, Lawrence Hunter.
Abstract
BACKGROUND: Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain.Entities:
Year: 2008 PMID: 18230184 PMCID: PMC2276192 DOI: 10.1186/1747-5333-3-1
Source DB: PubMed Journal: J Biomed Discov Collab ISSN: 1747-5333
Figure 1GM system evaluation. Evaluation results for nine gene taggers are shown for two of the five corpora used (PennBioIE Oncology, left; Bio1, right). There are 45 data points in each graph. Five evaluation metrics – X, Strict: spans must match exactly; S, Sloppy: spans must overlap; L, LeftMatch: span starts must match; R, RightMatch: span ends must match; E, EitherMatch: span start or end must match – were used to evaluate each tagger. Different colors are used to distinguish between the taggers. F-measure contour lines are displayed in gray, with the corresponding value listed on the right, also in gray.
Figure 2GN system evaluation. Results from the GN system evaluation. (A) GN system performance (F-measure) as it relates to the combined gene tagger performance. (B) GN system performance based on each of the three methods for combining gene tagger output (Overlapping, Consensus, Consensus followed by Overlapping). (C) GN System performance highlighting the combination of the overlapping filter with and without use of the dictionary-based GM system. Data points generated using the other filters are shown in gray. (D) Same as C, with the presence/absence of another representative tagger shown. (E) and (F): GN system performance as it relates to combined gene tagger precision and recall, respectively.