| Literature DB >> 25887746 |
Nicolas Fiorini1, Sylvie Ranwez2, Jacky Montmain3, Vincent Ranwez4.
Abstract
BACKGROUND: Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document.Entities:
Mesh:
Year: 2015 PMID: 25887746 PMCID: PMC4367850 DOI: 10.1186/s12859-015-0513-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Illustration of USI annotation process according to four variants. During a first phase (A), documents similar to the document d to be annotated are identified. Here, this is done using PMRA*. The set of the k-NNs of d , denoted K, is identified from this ordered list. It will support the annotation calculus for d . Here k=3. This selection may be done by only taking the k top ranked documents obtained by PMRA* (B top) or after interaction with the user on a semantic map (B bottom). The set A 0 of candidate concepts for d annotation is obtained either by taking the union of concepts annotating at least one document (A 0) or two documents (A 0+) of K. This candidate set is then processed to find the medial annotation of K that will be proposed to annotate d (C).
Figure 2Impact of the variation of with the _ set-up. Computing times are expressed in milliseconds. The highest values of the F-score and the semantic score are obtained with 10≤k≤20. Increasing the value of k over 20 would only increase the computation time while not providing better results.
F-score, semantic score and processing time for different methods with =20
|
|
|
|
|
|---|---|---|---|
| PMRA* + MetaMap + Clustering (MTI) [ | 0.398 | 0.68 | N/A |
| PMRA* + LTR [ | 0.467 | 0.768 | 0.169 |
| PMRA* + algorithm implemented in Equation 5 | 0.474 | 0.785 | 0.791 |
| PMRA* + | 0.474 | 0.785 | 0.015 |
| PMRA* + | 0.521 | 0.776 | 0.003 |
| PMRA* + | 0.509 | 0.807 | 0.014 |
| PMRA* + | 0.546 | 0.802 | 0.004 |
Note that PMRA* is never taken into account for the processing time since it has already been computed in the benchmark dataset. USI running times were measured on a 2.7Ghz microprocessor and 16Go of RAM Linux machine, whereas LTR running times, kindly provided by Huang et al., were measured on a somewhat comparable configuration but on a different machine.