| Literature DB >> 23497538 |
Yasunori Yamamoto1, Atsuko Yamaguchi, Akinori Yonezawa.
Abstract
BACKGROUND: There is a growing need for efficient and integrated access to databases provided by diverse institutions. Using a linked data design pattern allows the diverse data on the Internet to be linked effectively and accessed efficiently by computers. Previously, we developed the Allie database, which stores pairs of abbreviations and long forms (LFs, or expanded forms) used in the life sciences. LFs define the semantics of abbreviations, and Allie provides a Web-based search service for researchers to look up the LF of an unfamiliar abbreviation. This service encounters two problems. First, it does not display each LF's definition, which could help the user to disambiguate and learn the abbreviations more easily. Furthermore, there are too many LFs for us to prepare a full dictionary from scratch. On the other hand, DBpedia has made the contents of Wikipedia available in the Resource Description Framework (RDF), which is expected to contain a significant number of entries corresponding to LFs. Therefore, linking the Allie LFs to DBpedia entries may present a solution to the Allie's problems. This requires a method that is capable of matching large numbers of string pairs within a reasonable period of time because Allie and DBpedia are frequently updated.Entities:
Year: 2013 PMID: 23497538 PMCID: PMC3621846 DOI: 10.1186/2041-1480-4-8
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Link results. Link results for five methods/conditions: exact match, the key collision methods of fingerprint and bi-gram fingerprint, and the combined method with and without using the UMLS resources. The link ratio indicates the number of LFs with a link to their corresponding DBpedia titles divided by the total number of LFs at each bin.
F-measures for each bin and method
| 1 | 0.80 | 0.98 | 0.99 | 0.99 |
| 2 | 0.83 | 0.92 | 0.94 | 0.96 |
| 3 | 0.86 | 0.97 | 0.98 | 0.99 |
| 4 | 0.80 | 0.96 | 0.95 | 0.99 |
| 5 | 0.85 | 0.96 | 0.98 | 0.98 |
| 6 | 0.88 | 0.94 | 0.97 | 0.97 |
| 7 | 0.88 | 0.97 | 0.99 | 0.99 |
| 8 | 0.92 | 0.96 | 0.97 | 0.97 |
| 9 | 0.86 | 0.97 | 0.97 | 0.98 |
| 10 | 0.93 | 0.98 | 0.98 | 1.00 |
| 11 | 0.90 | 0.97 | 0.98 | 0.99 |
| 12 | 0.91 | 0.98 | 0.98 | 0.99 |
Each F-measure expresses the accuracy of the string match. Note that all the data have been obtained using the UMLS-based term normalisation pre-processes, and these do not indicate how many LFs have links to their corresponding DBpedia titles.
The numbers of false negatives (NPs), false positives (FPs), and true positives (TPs) for each method
| Exact match | 221 | 2 | 773 |
| Fingerprint | 59 | 9 | 928 |
| Bi-gram fingerprint | 40 | 9 | 945 |
| Combined | 28 | 4 | 967 |
Note that all of the data have been obtained using the UMLS-based term normalisation pre-processes.
Distribution of LF appearances in MEDLINE
| 1 | 10 - 49 | 69 416 |
| 2 | 50 - 99 | 10 293 |
| 3 | 100 - 199 | 5 691 |
| 4 | 200 - 299 | 2 030 |
| 5 | 300 - 399 | 1 033 |
| 6 | 400 - 499 | 639 |
| 7 | 500 - 599 | 444 |
| 8 | 600 - 699 | 329 |
| 9 | 700 - 799 | 233 |
| 10 | 800 - 899 | 185 |
| 11 | 900 - 999 | 154 |
| 12 | >= 1 000 | 1 126 |
The frequency range indicates that the number of appearances of an LF in MEDLINE falls within that range.