| Literature DB >> 20815139 |
Alex Rudniy1, Min Song, James Geller.
Abstract
Duplicate entity detection in biological data is an important research task. In this paper, we propose a novel and context-sensitive Shortest Path Edit Distance (SPED) extending and supplementing our previous work on Markov Random Field-based Edit Distance (MRFED). SPED transforms the edit distance computational problem to the calculation of the shortest path among two selected vertices of a graph. We produce several modifications of SPED by applying Levenshtein, arithmetic mean, histogram difference and TFIDF techniques to solve subtasks. We compare SPED performance to other well-known distance algorithms for biological entity matching. The experimental results show that SPED produces competitive outcomes.Entities:
Mesh:
Year: 2010 PMID: 20815139 DOI: 10.1504/ijdmb.2010.034196
Source DB: PubMed Journal: Int J Data Min Bioinform ISSN: 1748-5673 Impact factor: 0.667