Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Comparing Medline citations using modified N-grams.

Literature DB >> 23715801

Comparing Medline citations using modified N-grams.

Rao Muhammad Adeel Nawab¹, Mark Stevenson, Paul Clough.

Abstract

OBJECTIVE: We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information.
MATERIALS AND METHODS: Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) deletion, an item in the n-gram is removed; and (2) substitution, an item in the n-gram is substituted with a similar term obtained from the Unified Medical Language System Metathesaurus. N-grams are also weighted using a score derived from a language model. Evaluation is carried out using a set of 520 Medline citation pairs, including a set of 260 manually verified duplicate pairs obtained from the Deja Vu database.
RESULTS: The approach accurately detects duplicate Medline document pairs with an F1 measure score of 0.99. Allowing for word deletions and substitution improves performance. The best results are obtained by combining scores for n-grams of length 1-5 words. DISCUSSION: Results show that the detection of duplicate Medline citations can be improved by modifying n-grams and that high performance can also be obtained using only unigrams (F1=0.959), particularly when allowing for substitutions of alternative phrases.

Entities: Chemical

Keywords: Natural Language Processing; PubMed

Mesh：

Year: 2013 PMID： 23715801 PMCID： PMC3912705 DOI： 10.1136/amiajnl-2012-001552

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

8 in total

1 in total

Review 1. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects.

Authors: Antoine Danchin; Christos Ouzounis; Taku Tokuyasu; Jean-Daniel Zucker
Journal: Microb Biotechnol Date: 2018-05-28 Impact factor: 5.813

1 in total

Comparing Medline citations using modified N-grams.

1. Duplicate publication in the field of otolaryngology-head and neck surgery.

2. One in 13 'original' articles in the Journal of Bone and Joint Surgery are duplicate or fragmented publications.

3. An overview of MetaMap: historical perspective and recent advances.

4. Duplicate publications: redundancy in plastic surgery literature.

5. Déjà vu--a study of duplicate citations in Medline.

6. Text similarity: an alternative way to search MEDLINE.

7. Identifying duplicate content using statistically improbable phrases.

8. eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications.

Review 1. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects.