| Literature DB >> 18757888 |
Mounir Errami1, Zhaohui Sun, Tara C Long, Angela C George, Harold R Garner.
Abstract
In the scientific research community, plagiarism and covert multiple publications of the same data are considered unacceptable because they undermine the public confidence in the scientific integrity. Yet, little has been done to help authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. For this reason, we have made available Déjà vu, a publicly available database of highly similar Medline citations identified by the text similarity search engine eTBLAST. Following manual verification, highly similar citation pairs are classified into various categories ranging from duplicates with different authors to sanctioned duplicates. Déjà vu records also contain user-provided commentary and supporting information to substantiate each document's categorization. Déjà vu and eTBLAST are available to authors, editors, reviewers, ethicists and sociologists to study, intercept, annotate and deter questionable publication practices. These tools are part of a sustained effort to enhance the quality of Medline as 'the' biomedical corpus. The Déjà vu database is freely accessible at http://spore.swmed.edu/dejavu. The tool eTBLAST is also freely available at http://etblast.org.Entities:
Mesh:
Year: 2008 PMID: 18757888 PMCID: PMC2686470 DOI: 10.1093/nar/gkn546
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Déjà vu content by category and category definitions
| Duplication type | Count | Description |
|---|---|---|
| DISTINCT | 1379 | There are a number of reasons for different citations to have a high similarity, including citations that describe related, but very distinct publications. A pair of citations identified by computer similarity, which after inspection is, for example, clearly a continuation of a study which has evolved, and the text represents new information that is categorized as a distinct and unique work |
| DUPLICATE | 2443 | A pair of citations that was identical or nearly identical. The citations report on a study with the same or very similar results and conclusions. |
| ERRATUM | 188 | Only a fraction of the MEDLINE records that are apparently corrections to previous entries are marked as errata. If a title/abstract pair is either labeled as errata or if it is clear that a correction has been made (author list, spelling, small changes to abstract or title wording, etc.), then the errata classification is used. |
| SANCTIONED | 1619 | There are a number of reasons for different citations to have a high level of similarity, some of which play a special, very important, and very legitimate role in the reporting of science. Examples include periodic reviews, periodic guidelines, specialized databases and specialized federal register citations. Citation pairs of this type, identified through computer text similarity have been manually classified to the category sanctioned. |
| NO ABSTRACT | 16 | In some cases highly similar titles are flagged as potential duplicates, but the non-identity MEDLINE record does not contain an abstract, we designate that pair as a ‘NO ABSTRACT’ to indicate that its status cannot be determined. |
| UNVERIFIED | 69115 | Deja vu is a database of duplicate publications, as identified using a number of different techniques, with the principle one being text similarity comparisons. Those putative duplicates identified by any of these techniques, prior to human verification and assignment to another category, are initially loaded into these categories, and since our software also inspects the author lists, they are loaded into unverified categories that have either overlapping authors (SA) or not (DA). |
| TOTAL | 74 760 |
Up to date statistics and definitions are available at http://spore.swmed.edu/dejavu/help and http://spore.swmed.edu/dejavu/statistics/.
Figure 1.The Déjà vu citation presentation output. (A) Browsing interface for database content. (B) Query box to search duplicate records by author names, title, abstract, year of publication and comment words. (C) List of records in Déjà vu including PMIDs, author names, publication date and links to Medline citations and free full text when available. (D) Category filters to browse records in a particular category. (E) Side-by-side view of a duplicate record highlighting overlapping keywords in blue. (F) Miscellaneous information for each article involved.