| Literature DB >> 29451552 |
Janet R Higgins1, Feng-Chang Lin2, James P Evans3.
Abstract
BACKGROUND: Plagiarism is common and threatens the integrity of the scientific literature. However, its detection is time consuming and difficult, presenting challenges to editors and publishers who are entrusted with ensuring the integrity of published literature.Entities:
Keywords: Optimization; Plagiarism detection; iThenticate
Year: 2016 PMID: 29451552 PMCID: PMC5803627 DOI: 10.1186/s41073-016-0021-8
Source DB: PubMed Journal: Res Integr Peer Rev ISSN: 2058-8615
Fig. 1Manuscripts by country of origin
Country of origin of the manuscripts with plagiarism
| Country | Manuscripts with plagiarism (%) | iThenticate score median (range)a | Self-plagiarism (%) |
|---|---|---|---|
| Brazil | 3 (38) | 25 (17–38) | 2 (67) |
| China | 23 (43) | 29 (10–53) | 10 (43) |
| France | 1 (12) | 1 (100) | |
| India | 1 (33) | 0 | |
| Islamic republic of Iran | 2 (67) | 1 (50) | |
| Italy | 6 (33) | 26 (20–37) | 4 (67) |
| Japan | 2 (50) | 0 | |
| Macedonia | 1 (100) | 0 | |
| Netherlands | 1 (10) | 1 (100) | |
| Norway | 1 (50) | 1 (100) | |
| Portugal | 1 (33) | 1 (100) | |
| Republic of Korea | 1 (25) | 1 (100) | |
| Spain | 6 (35) | 24.5 (18–33) | 5 (83) |
| Sri Lanka | 1 (100) | 0 | |
| Taiwan | 3 (60) | 26 (19–43) | 1 (33) |
| Turkey | 3 (100) | 32 (10–44) | 1 (33) |
| United States | 10 (6) | 17 (9–28) | 6 (60) |
| Grand total | 66 | 35 |
aGiven for n > 3 only
Fig. 2Receiver operating characteristics (ROC) of iThenticate scores
Sensitivity and specificity as a function of iThenticate scores
| iThenticate—overall similarity % | Sensitivity | Specificity | Sensitivity + specificity |
|---|---|---|---|
| 0 | 1.000 | 0.003 | 1.003 |
| 2 | 1.000 | 0.018 | 1.018 |
| 3 | 1.000 | 0.045 | 1.045 |
| 4 | 1.000 | 0.096 | 1.096 |
| 5 | 1.000 | 0.156 | 1.156 |
| 6 | 1.000 | 0.201 | 1.201 |
| 7 | 1.000 | 0.276 | 1.276 |
| 8 | 1.000 | 0.333 | 1.333 |
| 9 | 0.985 | 0.429 | 1.414 |
| 10 | 0.955 | 0.483 | 1.438 |
| 11 | 0.939 | 0.559 | 1.498 |
| 12 | 0.924 | 0.670 | 1.594 |
| 13 | 0.894 | 0.727 | 1.621 |
| 14 | 0.879 | 0.769 | 1.648 |
|
|
|
|
|
| 16 | 0.818 | 0.814 | 1.632 |
| 17 | 0.758 | 0.847 | 1.604 |
| 18 | 0.742 | 0.862 | 1.604 |
| 19 | 0.712 | 0.889 | 1.601 |
| 20 | 0.697 | 0.907 | 1.604 |
| 21 | 0.697 | 0.919 | 1.616 |
| 22 | 0.636 | 0.934 | 1.570 |
| 23 | 0.621 | 0.955 | 1.576 |
| 24 | 0.561 | 0.967 | 1.528 |
| 25 | 0.470 | 0.967 | 1.437 |
| 26 | 0.394 | 0.970 | 1.364 |
| 27 | 0.364 | 0.982 | 1.346 |
| 28 | 0.318 | 0.985 | 1.303 |
| 29 | 0.303 | 0.988 | 1.291 |
| 30 | 0.258 | 0.988 | 1.246 |
| 31 | 0.242 | 0.988 | 1.230 |
| 32 | 0.197 | 0.991 | 1.188 |
| 33 | 0.167 | 0.991 | 1.158 |
| 34 | 0.167 | 0.994 | 1.161 |
| 35 | 0.167 | 0.997 | 1.164 |
| 37 | 0.152 | 0.997 | 1.149 |
| 38 | 0.136 | 0.997 | 1.133 |
| 39 | 0.121 | 1.000 | 1.121 |
| 40 | 0.106 | 1.000 | 1.106 |
| 41 | 0.091 | 1.000 | 1.091 |
| 42 | 0.076 | 1.000 | 1.076 |
| 43 | 0.061 | 1.000 | 1.061 |
| 44 | 0.045 | 1.000 | 1.045 |
| 45 | 0.030 | 1.000 | 1.030 |
| 48 | 0.015 | 1.000 | 1.015 |
| 53 | 0.000 | 1.000 | 1.000 |
Italic values indicate the score that maximized both sensitivity (at 85 %) and specificity (at 80 %)
Sensitivity and specificity calculations using an iThenticate score of 15 % compared to manually detected plagiarism
| Manual curation (gold standard) | |||
|---|---|---|---|
| No plagiarism | Plagiarism | ||
| iThenticate overall similarity score 15 % | No plagiarism | 268 (80.5 %) | 10 (15.2 %) |
| Plagiarism | 65 (19.5 %) | 56 (84.8 %) | |
| Total | 333 | 66 | |
The optimal iThenticate score is 15 % where the sensitivity is 84.8 % (66 manuscripts had plagiarism by manual curation and iThenticate correctly identified 56 of those manuscripts) and the specificity is 80.5 % (333 manuscripts had no plagiarism by manual curation and iThenticate correctly identified no plagiarism in 268 of these manuscripts)
Plagiarism in USA, Spain, and China manuscripts before and after implementation of Chinese language instructions for authors
| Before implementation of Chinese language IFA | After implementation of Chinese language IFA | ||||
|---|---|---|---|---|---|
| No plagiarism | Plagiarism | No plagiarism | Plagiarism | ||
| Manuscripts country of origin | China | 30 (14.3 %) | 23 (59.0 %) | 18 (11.1 %) | 13 (52.0 %) |
| Spain | 11 (5.2 %) | 6 (15.4 %) | 12 (7.4 %) | 2 (8.0 %) | |
| United States | 169 (80.5 %) | 10 (25.6 %) | 132 (81.5 %) | 10 (40.0 %) | |
| Total | 210 | 39 | 162 | 25 | |
This table shows the column percentages of plagiarism in three countries before and after the implementation of Chinese language instructions to authors (IFA). There was no significant reduction in plagiarism, detected by manual curation, in any of the three countries analyzed (chi-square test, p = 0.821)
Fig. 3GIM’s impact factor and number of manuscripts submitted from China