| Literature DB >> 35978441 |
Nikolay Borissov1,2, Quentin Haas1,2, Beatrice Minder3, Doris Kopp-Heim3, Marc von Gernler4, Heidrun Janka4, Douglas Teodoro5,6, Poorya Amini7,8.
Abstract
BACKGROUND: Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check for duplicates using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, and rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with a set of rules created by expert information specialists.Entities:
Keywords: Artificial intelligence; Bibliographic databases; Deduplication; Duplicate references; Risklick; Systematic review; Systematic review software
Mesh:
Year: 2022 PMID: 35978441 PMCID: PMC9382798 DOI: 10.1186/s13643-022-02045-9
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Description of datasets used for deduplication analysis
| N° | Dataset | Searched databases | References total | Duplicates found by experts | Remaining references |
|---|---|---|---|---|---|
| 1 | Sustainable food | MEDLINE | 7595 | 4438 | 3157 |
| Embase Ovid | |||||
| PsycINFO Ovid | |||||
| Web of Science | |||||
| Scopus | |||||
| Lilacs | |||||
| BDENF | |||||
| Google Scholar | |||||
| 2 | Healthy aging | MEDLINE | 18,314 | 7958 | 10,356 |
| Embase Ovid | |||||
| PsycINFO Ovid | |||||
| CINAHL | |||||
| Web of Science | |||||
| Cochrane Central | |||||
| 3 | Healthy lifestyle | MEDLINE | 13,522 | 7992 | 5530 |
| Embase Ovid | |||||
| Web of Science | |||||
| Cochrane Central | |||||
| Google Scholar | |||||
| 4 | Menopause onset | MEDLINE | 8057 | 4281 | 3776 |
| Embase Ovid | |||||
| Web of Science | |||||
| Cochrane Central | |||||
| Google Scholar | |||||
| 5 | Hypertension | MEDLINE | 14,024 | 9478 | 4546 |
| Embase Ovid | |||||
| CINAHL | |||||
| Web of Science | |||||
| Cochrane Central | |||||
| ClinicalTrials.gov | |||||
| Epistemonikos | |||||
| 6 | e3_gsm | Medline | 1676 | 1270 | 406 |
| Embase Ovid | |||||
| CINAHL | |||||
| Web of Science | |||||
| Cochrane Central | |||||
| ClinicalTrials.gov | |||||
| 7 | Jugular | MEDLINE | 1394 | 1345 | 49 |
| Embase Ovid | |||||
| Scopus | |||||
| Cochrane Central | |||||
| 8 | Clinical trials | Cochrane Central, ClinicalTrials.gov, WHO ICTRP | 45 | 15 | 30 |
Validated additional duplicates and missing original references in manually deduplicated datasets
| Dataset | Validated true duplicates | Validated missing original references |
|---|---|---|
| Sustainable food | 3 | 0 |
| Healthy aging | 99 | 6 |
| Healthy lifestyle | 104 | 0 |
| Menopause onset | 52 | 2 |
| Hypertension | 364 | 2 |
| e3_gsm | 46 | 1 |
| Jugular | 109 | 0 |
| Clinical trials | 0 | 0 |
Comparative table of deduplication results following experts and Deduklick analysis
| Dataset | Type | ET s | True + | True − | False + | False − | Recall | Precision | F1 |
|---|---|---|---|---|---|---|---|---|---|
| Sustain. food | Experts | 4200 | 3157 | 4435 | 0 | 3 | 99.91% | 100.00% | 99.95% |
| Deduklick | 49 | 3148 | 4435 | 0 | 12 | 99.62% | 100.00% | 99.81% | |
| Healthy aging | Experts | 4200 | 10,356 | 7853 | 6 | 99 | 99.05% | 99.94% | 99.50% |
| Deduklick | 109 | 10,394 | 7859 | 0 | 61 | 99.42% | 100.00% | 99.71% | |
| Healthy lifestyle | Experts | 4200 | 5530 | 7888 | 0 | 104 | 98.15% | 100.00% | 99.07% |
| Deduklick | 92 | 5592 | 7888 | 0 | 42 | 99.25% | 100.00% | 99.63% | |
| Menopause onset | Experts | 4200 | 3776 | 4227 | 2 | 52 | 98.64% | 99.95% | 99.29% |
| Deduklick | 24 | 3814 | 4229 | 0 | 14 | 99.64% | 100.00% | 99.82% | |
| Hypertension | Experts | 4200 | 4546 | 9112 | 2 | 364 | 92.59% | 99.96% | 96.13% |
| Deduklick | 106 | 4922 | 9114 | 0 | 5 | 99.90% | 100.00% | 99.95% | |
| e3_gsm | Experts | 4200 | 406 | 1223 | 1 | 46 | 89.82% | 99.75% | 94.53% |
| Deduklick | 19 | 447 | 1224 | 0 | 5 | 98.89% | 100.00% | 99.44% | |
| Jugular | Experts | 4200 | 49 | 1236 | 0 | 109 | 31.01% | 100.00% | 47.34% |
| Deduklick | 29 | 159 | 1236 | 0 | 1 | 99.38% | 100.00% | 99.69% | |
| Clinical trials | Experts | 4200 | 30 | 15 | 0 | 0 | 100.00% | 100.00% | 100.00% |
| Deduklick | 2 | 30 | 15 | 0 | 0 | 100.00% | 100.00% | 100.00% | |
| Experts | 4200 | 3481.3 | 4498.6 | 1.4 | 97.1 | 88.65% | 99.95% | 91.98% | |
| Deduklick | 54 | 3563.3 | 4500 | 0 | 17.5 | 99.51% | 100.00% | 99.75% |
Number of deduplicated references ordered by database source
| Dataset | Sources | Reference experts | Reference Deduklick | Difference |
|---|---|---|---|---|
| Sustainable food | MEDLINE | 1582 | 1582 | 0 |
| Embase Ovid | 291 | 294 | 3 | |
| PsycINFO Ovid | 334 | 335 | 1 | |
| Web of Science | 1508 | 1513 | 5 | |
| Scopus | 477 | 485 | 8 | |
| Lilacs | 97 | 94 | − 3 | |
| BDENF | 1 | 1 | 0 | |
| Google Scholar | 39 | 41 | 2 | |
| Other | 109 | 102 | − 7 | |
| Healthy aging | MEDLINE | 1986 | 4109 | 2123 |
| Embase Ovid | 2587 | 494 | − 2093 | |
| PsycINFO Ovid | 1164 | 1207 | 43 | |
| CINAHL | 650 | 645 | − 5 | |
| Web of Science | 1388 | 1284 | − 104 | |
| Cochrane Central | 183 | 181 | − 2 | |
| Healthy lifestyle | MEDLINE | 1961 | 4055 | 2094 |
| Embase Ovid | 3519 | 1388 | − 2131 | |
| Web of Science | 1744 | 1735 | − 9 | |
| Cochrane Central | 634 | 621 | − 13 | |
| Google Scholar | 100 | 98 | − 2 | |
| Other | 34 | 33 | − 1 | |
| Menopause onset | MEDLINE | 1835 | 1837 | 2 |
| Embase Ovid | 1167 | 1164 | − 3 | |
| Web of Science | 839 | 853 | 14 | |
| Cochrane Central | 213 | 203 | − 10 | |
| Google Scholar | 99 | 88 | − 11 | |
| Other | 128 | 98 | − 30 | |
| Hypertension | MEDLINE | 3673 | 3671 | − 2 |
| Embase Ovid | 3011 | 2844 | − 167 | |
| CINAHL | 195 | 185 | − 10 | |
| Web of Science | 1516 | 1349 | − 167 | |
| Cochrane Central | 456 | 447 | − 9 | |
| ClinicalTrials.gov | 358 | 360 | 2 | |
| Epistemonikos | 159 | 152 | − 7 | |
| Other | 110 | 94 | − 16 | |
| e3_gsm | MEDLINE | 408 | 409 | 1 |
| Embase Ovid | 631 | 611 | − 20 | |
| CINAHL | 18 | 12 | − 6 | |
| Web of Science | 97 | 83 | − 14 | |
| Cochrane Central | 47 | 44 | − 3 | |
| ClinicalTrials.gov | 59 | 60 | 1 | |
| Other | 10 | 10 | 0 | |
| Jugular | MEDLINE | 634 | 633 | − 1 |
| Embase Ovid | 447 | 367 | − 80 | |
| Scopus | 155 | 134 | − 21 | |
| Cochrane Central | 77 | 76 | − 1 | |
| Other | 32 | 25 | − 7 | |
| Clinical trials | Cochrane, ClinicalTrials.gov WHO ICTRP | 15 | 15 | 0 |
Fig. 1Example of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) deduplication flowchart report following Deduklick analysis
Fig. 2Illustration of deduplication report record with an identified duplicates and corresponding unique reference