| Literature DB >> 32205374 |
Mohamed Yusuf1, Ignacio Atal2,3, Jacques Li3, Philip Smith4, Philippe Ravaud3, Martin Fergie5, Michael Callaghan4, James Selfe4.
Abstract
AIMS: We conducted a systematic review assessing the reporting quality of studies validating models based on machine learning (ML) for clinical diagnosis, with a specific focus on the reporting of information concerning the participants on which the diagnostic task was evaluated on.Entities:
Keywords: clinical prediction; machine learning; medical diagnosis; reporting quality
Mesh:
Year: 2020 PMID: 32205374 PMCID: PMC7103817 DOI: 10.1136/bmjopen-2019-034568
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Item list used to extract eligible papers
| Item groups | Item list | Detailed items |
| General characteristics | Diagnostic task | What is the target condition? |
| Study objective | Is the study aiming at the development of a diagnostic method, evaluation of a diagnostic method or both? | |
| Target population | What is the population targeted by the diagnostic test? | |
| Methods | Data sources | Where and when potentially eligible participants were identified (setting, location and dates) |
| Data split | Method for partitioning the evaluation set from the training data. To assess whether participants formed a consecutive, random or convenience series. | |
| Test dataset eligibility criteria | On what basis potentially eligible participants were identified within the test dataset (such as symptoms, results from previous tests, inclusion in registry). | |
| Results | Baseline characteristics | Baseline demographic and clinical characteristics of participants |
| Diagnosis/non-diagnosis classification | Classification of the diagnosed and non-diagnosed patients within the test set. | |
| Flow diagram | Flow of participants, using a diagram. | |
| Severity | Distribution of severity of disease in those with the target condition. | |
| Alternative diagnosis | Distribution of alternative diagnoses in those without the target condition. | |
| Difference between reference test and ML test | Is there a time interval between index test and reference standard? | |
| Applicability | Does the evaluation population correspond to the setting in which the diagnosis test will be applied? |
ML, machine learning.
Figure 1Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram. From: Moher D et al.34
Study characteristics
| Items | Total n (%) |
| Year | |
| 2015 | 4 (14) |
| 2016 | 9 (32) |
| 2017 | 12 (43) |
| 2018 | 3 (11) |
| Journals | |
| | 8 (29) |
| | 2 (7) |
| | 2 (7) |
| | 2 (7) |
| | 1 (3) |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| 1 (3) | |
| Clinical Specialty | |
| Oncology | 13 (47) |
| Neurology | 5 (18) |
| Immunology | 2 (7) |
| Ophthalmology | 2 (7) |
| Others specialties* | 6 (21) |
| Task | |
| Development and evaluation | 27 (97) |
| Evaluation | 1 (3) |
*Other clinical specialities include cardiology, gastroenterology, infectious disease, psychiatry, endocrinology and various.
Reporting quality
| Items | Reported, n (%) | Not reported, n (%) | Unclear, n (%) |
| Methods | |||
| Data source | 24 (86) | 0 (0) | 4 (14) |
| Data split methods | 28 (100) | 0 (0) | 0 (0) |
| Test set eligibility criteria (evaluation set) | 23 (82) | 5 (18) | 0 (0) |
| Results | |||
| Baseline characteristic | 17 (61) | 11 (39) | 0 (0) |
| Diagnosis/non-diagnosis classification | 23 (82) | 4 (14) | 1 (4) |
| Flow diagram | 10 (36) | 18 (64) | 0 (0) |
| Disease severity | 8 (29) | 18 (64) | 2 (7) |
| Alternative diagnosis | 10 (36) | 18 (64) | 0 (0) |
| Use of reporting guideline | 0 (0) | 28 (100) | 0 (0) |
Presence of bias
| Items | Yes, n (%) | No, n (%) | Unclear, n (%) |
| Is there a time interval between reference standard ML test? | 23 (82) | 1 (4) | 4 (14) |
| Does the test population correspond to the population/setting in which the diagnosis test will be applied? | 5 (18) | 8 (29) | 15 (54) |