| Literature DB >> 35794668 |
Paula Dhiman1,2, Jie Ma3, Constanza L Andaur Navarro4,5, Benjamin Speich3,6, Garrett Bullock7, Johanna A A Damen4,5, Lotty Hooft4,5, Shona Kirtley3, Richard D Riley8, Ben Van Calster9,10,11, Karel G M Moons4,5, Gary S Collins3,12.
Abstract
BACKGROUND: Prognostic models are used widely in the oncology domain to guide medical decision-making. Little is known about the risk of bias of prognostic models developed using machine learning and the barriers to their clinical uptake in the oncology domain.Entities:
Keywords: Machine learning; Prediction modelling; Risk of bias; Systematic review
Year: 2022 PMID: 35794668 PMCID: PMC9261114 DOI: 10.1186/s41512-022-00126-w
Source DB: PubMed Journal: Diagn Progn Res ISSN: 2397-7523
Study characteristics of the 62 included publications, by study type
| Lung | 8 (12.9) | 6 (12.5) | 2 (14.3) |
| Breast | 6 (9.7) | 6 (12.5) | - |
| Colon/colorectal/rectal | 6 (9.7) | 3 (6.3) | 3 (21.4) |
| Pancreatic | 3 (4.8) | 1 (2.1) | 2 (14.3) |
| Liver | 2 (3.2) | 2 (4.2) | - |
| Gastric | 3 (4.7) | 3 (6.3) | - |
| Head and neck | 5 (8.1) | 5 (10.4) | - |
| Spinal | 4 (6.5) | 4 (8.3) | - |
| Brain (inc. meningioma, glioblastoma) | 5 (8.1) | 4 (8.3) | 1 (7.1) |
| Oral (inc. nasopharyngeal carcinoma) | 3 (4.8) | 2 (4.2) | 1 (7.1) |
| Gynaecological (inc. cervical, ovarian, endometrial) | 6 (9.7) | 5 (10.4) | 1 (7.1) |
| Prostate/penile | 5 (8.1) | 4 (8.3) | 1 (7.1) |
| Skin (inc. melanoma) | 2 (3.2) | 1 (2.1) | 1 (7.1) |
| Other* | 4 (6.5) | 2 (4.2) | 2 (14.3) |
| Cancer patients | 55 (88.7) | 43 (89.6) | 12 (85.7) |
| General population | 6 (9.7) | 4 (8.3) | 2 (14.3) |
| Unclear | 1 (1.6) | 1 (2.1) | - |
| Binary | 48 (77.4) | 40 (83.3) | 8 (57.1) |
| 11 | 11 | - | |
| 8 | 7 | 1 | |
| 7 | 4 | 3 | |
| 6 | 4 | 2 | |
| 4 | 3 | 1 | |
| 4 | 4 | - | |
| 3 | 2 | 1 | |
| 3 | 3 | - | |
| 1 | 1 | - | |
| 1 | 1 | - | |
| Continuous | 1 (1.6) | - | 1 (7.1) |
| 1 | - | 1 | |
| Multinomial | 2 (3.2) | 2 (4.2) | - |
| 1 | 1 | - | |
| 1 | 1 | - | |
| Time to event | 11 (17.7) | 6 (12.5) | 5 (35.7) |
| 7 | 3 | 4 | |
| 1 | - | 1 | |
| 1 | 1 | - | |
| 1 | 1 | - | |
| 1 | 1 | - | |
*Other includes peritoneal carcinomatosis, incurable cancer (various), leukaemia, malignant peripheral nerve sheath tumour
Fig. 1PRISMA flow diagram of included studies
Development analysis characteristics of the 62 included publications, by study type
| Randomised controlled trial | 1 (1.6) | - | 1 (7.1) |
| Prospective cohort | 9 (14.5) | 9 (18.8) | - |
| Retrospective cohort | 14 (22.6) | 11 (22.9) | 3 (21.4) |
| Registry | 21 (33.9) | 15 (31.3) | 6 (42.9) |
| Routine care database | 9 (14.5) | 7 (14.6) | 2 (14.3) |
| Other** | 3 (4.8) | 2 (4.2) | 1 (7.1) |
| Unclear | 5 (8.1) | 4 (8.3) | 1 (7.1) |
| Primary care | 2 (3.2) | 2 (4.2) | - |
| Secondary care | 36 (58.1) | 29 (60.4) | 7 (50) |
| Tertiary care | 10 (16.1) | 7 (14.6) | 3 (21.4) |
| General population | 5 (8.1) | 3 (6.3) | 2 (14.3) |
| Other**** | 3 (4.8) | 3 (6.3) | - |
| Unclear | 6 (9.7) | 4 (8.3) | 2 (14.3) |
| No | 26 (41.9) | 24 (50) | 2 (14.3) |
| Yes | 13 (21) | 7 (14.6) | 6 (42.9) |
| Unclear | 23 (37.1) | 17 (35.4) | 6 (42.9) |
| South America | 2 (3.2) | 2 (4.2) | - |
| Asia | 8 (12.9) | 6 (12.5) | 2 (14.3) |
| Europe | 13 (21) | 13 (27.1) | - |
| Canada | 3 (4.8) | 3 (6.3) | - |
| USA | 21 (33.9) | 15 (31.3) | 6 (42.9) |
| Europe, North America, Australia | 1 (1.6) | 1 (2.1) | - |
| Europe, South America | 1 (1.6) | - | 1 (7.1) |
| South Asia, USA | 1 (1.6) | 1 (2.1) | - |
| Unclear | 12 (19.4) | 7 (14.6) | 5 (35.7) |
| Health care providers | 34 (54.8) | 27 (56.3) | 7 (50) |
| Public/patients | 2 (3.2) | 2 (4.2) | - |
| Researchers | 1 (1.6) | 1 (2.1) | - |
| Health care providers and patient/public | 4 (6.5) | 1 (2.1) | 3 (21.4) |
| Health care providers and researchers | 2 (3.2) | 2 (4.2) | - |
| Unclear | 19 (30.6) | 15 (31.3) | 4 (28.6) |
| Predict risk | 36 (58.1) | 25 (52.1) | 11 (78.6) |
| Classify patients | 25 (40.3) | 23 (47.9) | 2 (14.3) |
| Predict length of stay (continuous outcome) | 1 (1.6) | - | 1 (7.1) |
*Validation characteristics for data source are: Randomised controlled trial: 2/14 (14.3%); Prospective cohort: 3/14 (21.4%); Retrospective cohort: 4/14 (28.6%); Registry: 2/14 (14.3%); Routine care database: 2/14 (14.3%); Other (survey): 1/14 (7.1%)
**Other includes audit, survey and a combination data source of hospital and research data and a registry
***Validation characteristics for setting are: Secondary care: 7/14 (50%); Tertiary care: 4/14 (28.8%); General population: 2/14 (14.3%); Unclear: 1/14 (7.1%)
****Other includes combination of hospitals, hospices and nursing homes, NTT medical center in Tokyo and combination of primary and tertiary care
*****Validation characteristics for multicentre are: No: 8/14 (57.1%); Yes: 3/14 (21.4%); Unclear: 3/14 (21.4%)
******Validation characteristics for geographical location are: South America: 1/14 (7.1%); Asia: 5/14 (35.7%); USA: 5/14 (35.7%); Unclear: 3/14 (21.4%)
Fig. 2Bar charts showing the risk of bias ratings for each domain and the overall judgement, for the development of 152 models (left) and external validation of 37 developed models (right). “Overall” indicates the overall risk of bias; “participants” indicates bias introduced by participants or data sources; “predictors” indicates bias introduced by predictors or their assessment; “outcome” indicates bias introduced by outcomes or their assessment; “analysis” indicates bias introduced by the analysis. Values in the bars are frequency (%). * values for risk of bias (development models - predictors) are 1(1). ** values for risk of bias (validation models - overall) are 1(3)
PROBAST signalling questions for model development and validation analyses in all 62 studies
| PROBAST domain and signalling questions | Development analysis (152 models) | Validation analysis (37 models) | ||||
|---|---|---|---|---|---|---|
| 1. PARTICIPANTS | ||||||
| 1.1. Were appropriate data sources used, e.g., cohort, randomized controlled trial, or nested case–control study data? | 115 (75.7; 68.1,81.9) | 19 (12.5; 8.1,18.8) | 18 (11.8; 7.6,18.1) | 30 (81.1; 64.7,90.9) | 2 (5.4; 1.3,20) | 5 (13.5; 5.6.29.3) |
| 1.2. Were all inclusions and exclusions of participants appropriate? | 100 (65.8; 57.8,72.9) | 13 (8.6; 5,14.2) | 39 (25.7; 19.3,33.3) | 24 (64.9; 47.9,78.8) | - | 13 (35.1; 21.2.52.1) |
| 2. PREDICTORS | ||||||
| 2.1. Were predictors defined and assessed in a similar way for all participants? | 117 (77; 69.6,83) | 14 (9.2; 5.5,15) | 21 (13.8; 9.2,20.3) | 26 (70.3; 53.3,83.1) | - | 11 (29.7; 16.9.46.7) |
| 2.2. Were predictor assessments made without knowledge of outcome data? | 73 (48; 40.1,56) | 1 (0.7; 0.1,4.6) | 78 (51.3; 43.3,59.2) | 20 (54.1; 37.6,69.7) | - | 17 (46; 30.3.62.4) |
| 2.3. Are all predictors available at the time the model is intended to be used? | 91 (59.9; 51.8,67.4) | - | 61 (40.1; 32.6,48.2) | 22 (59.5; 42.7,74.3) | - | 15 (40.5; 25.7.57.3) |
| 3. OUTCOMES | ||||||
| 3.1. Was the outcome determined appropriately? | 130 (85.5; 78.9,90.3) | 4 (2.6; 1,6.9) | 18 (11.8; 7.6,18.1) | 30 (81.1; 64.7,90.9) | - | 7 (18.9; 9.1.35.3) |
| 3.2. Was a prespecified or standard outcome definition used? | 122 (80.3; 73.1,85.9) | 13 (8.6; 5,14.2) | 17 (11.2; 7,17.3) | 23 (62.2; 45.2,76.6) | 7 (18.9; 9.1,35.3) | 7 (18.9; 9.1.35.3) |
| 3.3. Were predictors excluded from the outcome definition? | 117 (77; 69.6,83) | 6 (4; 1.8,8.6) | 29 (19.1; 13.6,26.2) | 28 (75.7; 58.9,87.1) | - | 9 (24.3; 12.9.41.1) |
| 3.4. Was the outcome defined and determined in a similar way for all participants? | 115 (75.7; 68.1,81.9) | 11 (7.2; 4,12.6) | 26 (17.1; 11.9,24) | 35 (94.6; 80,98.7) | - | 2 (5.4; 1.3.20) |
| 3.5. Was the outcome determined without knowledge of predictor information? | 106 (69.7; 61.9,76.6) | 6 (4; 1.8,8.6) | 40 (26.3; 19.9,33.9) | 28 (75.7; 58.9,87.1) | - | 9 (24.3; 12.9.41.1) |
| 3.6. Was the time interval between predictor assessment and outcome determination appropriate? | 100 (65.8; 57.8,72.9) | 5 (3.3; 1.4,7.7) | 47 (30.9; 24,38.8) | 21 (56.8; 40.1,72) | 5 (13.5; 5.6,29.3) | 11 (29.7; 16.9.46.7) |
| 4. ANALYSIS | ||||||
| 4.1. Were there a reasonable number of participants with the outcome? | 44 (29; 22.2,36.7) | 77 (50.7; 42.7,58.6) | 31 (20.4; 14.7,27.6) | 10 (27; 14.9,44) | 16 (43.2; 28,59.9) | 11 (29.7; 16.9,46.7) |
| 4.2. Were continuous and categorical predictors handled appropriately? | 30 (19.7; 14.1,26.9) | 57 (37.5; 30.1,45.5) | 65 (42.8; 35.1,50.8) | 19 (51.4; 35.1,67.3) | 1 (2.7; 0.4,17.8) | 17 (46; 30.3,62.4) |
| 4.3. Were all enrolled participants included in the analysis? | 43 (28.3; 21.7,36) | 49 (32.2; 25.2,40.1) | 60 (39.5; 32,47.5) | 17 (46; 30.3,62.4) | 9 (24.3; 12.9,41.1) | 11 (29.7; 16.9,46.7) |
| 4.4. Were participants with missing data handled appropriately? | 24 (15.8; 10.8,22.5) | 70 (46.1; 38.2,54.1) | 58 (38.2; 30.7,46.2) | 6 (16.2; 7.3,32.4) | 15 (40.5; 25.7,57.3) | 16 (43.2; 28,59.9) |
| 4.5. Was selection of predictors based on univariable analysis avoided? | 68 (44.7; 37,52.8) | 49 (32.2; 25.2,40.1) | 35 (23; 17,30.4) | NA | ||
| 4.6. Were complexities in the data (e.g., censoring, competing risks, sampling of control participants) accounted for appropriately? | 10 (6.6; 3.6,11.8) | 28 (18.4; 13,25.5) | 114 (75; 67.4,81.3) | 2 (5.4; 1.3,20) | - | 35 (94.6; 80,98.7) |
| 4.7. Were relevant model performance measures evaluated appropriately? | 28 (18.4; 13,25.5) | 87 (57.2; 49.2,64.9) | 37 (24.3; 18.1,31.9) | 10 (27; 14.9,44) | 13 (35.1; 21.2,52.1) | 14 (37.8; 23.4,54.8) |
| 4.8. Were model overfitting and optimism in model performance accounted for? | 52 (34.2; 27.1,42.2) | 84 (55.3; 47.2,63) | 16 (10.5; 6.5,16.5) | NA | ||
| 4.9. Do predictors and their assigned weights in the final model correspond to the results from the reported multivariable analysis? | 24 (15.8; 10.8,22.5) | 8 (5.3; 2.6,10.2) | 120 (79; 71.7,84.7) | NA | ||
Y Yes, PY Probably yes, N No, PN Probably no, NI No information