| Literature DB >> 31615451 |
Johanna Tolksdorf1, Michael W Kattan2, Stephen A Boorjian3, Stephen J Freedland4,5, Karim Saba6, Cedric Poyet6, Lourdes Guerrios7, Amanda De Hoedt4, Michael A Liss8, Robin J Leach8, Javier Hernandez8, Emily Vertosick9, Andrew J Vickers9, Donna P Ankerst10.
Abstract
BACKGROUND: Online clinical risk prediction tools built on data from multiple cohorts are increasingly being utilized for contemporary doctor-patient decision-making and validation. This report outlines a comprehensive data science strategy for building such tools with application to the Prostate Biopsy Collaborative Group prostate cancer risk prediction tool.Entities:
Keywords: Calibration; Discrimination; Net benefit; Prostate cancer; Risk prediction; Validation
Year: 2019 PMID: 31615451 PMCID: PMC6792191 DOI: 10.1186/s12874-019-0839-0
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Five methods for optimal prediction in the validation; , y = 1 high-grade cancer, 0 otherwise and x = vector of covariates for the i th individual for all individuals across all centers (n the total number of individuals), β0 a fixed intercept, β = (β1, …, β, …, β9) a fixed vector of parameters of length 9 for the covariates log2PSA, age, DRE, African ancestry, family history and prior negative biopsy history, as well as the interactions log2PSA and DRE, age and DRE, age and African ancestry
| Type of logistic regression | Model form | Risk predictor |
|---|---|---|
| 1.Pooled data, cohort ignored |
| |
| 2.Pooled data, cohort as random effect, median prediction |
| |
| 3.Pooled data, cohort as random effect, mean prediction | ||
| 4.Meta-analysis, fixed effects by center |
| |
| 5.Meta-analysis, random effects by center |
|
Fig. 1Prevalence of high-grade cancer for the ten PBCG cohorts ordered from highest to lowest along with sample size of the cohort
Fig. 2Stacked risk factor distributions on the x-axis and number of biopsies on the y-axis with cohorts ordered from top to bottom by overall prevalence of high-grade cancer as in Fig. 1: 1) UTHealth, 2) DurhamVA, 3) SanRaffaele, 4) MayoClinic, 5) Sunnybrook, 6) SanJuanVA, 7) UCSF, 8) MSKCC, 9) ClevelandClinic, 10) Zurich; NA denotes missing values
Fig. 3Empirical univariate odds ratios for association between risk factors (age and PSA have been converted to binary factors for the sake of illustration) and high-grade cancer to prevalence of the risk factor in the cohort. Data not shown for African Ancestry for Zurich, San Raffaele, Mayo Clinic and UCSF, and family history for UCSF because numbers were too low to reliably estimate the odds ratios. Bold indicates significance at the 0.05 level; records with unknown risk factors have been excluded
Fig. 4Medians and 95 percentile intervals (2.5 to 97.5 percentile) for comparing the AUC, negative of HLS, and net benefit at the 15% threshold between the five possible prediction methods (numbering according to Table 1: 1-Pooled data, cohort ignored; 2-Pooled data, cohort as random effect, median prediction; 3-Pooled data, cohort as random effect, mean prediction; 4-Meta-analysis, fixed effects by center; 5-Meta-analysis, random effects by center) computed across all 252 choices of five cohorts as test sets with the remaining cohorts as training sets. Positive differences indicate superiority of prediction method listed first for the respective operating characteristic
Fig. 5For each PBCG cohort as an individual test set, all other 9 PBCG cohorts were used as a training set to fit a model, which was subsequently evaluated by the AUC, HLS and net benefit at the 15% threshold. The process was then repeated for each test set using the other 8 PBCG cohorts excluding Zurich as a training set. The AUC difference is reported along with 95% confidence intervals. For the negative of HLS and net benefit at 15% threshold, median estimates of difference and 95% percentile intervals (2.5 to 97.5 percentile) are obtained via bootstrapping. Positive values indicate inclusion of Zurich improves the respective performance characteristic for the test set
Fig. 6Odds ratios and 95% confidence intervals for the final model fit to the 8492 prostate biopsies from the ten PBCG cohorts. Log2PSA means PSA in ng/ml on the log-base-2 scale, age is in years, DRE is digital rectal exam (0 normal, 1 abnormal), African is African ancestry (1 = yes, 0 = no), Family history is first-degree family history (1 = yes, 0 = no), Prior neg. biopsy is Prior negative biopsy (1 = yes, ever, 0 = never) and colons denote interactions