Sebastian Szubert1, Andrzej Wojtowicz2, Patryk Zywica2. 1. Division of Gynecological Surgery, Poznan University of Medical Sciences, 33 Polna St., 60-535 Poznan, Poland. 2. Faculty of Mathematics and Computer Science, Adam Mickiewicz University, 87 Umultowska St., 61-614 Poznan, Poland.
Dear Editor,We carefully read the letter by Van Calster et al. (Van Calster et al., 2016a) concerning our article (Van Calster et al., 2016a, Szubert et al., 2016). We want to thank the authors for the valuable feedback on our work, and we do appreciate it. However, some of the guidelines mentioned by the authors were recently published, hence, we could not conform to them at the time of an arrangement of our research. We had chosen a different methodology, which still remains mathematically valid. Nevertheless, we carefully considered each comment, and certainly we agree with most of them. Thus, we extended our results to gain a deeper insight into performance of the ADNEX model. We hope, this new approach will facilitate drawing reliable conclusions. We are aware that with a larger study group the results would be more valuable, and according to Van Calster et al., suggestion (Van Calster et al., 2016a), we combined the datasets from two Centers, and we performed subsequent analysis.
Study group
The combined dataset consists of 327 patients, including 104 malignant and 223 benign ovarian tumors. Comprehensive patients' characteristics are presented in Supplementary materials (Tables 1 and 2). In order to evaluate both versions of ADNEX model (with and without CA-125) on the same group of patients, we imputed 9 missing values of CA-125. This was done by the predictive mean matching in multiple imputation with 100 datasets. We used age, menopause, morphology, locules number, as cites, shadowing, number of papillary projections, lesion max diameter and solid part max diameter as predictors. Statistical evaluation was performed with use of software R version 3.3.1 (2016-06-21).
Calibration
We investigated calibration for both ADNEX models. For ADNEX with CA-125, the intercept and slope equal to − 0.451 (− 0.843, − 0.065) and 0.843 (0.684, 1.028), respectively. This may suggest that predicted risks are on average overestimated (intercept < 0). For ADNEX without CA-125, the intercept and slope equal to − 0.274 (− 0.641, 0.089) and 0.805 (0.652, 0.979), respectively. Such results may imply overfitting of the model (slope < 1). The aforementioned intercept and slope narrowly deviate from the ideal values. For this reason, the calibration plots should be investigated, however, according to Van Calster et al., a graphical assessment should be done with at least 200 events and 200 nonevents(Van Calster et al., 2016b). This condition is not fulfilled within the study group; hence, we omit a detailed analysis of the plots (Fig. 1).
Results and discussion
In the current analysis, The ADNEX model with CA-125 had significantly higher AUC when compared to ADNEX without CA-125 (0.927; 95%CI 0.888–0.958 vs 0.907; 95%CI 0.868, 0.941; DeLong's test p = 0.009; see Table 3). This may suggest that lack of CA125 evaluation leads to significant drop of the performance. AUCs for both models remain high, however, the confidence intervals are wide, so there is a need to investigate this problem in further research.In the primary study we investigated only 10% cutoff, as it was proposed in the original report(Van Calster et al., 2014). However, ADNEX model may work differently depending on ovarian cancer prevalence, thus according to the guidelines by Van Calster et al., some clinical centers may prefer different cutoffs(Van Calster et al., 2015). For this reason, we have extended our original results and we have performed analysis of ADNEX at cutoffs from 3% to 30%. We have provided detailed results of ADNEX performance according to various cut-offs in supplementary materials (Tables 4 and 5). These are important findings of our study, showing the highest accuracy of the model at 30% threshold. This is in agreement with the guidelines by Van Calster et al. showing that for tertiary centers 30% cut-off may be more appropriate (Van Calster et al., 2015).When tumor groups were analyzed in pairs, the achieved AUCs indicated good overall performance. In general, the results confirm good polytomous discrimination from the first study by Van Calster et al., and were comparable to later ADNEX model evaluation studies (Table 6) (Van Calster et al., 2014, Araujo et al., 2016, Meys et al., 2016, Sayasneh et al., 2016). Both polytomous discrimination indexes (PDI) were > 0.2, which in this case is a threshold for a random performance. The models had problems in discrimination between some groups: borderline vs stage I, stage I vs stage II-IV, stage II-IV vs metastatic and in particular stage I vs metastatic. However, low AUCs for these pairs might be caused by the low prevalence of some malignant subclasses in the study group.We gave a special attention to ‘multiclass’ performance evaluation. Van Calster et al., provided guidelines for applying ADNEX model in clinical practice. The authors “believe that the aim should not be to classify tumors into a single subgroup of malignancy” and the risk should be assessed per type of malignancy(Van Calster et al., 2015). However, in our setting of external validation we were curious to know how ADNEX models behave when they have to differentiate between five categories. This can be achieved by the assessment of relative change in risk, as it was described by the authors in the aforementioned article (Van Calster et al., 2015). We have used prevalence of the total pooled dataset described in the article by Van Calster et al.(Van Calster et al., 2014). We have investigated the performance of the ADNEX models with 10% cutoff (see Tables 7–10). Indeed, with the use of relative risk calculation, such an approach improves the performance comparing to absolute risk. However, many II-IV stage tumors are misclassified as borderline tumors. There is also problem with metastatic tumors, which are commonly misclassified as other malignant types. The latter case might be investigated with a larger study group, since in this validation there are only 10 such cases. Nevertheless, the results confirm that the ADNEX models should not be used in that manner. Thus, we speculate that the best method for evaluation of ADNEX performance would be prospective analysis of clinical decision making upon ADNEX results.
Authors: K G Araujo; R M Jales; P N Pereira; A Yoshida; L de Angelo Andrade; L O Sarian; S Derchain Journal: Ultrasound Obstet Gynecol Date: 2017-04-12 Impact factor: 7.299
Authors: Sebastian Szubert; Andrzej Wojtowicz; Rafal Moszynski; Patryk Zywica; Krzysztof Dyczkowski; Anna Stachowiak; Stefan Sajdak; Dariusz Szpurek; Juan Luis Alcazar Journal: Gynecol Oncol Date: 2016-06-30 Impact factor: 5.482
Authors: B Van Calster; K Van Hoorde; W Froyman; J Kaijser; L Wynants; C Landolfo; C Anthoulakis; I Vergote; T Bourne; D Timmerman Journal: Facts Views Vis Obgyn Date: 2015
Authors: Ben Van Calster; Kirsten Van Hoorde; Lil Valentin; Antonia C Testa; Daniela Fischerova; Caroline Van Holsbeke; Luca Savelli; Dorella Franchi; Elisabeth Epstein; Jeroen Kaijser; Vanya Van Belle; Artur Czekierdowski; Stefano Guerriero; Robert Fruscio; Chiara Lanzani; Felice Scala; Tom Bourne; Dirk Timmerman Journal: BMJ Date: 2014-10-15
Authors: A Sayasneh; L Ferrara; B De Cock; S Saso; M Al-Memar; S Johnson; J Kaijser; J Carvalho; R Husicka; A Smith; C Stalder; M C Blanco; G Ettore; B Van Calster; D Timmerman; T Bourne Journal: Br J Cancer Date: 2016-08-02 Impact factor: 7.640
Authors: E M J Meys; L S Jeelof; N M J Achten; B F M Slangen; S Lambrechts; R F P M Kruitwagen; T Van Gorp Journal: Ultrasound Obstet Gynecol Date: 2017-06 Impact factor: 7.299