Literature DB >> 32930665

Comment on "Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine".

Jamal Rahmani1, Roya Karimi1, Yousef Khani1, Siamak Sabour1,2.   

Abstract

Entities:  

Keywords:  AUC; area under the curve; lung cancer; prediction

Year:  2020        PMID: 32930665      PMCID: PMC7525401          DOI: 10.2196/14944

Source DB:  PubMed          Journal:  J Med Internet Res        ISSN: 1438-8871            Impact factor:   5.428


× No keyword cloud information.
We read the recent article by Wang et al [1] with great interest. This paper was published in 2019 in the Journal of Medical Internet Research. The authors aimed to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population. They used data from individual patient electronic health records (EHRs), which was extracted from the Maine Health Information Exchange network. The Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model, and the authors reported an area under the curve (AUC) of 0.88 (95% CI 0.87-0.88) for their model validation, according to a prospective cohort data. Finally, the authors concluded that their model was able to identify statewide, high-risk patients. Risk prediction models are effectively useful due to their role in decision making. However, there are some methodological commentaries that we would like to mention. First, AUC is an appropriate measure for assessing discrimination. Discrimination is defined as the ability to distinguish events versus nonevents. However, it assumes that two persons are randomly selected—one who will develop the disease and one who will not. AUC assigns a higher probability of an outcome to the one who will develop the disease. A c-index value of 0.5 expresses a random chance; however, the usual c-index for a prediction model is 0.60 to 0.85. This range can be changeable under different conditions. What we should always consider about the AUC measure is that a high value of AUC discerns excellent discrimination, but it can also reflect a situation with limited relevance. This situation might arise because the variable is related to the diagnostic or early onset of the disease instead of prediction [2,3]. Furthermore, the receiver operating characteristic (ROC) would be a good tool for binary classification, but it is not instrumental for risk stratification. For risk stratification (low- and high-risk bins), the sensitivity in low and high specificity, and positive predictive value (PPV) in high-risk bins, are more discriminating parameters for the ability of the algorithm. Second, there are several types of external validation such as validation in more recent patients (temporal validation), in other places (geographic validation), or by other investigators at other sites (fully independent validation). Having two exemplary data sets with huge sample sizes, it would be suggestible to test the above-mentioned external validity. Moreover, internal validation is a necessary part of model development. It determines the reproducibility of a developed prediction model for the derivative sample and prevents the over-interpretation of the data. Resampling techniques, such as cross-validation and bootstrapping, can be performed; bootstrap validation, in particular, appears to be the most attractive option for obtaining stable optimism-corrected estimates [2]. Furthermore, it is of importance that the authors add the validation of data production in the real world after deployment, since it would be more revealing due to the unexpected data challenges encountered during real-time usage by clinical providers. Third, a mistake that is very common occurs when referring to statistically significant P values. A P value depends on statistical, instead of clinical, logic; thus, researchers should consider judging outputs based on effect size, rather than P value. A further common issue is missing data that can influence the model development. Missing data often follow a nonrandom pattern, where there is an explanation and cause behind it. If all missing values are removed, the cause and explanation will be lost, which may affect the conclusion and the model development. To generate the model, multivariable regression techniques usually use as a stepwise model (backward is more preferable), and concomitantly checking the Akaike information criterion can help us to decide if the model fits well enough. Finally, it is important to investigate the interactions between variables in prediction studies. Developing a model, score, or index without considering interactions among variables may elicit changes to the prediction in the real world and lead to misleading messages [3-5].
  5 in total

1.  Prognosis and prognostic research: what, why, and how?

Authors:  Karel G M Moons; Patrick Royston; Yvonne Vergouwe; Diederick E Grobbee; Douglas G Altman
Journal:  BMJ       Date:  2009-02-23

2.  Predictive value of confocal scanning laser for the onset of visual field loss.

Authors:  Siamak Sabour; Fariba Ghassemi
Journal:  Ophthalmology       Date:  2013-06       Impact factor: 12.079

3.  Prediction of preterm delivery using levels of VEGF and leptin in amniotic fluid from the second trimester: prediction rules.

Authors:  Siamak Sabour
Journal:  Arch Gynecol Obstet       Date:  2014-12-10       Impact factor: 2.344

4.  Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine.

Authors:  Xiaofang Wang; Yan Zhang; Shiying Hao; Xuefeng B Ling; Le Zheng; Jiayu Liao; Chengyin Ye; Minjie Xia; Oliver Wang; Modi Liu; Ching Ho Weng; Son Q Duong; Bo Jin; Shaun T Alfreds; Frank Stearns; Laura Kanov; Karl G Sylvester; Eric Widen; Doff B McElhinney
Journal:  J Med Internet Res       Date:  2019-05-16       Impact factor: 5.428

Review 5.  How to Develop, Validate, and Compare Clinical Prediction Models Involving Radiological Parameters: Study Design and Statistical Methods.

Authors:  Kyunghwa Han; Kijun Song; Byoung Wook Choi
Journal:  Korean J Radiol       Date:  2016-04-14       Impact factor: 3.500

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.