Literature DB >> 32253829

Risk prediction model for lung cancer incorporating metabolic markers: Development and internal validation in a Chinese population.

Zhangyan Lyu1, Ni Li1, Shuohua Chen2, Gang Wang3, Fengwei Tan4, Xiaoshuang Feng1, Xin Li1, Yan Wen1, Zhuoyu Yang1, Yalong Wang4, Jiang Li1, Hongda Chen1, Chunqing Lin1, Jiansong Ren1, Jufang Shi1, Shouling Wu2, Min Dai1, Jie He4.   

Abstract

BACKGROUND: Low-dose computed tomography screening has been proved to reduce lung cancer mortality, however, the issues of high false-positive rate and overdiagnosis remain unsolved. Risk prediction models for lung cancer that could accurately identify high-risk populations may help to increase efficiency. We thus sought to develop a risk prediction model for lung cancer incorporating epidemiological and metabolic markers in a Chinese population.
METHODS: During 2006 and 2015, a total of 122 497 people were observed prospectively for lung cancer incidence with the total person-years of 976 663. Stepwise multivariable-adjusted logistic regressions with Pentry  = .15 and Pstay  = .20 were conducted to select the candidate variables including demographics and metabolic markers such as high-sensitivity C-reactive protein (hsCRP) and low-density lipoprotein cholesterol (LDL-C) into the prediction model. We used the C-statistic to evaluate discrimination, and Hosmer-Lemeshow tests for calibration. Tenfold cross-validation was conducted for internal validation to assess the model's stability.
RESULTS: A total of 984 lung cancer cases were identified during the follow-up. The epidemiological model including age, gender, smoking status, alcohol intake status, coal dust exposure status, and body mass index generated a C-statistic of 0.731. The full model additionally included hsCRP and LDL-C showed significantly better discrimination (C-statistic = 0.735, P = .033). In stratified analysis, the full model showed better predictive power in terms of C-statistic in younger participants (<50 years, 0.709), females (0.726), and former or current smokers (0.742). The model calibrated well across the deciles of predicted risk in both the overall population (PHL  = .689) and all subgroups.
CONCLUSIONS: We developed and internally validated an easy-to-use risk prediction model for lung cancer among the Chinese population that could provide guidance for screening and surveillance.
© 2020 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

Entities:  

Keywords:  lung cancer; metabolic markers; prospective study; risk prediction model

Mesh:

Substances:

Year:  2020        PMID: 32253829      PMCID: PMC7286442          DOI: 10.1002/cam4.3025

Source DB:  PubMed          Journal:  Cancer Med        ISSN: 2045-7634            Impact factor:   4.452


INTRODUCTION

Lung cancer remains the leading cause of death from cancer worldwide. In China, lung cancer has been a serious issue in terms of public health. According to the data from the International Agency for Research on Cancer (IARC), in 2018, 37.0% of new cancer cases and 39.2% of cancer‐related deaths occurred in China. The survival rate of lung cancer was poor (16.1%) in China, however, the prognosis varies greatly at different stages of diagnosis. The 5‐year survival was <10% for stage IV lung cancer patients, but over 77% for patients with stage I diagnosis. Taken together, early detection and prevention strategies could have a profound effect on the reduction of the overall disease burden attributable to lung cancer. It has been shown that lung cancer screening is beneficial. The low‐dose computed tomography (LDCT) screening was shown by the National Lung Screening Trial to reduce lung cancer mortality in asymptomatic high‐risk smokers in 2011. Then, annual screening for lung cancer with LDCT in adults aged 55‐80 years who were current or former (<15 years since quitting) smokers (≥30 pack‐years) were recommended by the US Preventive Services Task Force (USPSTF). However, screening using LDCT could lead to a huge number of indeterminate nodules, and a significant proportion of lung cancer cases could not meet the screening entry criteria defined by USPSTF. Therefore, accurate identification of high‐risk subpopulation to be screened is critical to maximize the efficacy of lung cancer screening. An accurate lung cancer risk prediction model can contribute effectively to the identification of high‐risk individuals. There have been several lung cancer risk prediction models, primarily on the basis of established risk factors such as smoking, occupational exposures, family history of lung cancer, and respiratory diseases. , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Previous studies have shown that lipids and high‐sensitivity C‐reactive protein (hsCRP) were predictive of lung cancer risk. , However, evidence on the predictive performance of these markers in lung cancer beyond smoking‐based epidemiological models is limited. Moreover, there is no risk prediction model for lung cancer among Chinese mainland population based on traditional epidemiological risk factors and biomarkers. Therefore, in the present study, with the focus on established risk factors for lung cancer routinely available in general clinical settings, we aimed to develop and internally validated a risk prediction model for lung cancer.

METHODS

Study population

The Kailuan cohort is a large prospective dynamic cohort study in Tangshan City, China. The details of the study design and procedure were published previously. In brief, since May 2006, a total of 138 150 thousand employees including retired individuals aged more than 18 years were invited to participate in questionnaire interviews and clinical examinations every 2 years at 11 hospitals that are affiliated with the Kailuan Group. Participants who provided informed consent and completed the questionnaire interview were enrolled in the present study. Participants with a diagnosis of cancer before the baseline survey (n = 555) or had missing information on covariates included in the models (n = 15 653) were excluded. Ultimately, a total of 122 497 participants were included in the final analysis in this study. The study was approved by the Medical Ethics Committee of the Kailuan Medical Group. All participants have signed written informed consent forms.

Exposure assessment

Standardized questionnaires and health examination for all individuals were conducted by trained staff at baseline. Information regarding demographics, lifestyle factors, personal medical history, and family history of common noninfectious chronic disease (NCD) as potential indicators were collected. Smoking was defined as smoking ≥1 cigarette per week for at least 12 months. Drinking was defined as drinking ≥1 time per month for at least 6 months. In addition, we derived information on coal dust exposure from each miner's work history. The weight and height of the individuals were measured on standard stadiometers and scales without wearing shoes. The body mass index (BMI) was calculated by weight (kg)/height (m2). The waist circumference (WC) was measured at the midpoint between the supramargin of the iliac crest plane and the lower edge of the rib. The blood pressure (BP) was measured on the left arm using a mercury sphygmomanometer according to the standard recommended procedures. Systolic blood pressure (SBP) was defined as the point at which the first of two or more Korotkoff sounds are heard, and diastolic blood pressure (DBP) was defined as the disappearance of Korotkoff sound. We obtained morning fasting venous blood samples of all participants, and then processed and analyzed according to a standard operating procedure. The Hexokinase method was used for the measurement of fasting blood glucose (FBG). The details of the measurement of blood lipids, including total cholesterol (TC), triglycerides (TG), low‐density lipoprotein cholesterol (LDL‐C), and high‐density lipoprotein cholesterol (HDL‐C) have been introduced in previously published studies. Regarding variables, we assessed potential factors including age (<45, 45‐55, 55‐65, or ≥65 years), gender (male, or female), educational level (illiterate or primary school, junior high school, senior high school, or college and above), status of coal dust exposure (nonexposure or exposure), degree of coal dust exposure (light, moderate, or heavy), status of smoking (never, former, or current), pack‐years of smoking (continuous), duration of smoking (<15, 15‐30, or ≥30 years), age started smoking (<20, or ≥20 years old), smoking cessation time (<15, or ≥15 years), family history of cancer (yes, or no), family history of lung cancer (yes, or no), alcohol intake status (never, former, <1 time per day, or ≥1 time per day), BMI (<18.5, 18.5‐23.9, 24.0‐27.9, or ≥28.0 kg/m2), abdominal obesity (men: WC ≥90 cm, women: WC ≥80 cm), FBG (<3.9, 3.9‐5.6, 5.6‐7.0, or ≥7.0 mmol/L), BP (low defined as SBP ≤90 mm Hg or DBP ≤60 mm Hg, normal, or high defined as SBP ≥140 mm Hg or DBP ≥90 mm Hg), TC (quintile), TG (quintile), LDL‐C (quintile), HDL‐C (quintile), and hsCRP.

Ascertainment of lung cancer cases

We followed participants beginning at the baseline examination and ending at the occurrence of cancer, death, or 31 December 2015, whichever event came first. The details of cohort follow‐up and cancer assessment have been published previously.42 In brief, people with cancer were identified through biennial health examinations and annual searches of the Tangshan medical insurance system and the Kailuan social security system. Moreover, the outcome information was further confirmed by checking discharge summaries from hospitals where participants were diagnosed or treated. The diagnosis of incident primary lung cancer was confirmed by the reviewed medical records review by clinical experts. Information on pathological diagnosis, imaging diagnosis (including ultrasonography, computerized tomographic scanning, and magnetic resonance imaging), blood biochemical examination, and alpha‐fetoprotein test was collected for the incident lung cancer assessment. Cancers were coded according to the International Classification of Diseases, Tenth Revision (ICD‐10) and lung cancer was coded as C10.

Statistical methods

Categorical variables were described by percentages and the Chi‐squared test was used to compare the difference between different groups. Continuous variables were described by mean (standard deviation) and ANOVA was conducted to compare the difference between different groups. For each risk factor, the association with lung cancer risk was first assessed adjusting for age group by logistic regression. Stepwise multivariable‐adjusted logistic regressions (P entry = .15, P stay = .20) were conducted to choose the variables included in the prediction model. Odds ratios (ORs) and 95% confidence intervals (CIs) were presented. Predicted risk of lung cancer was calculated by We exp(β+∑β)/(1 + exp(βo+∑β)), where β was the intercept, and β was the regression coefficient for risk factor X. Model discrimination was evaluated by receiver‐operating characteristic (ROC) curves and concordance statistics (C‐statistics). In addition, the internal validation of model discrimination was evaluated by 10‐fold cross‐validation. The total cohort was randomly divided into 10 subsets, the prediction model was firstly fitted in 90 percent of the population (training set), and the predictive lung cancer risk was estimated in the remaining 10 percent of the population (validation set). This procedure was repeated for all 10 subpopulations, and the average C‐statistics was calculated. The Hosmer‐Lemeshow goodness‐of‐fit test was used to evaluate the model calibration by comparing the observed and predicted probabilities. A value of P HL > .05 indicated satisfactory calibration. Subgroup analyses were performed by age (<50 years vs ≥50 years), gender (male vs female), and smoking status (never smoking vs former or current smoking). Furthermore, we calculated the integrated discrimination improvement (IDI) and the net reclassification improvement (NRI) to evaluate the added predictive ability of new factors in risk prediction models. The NRI focuses on reclassification tables constructed separately for participants with and without events, and quantifies the correct movement in categories—upwards for events and downwards for nonevents. The IDI focuses on the improvement in the mean discrimination slope and the probability of discrimination between the base model (eg, simple model) and the new models (eg, full model). Larger NRI and IDI values indicate greater improvements in model discrimination. In the secondary analysis, we evaluated all the potential predictors among participants aged more than 50 years old, to see the applicability of our model among LDCT screening targeted population. In addition, in sensitivity analyses, continuous variables were also used instead of categorical variables to examine the potential probability of improving discrimination. All analyses were conducted using the SAS software (Version 9.4; SAS Institute). All statistical tests were two sided, and the significance level was set as P < .05.

RESULTS

Basic characteristics of the study population

A sum of 122 497 participants were enrolled in this study, and the mean age was 50.53 years. The mean levels of BMI, WC, FBG, SBP, DBP, TC, TG, LDL‐C, HDL‐C, HsCRP, and were 24.16 kg/m2, 86.78 cm, 5.49 mmol/L, 130.34 mm Hg, 83.48 mm Hg, 190.97 mg/dL, 146.90 mg/dL, 93.58 mg/dL, 58.92 mg/dL, and 2.44 mg/L, respectively. In addition, the rates of tobacco and alcohol intake were 34.70% and 39.09%, respectively (Table 1).
TABLE 1

Distribution of baseline characteristics by lung cancer status, Kailuan study, 2006‐2015

CharacteristicsTotal cohort (n = 122 497)Lung cancer P value
Yes (n = 984)No (n = 121 513)
Age (years) a 50.53 (13.17)60.00 (10.07)50.46 (13.16)<.001
BMI (kg/m2) a 24.16 (3.27)23.91 (3.26)24.16 (3.27).019
WC (cm) a 86.78 (10.14)87.76 (9.90)86.78 (10.14).003
FBG (mmol/L) a 5.49 (1.69)5.56 (1.89)5.49 (1.69).251
SBP (mm Hg) a 130.34 (21.05)135.50 (21.52)130.30 (21.04)<.001
DBP (mm Hg) a 83.48 (11.81)84.01 (11.89)83.48 (11.81).159
TC (mg/dL) a 190.97 (44.53)193.62 (43.24)190.95 (44.54).062
TG (mg/dL) a 146.90 (131.47)145.06 (116.90)146.92 (131.58).620
LDL‐C (mg/dL) a 93.58 (38.58)91.00 (37.06)93.60 (38.59).036
HDL‐C (mg/dL) a 58.92 (18.03)59.44 (16.03)58.92 (18.05).304
HsCRP (mg/L) a 2.44 (6.31)3.83 (12.27)2.43 (6.24)<.001
Gender b     
Female25 695 (20.98)91 (9.25)25 604 (21.07)<.001
Male96 802 (79.02)893 (90.75)95 909 (78.93) 
Education level b
Illiterate or primary school12 430 (10.15)192 (19.51)12 238 (10.07)<.001
Junior high school81 169 (66.27)674 (68.50)80 495 (66.25) 
Senior high school18 124 (14.80)89 (9.04)18 035 (14.84) 
College and above10 761 (8.79)29 (2.95)10 732 (8.83) 
Smoking status b
Never79 995 (65.30)520 (52.85)79 475 (65.40)<.001
Former4044 (3.30)45 (4.57)3999 (3.29) 
Current38 458 (31.40)419 (42.58)38 039 (31.30) 
Smoking pack‐years b
<2018 209 (43.00)113 (24.35)18 096 (43.21)<.001
20‐4018 088 (42.71)205 (44.18)17 883 (42.70) 
≥406051 (14.29)146 (31.47)5905 (14.10) 
Smoking duration (years) b
<156644 (15.69)30 (6.47)6614 (15.79)<.001
15‐3017 992 (42.49)132 (28.45)17 860 (42.64) 
≥3017 712 (41.48)302 (65.09)17 410 (41.57) 
Age start smoking (years old) b
<2015 866 (37.33)171 (36.85)15 695 (37.34).831
≥2026 636 (62.67)293 (63.15)26 343 (62.66) 
Smoking cessation duration (years) b
<153214 (79.48)35 (77.78)3179 (79.49).777
≥15830 (20.52)10 (22.22)820 (20.51) 
Alcohol intake status b
Never74 610 (60.91)556 (56.50)74 054 (60.94)<.001
Former3587 (2.93)58 (5.89)3529 (2.90) 
Current44 300 (36.16)370 (37.60)43 930 (36.15) 
Coal dust exposure status b
Nonexposure57 784 (47.17)474 (48.17)57 310 (47.16).529
Exposure64 713 (52.83)510 (51.83)64 203 (52.84) 
Degree of coal dust exposure b
Light33 680 (52.05)214 (41.96)33 466 (52.13)<.001
Moderate13 570 (20.97)120 (23.53)13 450 (20.95) 
Heavy17 463 (26.99)176 (34.51)17 287 (26.93) 

Abbreviations: BMI, body mass index; DBP, diastolic blood pressure; FBG, fasting blood glucose; HDL‐C, high‐density lipoprotein cholesterol; HsCRP, high‐sensitivity C‐reactive protein; LDL‐C, low‐density lipoprotein cholesterol; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides; WC, waist circumference.

Mean (standard deviation), P values from ANOVA.

N (%), P values from the Chi‐squared test.

Distribution of baseline characteristics by lung cancer status, Kailuan study, 2006‐2015 Abbreviations: BMI, body mass index; DBP, diastolic blood pressure; FBG, fasting blood glucose; HDL‐C, high‐density lipoprotein cholesterol; HsCRP, high‐sensitivity C‐reactive protein; LDL‐C, low‐density lipoprotein cholesterol; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides; WC, waist circumference. Mean (standard deviation), P values from ANOVA. N (%), P values from the Chi‐squared test. By December 2015, with a median period of follow‐up of 8.87 (7.09‐9.15) years and a sum of 976 663 person‐years, a total of 984 (0.80%) primary lung cancer cases were identified. Lung cancer cases were typically older, with a lower BMI, lower educational level (junior high school or below), and were more inclined to smoke and drink compared with controls (all P < .05). Moreover, the levels of LDL‐C (P = .036) and BMI (P = .019) were lower in lung cancer cases than controls, while the levels of WC (P = .003), SBP (P < .001), and HsCRP (P < .001) were significantly higher in lung cancer cases than in those without lung cancer (Table 1).

Predictors included in models

Multivariable logistic regression model showed that older age (≥45 years: OR=4.36, 3.25‐5.86; ≥55 years: OR = 7.48, 5.60‐10.01; ≥65 years OR = 13.01, 9.77‐17.57), male (OR = 1.77 1.40‐2.23), smoking status (former smoker: OR = 1.08, 0.77‐1.52; current smoker: OR = 1.77, 1.50‐2.07), alcohol intake status (former drinker: OR = 1.36, 1.01‐1.82), and high HsCRP levels (1‐3 mg/L:OR = 1.156, 1.00‐1.35; ≥3 mg/L:OR = 1.20, 1.02‐1.41) were positively associated with incident lung cancer risk, however, the inverse association was showed in the participants with higher BMI (overweight [24.5 ≤ BMI < 28.0 kg/m2]: OR = 0.83, 0.72‐0.95; obesity [BMI ≥28.0 kg/m2]: OR = 0.77, 0.62‐0.96), coal dust exposure (OR = 0.89, 0.78‐1.01), or with lower LDL‐C (≥120 mg/dL: OR = 0.76, 0.63‐0.92) (Table 2). In addition, age‐adjusted logistic regression showed a positive association of smoking duration and age started smoking with lung cancer risk. However, no association was found between the risk of lung cancer with smoking cessation time, family history of cancer, family history of lung cancer, abdominal obesity, FBG, BP, TC, TG, and HDL‐C (Table S1).
TABLE 2

Age and multivariable adjusted ORs and 95% CIs of the predictors with lung cancer risk, Kailuan study, 2006‐2015

PredictorsCase/control

Age‐adjusted

OR 95% CI a

Coefficient

Multi‐adjusted

OR 95% CI b

Age, years
<4555/37 4471.00 1.00
45‐55252/38 2254.49 (3.35‐6.01)1.4614.36 (3.25‐5.86)
55‐65334/29 0487.82 (5.88‐10.41)1.9977.48 (5.60‐10.01)
≥65343/16 79313.90 (10.45‐18.48)2.56213.01 (9.77‐17.57)
P trend  <.001 <.001
Gender
Female91/25 6041.00 1.00
Male893/95 9092.11 (1.70‐2.62)0.5671.77 (1.40‐2.23)
P  <.001 <.001
Smoking status
Never520/79 4751.00 1.00
Former45/39991.29 (0.95‐1.75)0.1341.08 (0.77‐1.52)
Current419/38 0391.93 (1.69‐2.20)0.5791.77 (1.50‐2.07)
P trend  <.001 <.001
Smoking pack‐years
Never520/79 4751.00 1.00
<2093/16 9981.26 (1.07‐1.49)0.2841.34 (1,08‐1.66)
20‐40168/14 4021.72 (1.46‐2.02)0.4551.56 (1.32‐1.85)
≥401456/58942.60 (2.16‐3.12)0.8502.33 (1.92‐2.82)
P trend  <.001 <.001
Alcohol intake status
Never556/74 0541.00 1.00
Former58/35291.77 (1.35‐2.32)0.2811.36 (1.01‐1.82)
Current370/43 9301.39 (1.21‐1.58)‐0.0540.96 (0.82‐1.13)
P trend  <.001 .063
Coal dust exposure status
Nonexposure474/57 3101.00 1.00
Exposure510/64 2031.07 (0.95‐1.22)‐0.1100.89 (0.78‐1.01)
P  .271 .082
BMI, kg/m2
<18.535/31101.27 (0.90‐1.80)0.1771.20 (0.85‐1.70)
18.5‐23.9497/57 9811.00 1.00
24.0‐27.9351/46 0850.83 (0.72‐0.95)‐0.1890.83 (0.72‐0.95)
≥28.0101/14 3370.79 (0.64‐0.98)‐0.2230.77 (0.62‐0.96)
P trend  .008 .007
LDL‐C, mg/dL
<70273/26 0311.00 1.00
70‐87176/25 0620.77 (0.63‐0.93)‐0.2640.76 (0.63‐0.92)
87‐100173/23 8530.83 (0.69‐1.01)‐0.1970.81 (0.67‐0.99)
100‐120180/23 0460.88 (0.72‐1.06)‐0.1580.84 (0.69‐1.02)
≥120182/23 5210.78 (0.64‐0.94)‐0.2980.72 (0.60‐0.88)
P trend  .032 .009
HsCRP, mg/L
<1.0427/62 8131.00 1.00
1.0‐3.0293/33 8661.12 (0.96‐1.30)0.1411.16 (1.00‐1.35)
≥3.0264/24 8341.17 (1.00‐1.37)0.1841.20 (1.02‐1.41)
P trend  <.001 .045
Intercept  ‐6.936 

Abbreviations: CI, confidence interval; OR, odd ratio.

Adjust for age class (<40, 40‐49, 50‐59, ≥60 y).

Multivariable logistic regression model was used with additional adjustment for all the other listed variables.

Age and multivariable adjusted ORs and 95% CIs of the predictors with lung cancer risk, Kailuan study, 2006‐2015 Age‐adjusted OR 95% CI Multi‐adjusted OR 95% CI Abbreviations: CI, confidence interval; OR, odd ratio. Adjust for age class (<40, 40‐49, 50‐59, ≥60 y). Multivariable logistic regression model was used with additional adjustment for all the other listed variables. In the present study, we considered two set of models: the epidemiological model included six established predictors for lung cancer including age, gender, smoking status, alcohol intake status, coal dust exposure status, and BMI; and then through stepwise logistic regression, the full model additionally included two metabolic markers including HsCRP and LDL‐C (Table 2).

Predictive performance of the models

The epidemiological risk prediction model generated a C‐statistic of 0.731. Significant improvement in C‐statistics was observed when the full model (C‐statistic = 0.735, P = .033) was compared to the epidemiological model (Table 3). ROC curves also suggested improved discrimination when adding metabolic markers to the epidemiological models (Figure 1). Stratified analysis by age showed that the discriminatory performance of the full model was better in participants <50 years (C‐statistic, 0.709) than in participants aged ≥50 years (C‐statistic, 0.655). Moreover, the full models yield better C‐statistic in females (C‐statistic, 0.726) than in males (C‐statistic, 0.716). Notably, the C‐statistic of the full model in former or current smokers (0.742) was higher than in never smokers and was statistically significantly higher than the C‐statistic of the epidemiological model in former or current smokers (0.735, P = .016) (Table 3).
TABLE 3

Predictive performance (C‐statistics) of the risk prediction models for lung cancer, Kailuan study, 2006‐2015

 Case/controlEpidemiological model a Full model b
C‐statisticsC‐statistics P value
Overall984/121 5130.7290.735.015
By age
<50142/52 7830.7090.709.987
≥50842/68 7300.6490.655.012
By gender
Female91/25 6040.7300.726.587
Male893/95 9090.7170.716.046
By smoking status
Never520/79 4750.7450.766.093
Former or current464/42 0380.7360.742.107

Abbreviations: BMI, body mass index; HsCRP, high‐sensitivity C‐reactive protein; LDL‐C, low‐density lipoprotein cholesterol.

Epidemiological model: included age, gender, smoking status, smoking pack‐years, alcohol intake status, coal dust exposure status, and BMI.

Full model: additionally, included HsCRP and LDL‐C.

FIGURE 1

Receive operation curve for lung cancer risk prediction models, Kailuan study, 2006‐2015

Predictive performance (C‐statistics) of the risk prediction models for lung cancer, Kailuan study, 2006‐2015 Abbreviations: BMI, body mass index; HsCRP, high‐sensitivity C‐reactive protein; LDL‐C, low‐density lipoprotein cholesterol. Epidemiological model: included age, gender, smoking status, smoking pack‐years, alcohol intake status, coal dust exposure status, and BMI. Full model: additionally, included HsCRP and LDL‐C. Receive operation curve for lung cancer risk prediction models, Kailuan study, 2006‐2015 The results of internal validation by 10‐fold cross‐validation showed the stability of the models’ predictive power. The average C‐statistic of the epidemiological model and the full model were 0.728 and 0.735, respectively (Table S2). Table S3 showed reclassification results. Compared with the epidemiological model, statistically significant (P < .001) higher NRI was observed for the full model (15.4, 95% CI, 9.1‐21.6). Similarly, we found statistically significant improvement for the IDI (P < .001) for the full model (0.03, 95% CI, 0.02‐0.05). The full model showed good calibration across deciles of predicted risk (P HL = .689). The predicted risk for lung cancer was 2.40% in the highest decile compared with 0.09% in the lowest decile (OR, 38.26; 95% CI, 18.95‐77.23) (Table 4). Meanwhile, the full model also showed good calibration in all subpopulations.
TABLE 4

Calibration of risk prediction models for lung cancer overall and by age across deciles of predicted risk, Kailuan study, 2006‐2015

 Decile of predicted risk of lung cancer P HL a OR (95% CI) b
12345678910
Overall
Observed81826576784105138179302  
Expected10.416.125.844.365.789.5110.8138.0187.6295.7  
Observed risk0.07%0.15%0.21%0.47%0.55%0.69%0.86%1.13%1.47%2.45%  
Predicted risk0.09%0.13%0.21%0.36%0.54%0.73%0.90%1.13%1.54%2.40%.68938.26 (18.95‐77.23)
Age <50
Observed334137117213142  
Expected3.34.25.67.78.19.212.617.028.445.9  
Observed risk0.060.060.080.220.150.200.150.370.590.79  
Predicted risk0.060.080.100.140.160.180.240.320.520.85.35314.22 (4.04‐45.89)
Age ≥50
Observed35375549718272104129208  
Expected27.037.848.257.465.576.891.6109.7131.7196.5  
Observed risk0.500.530.790.691.041.181.041.481.862.98  
Predicted risk0.390.550.690.820.951.101.321.571.892.82.2176.08 (4.24‐8.71)
Female
Observed01238612102524  
Expected0.51.01.63.77.89.310.312.317.826.7  
Observed risk0.00%0.04%0.07%0.04%0.24%0.35%0.31%0.43%0.78%1.30%  
Predicted risk0.02%0.04%0.05%0.07%0.27%0.34%0.40%0.46%0.57%1.35%.659NA
Male
Observed16143058707687126153263  
Expected11.817.127.148.967.480.9100.1126.4160.5252.8  
Observed risk0.17%0.15%0.31%0.60%0.72%0.81%0.89%1.33%1.58%2.66%  
Predicted risk0.12%0.18%0.28%0.51%0.70%0.86%1.03%1.31%1.68%2.56%.53616.55 (9.98‐27.44)
Never smoker
Observed4715333349707798134  
Expected6.08.511.729.041.651.762.874.196.2138.3  
Observed risk0.050.090.190.410.410.600.890.981.261.62  
Predicted risk0.070.110.140.360.520.650.790.951.211.70.70932.99 (12.19‐89.24)
Former or current smoker
Observed79122527394057103145  
Expected7.19.011.723.130.336.247.063.186.5150.1  
Observed risk0.160.200.300.590.620.880.971.382.303.54  
Predicted risk0.160.210.270.550.700.851.091.462.003.66.67023.00 (10.76‐49.13)

Abbreviations: CI, confidence interval; OR, odd ratio.

P HL: P value for Hosmer‐Lemeshow goodness‐of‐fit test; P > .05 indicates adequate fit.

OR of lung cancer for the top decile compared with the bottom decile of predicted prevalence.

Calibration of risk prediction models for lung cancer overall and by age across deciles of predicted risk, Kailuan study, 2006‐2015 Abbreviations: CI, confidence interval; OR, odd ratio. P HL: P value for Hosmer‐Lemeshow goodness‐of‐fit test; P > .05 indicates adequate fit. OR of lung cancer for the top decile compared with the bottom decile of predicted prevalence. To test the broad utility of our models for the LDCT screening set, in secondary analysis, we considered only participants aged more than 50 years old. As shown in Table S4, through stepwise regression, the included predictors and the corresponding associations were almost the same with the model developed among the whole population, which confirmed the stability and potential utility of our present models. Finally, in the sensitivity analysis, if continuous variables were used instead of categorical variables, the C‐statistics of the full models were not improved (C‐statistic, 0.728).

DISCUSSION

In this study, we developed and validated internally two sets of risk prediction models for lung cancer based on data from routine health check‐ups, aiming at providing simple and efficient tools for tailored lung cancer screening by identity high‐risk subpopulations effectively. Our results showed that the model that included solely demographic information and lifestyle behavior information could strongly discriminate incident lung cancer cases from noncases. Moreover, the incorporation of CRP and LDL‐C as metabolic markers provided a satisfactory increase in discriminatory performance (C‐statistic for the full model, 0.735). Because all the indicators included in this model can be acquired easily from general clinical or screening sets, the potential of translating into use is great. Internal validation suggested the models may perform well regarding model discrimination when applied to other populations. The evidence base for the included predictors is one of the important measurements of the validity of a risk prediction model. In this study, all the predictors have been shown associated with lung cancer risk. It has been proven that smoking is causally associated with the risk of lung cancer since the 1950s. Additionally, alcohol intake was shown to be related to elevated lung cancer risk. , Moreover, reduced risk for lung cancer has been indicated in men or women with higher levels of BMI. , Consistent with previous evidence from epidemiological studies, in this study, we observed the positive association of smoking, alcohol, and inverse association of BMI with lung cancer risk. As for the metabolic markers, we had reported the elevated lung cancer risk for participants with low LDL‐C. Furthermore, based on 20 population‐based cohort studies in the United States, Asia, Australia, and Europe, muller et al found that former and current smokers with higher hsCRP had an increased risk of lung cancer. In addition to credible predictors, a risk prediction model should also meet performance standards related to discrimination defined as the ability to distinguish lung cancer cases from controls, and calibration defined as the consistency between observed and predicted risk for lung cancer. There have been several lung cancer prediction models for the general population developed in different population. For study design, multiple case‐control studies (eg, Liverpool Lung Project [LLP] model), and cohorts or randomized trials (eg, Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial [PLCO]m2014 model) were used for the development of lung cancer risk prediction model. In terms of study population, never smokers (eg, EPIC model), or overall population (eg, LLPi model) were included for developing risk models. To our knowledge, this study is the only study assessing CRP and lipids directly to develop a lung cancer risk prediction model. It is hard to directly compare the discriminatory performance of risk prediction models as each was developed in different populations with varying baseline risks or lengths of follow‐up time. Nevertheless, each of the models’ discriminative ability was relatively similar, with C‐statistics ranges from 0.72 to 0.86. Our model showed comparable predictive performance compared with previous studies. A major limitation of our study is that we were not able to validate the risk prediction model externally to assess the general applicability. However, the results of the internal validation suggest promisingly that this model will obtain well performance when applied to other populations. Another limitation is that because of the limited number of identified squamous cell carcinoma (SCC, n = 150), adenocarcinoma (AC, 143), and small cell lung carcinoma (SCLC, 71), we did not construct separate models for these two histologic types. However, the goal of our model is to apply in the screening setting, and previous studies indicate that many of the commonly cited risk factors for lung cancer are shared by different pathological types. Furthermore, the competing risks for death and/or development of other kinds of cancer were not corrected in present model, which may lead to potential bias in terms of the predictive accuracy of the models. Additionally, as the logistic regression model was used in this study, certain time interval predicted risk could not be calculated. Finally, information on lung function, asbestos exposure, history of pneumonia, and history of chronic obstructive pulmonary disease was not collected, so their roles in lung cancer risk prediction could not be evaluated in this study. Meanwhile, this study has its unique strengths. To the best of our knowledge, this is the first model that predicts lung cancer risk by assessing the CRP and lipids levels in a population‐based study. The present study provides a few advantages for the development of lung cancer prediction model, given the large sample size, which enables us to validate the prediction model in an independent subset of the population, as well as the detailed information from questionnaire and blood test, especially the comprehensive information which is easily available in general settings, are particularly important in the stratification of population for screening. In conclusion, we developed and validated internally a risk prediction model for lung cancer that incorporates metabolic markers, based on data from Chinese residents. The model consisted of predictors that are readily available or easily accessible in general clinical or primary care settings showed satisfactory performance in terms of both discrimination and calibration. Therefore, this model could be used effectively as a practical tool to identify high‐risk individuals for tailored lung cancer screening.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

AUTHORS’ CONTRIBUTIONS

Z Lyu, N Li, S Chen, G Wang, S Wu, M Dai, and J He were involved in conception and design. N Li, M Dai, and J He were involved in development of methodology. Z Lyu, X Feng, X Li, Y Wen, Z Yang, and Y Wang were involved in acquisition of data. Z Lyu, F Tan, J Li, H Chen, C Lin, J Ren, and J Shi were involved in analysis and interpretation of data. Z Lyu, N Li, M Dai, and J He were involved in writing, review, and/or revision of the manuscript. N Li, F Tan, J Li, H Chen, C Lin, J Ren, J Shi, M Dai, and J He were involved in administrative, technical, or material support. N Li, M Dai, and J He were involved in study supervision. All the authors approved the final version of the manuscript.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE

The study was conducted in accordance with the guidelines of the Helsinki Declaration and was approved by the Medical Ethics Committee of the Kailuan Medical Group, Kailuan Company. Written informed consent forms were obtained from all participants. Tables S1‐S4 Click here for additional data file.
  50 in total

1.  Tobacco smoke and involuntary smoking.

Authors: 
Journal:  IARC Monogr Eval Carcinog Risks Hum       Date:  2004

2.  Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.

Authors:  Michael J Pencina; Ralph B D'Agostino; Ralph B D'Agostino; Ramachandran S Vasan
Journal:  Stat Med       Date:  2008-01-30       Impact factor: 2.373

3.  Selecting High-Risk Individuals for Lung Cancer Screening: A Prospective Evaluation of Existing Risk Models and Eligibility Criteria in the German EPIC Cohort.

Authors:  Kuanrong Li; Anika Hüsing; Disorn Sookthai; Manuela Bergmann; Heiner Boeing; Nikolaus Becker; Rudolf Kaaks
Journal:  Cancer Prev Res (Phila)       Date:  2015-06-15

4.  Effect of alcohol and its metabolites in lung cancer: CAPUA study.

Authors:  Sara M Álvarez-Avellón; Ana Fernández-Somoano; Eva M Navarrete-Muñoz; Jesús Vioque; Adonina Tardón
Journal:  Med Clin (Barc)       Date:  2017-02-22       Impact factor: 1.725

5.  Assessing the performance of prediction models: a framework for traditional and novel measures.

Authors:  Ewout W Steyerberg; Andrew J Vickers; Nancy R Cook; Thomas Gerds; Mithat Gonen; Nancy Obuchowski; Michael J Pencina; Michael W Kattan
Journal:  Epidemiology       Date:  2010-01       Impact factor: 4.822

6.  Predictive accuracy of the Liverpool Lung Project risk model for stratifying patients for computed tomography screening for lung cancer: a case-control and cohort validation study.

Authors:  Olaide Y Raji; Stephen W Duffy; Olorunshola F Agbaje; Stuart G Baker; David C Christiani; Adrian Cassidy; John K Field
Journal:  Ann Intern Med       Date:  2012-08-21       Impact factor: 25.391

7.  Development and validation of a lung cancer risk prediction model for African-Americans.

Authors:  Carol J Etzel; Sumesh Kachroo; Mei Liu; Anthony D'Amelio; Qiong Dong; Michele L Cote; Angela S Wenzlaff; Waun Ki Hong; Anthony J Greisinger; Ann G Schwartz; Margaret R Spitz
Journal:  Cancer Prev Res (Phila)       Date:  2008-09

Review 8.  Screening for lung cancer with low-dose computed tomography: a systematic review to update the US Preventive services task force recommendation.

Authors:  Linda L Humphrey; Mark Deffebach; Miranda Pappas; Christina Baumann; Kathryn Artis; Jennifer Priest Mitchell; Bernadette Zakher; Rongwei Fu; Christopher G Slatore
Journal:  Ann Intern Med       Date:  2013-09-17       Impact factor: 25.391

9.  Individualized risk prediction model for lung cancer in Korean men.

Authors:  Sohee Park; Byung-Ho Nam; Hye-Ryung Yang; Ji An Lee; Hyunsun Lim; Jun Tae Han; Il Su Park; Hai-Rim Shin; Jin Soo Lee
Journal:  PLoS One       Date:  2013-02-07       Impact factor: 3.240

10.  Development of a risk prediction model for lung cancer: The Japan Public Health Center-based Prospective Study.

Authors:  Hadrien Charvat; Shizuka Sasazuki; Taichi Shimazu; Sanjeev Budhathoki; Manami Inoue; Motoki Iwasaki; Norie Sawada; Taiki Yamaji; Shoichiro Tsugane
Journal:  Cancer Sci       Date:  2018-02-21       Impact factor: 6.716

View more
  4 in total

1.  Risk prediction model for lung cancer incorporating metabolic markers: Development and internal validation in a Chinese population.

Authors:  Zhangyan Lyu; Ni Li; Shuohua Chen; Gang Wang; Fengwei Tan; Xiaoshuang Feng; Xin Li; Yan Wen; Zhuoyu Yang; Yalong Wang; Jiang Li; Hongda Chen; Chunqing Lin; Jiansong Ren; Jufang Shi; Shouling Wu; Min Dai; Jie He
Journal:  Cancer Med       Date:  2020-04-06       Impact factor: 4.452

2.  Risk-based prediction model for selecting eligible population for lung cancer screening among ever smokers in Korea.

Authors:  Boyoung Park; Yeol Kim; Jaeho Lee; Nayoung Lee; Seung Hun Jang
Journal:  Transl Lung Cancer Res       Date:  2021-12

3.  Construction and Validation of a Lung Cancer Risk Prediction Model for Non-Smokers in China.

Authors:  Lan-Wei Guo; Zhang-Yan Lyu; Qing-Cheng Meng; Li-Yang Zheng; Qiong Chen; Yin Liu; Hui-Fang Xu; Rui-Hua Kang; Lu-Yao Zhang; Xiao-Qin Cao; Shu-Zheng Liu; Xi-Bin Sun; Jian-Gong Zhang; Shao-Kai Zhang
Journal:  Front Oncol       Date:  2022-01-04       Impact factor: 6.244

4.  The roles of risk model based on the 3-XRCC genes in lung adenocarcinoma progression.

Authors:  Qun-Xian Zhang; Ye Yang; Heng Yang; Qiang Guo; Jia-Long Guo; Hua-Song Liu; Jun Zhang; Dan Li
Journal:  Transl Cancer Res       Date:  2021-10       Impact factor: 1.241

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.