| Literature DB >> 27418131 |
Chih-Feng Chian1, Yi-Ting Hwang2, Harn-Jing Terng3, Shih-Chun Lee4, Tsui-Yi Chao5,6, Hung Chang4, Ching-Liang Ho7, Yi-Ying Wu7, Wann-Cherng Perng1.
Abstract
Peripheral blood mononuclear cell (PBMC)-derived gene signatures were investigated for their potential use in the early detection of non-small cell lung cancer (NSCLC). In our study, 187 patients with NSCLC and 310 age- and gender-matched controls, and an independent set containing 29 patients for validation were included. Eight significant NSCLC-associated genes were identified, including DUSP6, EIF2S3, GRB2, MDM2, NF1, POLDIP2, RNF4, and WEE1. The logistic model containing these significant markers was able to distinguish subjects with NSCLC from controls with an excellent performance, 80.7% sensitivity, 90.6% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.924. Repeated random sub-sampling for 100 times was used to validate the performance of classification training models with an average AUC of 0.92. Additional cross-validation using the independent set resulted in the sensitivity 75.86%. Furthermore, six age/gender-dependent genes: CPEB4, EIF2S3, GRB2, MCM4, RNF4, and STAT2 were identified using age and gender stratification approach. STAT2 and WEE1 were explored as stage-dependent using stage-stratified subpopulation. We conclude that these logistic models using different signatures for total and stratified samples are potential complementary tools for assessing the risk of NSCLC.Entities:
Keywords: circulating tumor cells; gene expression profiling; non-small cell lung cancer
Mesh:
Substances:
Year: 2016 PMID: 27418131 PMCID: PMC5226605 DOI: 10.18632/oncotarget.10558
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Characteristics of the study sample (N = 497)
| NSCLC Cases | Non-cancer Control | Total Sample | |||||
|---|---|---|---|---|---|---|---|
| % | % | % | |||||
| Sample size | 187 | 37.63 | 310 | 62.37 | 497 | 100.00 | |
| Gender | |||||||
| Female | 73 | 39.04 | 134 | 43.23 | 212 | 41.73 | 0.3588 |
| Male | 114 | 60.96 | 176 | 56.77 | 296 | 58.27 | |
| Age | |||||||
| 36-65 | 81 | 43.32 | 127 | 40.97 | 216 | 42.52 | 0.6073 |
| 66-95 | 106 | 56.68 | 183 | 59.03 | 292 | 57.48 | |
| Smoking status | |||||||
| No | 84 | 44.92 | 218 | 70.32 | 309 | 60.83 | <0.0001 |
| Yes | 103 | 55.08 | 92 | 29.68 | 199 | 39.17 | |
The p value was obtained from the chi-square test.
Smoking status: No: non-smoker, Yes: current smoker and ever smoker.
Analysis of bivariate association of the relative mean expression of 15 investigated genes between NSCLC cases and non-cancer controls
| Gene | Total Sample | NSCLC Cases | Non-cancer Controls | ||||
|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | Mean | SD | ||
| 3.56 | 0.53 | 3.46 | 0.65 | 3.61 | 0.43 | 0.0028 | |
| −0.71 | 0.65 | −0.83 | 0.74 | −0.63 | 0.58 | 0.0008 | |
| 2.03 | 0.58 | 1.92 | 0.58 | 2.09 | 0.57 | 0.0017 | |
| −0.59 | 0.67 | −0.86 | 0.71 | −0.42 | 0.59 | 0.0000 | |
| 0.75 | 1.03 | 1.14 | 1.13 | 0.51 | 0.88 | 0.0000 | |
| 1.93 | 0.81 | 2.47 | 0.79 | 1.61 | 0.63 | 0.0000 | |
| 2.16 | 0.79 | 2.44 | 0.84 | 1.99 | 0.71 | 0.0000 | |
| −1.84 | 0.68 | −1.73 | 0.72 | −1.90 | 0.65 | 0.0093 | |
| −0.76 | 0.64 | −0.49 | 0.72 | −0.92 | 0.52 | 0.0000 | |
| 1.61 | 1.06 | 1.79 | 1.22 | 1.50 | 0.93 | 0.0034 | |
| 1.88 | 0.59 | 1.99 | 0.68 | 1.81 | 0.52 | 0.0008 | |
| 0.23 | 0.69 | 0.23 | 0.72 | 0.23 | 0.67 | 0.9488 | |
| 0.86 | 0.47 | 0.82 | 0.57 | 0.88 | 0.40 | 0.2078 | |
| 2.42 | 0.70 | 2.46 | 0.74 | 2.40 | 0.67 | 0.4122 | |
| −2.33 | 0.58 | −2.29 | 0.62 | −2.36 | 0.54 | 0.1998 | |
The p value was obtained from the independent two-sample t-test.
The gene expression level between NSCLC cases and non-cancer controls was significantly different (p < 0.05).
Multivariate analysis and selection of significant NSCLC-associated molecular markers in the total sample (N = 497). #*
| Gene | 95%CI of OR | StdEst | |||
|---|---|---|---|---|---|
| 7.71 | 4.20 | 14.13 | 0.0000 | 0.91 | |
| 7.41 | 3.63 | 15.13 | 0.0000 | 0.87 | |
| 5.36 | 2.47 | 11.65 | 0.0000 | 0.59 | |
| 0.22 | 0.10 | 0.48 | 0.0002 | −0.45 | |
| 0.35 | 0.14 | 0.88 | 0.0255 | −0.27 | |
| 0.16 | 0.07 | 0.35 | 0.0000 | −0.71 | |
| 0.22 | 0.09 | 0.55 | 0.0014 | −0.49 | |
| 0.47 | 0.25 | 0.86 | 0.0150 | −0.28 | |
| C statistic | 0.924 | ||||
odds ratio; CI: confidence interval; StdEst: standardized coefficients;
The multiple logistic model contains 15 expressed genes, with controlling for age, gender and smoking status. The other seven molecular markers (CPEB4, EXT2, IRF4, MCM4, MMD, STAT2 and ZNF264) were not significantly associated with NSCLC.
The performance of this model is presented as the sensitivity and specificity depending on the cut-off value chosen, for instance:
Cut-off value = 0.434, sensitivity = 0.807, specificity = 0.906
Cut-off value = 0.321, sensitivity = 0.861, specificity = 0.855
Cut-off value = 0.226, sensitivity = 0.904, specificity = 0.774
Figure 1Histogram of risk score of samples (Proportion)
A. Controls; B. Cases with early stage disease and C. Cases with advanced stage disease. The risk score is calculated using LCM classification model (Table 3).
Multivariate analysis and selection of NSCLC-associated molecular markers based on age- and gender-stratified subpopulations. #
| Younger women N=100 | Older women N=107 | Younger men N=108 | Older men N=182 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N(Case) | 39 | 34 | 42 | 72 | ||||||||||||
| N(Control) | 61 | 73 | 66 | 110 | ||||||||||||
| Gene | 95% | CI | StdEst | 95% | CI | StdEst | 95% | CI | StdEst | 95% | CI | StdEst | ||||
| 5.00 | 2.09 | 11.97 | 0.97 | |||||||||||||
| 4.25 | 1.69 | 10.65 | 0.66 | 13.64 | 4.23 | 44.06 | 1.29 | 4.13 | 1.74 | 9.79 | 0.63 | 34.21 | 6.67 | 175.39 | 1.46 | |
| 20.33 | 4.99 | 82.77 | 1.37 | |||||||||||||
| 6.31 | 1.83 | 21.84 | 0.70 | |||||||||||||
| 4.81 | 1.30 | 17.82 | 0.51 | 7.34 | 1.99 | 27.05 | 0.73 | |||||||||
| 0.21 | 0.05 | 0.97 | −0.45 | |||||||||||||
| 0.10 | 0.02 | 0.47 | −0.55 | 0.10 | 0.02 | 0.46 | −0.63 | |||||||||
| 0.28 | 0.09 | 0.86 | −0.49 | 0.01 | 0.00 | 0.08 | −1.77 | |||||||||
| 0.16 | 0.04 | 0.64 | −0.54 | |||||||||||||
| 0.26 | 0.07 | 0.94 | −0.39 | |||||||||||||
| 0.07 | 0.02 | 0.27 | −0.96 | 0.26 | 0.09 | 0.76 | −0.51 | |||||||||
| C statistic | 0.907 | 0.930 | 0.895 | 0.970 | ||||||||||||
odds ratio; CI: confidence interval; StdEst: standardized coefficients
Each multiple logistic model contains significant genes with controlling for smoking status.
Multivariate analysis of NSCLC-associated molecular markers based on stage-stratified subpopulations containing early-stage (I–IIIA) NSCLC cases (n = 59) and all non-cancer controls (n = 310). #*
| Variable | 95%CI of | StdEst | |||
|---|---|---|---|---|---|
| 5.43 | 2.46 | 12.01 | 0.0000 | 0.67 | |
| 5.52 | 2.28 | 13.36 | 0.0002 | 0.70 | |
| 12.36 | 4.01 | 38.09 | 0.0000 | 0.76 | |
| 0.19 | 0.07 | 0.57 | 0.0028 | −0.42 | |
| 0.16 | 0.06 | 0.44 | 0.0004 | −0.68 | |
| 0.21 | 0.06 | 0.74 | 0.0149 | −0.49 | |
| C statistic | 0.883 | ||||
odds ratio; CI: confidence interval; StdEst: standardized coefficients;
The multiple logistic model contains 15 expressed genes, with controlling for age, gender, and smoking status. The other seven molecular markers (CPEB4, EXT2, IRF4, MCM4, MMD, NF1, STAT2, WEE1, and ZNF264) were not significantly associated.
The performance of this model is presented as the sensitivity and specificity depending on the cut-off value chosen:
cut-off value =0.391, Sensitivity = 0.661, Specificity = 0.952
cut-off value =0.224, Sensitivity = 0.763, Specificity = 0.900
cut-off value =0.172, Sensitivity = 0.763, Specificity = 0.855
cut-off value =0.127, Sensitivity = 0.797, Specificity = 0.800
cut-off value =0.100, Sensitivity = 0.847, Specificity = 0.735
Multivariate analysis of NSCLC-associated molecular markers based on stage-stratified subpopulations containing advanced-stage (IIIB-IV) NSCLC cases (n = 128) and all non-cancer controls (n = 310). #*
| Variable | 95%CI of | StdEst | |||
|---|---|---|---|---|---|
| 12.49 | 5.55 | 28.12 | 0.0000 | 1.10 | |
| 20.48 | 7.12 | 58.91 | 0.0000 | 1.29 | |
| 4.43 | 1.61 | 12.16 | 0.0039 | 0.52 | |
| 0.23 | 0.08 | 0.67 | 0.0066 | −0.42 | |
| 0.10 | 0.04 | 0.30 | 0.0000 | −0.87 | |
| 0.25 | 0.08 | 0.82 | 0.0228 | −0.45 | |
| 0.32 | 0.12 | 0.85 | 0.0219 | −0.35 | |
| 0.29 | 0.13 | 0.62 | 0.0017 | −0.46 | |
| C statistic | 0.953 | ||||
odds ratio; CI: confidence interval; StdEst: standardized coefficients;
The multiple logistic model contains all 15 expressed genes, with controlling for age, gender, and smoking status. The other seven molecular markers (CPEB4, EXT2, IRF4, MCM4, MMD, NF1, and ZNF264) were not significantly associated.
The performance of this model is presented as the sensitivity and specificity depending on the cut-off value chosen:
cut-off value =0.514, Sensitivity = 0.781, Specificity = 0.952
cut-off value =0.311, Sensitivity = 0.859, Specificity = 0.903
cut-off value =0.187, Sensitivity = 0.914, Specificity = 0.852
cut-off value =0.125, Sensitivity = 0.953, Specificity = 0.768