| Literature DB >> 27121382 |
Michael W Marcus1, Olaide Y Raji1, Stephen W Duffy2, Robert P Young3, Raewyn J Hopkins3, John K Field1.
Abstract
Incorporation of genetic variants such as single nucleotide polymorphisms (SNPs) into risk prediction models may account for a substantial fraction of attributable disease risk. Genetic data, from 2385 subjects recruited into the Liverpool Lung Project (LLP) between 2000 and 2008, consisting of 20 SNPs independently validated in a candidate-gene discovery study was used. Multifactor dimensionality reduction (MDR) and random forest (RF) were used to explore evidence of epistasis among 20 replicated SNPs. Multivariable logistic regression was used to identify similar risk predictors for lung cancer in the LLP risk model for the epidemiological model and extended model with SNPs. Both models were internally validated using the bootstrap method and model performance was assessed using area under the curve (AUC) and net reclassification improvement (NRI). Using MDR and RF, the overall best classifier of lung cancer status were SNPs rs1799732 (DRD2), rs5744256 (IL-18), rs2306022 (ITGA11) with training accuracy of 0.6592 and a testing accuracy of 0.6572 and a cross-validation consistency of 10/10 with permutation testing P<0.0001. The apparent AUC of the epidemiological model was 0.75 (95% CI 0.73-0.77). When epistatic data were incorporated in the extended model, the AUC increased to 0.81 (95% CI 0.79-0.83) which corresponds to 8% increase in AUC (DeLong's test P=2.2e-16); 17.5% by NRI. After correction for optimism, the AUC was 0.73 for the epidemiological model and 0.79 for the extended model. Our results showed modest improvement in lung cancer risk prediction when the SNP epistasis factor was added.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27121382 PMCID: PMC4902078 DOI: 10.3892/ijo.2016.3499
Source DB: PubMed Journal: Int J Oncol ISSN: 1019-6439 Impact factor: 5.650
Epidemiology, clinical and lifestyle characteristics of the subjects by case-control status.
| Characteristics | Case (n=718) | Control (n=1667) | All subjects (n=2385) |
|---|---|---|---|
| Age (yrs.) | |||
| <60 | 162 (22.6) | 457 (27.41) | 619 (25.9) |
| 60–70 | 264 (36.8) | 647 (38.8) | 911 (38.2) |
| 70+ | 292 (40.7) | 563 (33.8) | 855 (35.9 |
| Gender | |||
| Male | 414 (57.7) | 969 (58.1) | 1383 (58.0) |
| Female | 304 (42.3) | 698 (41.9) | 1002 (42.0) |
| Smoking status | |||
| Never | 43 (6.0) | 575 (34.5) | 618 (25.9) |
| Former | 316 (44.0) | 820 (49.2) | 1136 (47.6) |
| Current | 353 (49.2) | 267 (16.0) | 620 (26.0) |
| Smoking duration (yrs.) | |||
| Never | 43 (6.0) | 575 (34.5) | 618 (25.9) |
| 1–20 | 38 (5.3) | 341 (20.5) | 379 (15.9) |
| 21–40 | 175 (24.4) | 440 (26.4) | 615 (25.8) |
| 41–60 | 399 (55.6) | 278 (16.7) | 677 (28.4) |
| >60 | 51 (7.1) | 27 (1.6) | 78 (3.3) |
| Previous pneumonia | |||
| Yes | 105 (14.6) | 243 (14.6) | 348 (14.6) |
| No | 590 (82.2) | 1420 (85.2) | 2010 (84.3) |
| Previous malignant | |||
| Yes | 183 (26.3) | 38 (2.3) | 221 (9.4) |
| No | 512 (73.7) | 1625 (97.7) | 2136 (90.6) |
| Asbestos exposure | |||
| Yes | 134 (18.7) | 158 (9.5) | 292 (12.2) |
| No | 395 (55.0) | 1505 (90.3) | 1900 (79.7) |
| Family lung CA | |||
| No history | 566 (78.8) | 1348 (80.9) | 1914 (80.3) |
| Early onset | 74 (10.3) | 101 (6.1) | 175 (7.3) |
| Late onset | 78 (10.9) | 218 (13.0) | 296 (12.4) |
| Histology | |||
| Squamous cell carcinoma | 239 (33.3) | - | |
| Adenocarcinoma | 228 (31.8) | - | |
| Small cell | 87 (12.1) | - | |
| NSCLC | 77 (10.7) | - | |
| Other | 87 (12.1) | - | |
Numbers do not add up to total due to missing data; NSCLC, non-small cell lung cancer.
Univariable analysis of associations between 20 candidate SNPs and lung cancer (33).
| SNP | Chromosome | Gene | Genotype | Additive model assumption | ||||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| Wild | Heterozygote | Homozygote | ||||||
|
|
|
|
| |||||
| ca/co (%) | ca/co (%) | OR (95% CI) | ca/co (%) | OR (95% CI) | P-valuetrend | |||
| rs2279115 | 18q21.3 | Bcl-2 | 30.1/29.0 | 49.0/50.4 | 0.91 (0.75, 1.11) | 20.1/20.6 | 0.91 (0.71, 1.17) | 0.91 |
| rs10115703 | 9p22.3 | Cerb1 | 86.2/84.7 | 12.7/14.6 | 0.85 (0.66, 1.10) | 1.1/0.7 | 1.66 (0.66, 4.15) | 0.21 |
| rs16969968 | 15q25.1 | α5-nAChR | 40.1/44.9 | 45.7/44.1 | 1.16 (0.96, 1.40) | 14.2/10.9 | 1.46 (1.11, 1.93) | 0.012 |
| rs2031920 | 10q26.3 | CYP2E1 | 94.7/94.7 | 5.2/5.2 | 0.99 (0.67, 1.47) | 0.1/0.1 | 1.16 (0.11, 12.8) | 0.71 |
| rs6413429 | 5p15.33 | DAT1 | 87.2/86.4 | 12.5/13.3 | 0.93 (0.72, 1.21) | 0.3/0.3 | 0.92 (0.18, 4.76) | 0.74 |
| rs1799732 | 11q23.2 | DRD2 | 79.5/79.4 | 13.0/7.4 | 1.74 (1.31, 2.32) | 7.5/13.1 | 0.57 (0.42, 0.78) | 0.30 |
| rs13181 | 19q13.32 | XPD(ERCC2) | 38.6/39.9 | 43.3/47.3 | 0.95 (0.78, 1.15) | 18.1/12.8 | 1.46 (1.13, 1.89) | 0.10 |
| rs763110 | 1q24.3 | FasL | 42.5/40.2 | 43.4/46.6 | 0.88 (0.73, 1.07) | 14.1/13.2 | 1.00 (0.77, 1.32) | 0.27 |
| rs5744256 | 11q23.1 | IL18 | 32.7/47.2 | 43.7/44.5 | 1.42 (1.16, 1.72) | 23.6/8.3 | 4.07 (3.11, 5.31) | <0.0001 |
| rs16944 | 2q13 | IL1B | 42.9/46.1 | 44.7/43.2 | 1.11 (0.92, 1.34) | 12.4/10.7 | 1.24 (0.93, 1.65) | 0.24 |
| rs4073 | 4q13.3 | IL8 | 27.6/29.9 | 51.3/47.4 | 1.17 (0.96, 1.44) | 21.7/22.7 | 1.01 (0.79, 1.30) | 0.50 |
| rs2306022 | 15q23 | ITGA11 | 65.9/83.6 | 30.6/15.4 | 2.53 (2.06, 3.12) | 3.5/1.0 | 4.09 (2.21, 7.56) | <0.0001 |
| rs2317676 | 17q21.32 | ITGB3 | 87.9/87.5 | 11.6/12.2 | 0.95 (0.72, 1.24) | 0.6/0.3 | 1.54 (0.43, 5.48) | 0.88 |
| rs1799930 | 8p22 | NAT2 | 50.3/48.4 | 39.4/42.7 | 0.89 (0.74, 1.07) | 10.3/8.9 | 1.12 (0.82, 1.52) | 0.95 |
| rs3087386 | 2q11.2 | REV1 | 31.6/31.4 | 49.7/49.4 | 0.99 (0.82, 1.22) | 18.7/19.3 | 0.96 (0.75, 1.24) | 0.63 |
| rs4934 | 14q32.13 | SERPINA3 | 26.9/27.3 | 50.3/49.2 | 1.04 (0.84, 1.28) | 22.8/23.5 | 0.99 (0.77, 1.26) | 0.99 |
| rs1799895 | 4p15.2 | SOD3 | 96.7/97.2 | 3.3/2.7 | 1.25 (0.75, 2.06) | 0.0/0.1 | - | 0.44 |
| rs5743836 | 3p21.2 | TLR9 | 71.2/69.0 | 25.4/28.1 | 0.88 (0.72, 1.07) | 3.5/2.9 | 1.15 (0.70, 1.88) | 0.24 |
| rs1139417 | 12p13.31 | TNFR1 | 32.0/31.5 | 49.3/50.8 | 0.96 (0.78, 1.16) | 18.7/17.8 | 1.03 (0.80, 1.34) | 0.96 |
| rs2273953 | 1p36.33 | TP73 | 58.5/62.8 | 35.5/31.7 | 1.20 (0.99, 1.45) | 6.0/5.5 | 1.17 (0.80, 1.70) | 0.11 |
Reference genotype; ca, cases; co, controls.
Comparison of different Multi-locus SNP combinations using MDR.
| Model of inheritance | No. of loci | Selected SNPs in selected best model | Cross Validation consistency (CV) | Balanced training accuracy | Balanced testing accuracy |
|---|---|---|---|---|---|
| Additive effect | 1 | ITGA11_rs2306022 | 10/10 | 0.5886 | 0.5886 |
| 2 | IL18_rs5744256 | 10/10 | 0.6418 | 0.6418 | |
| 3 | DRD2_rs1799732 | 10/10 | 0.6575 | 0.6538 | |
| 4 | CHRNA3_A5_rs16969968 | 4/10 | 0.6652 | 0.6321 | |
| 5 | DRD2_rs1799732 FASL_rs763110 | 6/10 | 0.6869 | 0.6178 |
Importance score results in the random forest.
| SNP | Gene name | Variable importance |
|---|---|---|
| rs5744256 | IL18 | 18.0783 |
| rs2306022 | ITGA11 | 14.2703 |
| rs1799732 | DRD2 | 4.4401 |
| rs4934 | SERPINA3 | 2.8533 |
| rs13181 | XPD(ERCC2) | 2.7543 |
| rs16969968 | α5-nAChR | 2.4906 |
| rs16944 | IL1B | 2.1737 |
| rs1139417 | TNFR1 | 1.5054 |
| rs2273953 | TP73 | 1.4667 |
| rs3087386 | REV1 | 1.4185 |
| rs1799930 | NAT2 | 1.1701 |
| rs10115703 | Cerb1 | 0.9366 |
| rs2279115 | Bcl-2 | 0.8465 |
| rs5743836 | TLR9 | 0.7407 |
| rs4073 | IL8 | 0.6093 |
| rs763110 | FasL | 0.4508 |
| rs2317676 | ITGB3 | 0.0477 |
| rs2031920 | CYP2E1 | −0.048 |
| rs1799895 | SOD3 | −0.1922 |
| rs6413429 | DAT1 | −0.3696 |
Top 3 ranked SNPs using variable importance.
Reclassification of predicted risk for cases and controls using the epidemiological model and extended model with rs1799732 (DRD2), rs5744256 (IL-18) and rs2306022 (ITGA11).
| Epidemiological model | Extended model with rs1799732 (DRD2), rs5744256 (IL-18) and rs2306022 (ITGA11) | Total | |||
|---|---|---|---|---|---|
|
| |||||
| <0.91% | 0.91 to 2.5% | >2.5 to 5.12% | >5.12% | ||
| Cases | |||||
| <0.91% | 69 (57.5) | 43 (35.8) | 8 (6.7) | 0 (0) | 120 |
| 0.91 to 2.5% | 15 (12.4) | 46 (38.0) | 46 (38.0) | 14 (11.6) | 121 |
| >2.5 to 5.12% | 0 (0) | 43 (26.7) | 49 (30.4) | 69 (42.9) | 161 |
| >5.12 | 2 (0.6) | 9 (2.8) | 62 (19.1) | 252 (77.5) | 325 |
| Total | 86 | 141 | 165 | 335 | 727 |
| Controls | |||||
| <0.91% | 726 (89.9) | 77 (9.5) | 4 (0.5) | 1 (0.1) | 808 |
| 0.91 to 2.5% | 180 (45.0) | 147 (36.7) | 58 (14.5) | 15 (3.8) | 400 |
| >2.5 to 5.12% | 20 (8.8) | 85 (37.4) | 70 (30.8) | 52 (22.9) | 227 |
| >5.12% | 3 (1.3) | 29 (13.1) | 68 (30.6) | 122 (55.0) | 222 |
| Total | 929 | 338 | 200 | 190 | 1657 |
Summary of multivariable risk model for the epidemiological model and the extended model with rs1799732 (DRD2), rs5744256 (IL-18) and rs2306022 (ITGA11).
| Epidemiological model | Extended model with SNPs | |||
|---|---|---|---|---|
|
|
| |||
| Covariates | OR (95%CI) | P-values | OR (95%CI) | P-values |
| Age | 1.01 (0.99–1.02) | 0.312 | 1.00 (0.99–1.02) | 0.610 |
| Gender | 1.24 (0.95–1.63) | 0.107 | 1.14 (0.87–1.52) | 0.340 |
| Smoking duration (years) | ||||
| None | 1.00 | 1.00 | ||
| 1–19 | 1.41 (0.82–2.42) | 0.209 | 1.23 (0.69–2.18) | 0.476 |
| 20–39 | 4.30 (2.81–6.57) | <0.001 | 4.90 (3.10–7.73) | <0.001 |
| 40–59 | 11.12 (5.41–22.86) | <0.001 | 15.70 (7.22–34.14) | <0.001 |
| ≥60 | 13.91 (9.26–20.91) | <0.001 | 18.58 (11.90–29.01) | <0.001 |
| Pneumonia | 1.53 (1.12–2.09) | 0.007 | 1.55 (1.11–2.15) | 0.008 |
| Asbestos | 3.25 (2.34–4.52) | <0.001 | 3.10 (2.19–4.39) | <0.001 |
| Previous tumour | 16.97 (11.25–25.61) | <0.001 | 16.52 (10.79–25.31) | <0.001 |
| Family history of lung cancer | ||||
| None | 1.00 | |||
| Early onset (<60 years) | 1.33 (0.84–2.09) | 0.223 | 1.11 (0.69–1.80) | 0.659 |
| Late onset (≥60 years) | 1.07 (0.76–1.54) | 0.672 | 1.14 (0.78–1.66) | 0.495 |
| rs1799732 | 0.78 (0.63–0.97) | 0.028 | ||
| rs5744256 | 2.04 (1.69–2.46) | <0.001 | ||
| Rs2306022 | 4.04 (3.10–5.26) | <0.001 | ||
| Goodness of fit statistic | ||||
| AIC | 2098.42 | 1930.14 | ||
| BIC | 2167.75 | 2016.80 | ||
Figure 1Performance of lung cancer risk model with and without the SNP epistatic effect.