| Literature DB >> 35637439 |
Jie-Huei Wang1, Kang-Hsin Wang2, Yi-Hau Chen3.
Abstract
BACKGROUND: In the context of biomedical and epidemiological research, gene-environment (G-E) interaction is of great significance to the etiology and progression of many complex diseases. In high-dimensional genetic data, two general models, marginal and joint models, are proposed to identify important interaction factors. Most existing approaches for identifying G-E interactions are limited owing to the lack of robustness to outliers/contamination in response and predictor data. In particular, right-censored survival outcomes make the associated feature screening even challenging. In this article, we utilize the overlapping group screening (OGS) approach to select important G-E interactions related to clinical survival outcomes by incorporating the gene pathway information under a joint modeling framework.Entities:
Keywords: Gene-environment interaction; Joint model; Lasso; Overlapping group screening; Survival prediction; TCGA
Mesh:
Year: 2022 PMID: 35637439 PMCID: PMC9150322 DOI: 10.1186/s12859-022-04750-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1The long-tailed distribution of clinical survival data for the TCGA ESCA and HNSCC
Gene group structure in the simulation study
| Pathway | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gene Size | 3 | 3 | 3 | 6 | 6 | 6 | 9 | 9 | 9 | 15 | 15 | 15 | 24 | 24 | 24 | 36 | 36 | 36 | 45 | 45 | 45 | 60 | 60 | 60 | 38 |
| Overlapping | 1 | 1 | 0 | 2 | 2 | 0 | 3 | 3 | 0 | 5 | 5 | 0 | 8 | 8 | 0 | 12 | 12 | 0 | 15 | 15 | 0 | 20 | 20 | 0 |
Fig. 2Gene network structure
The median of the performance measures out of 200 simulation replications for different approaches
| Oracle | GSIS SCAD | SIS lasso | Ordinary lasso | OGS ridge | OGS lasso | |
|---|---|---|---|---|---|---|
| RMSE | 0.3520 | 0.2936 | 0.3629 | 0.3582 | 0.3667 | 0.3194 |
| P.int | 1.0000 | 0.0000 | 0.0000 | 0.1667 | 0.5000 | 0.5000 |
| Sen | 1.0000 | 0.8901 | 0.3736 | 0.7473 | 0.9670 | 0.9670 |
| Spe | 1.0000 | 1.0000 | 0.9962 | 0.9823 | 0.9399 | 0.9875 |
| C.model | 91.0000 | 81.0000 | 46.0000 | 120.0000 | 266.0000 | 124.0000 |
| Deviance | − 125.2257 | − 113.7699 | − 60.0277 | − 114.7706 | − 70.2329 | − 250.4203 |
| C-index | 0.8727 | 0.9244 | 0.7875 | 0.8722 | 0.8969 | 0.9549 |
| AUC | 0.9392 | 0.9730 | 0.8540 | 0.9418 | 0.9650 | 0.9908 |
| RMSE | 0.3437 | 0.3668 | 0.3631 | 0.3618 | 0.3670 | 0.3451 |
| P.int | 1.0000 | 0.0000 | 0.0000 | 0.1667 | 0.5000 | 0.3333 |
| Sen | 1.0000 | 0.8901 | 0.3516 | 0.5934 | 0.9670 | 0.8791 |
| Spe | 1.0000 | 1.0000 | 0.9955 | 0.9825 | 0.9221 | 0.9911 |
| C.model | 91.0000 | 81.0000 | 45.0000 | 104.5000 | 311.0000 | 104.0000 |
| Deviance | − 96.4789 | − 32.3585 | − 43.5888 | − 66.1335 | − 50.0469 | − 132.8876 |
| C-index | 0.8841 | 0.8027 | 0.7915 | 0.8491 | 0.8929 | 0.9222 |
| AUC | 0.9363 | 0.8537 | 0.8433 | 0.8985 | 0.9461 | 0.9668 |
| RMSE | 0.3407 | 0.3671 | 0.3643 | 0.3654 | 0.3674 | 0.3561 |
| P.int | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.5000 | 0.1667 |
| Sen | 1.0000 | 0.8901 | 0.2747 | 0.4176 | 0.9670 | 0.6484 |
| Spe | 1.0000 | 1.0000 | 0.9942 | 0.9849 | 0.9224 | 0.9935 |
| C.model | 91.0000 | 81.0000 | 43.0000 | 83.0000 | 294.5000 | 77.0000 |
| Deviance | − 0.8763 | − 16.1453 | − 22.9206 | − 31.5497 | − 31.7002 | − 58.9638 |
| C-index | 0.8617 | 0.7569 | 0.7791 | 0.8174 | 0.8837 | 0.8866 |
| AUC | 0.8996 | 0.7844 | 0.8136 | 0.8430 | 0.9201 | 0.9212 |
The selected clinical variables information of the TCGA HNSCC
| Variable | Coding | Missing status | Continuous(EC) /discrete(ED) |
|---|---|---|---|
| AJCC pathologic nodes | n0 = 0, n1 = 1, (n2, n2a, n2b, n2c) = 2, n3 = 3, nx = 4 | YES | ED |
| AJCC pathologic tumor | t0 = 0, t1 = 1, t2 = 2, t3 = 3, (t4, t4a, t4b) = 4, tx = 5 | YES | ED |
| age | No | EC | |
| gender | female = 0, male = 1 | No | ED |
| ICD O3 site | (C00.9, C01.9, C02.1, C02.9) = 0, (C03.0, C03.1, C03.9, C04.0, C04.9) = 1, (C05.0, C05.9 C06.0, C06.2, C06.9) = 2, (C09.9, C10.3, C10.9) = 3, (C13.9, C14.8) = 4, and 5 for others | No | ED |
Results (median of prediction accuracy of different methods in the TCGA HNSCC data over 10 random splits of 413:104 training /test sets based on GO-BP database)
| GSIS SCAD | SIS lasso | Ordinary lasso | OGS ridge | OGS lasso | PTReg | |
|---|---|---|---|---|---|---|
| Cox-test | 0.1842 | 0.0048 | 0.0029 | 0.0002 | 0.0013 | 0.0660 |
| LR-test | 0.2949 | 0.0115 | 0.0117 | 0.0015 | 0.0077 | 0.0580 |
| Deviance | 34.6441 | 8.7698 | 2.8340 | 0.0899 | 2.8927 | 44.9984 |
| C-index | 0.5534 | 0.6323 | 0.6471 | 0.7066 | 0.6618 | 0.5851 |
| AUC | 0.5231 | 0.6505 | 0.6432 | 0.7005 | 0.6660 | 0.6213 |
Fig. 3Kaplan–Meier curves for the 104 subjects in the TCGA HNSCC testing data. Good and poor groups are identified by the median of the PI scores in the test dataset
The selected clinical variables information of the TCGA ESCA data
| Variable | Coding | Missing status | Continuous(EC) /discrete(ED) |
|---|---|---|---|
| Esophageal tumor central location | proximal = 1, mid = 2, distal = 3 | Yes | ED |
| Person neoplasm cancer status | tumor free = 1, with tumor = 2, | Yes | ED |
| Race | white = 1, asian = 2, black or African american = 3 | Yes | ED |
| BMI | weight/height^2 | Yes | EC |
| AJCC pathologic stage | (stage i, stage ia, stage ib) = 1 (stage ii, stage iia, stage iib) = 2 (stage iii, stage iiia, stage iiib, stage iiic) = 3 (stage iv, stage iva) = 4 | Yes | ED |
| Age | days_to_ birth | No | EC |
| Gender | female = 0, male = 1 | No | ED |
Results (median of prediction accuracy of different methods in the TCGA ESCA data over 10 random splits of 294:74 training /test sets based on GO-BP database)
| GSIS SCAD | SIS lasso | Ordinary lasso | OGS ridge | OGS lasso | PTReg | |
|---|---|---|---|---|---|---|
| Cox-test | 0.4685 | 0.0024 | 8.2557e − 09 | 6.0168e − 10 | 8.0676e − 10 | 0.0330 |
| LR-test | 0.4944 | 0.0308 | 6.1948e − 08 | 1.8792e − 08 | 1.2942e − 07 | 0.0244 |
| Deviance | 161.1422 | 11.4386 | − 31.7249 | − 44.0441 | − 41.3946 | 57.3278 |
| C-index | 0.5452 | 0.6400 | 0.8759 | 0.8984 | 0.8862 | 0.7041 |
| AUC | 0.4843 | 0.5968 | 0.9006 | 0.9294 | 0.9109 | 0.7899 |
Fig. 4Kaplan–Meier curves for the 74 subjects in the TCGA ESCA testing data. Good and poor groups are identified by the median of the PI scores in the test dataset