| Literature DB >> 35822775 |
Yinhao Du1, Kun Fan1, Xi Lu1, Cen Wu1.
Abstract
Gene-environment (G×E) interaction is critical for understanding the genetic basis of complex disease beyond genetic and environment main effects. In addition to existing tools for interaction studies, penalized variable selection emerges as a promising alternative for dissecting G×E interactions. Despite the success, variable selection is limited in terms of accounting for multidimensional measurements. Published variable selection methods cannot accommodate structured sparsity in the framework of integrating multiomics data for disease outcomes. In this paper, we have developed a novel variable selection method in order to integrate multi-omics measurements in G×E interaction studies. Extensive studies have already revealed that analyzing omics data across multi-platforms is not only sensible biologically, but also resulting in improved identification and prediction performance. Our integrative model can efficiently pinpoint important regulators of gene expressions through sparse dimensionality reduction, and link the disease outcomes to multiple effects in the integrative G×E studies through accommodating a sparse bi-level structure. The simulation studies show the integrative model leads to better identification of G×E interactions and regulators than alternative methods. In two G×E lung cancer studies with high dimensional multi-omics data, the integrative model leads to an improved prediction and findings with important biological implications.Entities:
Keywords: Gene-environment (G×E) interactions; high-dimensional variable selection; integrated analysis; multidimensional data
Year: 2021 PMID: 35822775 PMCID: PMC9245467 DOI: 10.3390/biotech10010003
Source DB: PubMed Journal: BioTech (Basel) ISSN: 2673-6284
PAUC: mean (sd) based on 100 replicates. .
| Covariance | Signal | Approaches | G and G×E | Regulators |
|---|---|---|---|---|
| AR-1 | weak | IGE | 0.73 (0.07) | 0.76 (0.10) |
| S-LASSO | 0.47 (0.04) | 0.46 (0.13) | ||
| J-LASSO | 0.54 (0.04) | 0.32 (0.05) | ||
| ColReg | 0.39 (0.03) | 0.45 (0.15) | ||
| strong | IGE | 0.77 (0.07) | 0.85 (0.06) | |
| S-LASSO | 0.52 (0.05) | 0.48 (0.14) | ||
| J-LASSO | 0.55 (0.04) | 0.33 (0.05) | ||
| ColReg | 0.39 (0.03) | 0.46 (0.15) | ||
| Banded | weak | IGE | 0.74 (0.06) | 0.74 (0.10) |
| S-LASSO | 0.48 (0.03) | 0.44 (0.11) | ||
| J-LASSO | 0.54 (0.05) | 0.32 (0.04) | ||
| ColReg | 0.39 (0.03) | 0.43 (0.12) | ||
| strong | IGE | 0.77 (0.08) | 0.84 (0.06) | |
| S-LASSO | 0.52 (0.04) | 0.46 (0.11) | ||
| J-LASSO | 0.55 (0.05) | 0.32 (0.04) | ||
| ColReg | 0.39 (0.03) | 0.43 (0.12) | ||
| LUSC | weak | IGE | 0.59 (0.09) | 0.55 (0.15) |
| S-LASSO | 0.39 (0.04) | 0.21 (0.06) | ||
| J-LASSO | 0.42 (0.05) | 0.19 (0.06) | ||
| ColReg | 0.28 (0.04) | 0.21 (0.07) | ||
| strong | IGE | 0.63 (0.10) | 0.71 (0.13) | |
| S-LASSO | 0.42 (0.05) | 0.22 (0.07) | ||
| J-LASSO | 0.43 (0.05) | 0.19 (0.06) | ||
| ColReg | 0.28(0.05) | 0.22 (0.07) | ||
| LUAD | weak | IGE | 0.64 (0.09) | 0.62 (0.15) |
| S-LASSO | 0.45 (0.04) | 0.21 (0.06) | ||
| J-LASSO | 0.47 (0.05) | 0.19 (0.05) | ||
| ColReg | 0.32 (0.03) | 0.22 (0.07) | ||
| strong | IGE | 0.70 (0.08) | 0.77 (0.11) | |
| S-LASSO | 0.47 (0.05) | 0.23 (0.08) | ||
| J-LASSO | 0.48 (0.05) | 0.18 (0.05) | ||
| ColReg | 0.31 (0.04) | 0.23 (0.08) |
PAUC: mean (sd) based on 100 replicates. .
| Covariance | Signal | Approaches | G and G×E | Regulators |
|---|---|---|---|---|
| AR-1 | weak | IGE | 0.89 (0.02) | 0.91 (0.02) |
| S-LASSO | 0.57 (0.04) | 0.73 (0.09) | ||
| J-LASSO | 0.62 (0.04) | 0.40 (0.04) | ||
| ColReg | 0.50 (0.03) | 0.71 (0.09) | ||
| strong | IGE | 0.91 (0.02) | 0.93 (0.02) | |
| S-LASSO | 0.61 (0.04) | 0.71 (0.08) | ||
| J-LASSO | 0.64 (0.05) | 0.43 (0.04) | ||
| ColReg | 0.52 (0.03) | 0.70 (0.09) | ||
| Banded | weak | IGE | 0.89 (0.03) | 0.91 (0.03) |
| S-LASSO | 0.55 (0.04) | 0.73 (0.07) | ||
| J-LASSO | 0.62 (0.04) | 0.40 (0.05) | ||
| ColReg | 0.50 (0.03) | 0.71 (0.08) | ||
| strong | IGE | 0.90 (0.04) | 0.92 (0.02) | |
| S-LASSO | 0.61 (0.04) | 0.72 (0.08) | ||
| J-LASSO | 0.64 (0.04) | 0.44 (0.06) | ||
| ColReg | 0.53 (0.04) | 0.70 (0.08) | ||
| LUSC | weak | IGE | 0.82 (0.04) | 0.78 (0.06) |
| S-LASSO | 0.51 (0.05) | 0.36 (0.07) | ||
| J-LASSO | 0.56 (0.05) | 0.25 (0.07) | ||
| ColReg | 0.39 (0.04) | 0.35 (0.08) | ||
| strong | IGE | 0.83 (0.04) | 0.82 (0.06) | |
| S-LASSO | 0.57 (0.05) | 0.39 (0.07) | ||
| J-LASSO | 0.58 (0.05) | 0.25 (0.08) | ||
| ColReg | 0.42 (0.04) | 0.38 (0.07) | ||
| LUAD | weak | IGE | 0.83 (0.04) | 0.80 (0.06) |
| S-LASSO | 0.57 (0.04) | 0.43 (0.06) | ||
| J-LASSO | 0.59 (0.04) | 0.25 (0.06) | ||
| ColReg | 0.47 (0.03) | 0.43 (0.06) | ||
| strong | IGE | 0.85 (0.03) | 0.84 (0.04) | |
| S-LASSO | 0.61 (0.04) | 0.46 (0.07) | ||
| J-LASSO | 0.61 (0.04) | 0.26 (0.06) | ||
| ColReg | 0.49 (0.03) | 0.46 (0.07) |
Figure 1Four cases of receiver operating characteristic (ROC) curves under AR-1 structure. The left panel corresponds to comparison under both weak and strong signals for 500 subjects. The right panel corresponds to comparison under both weak and strong signals for 1000 subjects. IGE, solid red; S-LASSO, dashed blue; J-LASSO, long dashed purple; ColReg, long dashed green.
Figure 2Four cases of ROC curves under estimated covariance from lung squamous cell carcinoma (LUSC). The left panel corresponds to comparison under both weak and strong signals for 500 subjects. The right panel corresponds to comparison under both weak and strong signals for 1000 subjects. IGE, solid red; S-LASSO, dashed blue; J-LASSO, long dashed purple; ColReg, long dashed green.
Analysis of the the Cancer Genome Atlas (TCGA) lung adenocarcinoma (LUAD) data: linear regulatory models (LRMs) and residual effects for gene expression and regulators with the estimated coefficient or loadings in the parentheses.
| LRMs | ||||
|---|---|---|---|---|
| #1 (0.07) | #2 (−0.01) | #3 (−0.02) | #4 (−0.03) | |
| mRNA | PIK3R2 (0.35) | PIK3R2 (0.98) | ECT2 (−0.98) | INTS7 (−0.77) |
| STK3 (−0.74) | STK3 (0.11) | PSMD2 (−0.17) | PIK3R2 (−0.62) | |
| NCKAP5L (0.74) | NCKAP5L (−0.08) | |||
| CUL9 (0.14) | ||||
| CNA | NEK2(−0.22) | CECR1 (0.65) | KPNA4 (−0.44) | INTS7 (−0.70) |
| LPGAT1 (0.22) | C1QTNF6 (−0.75) | B3GALNT1 (0.43) | DTL (0.70) | |
| INTS7 (0.65) | PSMD2 (−0.55) | |||
| DTL (−0.65) | LIPH (0.55) | |||
| CECR1 (−0.19) | ||||
| #5 (−0.05) | #6 (0.08) | #7 (−0.06) | #8 (0.06) | |
| mRNA | PIK3R2 (0.12) | INTS7 (0.73) | PIK3R2 (−0.10) | PSMD2 (0.31) |
| STK3 (−0.78) | PIK3R2 (0.63) | STK3 (−0.24) | TMOD 3(0.61) | |
| NCKAP5L (0.57) | STK3 (0.18) | CUL9 (−0.96) | DIAPH3 (0.72) | |
| CUL9 (0.16) | NCKAP5L (−0.14) | |||
| CNA | INTS7 (−0.16) | NEK2 (−0.69) | INTS7 (−0.34) | MAPRE3 (0.70) |
| DTL (0.16) | LPGAT1 (0.71) | DTL (0.36) | IFT172 (−0.67) | |
| CECR1 (−0.78) | CECR1 (0.61) | PSMD2 (0.09) | ||
| C1QTNF6 (−0.57) | C1QTNF6 (−0.61) | ITGB1 (0.09) | ||
| ADAM10 (0.14) | ||||
| Residual effects | ||||
| mRNA | MAST3 (0.01) | |||
| DM | ADSS (0.01) | SLC2A1 (0.01) | PTCH2 (0.01) | ECT2 (0.09) |
| TNS4 (0.02) | MUSTN1 (0.05) | DKK1 (0.02) | FSCN1 (0.05) | |
| GNPNAT1 (0.04) | HPS1 (−0.04) | MAPRE3 (−0.02) | ||
| CNA | LAMC2 (−0.01) | CD5 (−0.03) | E2F7 (−0.01) | |
Analysis of the TCGA LUAD data: G×E interaction identifications from LRMs and gene expression with the estimated regression coefficients in the parentheses.
| LRMs | AGE | GENDER | SMOKING |
|---|---|---|---|
| #1 | 0.08 | −0.25 | |
| #2 | 0.02 | ||
| #3 | 0.01 | ||
| #4 | 0.01 | 0.01 | |
| #5 | 0.01 | ||
| mRNA Residual | AGE | GENDER | SMOKING |
| MAST3 | 0.27 | ||
| HPS1 | 0.01 | ||
| BBS5 | −0.04 | −0.03 | |
| TLE1 | −0.01 | ||
| ADAM10 | 0.02 | 0.03 | |
| SLC16A3 | 0.07 | ||
| BTN2A2 | −0.02 | −0.06 | |
| FAM71E1 | 0.02 |
Analysis of the TCGA LUSC data: LRMs and residual effects for gene expression and regulators with the estimated coefficient or loadings in the parentheses.
| LRMs | ||||
|---|---|---|---|---|
| #1 (−0.01) | #2 (0.01) | #3 (0.01) | #4 (−0.02) | |
| mRNA | RNF24 (−0.17) | SEC23B (0.23) | REEP3 (−0.76) | AP2A2 (−0.59) |
| ESM1 (−0.53) | RNF24 (−0.97) | FUT11 (−0.64) | PNPLA6 (−0.37) | |
| RASAL2 (−0.39) | RFX1 (−0.55) | |||
| LAMC1 (−0.34) | XRN2 (0.45) | |||
| DLGAP4 (−0.63) | ||||
| DM | DCBLD1 (0.09) | TCF7L2 (0.22) | RGP1 (−0.52) | |
| CHI3L1 (0.18) | NCOR2 (0.27) | |||
| CNA | CD163L1 (−0.16) | ENTPD6 (0.68) | RERE (−0.89) | CD163L1 (0.70) |
| DLGAP4 (−0.96) | ABHD12 (−0.69) | DLGAP4 (−0.43) | PARD6G (−0.39) | |
| #5 (0.16) | #6 (0.05) | #7 (−0.05) | #8 (0.01) | |
| mRNA | COL5A3 (0.45) | MGST3 (0.33) | TPM4 (0.68) | TCTN2 (−0.45) |
| DCBLD1 (0.57) | OSBPL5 (0.31) | UBB (0.59) | ANGPT2 (−0.40) | |
| PDGFA (0.31) | SNX9 (0.56) | NCOR2 (−0.42) | UBE4B (−0.37) | |
| CHST15 (0.45) | MYO1C (0.46) | MBTPS1 (−0.47) | ||
| LGALS1 (0.39) | CCDC68 (0.49) | FAM178B (−0.50) | ||
| DM | DCBLD1 (−0.86) | CHST15 (−0.97) | RGP1 (−0.55) | NCOR2 (0.16) |
| FAM178B (−0.37) | RGP1 (0.13) | |||
| CHST15 (−0.17) | NCOR2 (−0.10) | |||
| LGALS1 (−0.15) | ||||
| CNA | DLGAP4 (0.27) | STK40 (−0.26) | CD163L1 (−0.35) | |
| TCTN2 (−0.78) | DLGAP4 (−0.92) | |||
| Residual effects | ||||
| mRNA | LRAT (−0.02) | PLEKHA6 (−0.02) | ||
| DM | BAMBI (0.01) | PYGB (0.02) | FUT11 (−0.18) | ZNF394 (0.03) |
| CCIN (−0.01) | DEAF1 (−0.10) | ACOT7 (0.04) | KLK6 (−0.12) | |
| LHX8 (−0.01) | PLEKHB1 (0.09) | |||
| CNA | FGFRL1 (−0.05) | DCBLD1 (−0.04) | NEFL (−0.04) | CHST1 (0.02) |
| ULK1 (−0.03) | FPR2 (0.02) | PYGB (−0.10) | ||
Analysis of the TCGA LUSC data: G×E interaction identifications from LRMs and gene expression with the estimated regression coefficients in the parentheses.
| LRMs | AGE | GENDER | SMOKING |
|---|---|---|---|
| #1 | 0.02 | 0.03 | |
| #2 | 0.03 | ||
| #4 | −0.02 | ||
| #5 | 0.01 | 0.05 | −0.02 |
| #6 | 0.01 | −0.01 | |
| #7 | −0.36 | ||
| #8 | 0.02 | ||
| mRNA Residual | AGE | GENDER | SMOKING |
| LRAT | −0.17 | ||
| PLEKHA6 | −0.30 | ||
| AP2A2 | 0.02 | ||
| SLC12A7 | −0.10 | 0.07 | |
| TCTN2 | −0.15 | −0.09 | |
| CLEC5A | 0.01 | ||
| RNF24 | −0.06 | 0.04 | |
| PRRX2 | 0.04 | −0.04 | |
| CCDC74A | 0.14 | −0.13 | |
| FGF9 | 0.03 | −0.06 | |
| IGF2R | 0.05 | −0.02 | |
| CHMP4C | 0.24 | 0.13 | −0.01 |
| SLC45A4 | −0.11 | ||
| SULF2 | −0.05 | −0.03 | |
| UBB | −0.11 | ||
| DVL1 | −0.07 | ||
| NID1 | 0.08 | 0.20 | |
| KLK8 | 0.01 | ||
| DOCK6 | 0.26 | −0.10 | |
| FHDC1 | 0.01 | −0.16 | |
| OPLAH | −0.12 | ||
| VSTM1 | −0.02 | ||
| SLC28A1 | −0.07 | ||
| TCF7L2 | 0.12 | ||
| DLGAP4 | −0.04 | ||
| CRNKL1 | −0.25 |