| Literature DB >> 30400837 |
Weiwei Duan1,2,3,4, Ruyang Zhang1,2,3,4, Yang Zhao1,2,3,4, Sipeng Shen1,2,3,4, Yongyue Wei1,2,3,4, Feng Chen5,6,7,8, David C Christiani2,3,9,10.
Abstract
BACKGROUND: Modeling thousands of markers simultaneously has been of great interest in testing association between genetic biomarkers and disease or disease-related quantitative traits. Recently, an expectation-maximization (EM) approach to Bayesian variable selection (EMVS) facilitating the Bayesian computation was developed for continuous or binary outcome using a fast EM algorithm. However, it is not suitable to the analyses of time-to-event outcome in many public databases such as The Cancer Genome Atlas (TCGA).Entities:
Keywords: Bayesian variable selection; EM algorithm; Non-small cell lung cancer; Omics; Stomach adenocarcinoma; Survival analysis
Mesh:
Substances:
Year: 2018 PMID: 30400837 PMCID: PMC6218990 DOI: 10.1186/s40246-018-0179-x
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Parameters settings for simulation studies
| Parameter | Scenario | |||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
| Censoring rate | 40% | |||||
| Causal effects | − 0.2, 0.2, − 0.3, 0.3, − 0.4, 0.4 | |||||
| Replications | 50 | |||||
| Sample size of test dataset | 100 | |||||
| Distribution of survival time (shape) | Exponential | Weibull (2) | Gamma (0.8) | |||
| Sample size ( | 500/1000 | 500/5000 | 500/1000 | 500/5000 | 500/1000 | 500/5000 |
Fig. 1Solution path and iteration path of the proposed SurvEMVS under Scenario 1. Red dots represent the changes of estimated effects for the true signals. a Solution path. b Iteration path (υ0 = 0.05)
TPR, FPR, and FDR in variable selection with 50 replications (exponential distribution)
| Method | Scenario 1 ( | Scenario 2 ( | ||||
|---|---|---|---|---|---|---|
| TPR | FPR | FDR | TPR | FPR | FDR | |
| LASSO.se | 0.657 | 1.25E−03 | 0.239 | 0.327 | 1.36E−04 | 0.258 |
| LASSO.min | 0.920 | 2.11E−02 | 0.792 | 0.713 | 4.60E−03 | 0.843 |
| EBIC ( | 0.743 | 1.19E−03 | 0.209 | 0.703 | 5.78E−03 | 0.872 |
| EBIC ( | 0.730 | 7.85E−04 | 0.151 | 0.480 | 1.48E−04 | 0.204 |
| EBIC ( | 0.710 | 6.24E−04 | 0.127 | 0.377 | 2.00E−05 | 0.042 |
Abbreviations: TPR, true positive rate; FPR, false positive rate; FDR, false discovery rate
Fig. 2Averaged estimated effect (black vertical lines) for each marker over 50 replications under Scenario 1. Red triangles label true effect sizes and locations of the causal markers
Fig. 3MSE of parameter estimation and AUC of prognosis prediction for Scenarios 1 and 2. a, b The results of Scenarios 1 and 2, respectively
Validation analysis of the seven potential SNPs identified by SurvEMVS using external database
| SNP (Cytoband) | Gene symbol (annotation) | TCGA | KEGG | PubMed |
|---|---|---|---|---|
| rs1506943_G (1q23.3) | LMX1A-RXRG (Intergenic) | RXRG: low expression in LUAD and LUSC tumor samples | RXRG participates in non-small cell lung cancer pathway and other cancer related pathways (hsa05200, 05222) | This gene is expressed at significantly lower levels in non-small cell lung cancer cells [ |
| rs1921660_G (2q37.3) | GBX2-ASB18 (Intergenic) | GBX2: high expression is associated with bad prognosis ( | –a | Enhanced GBX2 expression stimulates growth of human prostate cancer cells [ |
| rs981852_C (3p14.2) | FHIT (Intron) | Low expression in LUSC tumor samples | FHIT participates in non-small cell lung cancer pathway and Small cell lung cancer pathway(hsa05222, 05223) | – |
| rs2044831_G (7p14.1) | EPDR1 (Coding) | Low expression in LUSC tumor samples | – | EPDR1 is highly expressed in colorectal tumor cells [ |
| rs263264_G, (8q24.2) | ADCY8 (Intron) | – | ADCY8 participates multiple signal pathways and pathways in cancer | – |
| rs2074986_G (10q25.3) | GFRA1 (DHS) | Low expression in LUAD and LUSC tumor samples; High expression in LUAD is associated with good prognosis ( | – | GFRA1 released by nerves enhances cancer cell perineural invasion [ |
| rs4885110_A (13q22.1) | LINC00393-KLF12 (Intergenic) | KLF12: low expression in LUAD tumor samples | – | KLF12 is an important regulator of gene expression during carcinogenesis [ |
aNegative results of validation analysis
Fig. 4Kaplan-Meier survival curve of patients with high, moderate, and low risk. P value is calculated using log-rank test
Validation analysis of five genes using five GEO datasets
| Gene symbol | GSE14210 ( | GSE15459 ( | GSE29272 ( | GSE51105 ( | GSE62254 ( | Heterogeneity- | Combined (random) |
|---|---|---|---|---|---|---|---|
| CTLA4 (2q33.2) | –a | 0.77 (0.52~1.12) | – | 0.45 (0.25~0.82) | 0.59 (0.41~0.85) | 3.0E−01 | 0.62 (0.48~0.82) |
| PLCXD3 (5p13.1) | – | 0.77 (0.51~1.15) | – | 1.60 (0.98~2.6) | 1.92 (1.32~2.79) | 4.0E−03 | 1.33 (0.75~2.36) |
| NACAD (7p13) | 1.47 (0.97~2.24) | 1.47 (0.93~2.34) | 1.61 (1.04~ 2.49) | 1.76 (1.06~2.92) | 2.03 (1.41~2.92) | 7.7E−01 | 1.68 (1.39~2.04) |
| SERPINE1 (7q22.1) | 1.46 (0.99~2.16) | 0.83 (0.56~1.24) | 0.73 (0.48~ 1.10) | 1.27 (0.76~2.12) | 1.57 (1.08~2.28) | 2.2E−02 | 1.12 (0.82~1.53) |
| GAMT (19p13.3) | 1.17 (0.81~1.70) | 0.69 (0.46~1.03) | 1.81 (1.07~ 3.06) | 0.54 (0.32~0.92) | 1.75 (1.22~2.51) | < 1.0E−03 | 1.08 (0.69~1.68) |
The contents of each cell represent estimated hazard ratio (HR), 95%CI of HR, and hypothesis testing P value of HR, and in each dataset, the samples are categorized into two groups using a best cutoff of expression level. Combined results are derived by meta-analysis with random effect model
aExpression data of the gene is not available in the corresponding GEO database