| Literature DB >> 32127576 |
Chung Chang1, Chan-Yu Sung1, Han Hsiao1, Jiabin Chen2, I-Hsuan Chen3,4,5, Wei-Ting Kuo3, Lung-Feng Cheng3, Praveen Kumar Korla2, Ming-Jhe Chung1, Pei-Jhen Wu1, Chia-Cheng Yu6,7,8,9, Jim Jinn-Chyuan Sheu10,11,12,13.
Abstract
Recent advances in high-throughput genomic technologies have nurtured a growing demand for statistical tools to facilitate identification of molecular changes as potential prognostic biomarkers or drugable targets for personalized precision medicine. In this study, we developed a web-based interactive and user-friendly platform for high-dimensional analysis of molecular alterations in cancer (HDMAC) (https://ripsung26.shinyapps.io/rshiny/). On HDMAC, several penalized regression models that are suitable for high-dimensional data analysis, Ridge, Lasso and adaptive Lasso, are offered, with Cox regression for survival and logistic regression for binary outcomes. Choice of a first-step screening is provided to address the multiple-comparison issue that often arises with large-volume genomic data. Hazard ratio or estimated coefficient is provided with each selected gene so that a multivariate regression model may be built based on the genes selected. Cross validation is provided as the method to estimate the prediction power of each regression model. In addition, R codes are also provided to facilitate download of whole sets of molecular variables from TCGA. In this study, illustration of the use of HDMAC was made through a set of data on gene mutations and a set on mRNA expression from ovarian cancer patients and a set on mRNA expression from bladder cancer patient. From the analysis of each set of data, a list of candidate genes was obtained that might be associated with mutations or abnormal expression of genes in ovarian and bladder cancers. HDMAC offers a solution for rigorous and validation analysis of high-dimensional genomic data.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32127576 PMCID: PMC7054321 DOI: 10.1038/s41598-020-60791-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Flowchart of running HDMAC.
Numbers of genes and c-indices with mutations and mRNA expression abnormalities in response to overall survival of ovarian cancer.
| Cox PH method | Ridge | Lasso | Adaptive Lasso | ||||
|---|---|---|---|---|---|---|---|
| numbers | c-index | numbers | c-index | numbers | c-index | ||
| mutated genes | no FDR | 670 | 0.529 | 1 | 0.501 | 1 | 0.501 |
| after FDR | 2 | 0.502 | 2 | 0.502 | 2 | 0.497 | |
| mRNA expression abnormalities | no FDR | 9548 | 0.591 | 4 | 0.554 | 4 | 0.560 |
| after FDR | 6 | 0.538 | 6 | 0.538 | 6 | 0.540 | |
Genes selected with FDR penalty to be significantly associated with overall survival in ovarian cancer.
| Mutated genes | Estimated coefficients | Hazard ratio | p-value | Abnormally expressed genes | Estimated coefficients | Hazard ratio | p-value |
|---|---|---|---|---|---|---|---|
| ZSWIM8 | 2.014 | 7.493 | 0.00007 | ASAP3 | 0.09 | 1.094 | 0.0682 |
| PABPC3 | 1.729 | 5.635 | 0.00071 | C10ORF113 | 0.08 | 1.083 | 0.0330 |
| TIGAR | 0.08 | 1.083 | 0.0001 | ||||
| KIAA0100 | 0.05 | 1.051 | 0.0188 | ||||
| REPL4B | 0.007 | 1.007 | 0.0036 | ||||
| ZFHX4 | 0.08 | 1.083 | 0.0231 |
Numbers of genes and the test statistics of mRNA expression abnormalities in response to the invasion subtype of bladder cancer and the validation results.
| Logistic regression | Ridge | Lasso | Adaptive Lasso | |||
|---|---|---|---|---|---|---|
| no FDR | with FDR | no FDR | with FDR | no FDR | with FDR | |
| # abnormal expression | 8024 | 461 | 46 | 36 | 27 | 24 |
| Sensitivity | 0.565 | 0.500 | 0.533 | 0.565 | 0.484 | 0.532 |
| Specificity | 0.701 | 0.764 | 0.709 | 0.677 | 0.772 | 0.717 |
| Accuracy | 0.656 | 0.677 | 0.651 | 0.640 | 0.677 | 0.656 |
| AUC (area under curve) | 68.107 | 66.515 | 65.864 | 67.020 | 62.442 | 64.300 |
Genes selected with adaptive Lasso logistic regression after FDR penalty whose abnormal expression was associated with invasion in bladder cancer together with their estimated coefficients, odds ratios and p values.
| Abnormally expressed genes | Estimated coefficients | Odds ratio (ln) | p-value |
|---|---|---|---|
| SPTSSA | −0.16 | 0.852 | 0.51 |
| ATAT1 | 0.06 | 1.061 | 0.47 |
| CABP4 | 0.26 | 1.296 | 0.11 |
| CCNK | −0.27 | 1.309 | 0.19 |
| CIR1 | 0.55 | 1.733 | 0.50 |
| DPP9 | 0.42 | 1.521 | 0.05 |
| FANCL | 0.01 | 1.010 | 0.92 |
| ICOSLG | −0.66 | 0.516 | 0.004 |
| JOSD1 | −0.35 | 0.704 | 0.54 |
| MED30 | −0.43 | 0.650 | 0.01 |
| NADSYN1 | −0.71 | 0.491 | 0.27 |
| NCOA3 | −0.52 | 0.594 | 0.003 |
| LINC00173 | −0.12 | 0.886 | 0.66 |
| NKIRAS1 | −0.29 | 0.748 | 0.10 |
| NUDT16P1 | 0.24 | 1.271 | 0.15 |
| PDRG1 | −0.69 | 0.501 | 0.49 |
| POLR1D | 0.55 | 1.733 | 0.02 |
| PSORS1C2 | 1.14 | 3.126 | 0.005 |
| RETSAT | −0.32 | 0.726 | 0.18 |
| RPL23AP7 | 0.66 | 1.934 | 0.01 |
| SETMAR | 0.29 | 1.336 | 0.52 |
| SLC14A1 | 0.50 | 1.648 | 0.05 |
| SLC39A4 | 0.14 | 1.150 | 0.65 |
| ZSCAN2 | 0.27 | 1.309 | 0.16 |
Uploading time for both logistic regression and survival analysis (seconds).
| Number of Observations | Number of variables | ||
|---|---|---|---|
| Small (50) | Medium (500) | Large (5000) | |
| Small (50) | 1.1 | 1.8 | 5.1 |
| Medium (200) | 1.5 | 3.5 | 12.4 |
| Large (1000) | 3.4 | 8.4 | 54.9 |
Computing time for logistic regression and survival analysis (seconds).
| Number of Observations | Number of variables | |||||
|---|---|---|---|---|---|---|
| Logistic regression | Survival analysis | |||||
| Small (50) | Medium (500) | Large (5000) | Small (50) | Medium (500) | Large (5000) | |
| Small (50) | 1.5 | 1.7 | 4.5 | 1.4 | 1.6 | 2.5 |
| Medium (200) | 1.7 | 1.9 | 5.5 | 1.6 | 5.8 | 14.1 |
| Large (1000) | 4.3 | 6.4 | 16.4 | 12.8 | 59.2 | 128.2 |