| Literature DB >> 32300588 |
Ning Zhao1, Maozu Guo1,2,3, Kuanquan Wang1,4, Chunlong Zhang5, Xiaoyan Liu4.
Abstract
Prognostic biomarkers dedicating to treat cancer are very difficult to identify. Although high-throughput sequencing technology allows us to mine prognostic biomarkers much deeper by analyzing omics data, there is lack of effective methods to comprehensively utilize multi-omics data. In this work, we integrated multi-omics data [DNA methylation (DM), gene expression (GE), somatic copy number alternation, and microRNA expression (ME)] and proposed a method to rank genes by desiring a "Score." Applying the method, cancer-specific prognostic biomarkers for 13 cancers were obtained. The prognostic powers of the biomarkers were further assessed by C-indexes (ranged from 0.76 to 0.96). Moreover, by comparing the 13 survival-related gene lists, seven genes (SLK, API5, BTBD2, PTAR1, VPS37A, EIF2B1, and ZRANB1) were found to be associated with prognosis in a variety of cancers. In particular, SLK was more likely to be cancer-related due to its high missense mutation rate and associated with cell adhesion. Furthermore, after network analysis, EPRS, HNRNPA2B1, BPTF, LRRK1, and PUM1 were demonstrated to have a broad correlation with cancers. In summary, our method has a better integration of multi-omics data that can be extended to the researches of other diseases. And the prognostic biomarkers had a better prognostic power than previous methods. Our results could provide a reference for translational medicine researchers and clinicians.Entities:
Keywords: biomarker; multi-omics; pan-cancer; prognosis; survival
Year: 2020 PMID: 32300588 PMCID: PMC7142216 DOI: 10.3389/fbioe.2020.00268
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
The sample size of 13 types of cancers.
| Cancer | Clinical | DM (450K) | SCNA (nocnv) | GE (FPKM-UQ) | ME (isoform) | Total size |
| Bladder urothelial carcinoma [BLCA] | 291 | 412 | 410 | 408 | 409 | 283 |
| Breast invasive carcinoma [BRCA] | 735 | 783 | 1092 | 1091 | 1078 | 497 |
| Cervical squamous cell carcinoma and endocervical adenocarcinoma [CESC] | 177 | 307 | 295 | 304 | 307 | 166 |
| Colon adenocarcinoma [COAD] | 186 | 296 | 452 | 456 | 444 | 164 |
| Head and neck squamous cell carcinoma [HNSC] | 416 | 528 | 522 | 500 | 523 | 380 |
| Kidney renal clear cell carcinoma [KIRC] | 452 | 319 | 532 | 530 | 516 | 251 |
| Kidney renal papillary cell carcinoma [KIRP] | 183 | 275 | 289 | 288 | 291 | 172 |
| Brain lower grade glioma [LGG] | 347 | 516 | 515 | 511 | 512 | 340 |
| Liver hepatocellular carcinoma [LIHC] | 259 | 377 | 375 | 371 | 372 | 250 |
| Lung adenocarcinoma [LUAD] | 292 | 460 | 520 | 515 | 515 | 229 |
| Lung squamous cell carcinoma [LUSC] | 299 | 370 | 503 | 501 | 478 | 190 |
| Sarcoma [SARC] | 204 | 261 | 260 | 259 | 259 | 200 |
| Stomach adenocarcinoma [STAD] | 201 | 395 | 441 | 375 | 436 | 157 |
FIGURE 1The workflow of survival-related genes identification. (A) Candidate survival-related gene screening. DNA methylation, gene expression, somatic copy-number alteration, and microRNA (miRNA) expression profiles of TCGA for the same samples were extracted. miRNA expression data were corresponding to genes according to miRNA–mRNA interactions. Then, we got four types of data in the same samples and the same genes. On each omics data, univariate Cox proportional hazards model was utilized to identify survival-related genes. Only the genes associated with survival in more than two types of data were considered to be candidate genes. (B) Prognostic biomarker identifying. For the selected candidate genes, the multivariate Cox proportional hazards model was then applied to get risk scores (RS). Further, scores for ranking genes were obtained by calculating GS scores. In which, A, B, C, and D were binary variables indicating whether the gene was survival-related at the four omics data or not (“1” for related and “0” for not), respectively. The high ranked genes were identified survival-related.
FIGURE 2The information of pan-cancer samples. (A) The sample set intersections of the multi-omics data. Only the intersecting samples were chosen. We selected 3279 samples in this study. (B) The proportion of each cancer. (C) The clinical features distribution of the 3279 samples.
The prognostic biomarkers of each cancer.
| Cancers | Genes |
| BLCA | |
| BRCA | |
| CESC | |
| COAD | |
| HNSC | |
| KIRC | |
| KIRP | |
| LGG | |
| LIHC | |
| LUAD | |
| LUSC | |
| SARC | |
| STAD |
FIGURE 3The Kaplan–Meier curves of top-10 survival-related genes for each cancer. The green lines represented the low risk groups and the red lines represented the high risk groups. “ + ” indicated the censored follow-ups. (A) BLCA. (B) BRCA. (C) CESC. (D) COAD. (E) HNSC. (F) KIRC. (G) KIRP. (H) LGG. (I) LIHC. (J) LUAD. (K) LUSC. (L) SARC. (M) STAD.
FIGURE 4The C-index comparison of the prognostic power of our prognostic biomarkers in 13 cancers.
FIGURE 5Pan-cancer functional comparison of survival-related genes. (A) The representative KEGG pathways and GO functions enriched by the top-10 prognostic genes of each cancer. (B) The distribution of cancers enriched to each function. The size of the dots represented the number of enriched genes. The color of the dots represented the p-values.
FIGURE 6The intersection of pan-cancer genes. (A) The intersections of the lists of survival-related genes (left bottom) and the intersection of associated functions (top right corner) of cancers. The total numbers of genes associated with survival of each cancer were on the left, and the total associated functions were on the right. The color blocks represented the number of intersecting samples of each two cancers. The darker the color, the greater the intersection was. (B) The pan-cancer survival-related genes. The red blocks indicated that the gene was survival-associated with the cancer.
The seven pan-cancer survival-related genes.
| Genes | Functions | PubMed IDs |
| Cell adhesion, regulation of cell cycle | 26676752, 27849608, 22057237, 27247392, 22699621 | |
| Protein catabolic process | 26676752, 27790711, 25559195, 27006499, 24316982, 27178121, 23685747, 22421440 | |
| Protein catabolic process | 26676752, 22057237 | |
| Cellular protein modification process | 26676752 | |
| Viral process | 27849608, 24316982, 23045694 | |
| Glial cell development | 22057237 | |
| Regulation of cell death | 27790711, 27178121 |
FIGURE 7The characteristics of SLK. (A)The results of UCSC Genome Browser. (B) Distribution of mutations on SLK. (C) The functions of SLK. (D) The protein–protein interactions of SLK.
FIGURE 8Pan-cancer survival-related gene networks. (A) The chromosome distribution of the genes. The blue, green, red, and purple blocks represented the survival correlation of the genes in each omics data, respectively. The links in the middle represented the interaction of the genes. (B) The interaction network among the top-10 survival-related genes. Different colors represented prognostic genes of different cancers. Red nodes represented genes prognosis-related in multiple cancers. (C) The degree distribution of the prognostic-related gene network. (D) The betweeness centrality of the prognostic-related gene network.
FIGURE 9The prognostic biomarkers of KIRC and LUSC. (A,B) The C-index comparison of the prognostic power of our 10-gene prognostic biomarkers and other work. (C,D) The heatmap of samples hierarchical clustering by the expression of the 10-gene prognostic biomarkers. The bar on the top of the heatmap indicated the group the samples really belong to. Red represented tumor and green represented normal.
FIGURE 10The comparison of survival-related genes and differentially expressed genes. (A) The differentially expressed genes of LUSC. Red represented high expression and green represented low expression. Differentially expressed survival-related genes were marked. (B) The copy number variation peaks of LUSC. (C) Chromosomal positions and interactions of prognostic biomarkers and differentially expressed genes of LUSC. (D) The differentially expressed genes of KIRC. Red represented high expression and green represented low expression. Differentially expressed survival-related genes were marked. (E) The copy number variation peaks of KIRC. (F) Chromosomal positions and interactions of prognostic biomarkers and differentially expressed genes of KIRC.
FIGURE 11The comparison of the results for multi-omics data and single omics data. (A) The decision curve of multi-omics data and each omics data in KIRC. The thin oblique line represented the assumption that all patients have been treated. The black line represented the assumption that no patients have been treated. (B) The C-indexes of multi-omics data and each omics data in KIRC. (C) The decision curve of multi-omics data and each omics data in LUSC. The thin oblique line represented the assumption that all patients have been treated. The black line represented the assumption that no patients have been treated. (D) The C-indexes of multi-omics data and each omics data in LUSC.