| Literature DB >> 32329416 |
Qi-En He1, Yi-Fan Tong1, Zhou Ye2, Li-Xia Gao2, Yi-Zhi Zhang2, Ling Wang3, Kai Song1.
Abstract
Radiotherapy is one of the most important cancer treatments, but its response varies greatly among individual patients. Therefore, the prediction of radiosensitivity, identification of potential signature genes, and inference of their regulatory networks are important for clinical and oncological reasons. Here, we proposed a novel multiple genomic fused partial least squares deep regression method to simultaneously analyze multi-genomic data. Using 60 National Cancer Institute cell lines as examples, we aimed to identify signature genes by optimizing the radiosensitivity prediction model and uncovering regulatory relationships. A total of 113 signature genes were selected from more than 20,000 genes. The root mean square error of the model was only 0.0025, which was much lower than previously published results, suggesting that our method can predict radiosensitivity with the highest accuracy. Additionally, our regulatory network analysis identified 24 highly important 'hub' genes. The data analysis workflow we propose provides a unified and computational framework to harness the full potential of large-scale integrated cancer genomic data for integrative signature discovery. Furthermore, the regression model, signature genes, and their regulatory network should provide a reliable quantitative reference for optimizing personalized treatment options, and may aid our understanding of cancer progress mechanisms.Entities:
Keywords: Multiple genomic data; gene regulatory network; integrated regression method; radiosensitivity; signature genes
Mesh:
Year: 2020 PMID: 32329416 PMCID: PMC7225787 DOI: 10.1177/1533033820909112
Source DB: PubMed Journal: Technol Cancer Res Treat ISSN: 1533-0338
Figure 1.The overall structure of this paper. Based on the fact that the current standard approaches rely on separate mono-genomics data analyses followed by manual integration, multiple genomic data fused regression approach (MGPLS) is proposed to identifying signature genes. MGPLS method can analysis all data types simultaneously using a single integrated regression model as well as eliminating noise effects. VIP: variable importance on projection; CV: cross-validation; UVE: uninformative variable elimination.
Measured and predicted SF2 values for NCI-60 cell lines
| Cell line | Measured SF2 | Predicted SF2(MGPLS) | Predicted SF2(SVM) | Error(MGPLS) | Error(SVM) |
|---|---|---|---|---|---|
| CNS:U251 | 0.57 | 0.568 | 0.57 | 0.002 | 0 |
| OV:OVCAR-4 | 0.29 | 0.293 | 0.291 | -0.003 | -0.001 |
| LE:CCRF-CEM | 0.185 | 0.180 | 0.186 | 0.005 | -0.001 |
| CNS:SNB-19 | 0.43 | 0.425 | 0.43 | 0.005 | 0 |
| RE:SN12C | 0.62 | 0.611 | 0.618 | 0.009 | 0.002 |
| BR:T-47D | 0.52 | 0.531 | 0.52 | -0.011 | 0 |
| RE:ACHN | 0.72 | 0.707 | 0.722 | 0.013 | -0.002 |
| LE:HL-60(TB) | 0.315 | 0.335 | 0.319 | -0.020 | -0.004 |
| ME:MALME-3M | 0.8 | 0.779 | 0.797 | 0.021 | 0.003 |
| ME:SK-MEL-5 | 0.72 | 0.697 | 0.72 | 0.023 | 0 |
| OV:OVCAR-8 | 0.6 | 0.572 | 0.597 | 0.028 | 0.003 |
| ME:UACC-257 | 0.48 | 0.510 | 0.48 | -0.030 | 0 |
| ME:SK-MEL-28 | 0.74 | 0.709 | 0.737 | 0.031 | 0.003 |
| LE:RPMI-8226 | 0.1 | 0.069 | 0.1 | 0.031 | 0 |
| PR:DU-145 | 0.52 | 0.488 | 0.52 | 0.032 | 0 |
| BR:HS 578T | 0.79 | 0.757 | 0.79 | 0.033 | 0 |
| CO:HCT-15 | 0.4 | 0.435 | 0.4 | -0.035 | 0 |
| CO:HCT-116 | 0.38 | 0.418 | 0.38 | -0.038 | 0 |
| PR:PC-3 | 0.484 | 0.445 | 0.486 | 0.039 | -0.002 |
| LC:EKVX | 0.7 | 0.660 | 0.694 | 0.040 | 0.006 |
| OV:OVCAR-5 | 0.408 | 0.452 | 0.409 | -0.044 | -0.001 |
| RE:TK-10 | 0.52 | 0.475 | 0.522 | 0.045 | -0.002 |
| LE:K-562 | 0.05 | 0.100 | 0.054 | -0.050 | -0.004 |
| OV:NCI/ADR-RES | 0.57 | 0.520 | 0.572 | 0.050 | -0.002 |
| CNS:SNB-75 | 0.55 | 0.602 | 0.55 | -0.052 | 0 |
| ME:M14 | 0.42 | 0.477 | 0.42 | -0.057 | 0 |
| ME:UACC-62 | 0.52 | 0.461 | 0.519 | 0.059 | 0.001 |
| OV:OVCAR-3 | 0.55 | 0.491 | 0.548 | 0.059 | 0.002 |
| LC:NCI-H322M | 0.65 | 0.587 | 0.65 | 0.063 | 0 |
| RE:UO-31 | 0.62 | 0.686 | 0.619 | -0.066 | 0.001 |
| CO:COLO 205 | 0.69 | 0.762 | 0.687 | -0.072 | 0.003 |
| OV:IGROV1 | 0.39 | 0.463 | 0.39 | -0.073 | 0 |
| LE:SR | 0.07 | 0.143 | 0.072 | -0.073 | -0.002 |
| CNS:SF-539 | 0.82 | 0.746 | 0.817 | 0.074 | 0.003 |
| BR:MCF7 | 0.576 | 0.500 | 0.574 | 0.076 | 0.002 |
| ME:SK-MEL-2 | 0.66 | 0.737 | 0.66 | -0.077 | 0 |
| RE:RXF 393 | 0.67 | 0.754 | 0.669 | -0.084 | 0.001 |
| LC:NCI-H522 | 0.43 | 0.344 | 0.431 | 0.086 | -0.001 |
| ME:LOX IMVI | 0.68 | 0.588 | 0.68 | 0.092 | 0 |
| LC:HOP-92 | 0.43 | 0.522 | 0.43 | -0.092 | 0 |
| CNS:SF-268 | 0.45 | 0.543 | 0.45 | -0.093 | 0 |
| ME:MDA-MB-435 | 0.1795 | 0.273 | 0.183 | -0.094 | -0.0035 |
| BR:BT-549 | 0.632 | 0.537 | 0.635 | 0.095 | -0.003 |
| ME:MDA-N | 0.45 | 0.352 | 0.449 | 0.098 | 0.001 |
| CNS:SF-295 | 0.73 | 0.631 | 0.73 | 0.099 | 0 |
| LE:MOLT-4 | 0.05 | 0.149 | 0.052 | -0.099 | -0.002 |
| RE:786-0 | 0.66 | 0.551 | 0.659 | 0.109 | 0.001 |
| LC:HOP-62 | 0.164 | 0.277 | 0.166 | -0.113 | -0.002 |
| CO:KM12 | 0.42 | 0.535 | 0.418 | -0.115 | 0.002 |
| LC:A549/ATCC | 0.61 | 0.730 | 0.61 | -0.120 | 0 |
| RE:A498 | 0.61 | 0.734 | 0.62 | -0.124 | -0.001 |
| CO:HCC-2998 | 0.44 | 0.572 | 0.439 | -0.132 | 0.001 |
| OV:SK-OV-3 | 0.9 | 0.767 | 0.894 | 0.133 | 0.006 |
| RE:CAKI-1 | 0.37 | 0.517 | 0.37 | -0.147 | 0 |
| CO:SW-620 | 0.62 | 0.473 | 0.622 | 0.147 | -0.002 |
| LC:NCI-H226 | 0.63 | 0.786 | 0.626 | -0.156 | 0.004 |
| LC:NCI-H460 | 0.84 | 0.671 | 0.835 | 0.169 | 0.005 |
| BR:MDA-MB-231 | 0.82 | 0.613 | 0.82 | 0.207 | 0 |
| CO:HT29 | 0.79 | 0.567 | 0.785 | 0.223 | 0.005 |
| LC:NCI-H23 | 0.086 | 0.315 | 0.0925 | -0.229 | -0.0065 |
* Cell lines sorted by multiple genomic data fused partial least square deep regression (MGPLS).
22]
Figure 2.Flow chart of signature genes selection using GE and CNV data. MGPLS-UVE algorithm with 10 times of 6-fold cross-validation is employed to select the genes with the stably highest contribution to SF2 predicting model. There are 7622 variables at the beginning and 500 variables are left after MGPLS rough selection.
Figure 3.Gene regulatory network among 24 “Hub” genes. The color of a gene (circle or triangle nodes) matches the color of its arrows to identify regulatory relationships between these genes more efficiently. There are two types of arrows: sharp arrows indicate the promotion of expression and blunt arrows mean the inhibition of expression. In addition, there are 12 genes (triangular nodes) whose CNV have a significant promoting effect on their respective expression process. They are SRRM1, PDCD2, RPL9, SNRPD1, ATP5A1, BLM, EWSRI, MIPEP, MCM3, MYB, CLNSIA and RPL34. it is worth noting that genes regulated by these 24 genes but not the “Hub” are not included in Figure 3.
Details of 24 “hub” genes
| Serial number | Gene | Entrez gene id | Chromosome | Cytoband | Regression | Data |
|---|---|---|---|---|---|---|
| 1 | ATP5A1 | 498 | 18 | 18q21 | 0.0114 | GE |
| 2 | BLM | 641 | 15 | 15q26.1 | -0.0033 | GE |
| 3 | CENPC | 1060 | 4 | 4q13.2 | -0.0009 | GE |
| 4 | CLNS1A | 1207 | 11 | 11q13.5-q14 | -0.0009 | GE |
| 5 | EWSR1 | 2130 | 22 | 22q12.2 | 0.0069 | GE |
| 6 | MCM3 | 4172 | 6 | 6p12 | 0.0007 | GE |
| 7 | MIPEP | 650794 | 13 | 13q12.11 | 0.0121 | GE |
| 8 | MYB | 4602 | 6 | 6q22-q23 | -0.0026 | GE |
| 9 | PDCD2 | 5134 | 6 | 6q27 | 0.0016 | GE |
| 10 | PRCC | 5546 | 1 | 1q21.1 | -0.0118 | GE |
| 11 | RPL9 | 6133 | 4 | 4p13 | -0.0028 | GE |
| 12 | RPL34 | 6164 | 4 | 4q25 | -0.0053 | GE |
| 13 | SMARCC1 | 6599 | 3 | 3p21.31 | -0.0110 | GE |
| 14 | SNRPD1 | 6632 | 18 | 18q11.2 | 0.0011 | GE |
| 15 | ZBTB39 | 9880 | 12 | 12q13.3 | -0.0039 | GE |
| 16 | DENND4B | 9909 | 1 | 1q21 | -0.0066 | GE |
| 17 | SRRM1 | 10250 | 1 | 1p36.11 | -0.0082 | GE |
| 18 | PAICS | 10606 | 4 | 4q12 | -0.0023 | GE |
| 19 | NISCH | 11188 | 3 | 3p21.1 | -0.0151 | GE |
| 20 | LRCH1 | 23143 | 13 | 13q14.11 | 0.0068 | GE |
| 21 | CNOT10 | 25904 | 3 | 3p22.3 | -0.0061 | GE |
| 22 | GAR1 | 54433 | 4 | 4q25 | -0.0041 | GE |
| 23 | MED28 | 80306 | 4 | 4p16 | -0.0066 | GE |
| 24 | SHPRH | 257218 | 6 | 6q24.3 | -0.0021 | GE |
Half of the 24 “hub” genes uncovered by GRN inference showed strong correlations between their own GE and CNV values (Figure 3). Corresponding Pearson correlation coefficients between GEs and CNVs are listed in Table S3, of which 12 out of 24 genes are >0.5.
RMSE comparison of different models.
| Linear method | Nonlinear method | ||||||
|---|---|---|---|---|---|---|---|
| Only GE | Only CNV | Multi-genomics | Only GE | Only CNV | Multi-genomics | ||
| This paper | 113 genes | 0.10 | 0.21 | 0.094 | 0.0031 | 0.015 | 0.0025 |
| 24 hub gene | 0.22 | 0.40 | 0.18 | 1.0 | |||
| Zhang et al | 0.16 | 0.011 | |||||
Figure 4.The measured and predicted SF2s of the 60 cell lines obtained using current signature genes and other published models. The measured and predicted SF2s of the 60 cell lines obtained using current signature genes and other published models.