| Literature DB >> 27535739 |
Reddy Rani Vangimalla1, Hyun-Hwan Jeong1, Kyung-Ah Sohn2.
Abstract
BACKGROUND: The increasing availability of multiple types of genomic profiles measured from the same cancer patients has provided numerous opportunities for investigating genomic mechanisms underlying cancer. In particular, association studies of gene expression traits with respect to multi-layered genomic features are highly useful for uncovering the underlying mechanism. Conventional correlation-based association tests are limited because they are prone to revealing indirect associations. Moreover, integration of multiple types of genomic features raises another challenge.Entities:
Keywords: Genomic association; Integrative analysis; Similarity fusion network; Sparse regression; TCGA
Mesh:
Year: 2016 PMID: 27535739 PMCID: PMC4989890 DOI: 10.1186/s12920-016-0192-7
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Overview of the complete workflow process
Data acquisition platforms
| Cancer type | Expression data | Methylation data |
|---|---|---|
| GBM | Broad Institute HT-HG-U133A Platform | JHU-USC-Illumina-DNA-Methylation Platform |
| LSCC | JHU-USC-Human-Methylation-27 Platform | |
| BIC | UNC-Agilent-G4502A-07 Platform | |
| COAD | ||
| KRCCC | UNC-Illumina-Hiseq-RNASeq Platform |
Details of all cancer profiles before and after filtration process
| Cancer type | Total samples | DNA Methylation | mRNA Expression | ||
|---|---|---|---|---|---|
| Before filtering | After filtering | Before filtering | After filtering | ||
| GBM | 215 | 1,491 | 597 | 12,042 | 385 |
| BIC | 105 | 23,094 | 17,814 | ||
| KRCCC | 124 | 24,532 | 20,532 | ||
| LSCC | 105 | 27,578 | 12,042 | ||
| COAD | 92 | 27,578 | 17,814 | ||
Fig. 2Illustration of regression network construction from regression coefficients. Two beta coefficients with similar effects for two different genes were fused using similarity measurement
Fig. 3Comparison of regression methods in terms of mean squared error (MSE)
Fig. 4Venn diagram of methylation features associated with expression traits. Methylation features found by top 200 regression coefficients is depicted for (a) Breast, (b) Colon, (c) GBM, (d) Kidney, and (e) Lung cancer
Fig. 5Common methylation features identified by at least three regression methods for each cancer profile
Fig. 6Network properties with varying affinity cutoffs on fused regression network (breast cancer): (a) the number of edges and (b) the size of largest connected component. Red line shows the result from real data and the gray lines are for the networks of dataset randomly permuted 100 times
Number of edges after filtering by the identified cutoff in each individual and the fused network
| Cancer type | DNA Methylation features | mRNA Expression traits | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Fused | GFLasso | Lasso | SGL | SIOL | Fused | GFLasso | Lasso | SGL | SIOL | |
| Breast | 238 | 19,283 | 5296 | 237 | 1218 | 742 | 182 | 21,162 | 811 | 69 |
| Colon | 348 | 4496 | 903 | 288 | 3260 | 784 | 15 | 18,287 | 195 | 104 |
| GBM | 584 | 737 | 14,925 | 581 | 5165 | 1272 | 317 | 22,767 | 299 | 480 |
| Kidney | 266 | 4656 | 3299 | 1072 | 2284 | 1089 | 44 | 20,213 | 4 | 393 |
| Lung | 364 | 43,752 | 4497 | 345 | 942 | 696 | 12 | 20,589 | 324 | 232 |
Network properties of methylation features and mRNA expression trails of Lung cancer profile
| Type | Properties | Lasso | GFLasso | SGL | SIOL | Corr. | Fused |
|---|---|---|---|---|---|---|---|
| Methylation network | Number of nodes | 557 | 418 | 338 | 468 | 552 | 394 |
| Network density | 0.03 | 0.50 | 0.006 | 0.009 | 0.12 | 0.005 | |
| Network diameter | 11 | 5 | 26 | 29 | 7 | 17 | |
| Clustering coefficient | 0.52 | 0.73 | 0.28 | 0.23 | 0.60 | 0.14 | |
| Average number of neighbors | 16.1 | 209.3 | 2.0 | 4.0 | 66.8 | 1.8 | |
| Connected components | 16 | 42 | 79 | 50 | 1 | 85 | |
|
| 0.52 | 0.28 | 0.87 | 0.40 | 0.37 | 0.87 | |
| mRNA expression network | Number of nodes | 372 | 17 | 239 | 232 | 276 | 342 |
| Network density | 0.29 | 0.09 | 0.011 | 0.009 | 0.04 | 0.012 | |
| Network diameter | 7 | 3 | 10 | 21 | 9 | 17 | |
| Clustering coefficient | 0.74 | 0.06 | 0.29 | 0.16 | 0.37 | 0.27 | |
| Average number of neighbors | 110.7 | 1.4 | 2.71 | 2.0 | 10.3 | 4.1 | |
| Connected components | 2 | 6 | 39 | 46 | 12 | 6 | |
|
| 0.13 | 0.47 | 0.79 | 0.98 | 0.79 | 0.81 |
Fig. 7Comparison of R value between the integrative regression network (denoted by “Fused”) and the correlation network. a Methylation network (b) Gene expression network
Significantly enriched GO BP terms (top 5) for the largest connected component of integrative regression network of methylation features
| Cancer type | Category | Term |
|
| FDR |
|---|---|---|---|---|---|
| Breast | GOTERM_BP_FAT | GO:0006468 ~ protein amino acid phosphorylation | 11 | 1.32e-07 | 2.05e-04 |
| INTERPRO | IPR008266:Tyrosine protein kinase, active site | 6 | 2.35e-07 | 2.55e-04 | |
| UP_SEQ_FEATURE | binding site:ATP | 9 | 2.81e-07 | 3.32e-04 | |
| INTERPRO | IPR001245:Tyrosine protein kinase | 6 | 6.25e-07 | 6.80e-04 | |
| GOTERM_MF_FAT | GO:0004672 ~ protein kinase activity | 10 | 6.90e-07 | 7.94e-04 | |
| Colon | SP_PIR_KEYWORDS | tyrosine-protein kinase | 12 | 1.86e-12 | 2.33e-09 |
| INTERPRO | IPR008266:Tyrosine protein kinase, active site | 12 | 1.91e-12 | 2.50e-09 | |
| INTERPRO | IPR001245:Tyrosine protein kinase | 12 | 1.69e-11 | 2.21e-08 | |
| SP_PIR_KEYWORDS | signal | 40 | 2.90e-10 | 3.63e-07 | |
| UP_SEQ_FEATURE | signal peptide | 40 | 3.51e-10 | 4.94e-07 | |
| GBM | GOTERM_BP_FAT | GO:00042127 ~ regulation of cell proliferation | 88 | 8.57e-42 | 1.52e-38 |
| SP_PIR_KEYWORDS | signal | 128 | 5.54e-29 | 7.73e-26 | |
| UP_SEQ_FEATURE | signal peptide | 128 | 1.03e-28 | 1.66e-25 | |
| GOTERM_BP_FAT | GO:0008284 ~ positive regulation of cell proliferation | 54 | 5.23e-28 | 9.28e-25 | |
| INTERPRO | IPR001245:Tyrosine protein kinase | 27 | 3.51e-22 | 5.21e-19 | |
| Kidney | GOTERM_BP_FAT | GO:0010033 ~ response to organic substance | 19 | 4.24e-08 | 6.99e-05 |
| GOTERM_BP_FAT | GO:0043067 ~ regulation of programmed cell death | 18 | 1.30e-06 | 0.00214 | |
| GOTERM_BP_FAT | GO:0010941 ~ regulation of cell death | 18 | 1.37e-06 | 0.00225 | |
| GOTERM_MF_FAT | GO:0032403 ~ protein complex binding | 10 | 1.61e-06 | 0.00215 | |
| KEGG_PATHWAY | has05200:Pathways in cancer | 14 | 1.65e-06 | 0.00174 | |
| Lung | GOTERM_BP_FAT | GO:0042127 ~ regulation of cell proliferation | 25 | 6.24e-13 | 1.02e-09 |
| SP_PIR_KEYWORDS | Proto-oncogene | 14 | 4.78e-12 | 5.92e-09 | |
| KEGG_PATHWAY | has05200:Pathways in cancer | 22 | 6.02e-12 | 6.56e-09 | |
| GOTERM_BP_FAT | GO:0007169 ~ transmembrane receptor protein tyrosine kinase signaling pathway | 14 | 1.24e-10 | 2.02e-07 | |
| GOTERM_BP_FAT | GO:000716 ~ enzyme linked receptor protein signaling pathway | 16 | 2.06e-10 | 3.37e-07 |