| Literature DB >> 22570537 |
Bill Andreopoulos1, Dimitris Anastassiou.
Abstract
Gene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types. We performed an integrated profiling to discover instances of miRNAs associated with a gene expression and DNA methylation signature across multiple cancer types. Using data from The Cancer Genome Atlas (TCGA), we revealed a concordant gene expression and methylation signature associated with the microRNA hsa-miR-142 across the same samples. In all cancer types examined, we found a signature of co-expression of a gene set R and methylated sites M, which correlate positively (M+) or negatively (M-) with the expression of hsa-miR-142. The set R consistently contains many genes, such as TRAF3IP3, NCKAP1L, CD53, LAPTM5, PTPRC, EVI2B, DOCK2, LCP2, CYBB and FYB. The signature is preserved across glioblastoma, ovarian, breast, colon, kidney, lung, uterine and rectum cancer. There is 28% overlap of methylation sites in M between glioblastoma (GBM) and ovarian cancer. There is 60% overlap of genes in R between GBM and ovarian (P = 1.3e(-11)). Most of the genes in R are known to be expressed in lymphocytes and haematopoietic stem cells, while M reflects membrane proteins involved in cell-cell adhesion functions. We speculate that the hsa-miR-142 associated signature may signal haematopoietic-specific processes and an accumulation of methylation events triggering a progressive loss of cell-cell adhesion. We also observed that GBM samples belonging to the proneural subtype tend to have underexpressed hsa-miR-142 and R genes, hypomethylated M+ and hypermethylated M-, while the mesenchymal samples have the opposite profile.Entities:
Keywords: cancer; correlation; gene expression; integrated analysis; methylation; microRNA
Year: 2012 PMID: 22570537 PMCID: PMC3306237 DOI: 10.4137/CIN.S9037
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Preprocessing: We discretized the expression and methylation data to determine if a gene is under, over, or moderately (non-extremely) expressed/methylated in each sample. To turn a gene (or miRNA or methylation site) into a discrete vector over all tumor samples, we evaluated the gene expression’s mean value and standard deviation over all samples. Then, if the gene had value greater than the mean plus standard deviation in a sample, we represented it as 1 (over-expressed or hypermethylated). If the gene had value lower than the mean minus standard deviation in a sample, we represented it as −1 (under-expressed or hypomethylated). Otherwise, the gene was represented as 0 in the sample. We used the discrete vector representation of each miRNA’s expression and methylation over all samples (either gbm or ovarian) from the preprocessing step. Step 1: We evaluated the Pearson correlation between all pairs of miRNAs and methylation sites in glioblastoma and ovarian cancer. Then, we ranked all pairwise correlations in descending order, as shown. Step 2: We kept the top ranked miRNAs for ovarian and gbm. A condition was that the Bonferroni-corrected P-value, derived from a two-tailed t-test that evaluated the Pearson correlation, should be less than 0.01. Step 3: We found the miRNAs appearing in the top ranks in both ovarian and gbm and we selected the best miRNA as representative. Step 4: Using the best miRNA as representative, we found the top correlated methylation sites and genes in gbm and ovarian. We refer to the resulting sets as M_gbm, M_ovarian (methylation sites) and R_gbm, R_ovarian (gene expression).
Figure 2Left: Histogram of the miRNA-gene expression correlations for cancer types that had miRNA and gene expression data (RNASeq and miRNASeq) available. We matched the miRNA-gene expression data on the same samples. Since we did not find a significant negative miRNA-gene correlation, the left graph shows just positive values. Right: the miRNA-methylation correlations for GBM and ovarian cancer (the other TCGA cancer types lacked integrated miRNA-methylation data). We matched the miRNA-methylation data on the same samples. We plotted the miRNAs that appear in all cancer types, which resulted in 680 miRNAs and 421 miRNAs, respectively. For each miRNA we included the correlation values for all 17,814 genes or 27,578 methylation sites. We averaged the correlations over all cancer types to determine if a correlation remains consistently high in all cancers. As shown, the miRNA hsa-miR-142 is highly correlated with a larger set of genes or methylation sites than other miRNAs.
Figure 3Results of step4 of our analysis method (see Methods section). Left: The methylation sites that are most correlated with hsa-miR-142, either positively or negatively, in glioblastoma (M_gbm). Right: M_ovarian in ovarian cancer. We distinguish the signature M into the methylation sites having positive correlation with hsa-miR-142 (M+) and those with negative correlation (M−). The overlap of methylation sites between M+_gbm (236) and M+_ovarian (471) is 76, while the overlap between M−_gbm (259) and M−_ovarian (126) is 63.
Overlap of R between cancers.
| R_GBM | R_COAD | R_BRCA | R_UCEC | R_READ | R_KIRC | |
|---|---|---|---|---|---|---|
| R_OV (1338) | 671 | 224 | 279 | 382 | 82 | 629 |
| R_GBM (1106) | – | 155 | 194 | 271 | 61 | 453 |
| R_COAD (289) | – | – | 197 | 224 | 85 | 254 |
| R_BRCA (404) | – | – | – | 305 | 78 | 368 |
| R_UCEC (486) | – | – | – | – | 87 | 441 |
| R_READ (101) | – | – | – | – | – | 88 |
| R_KIRC (1325) | – | – | – | – | – | – |
Notes: The parentheses show the number of genes in R for each cancer type. The cells show the R overlap sizes between different cancer types and the p-values of the overlaps using the hypergeometric cumulative distribution function.
Overlap of M signatures between cancers.
| M_GBM | M_COAD | M_BRCA | M_UCEC | M_READ | M_KIRC | M_KIRP | M_LUSC | |
|---|---|---|---|---|---|---|---|---|
| M_OV (597) | 139 | 341 | 375 | 143 | 68 | 268 | 8 | 351 |
| M_GBM (495) | – | 190 | 241 | 55 | 31 | 133 | 4 | 156 |
| M_COAD (1749) | – | – | 705 | 143 | 131 | 403 | 9 | 628 |
| M_BRCA (1787) | – | – | – | 155 | 98 | 466 | 7 | 709 |
| M_UCEC (184) | – | – | – | – | 39 | 121 | 5 | 149 |
| M_READ (131) | – | – | – | – | – | 73 | 4 | 86 |
| M_KIRC (866) | – | – | – | – | – | – | 9 | 361 |
| M_KIRP (10) | – | – | – | – | – | – | – | 7 |
| M_LUSC (1258) | – | – | – | – | – | – | – | – |
Notes: The parentheses show the number of methylation sites in M for each cancer type. The cells show the M overlap sizes between different cancer types and the P-values of the overlaps using the hypergeometric cumulative distribution function.
The distribution of glioblastoma samples between proneural, neural, classical and mesenchymal classes according to R gene and miRNA expression.
| R genes and hsa-miR-142 expression | Proneural | Neural | Classical | Mesenchymal | Multinomial probability |
|---|---|---|---|---|---|
| Over-expressed | 2 | 4 | 2 | 25 | 5.49178E-09 |
| Under-expressed | 12 | 2 | 5 | 0 | 0.001937516 |
Notes: We observed a significant difference between the classes: in proneural R is under-expressed, while in mesenchymal R is over-expressed. The multinomial probability in the last column is the total probability under the null hypothesis that at least 25 out of 33 over-expressed samples (or at least 12 out of 19 under-expressed samples) would have been classified in any one of the four classes.
The known functions of the top-ranked genes in R.
| SASH3 (CXorf9) | Signaling adapter protein in lymphocytes | CD2 | Cell adhesion molecule found on the surface of T cells and natural killer cells |
| CCL5 | Regulated upon activation, normal T-cell expressed | CD37 | Transmembrane protein, leukocyte antigen, may play a role in T-cell-B-cell interactions |
| FLJ21438 | Proteins in B-cell exosomes | DOK3 | Negative regulator of JNK signaling in B-cells |
| Rgr | Membrane protein, retinal G-protein coupled receptor | PSCD4 | Plasma membrane, regulation of cell adhesion |
| TLR2 | Membrane protein, immune system signaling pathway | ITGB2 | Leukocyte cell adhesion molecule |
| APBB1IP | Peripheral membrane protein, mediates Rap1-induced adhesion | BTK | Peripheral membrane protein, plays a crucial role in B-cell development (mature B lymphocytes) |
| DOCK2 | Peripheral membrane protein, haematopoietic and lymphocyte cell-specific protein | CXCR3 | Membrane protein, expressed primarily on activated T lymphocytes and NK cells, regulate leukocyte trafficking |
| ARHGAP9 | Regulates adhesion of hematopoietic cells to the extracellular matrix | FYB | Signaling transduction in T cells, modulates the expression of interleukin-2 |
| LCP2 | Lymphocyte protein promoting T cell development | PTPRC | Required for T-cell activation. Interleukin-12-dependent in activated lymphocytes |
| CD53 | Leukocyte surface antigen, signal transduction in T cells | LAIR1 | Leukocyte-associated receptor, found on NK cells, T cells, and B cells |
| TRAF3IP3 | Gene expressed in t-lyphocytes | URP2 | Cell adhesion in hematopoietic cells. Required for leukocyte adhesion to endothelial cells |
| CYBB | Glycoprotein integral to plasma membrane | AIF1 | Promotes the proliferation of T-lymphocytes. Enhances lymphocyte migration |
| NCKAP1L | Membrane-associated haematopoietic protein | ITGB2 | Leukocyte cell adhesion molecule |
| IL10RA | Interleukin-10 receptor in membrane proteins, expressed in hemopoietic cells and lymphocytes | HAVCR2 | T-cell membrane protein |
| LAPTM5 | Transmembrane protein associated with lysosomes, may play a role in hematopoiesis | CD48 | Ligand for CD2. Might facilitate interaction between activated lymphocytes. Probably involved in regulating T-cell activation |
| PLEK | Hemopoietic progenitor cell differentiation | SLA | Negatively regulates T-cell receptor (TCR) signaling |
| ARHGAP30 | Rho GTPase activating protein 30 | CCR5 | Integral membrane protein, mainly expressed on T cells |
| ARHGAP25 | Actin remodeling, cell migration | EVI2B | Integral membrane protein, bone marrow and blood expression |
Note: All of the genes are expressed in leukocytes, specifically lymphocytes, or haematopoietic stem cells.
Functional annotations of the methylation sites that overlap between M_GBM and M_ovarian.
| AFF3 (M | Expressed in the lymphoid system, transcription regulation | Transmembrane receptor associated with lysosomes | |
| ALDH3A1 (M+) | Metabolism of neurotransmitters | Lymphocyte protein promoting T cell development | |
| Actin remodeling, cell migration | LRP3 (M+) | Integral membrane protein | |
| Breast cancer-associated | LSM7 (M+) | Ribonucleoprotein complex | |
| C10orf27 (M−) | Cell differentiation | Inflammatoy response, apoptosis | |
| C16orf54 (M−) | Transmembrane protein | MAMSTR (M+) | Transcription regulation |
| C2orf40 (M−) | Cancer-related augurin precursor | MPHOSPH9 (M−) | Peripheral membrane protein |
| C6orf25 (M−) | Plasma membrane-bound cell surface receptor | MTMR11 (M +) | Protein-tyrosine phosphatase |
| CARD8 (M−) | Apoptotic protein | Superoxide-generating NADPH oxidase activity | |
| CCDC80 (M+) | Promotes cell adhesion and matrix assembly | NCOR2 (M+) | Transcriptional repression |
| CD101 (M−) | Leukocyte surface membrane protein | OGG1 (M+) | DNA repair enzyme |
| CD6 (M−) | Plasma membrane protein involved in T-cell activation | OSM (M−) | Tumor inhibitor |
| CD79B (M−) | B lymphocyte receptor | PAQR6 (M+) | Integral membrane protein |
| CHRM1 (M+) | G protein-coupled receptor membrane protein | PHKG1 (M+) | Protein kinase activity |
| CX3CL1 (M+) | T cell leukocyte adhesion and migration process at the endothelium | PLD4 (M−) | Single-pass membrane protein |
| Peripheral membrane protein, B lymphocyte adapter protein | PLEKHA4 (M+) | Peripheral Membrane protein | |
| DAPK2 (M+) | Cell apoptosis inducer | POR (M−) | ER membrane oxidoreductase |
| DDAH1 (M+) | Regulator of nitric oxide generation | PPP2R1A (M+) | Protein phosphatase |
| DNAI1 (M+) | Dynein intermediate chain, cytoplasmic | PRELP (M+) | Extracellular matrix, collagen binding in connective tissue |
| FBN3 (M+) | Extracellular matrix structural constituent, fibrillin. | PTGFRN (M+) | Integral membrane protein, Single-pass type I membrane protein |
| FAM113B (M−) | Hydrolase activity | PTPRCAP (M−) | Transmembrane phosphoprotein, plasma membrane, integral membrane protein |
| Hypothetical protein | RB1 (M−) | Tumor suppressor, negative regulator of the cell cycle | |
| FGR (M−) | Cell migration and adhesion | ROBO4 (M−) | External side of plasma membrane |
| FUT3 (M+) | Membrane protein, tumor metastasis and adhesion | RPE65 (M+) | Plasma membrane protein |
| GGT1 (M+) | Membrane protein | RUNX1 (M−) | Acute myeloid leukemia 1 protein |
| GIPC1 (M+) | Regulator cell surface receptor, trafficking | SEMA3B (M−) | Extracellular membrane, Neuronal development, tumor suppression by apoptosis induction |
| GPX2 (M−) | Glutathione peroxidase | SHROOM1 (M−) | Neuronal development |
| Hemopoietic progenitor cell differentiation | Negative regulator T-cell receptor (TCR) signaling | ||
| AFF3 (M−) | Expressed in the lymphoid system, transcription regulation | LAPTM5 (M−) | Transmembrane receptor associated with lysosomes |
| GRIP1 (M+) | Glutamate receptor-interacting protein 1 | SLC44A2 (M+) | Plasma membrane protein |
| HKDC1 (M+) | Hexokinase domain-containing protein 1 | SNCG (M+) | Breast cancer-specific gene 1 protein |
| IL17RE (M−) | Membrane protein, interleukin receptor | SSTR3 (M+) | Plasma membrane protein |
| Extracellular binding | SSTR5 (M+) | Plasma membrane protein | |
| IL22RA1 (M+) | Membrane protein, interleukin receptor | TMEM149 (M−) | Transmembrane protein |
| INCA1 (M+) | Inhibitor of CDK | Tumor necrosis factor, immune homeostasis | |
| INPP5J (M+) | Plasma membrane protein | TNKS1BP1 (M+) | Tankyrase-1-binding protein, enzyme binding |
| KCNQ1 (M+) | Potassium voltage-gated channel | TRAF1 (M−) | TNF receptor associated factor |
| KIAA0427 (M+) | Regulation of translational initiation | WFDC2 (M−) | Extracellular region, proteolysis |
| KLHL34 (M−) | Kelch-like protein 34 | ZNF205 (M+) | Zinc finger protein, transcription regulation |
| Protein binding, B-lymphocyte antigen receptor signaling | ZNF48 (M+) | Zinc finger protein, transcription regulation | |
| LAMB2 (M+) | Basement membrane protein, attachment, migration and organization of cells into tissues during embryonic development | ZNF512B (M+) | Zinc finger protein, transcription regulation |
Notes: Many of the genes in R are also found in the M− set and the names of these genes are highlighted in bold.
The top three functional annotation term clusters associated with the list of 671 R genes that overlap between R_ovarian and R_GBM.
| Gene ontology annotation | Count | |
|---|---|---|
| GO:0006952~defense response | 81 | 3.31E-43 |
| GO:0006954~inflammatory response | 47 | 4.55E-26 |
| GO:0009611~response to wounding | 54 | 1.12E-22 |
| Disulfide bond | 150 | 9.53E-37 |
| Disulfide bond | 146 | 1.09E-35 |
| Topological domain:Extracellular | 135 | 1.61E-30 |
| Glycoprotein | 174 | 1.83E-30 |
| Glycosylation site:N-linked (GlcNAc...) | 168 | 3.25E-29 |
| Topological domain:Cytoplasmic | 147 | 1.20E-27 |
| Signal | 137 | 3.38E-24 |
| Signal peptide | 137 | 6.20E-24 |
| Membrane | 202 | 9.05E-24 |
| Receptor | 88 | 4.19E-22 |
| GO:0005886~plasma membrane | 162 | 1.80E-21 |
| Transmembrane region | 170 | 2.00E-21 |
| Transmembrane | 170 | 4.00E-21 |
| GO:0031224~intrinsic to membrane | 189 | 5.47E-15 |
| GO:0016021~integral to membrane | 184 | 1.11E-14 |
| GO:0005886~plasma membrane | 162 | 1.80E-21 |
| GO:0005887~integral to plasma membrane | 81 | 3.86E-20 |
| GO:0031226~intrinsic to plasma membrane | 81 | 1.52E-19 |
| GO:0044459~plasma membrane part | 113 | 4.82E-19 |
Note: Annotations are clustered together if they have similar gene members; the more common genes annotations share, the higher the chance they will be grouped together. The count shows how frequently the particular annotation occurs in a cluster and the p-value shows the likelihood that such a count or a higher one would be observed in a random cluster. The p-value associated with each annotation term inside a cluster is statistically measured by Fisher Exact in DAVID system.