| Literature DB >> 29434863 |
Richa K Makhijani1, Shital A Raut1, Hemant J Purohit2.
Abstract
Cancer is one of the leading causes of mortality worldwide, and in particular, breast cancer in women, prostate cancer in men, and lung cancer in both women and men. The present study aimed to identify a common set of genes which may serve as indicators of important molecular and cellular processes in breast, prostate and lung cancer. Six microarray gene expression profile datasets [GSE45827, GSE48984, GSE19804, GSE10072, GSE55945 and GSE26910 (two datasets for each cancer)] and one RNA-Seq expression dataset (GSE62944 including all three cancer types), were downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) were identified in each individual cancer type using the LIMMA statistical package in R, and then a comparison of the resulting gene lists was performed to identify common DEGs across cancer types. This analysis was performed for microarray and RNA-Seq datasets individually, revealing a set of 62 and 1,290 differentially expressed genes respectively, which may be associated with the three cancers. Out of these genes, 44 were common to both analyses, and hence termed key genes. Gene Ontology functional annotation, Kyoto Encyclopedia of Genes and Genomes pathway mapping and literature citations were used to confirm the role of the key genes in cancer. Finally, the heterogeneity of expression of the key genes was explored using the I2 statistic (meta package in R). The results demonstrated non-heterogeneous expression of 6 out of the 44 key genes, whereas the remaining genes exhibited significant heterogeneity in expression across microarray samples. In conclusion, the identified DEGs may play important roles in the pathogenesis of breast, prostate and lung cancer and may be used as biomarkers for the development of novel diagnostic and therapeutic strategies.Entities:
Keywords: LIMMA; RNA-Seq; cancer; differentially expressed genes; heterogeneity; meta-analysis; microarray
Year: 2017 PMID: 29434863 PMCID: PMC5776944 DOI: 10.3892/ol.2017.7508
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Characteristics of the individual datasets used in the present study.
| Type of dataset | Type of cancer | Dataset identification number | Platform | Number of probes/genes | Number of samples (tumor/normal) |
|---|---|---|---|---|---|
| Microarray | Breast | GSE45827 | GPL570 | 54,675 | 174 (163/11) |
| gene expression | GSE48984 | GPL96 | 22,283 | 22 (13/9) | |
| Lung | GSE19804 | GPL570 | 54,675 | 120 (60/60) | |
| GSE10072 | GPL96 | 22,283 | 107 (57/50) | ||
| Prostate | GSE55945 | GPL570 | 54,675 | 19 (12/7) | |
| GSE26910 | GPL570 | 54,675 | 12 (6/6) | ||
| RNA-Seq | Breast | GSE62944 | GPL9052 | 23,368 | 1,230 (1,118/112) |
| gene expression | Lung squamous cell carcinoma | 551 (501/50) | |||
| Prostate adenocarcinoma | 552 (501/51) |
Differential expression analysis results for each microarray dataset.
| Cancer | Breast | Lung | Prostate | |||
|---|---|---|---|---|---|---|
| GEO dataset | GSE45827 | GSE48984 | GSE19804 | GSE10072 | GSE26910 | GSE55945 |
| Platform | GPL570 | GPL96 | GPL570 | GPL96 | GPL570 | GPL570 |
| Number of probes | 54,675 | 22,283 | 54,675 | 22,283 | 5,4675 | 5,4675 |
| Number of samples | 174 | 22 | 120 | 107 | 12 | 19 |
| Number of | 7,006 | 3,513 | 2,026 | 829 | 77 | 539 |
| differentially | Union of the two | Union of the two | Union of the two | |||
| expressed genes | 9,248 | 2,215 | 603 | |||
Figure 1.Overlap of differentially expressed genes in the three cancer types obtained from (A) microarray, (B) RNA-Seq and (C) combined microarray and RNA-Seq dataset analysis.
Gene symbols of the common differentially expressed genes in breast, lung and prostate cancer.
| Gene symbol | Link to gene summary |
|---|---|
| ACSS3 | |
| ANGPT1 | |
| AOX1 | |
| BIRC5 | |
| CAV1 | |
| CAV2 | |
| CCDC69 | |
| CCDC85A | |
| CELF2 | |
| CFD | |
| CLU | |
| DPT | |
| EFEMP1 | |
| ERG | |
| EZH2 | |
| FAM107A | |
| FERMT2 | |
| FHL1 | |
| FXYD6 | |
| GLDN | |
| GPM6A | |
| GPM6B | |
| HSPB8 | |
| ID4 | |
| INMT | |
| IQGAP3 | |
| ITIH5 | |
| KCNAB1 | |
| KIF4A | |
| MAMDC2 | |
| MCAM | |
| MYH11 | |
| MYL9 | |
| MYLK | |
| NTRK2 | |
| NUSAP1 | |
| PCDH9 | |
| PGM5 | |
| PTRF | |
| SDPR | |
| STIL | |
| SYNPO2 | |
| TCEAL2 | |
| TIMP3 |
Figure 2.Enriched biological processes in differentially expressed genes as predicted by GENECODIS software analysis.
Figure 4.Enriched cellular components in differentially expressed genes as predicted by GENECODIS software analysis.
Figure 3.Enriched molecular functions in differentially expressed genes as predicted by GENECODIS software analysis.
Enriched KEGG pathways in differentially expressed genes as predicted by GENECODIS analysis.
| KEGG pathway | Class | Number of genes | P-value (adjusted) | Gene symbols |
|---|---|---|---|---|
| Regulation of actin cytoskeleton | Cellular processes; cell motility | 3 | 0.016092 | MYLK, IQGAP3, MYL9 |
| Vascular smooth muscle contraction | Organismal systems; circulatory system | 3 | 0.005475 | MYLK, MYH11, MYL9 |
| Focal adhesion | Cellular processes | 4 | 0.003144 | CAV2, MYLK, CAV1, MYL9 |
| Tight junction | Cellular processes | 2 | 0.039699 | MYH11, MYL9 |
| Bacterial invasion of epithelial cells | Human diseases; infectious diseases | 2 | 0.016007 | CAV2, CAV1 |
| Tryptophan metabolism | Metabolism; amino acid metabolism | 2 | 0.01113 | INMT, AOX1 |
| Viral myocarditis | Human diseases; Cardiovascular diseases | 2 | 0.015622 | CAV1, MYH11 |
KEGG, Kyoto Encyclopedia of Genes and Genomes.
TARGETgene results for differentially expressed gene ranking and their number of citations in all and individual cancer types.
| Rank | Gene symbol | Citation numbers for all cancers | Citation numbers for breast cancer | Citation numbers for prostate cancer | Citation numbers for lung cancer |
|---|---|---|---|---|---|
| 1 | MYLK | 4 | 3 | 1 | 0 |
| 2 | NTRK2 | 23 | 0 | 0 | 5 |
| 3 | CAV1 | 137 | 46 | 24 | 22 |
| 4 | MCAM | 22 | 3 | 6 | 1 |
| 5 | ANGPT1 | 35 | 3 | 0 | 3 |
| 6 | CAV2 | 24 | 6 | 4 | 2 |
| 7 | BIRC5 | 326 | 47 | 18 | 46 |
| 8 | EFEMP1 | 4 | 1 | 0 | 2 |
| 9 | EZH2 | 68 | 37 | 33 | 6 |
| 10 | HSPB8 | 14 | 3 | 1 | 2 |
| 11 | ERG | 35 | 0 | 67 | 2 |
| 12 | MYH11 | 16 | 1 | 1 | 0 |
| 13 | TIMP3 | 34 | 9 | 3 | 2 |
| 14 | MYL9 | 1 | 1 | 0 | 0 |
| 15 | SDPR | 2 | 0 | 0 | 0 |
| 16 | PGM5 | 1 | 0 | 0 | 0 |
| 17 | CLU | 48 | 11 | 18 | 8 |
| 18 | FHL1 | 5 | 1 | 1 | 0 |
| 19 | FXYD6 | 4 | 0 | 0 | 0 |
| 20 | KIF4A | 8 | 0 | 1 | 0 |
| 21 | KCNAB1 | 2 | 0 | 0 | 0 |
| 22 | GPM6A | 3 | 0 | 0 | 1 |
| 23 | CFD | 1 | 0 | 0 | 0 |
| 24 | FAM107A | 9 | 0 | 0 | 1 |
| 25 | PTRF | 3 | 1 | 1 | 0 |
| 26 | DPT | 3 | 0 | 0 | 0 |
| 27 | ID4 | 21 | 4 | 0 | 0 |
| 28 | FERMT2 | 4 | 1 | 0 | 2 |
| 29 | MAMDC2 | 4 | 0 | 0 | 0 |
| 30 | CCDC69 | 2 | 0 | 0 | 0 |
| 31 | IQGAP3 | 1 | 0 | 0 | 0 |
| 32 | PCDH9 | 3 | 1 | 0 | 0 |
| 33 | SYNPO2 | 7 | 0 | 3 | 0 |
| 34 | STIL | 21 | 0 | 0 | 1 |
| 35 | GLDN | 2 | 0 | 0 | 0 |
| 36 | CCDC85A | 1 | 0 | 0 | 0 |
| 37 | GPM6B | 4 | 0 | 0 | 0 |
| 38 | ITIH5 | 5 | 4 | 1 | 1 |
| 39 | AOX1 | 3 | 0 | 0 | 0 |
| 40 | NUSAP1 | 2 | 0 | 0 | 0 |
| 41 | ACSS3 | 1 | 0 | 0 | 0 |
| 42 | TCEAL2 | 1 | 0 | 0 | 0 |
| 43 | INMT | 3 | 0 | 0 | 1 |
Meta-analysis of differentially expressed genes in the six microarray datasets.
| Gene symbol | Probe ID | I2 (%) | Q | df | P-value |
|---|---|---|---|---|---|
| ANGPT1 | 205608_s_at | 96.10 | 129.21 | 5 | <0.0001 |
| AOX1 | 205083_at | 86.20 | 36.33 | 5 | <0.0001 |
| BIRC5 | 202095_s_at | 97.00 | 167.98 | 5 | <0.0001 |
| CAV1 | 212097_at | 91.80 | 60.9 | 5 | <0.0001 |
| CAV2 | 203323_at | 90.10 | 50.61 | 5 | <0.0001 |
| CDKN1C | 213348_at | 92.60 | 67.89 | 5 | <0.0001 |
| CFD | 205382_s_at | 95.90 | 120.82 | 5 | <0.0001 |
| CLU | 208791_at | 0.00 | 2.97 | 5 | 0.7051 |
| DPT | 213068_at | 76.10 | 20.93 | 5 | 0.0008 |
| DPT | 207977_s_at | 0.00 | 4.25 | 5 | 0.5133 |
| EFEMP1 | 201843_s_at | 1.10 | 5.05 | 5 | 0.4094 |
| ERG | 213541_s_at | 96.20 | 131.24 | 5 | <0.0001 |
| EZH2 | 203358_s_at | 97.00 | 164.74 | 5 | <0.0001 |
| FAM107A | 209074_s_at | 99.00 | 507.18 | 5 | <0.0001 |
| FERMT2 | 209209_s_at | 89.10 | 46 | 5 | <0.0001 |
| FHL1 | 210299_s_at | 86.80 | 37.87 | 5 | <0.0001 |
| FXYD6 | 217897_at | 27.10 | 6.86 | 5 | 0.2311 |
| GPM6A | 209469_at | 97.90 | 235.98 | 5 | <0.0001 |
| GPM6B | 209168_at | 86.10 | 35.99 | 5 | <0.0001 |
| HSPB8 | 221667_s_at | 65.50 | 14.47 | 5 | 0.0129 |
| ID4 | 209292_at | 0.00 | 3.43 | 5 | 0.6338 |
| KCNAB1 | 210078_s_at | 64.50 | 14.1 | 5 | 0.015 |
| KIF4A | 218355_at | 95.80 | 119.67 | 5 | <0.0001 |
| LAPTM4B | 208767_s_at | 96.90 | 163.18 | 5 | <0.0001 |
| MCAM /// MIR6756 | 210869_s_at | 0.00 | 4.61 | 5 | 0.4657 |
| MYH11 | 201496_x_at | 91.50 | 58.49 | 5 | 0.001 |
| MYL9 | 201058_s_at | 73.80 | 19.12 | 5 | 0.0018 |
| MYLK | 202555_s_at | 90.00 | 49.86 | 5 | <0.0001 |
| NTRK2 | 221796_at | 88.60 | 43.8 | 5 | <0.0001 |
| NUSAP1 | 218039_at | 97 | 177.64 | 5 | <0.0001 |
| PCDH9 | 219737_s_at | 89.30 | 46.86 | 5 | <0.0001 |
| PPAP2B | 212226_s_at | 0.00 | 4.65 | 5 | 0.4606 |
| PTRF | 208789_at | 82.20 | 28.16 | 5 | <0.0001 |
| STIL | 205339_at | 95.20 | 103.46 | 5 | <0.0001 |
| TCEAL2 | 211276_at | 76.30 | 21.11 | 5 | 0.0008 |
| TIMP3 | 201147_s_at | 86.80 | 37.77 | 5 | <0.0001 |
Figure 5.Forest plots illustrating the results from heterogeneity analysis for the (A) MCAM and (B) PPAP2B genes. MCAM, melanoma cell adhesion molecule; PPAP2B, phosphatidic acid phosphatase type 2B.
Figure 6.Forest plots illustrating the results from heterogeneity analysis for the (A) EFEMP1 and (B) ID4 genes. EFEMP1, EGF-containing fibulin extracellular matrix protein 1; ID4, inhibitor of DNA-binding 4.