| Literature DB >> 26808319 |
Ram Bhupal Reddy1,2,3, Anupama Rajan Bhat4, Bonney Lee James1,2, Sindhu Valiyaveedan Govindan2, Rohit Mathew1, D R Ravindra1,2, Naveen Hedne2, Jeyaram Illiayaraja5, Vikram Kekatpure2, Samanta S Khora3, Wesley Hicks6,7, Pramila Tata4, Moni A Kuriakose1,2,7, Amritha Suresh1,2.
Abstract
The head and neck squamous cell carcinoma (HNSCC) transcriptome has been profiled extensively, nevertheless, identifying biomarkers that are clinically relevant and thereby with translational benefit, has been a major challenge. The objective of this study was to use a meta-analysis based approach to catalog candidate biomarkers with high potential for clinical application in HNSCC. Data from publically available microarray series (N = 20) profiled using Agilent (4X44K G4112F) and Affymetrix (HGU133A, U133A_2, U133Plus 2) platforms was downloaded and analyzed in a platform/chip-specific manner (GeneSpring software v12.5, Agilent, USA). Principal Component Analysis (PCA) and clustering analysis was carried out iteratively for segregating outliers; 140 normal and 277 tumor samples from 15 series were included in the final analysis. The analyses identified 181 differentially expressed, concordant and statistically significant genes; STRING analysis revealed interactions between 122 of them, with two major gene clusters connected by multiple nodes (MYC, FOS and HSPA4). Validation in the HNSCC-specific database (N = 528) in The Cancer Genome Atlas (TCGA) identified a panel (ECT2, ANO1, TP63, FADD, EXT1, NCBP2) that was altered in 30% of the samples. Validation in treatment naïve (Group I; N = 12) and post treatment (Group II; N = 12) patients identified 8 genes significantly associated with the disease (Area under curve>0.6). Correlation with recurrence/re-recurrence showed ANO1 had highest efficacy (sensitivity: 0.8, specificity: 0.6) to predict failure in Group I. UBE2V2, PLAC8, FADD and TTK showed high sensitivity (1.00) in Group I while UBE2V2 and CRYM were highly sensitive (>0.8) in predicting re-recurrence in Group II. Further, TCGA analysis showed that ANO1 and FADD, located at 11q13, were co-expressed at transcript level and significantly associated with overall and disease-free survival (p<0.05). The meta-analysis approach adopted in this study has identified candidate markers correlated with disease outcome in HNSCC; further validation in a larger cohort of patients will establish their clinical relevance.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26808319 PMCID: PMC4726811 DOI: 10.1371/journal.pone.0147409
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Meta-analysis work flow.
The publically available raw microarray data of Head and Neck Squamous Cell Carcinoma (HNSCC) series were downloaded and grouped and analyzed in Genespring statistical software. Normalization was carried out on the samples, which were then grouped into tumor and normal prior to performing the gene level experiment. Principal Component Analysis (PCA) was performed to remove the discordant samples. Fold change and p value were calculated to obtain the significant gene entities. These statistically significant entities were used for further analysis; database annotations (Gene Ontology, STRING and miRWALK) and patient validation (Experimental validation by Quantitative Real Time PCR (qPCR) and The Cancer Genome Atlas (TCGA)).
Details of series used in the Meta-analysis.
| S.NO | PUBLIC DATA SETS | ARRAY PLATFORM | SITE DETAILS | GENES IDENTIFIED | GENES VALIDATED | VALIDATION METHOD | PUBMED ID |
|---|---|---|---|---|---|---|---|
| 1 | HG-U133_Plus_2 | OSCC | - | - | - | - | |
| 2 | HG-U133_Plus_2 | HNSCC | - | - | - | PMID: 22234739 | |
| 3 | HG-U133_Plus_2 | HNSCC | - | - | - | PMID: 19117988, PMID: 16467079, | |
| 4 | HG-U133_Plus_2 | BUCCAL MUCOSA | 41 | S100A7, CYP1B1, CYP1A1, CD207, CHRNA3, NQO1, PTGES, AHRR, CD1a, LEPR, IGF2BP3 | qPCR, IHC, WESTERN BLOTTING | PMID: 20179299 | |
| 5 | HG-U133_Plus_2 | OSCC | 131 | LAMC2, COL4A1, COL1A1, and PADI1 | qPCR | PMID: 18669583 | |
| 6 | HG-U133_Plus_2 | HNSCC | 91 | TAF7L, CDKN2A, SYCP2, RFC4, and NAP1L2 | qPCR | PMID: 16943533, PMID: 16467079, | |
| 7 | HG-U133_Plus_2 | TONGUE | 35 | IL8 and MMP9 | qPCR, IHC | PMID: 18254958 | |
| 8 | GSE6791 | HG-U133_Plus_2 | HNSCC, CERVICAL CANCER SAMPLES | - | SYCP2 and TCAM1 | qPCR, IHC, WESTERN BLOT | PMID: 17510386 |
| 9 | GSE7224 | HG-U133_Plus_2 | ORAL EPITHELIA, TONSIL | - | CXCR4, CCR5, CD19, CD3, defensin-β1, defensin-β4, SLPI, ICAM-3, CD4 | IHC | PMID: 17620369 |
| 10 | GSE9600 | HG-U133_Plus_2 | HNSCC | - | - | - | PMID: 20652976 |
| 11 | GSE16149 | HG-U133_Plus_2 | BUCCAL MUCOSA | - | - | - | PMID: 20576139 |
| 12 | GSE31056 | HG-U133_Plus_2 | TONGUE | 139 | MMP1, COL4A1, P4HA2, and THBS2 | qPCR | PMID: 21989116 |
| 13 | GSE45153 | HG-U133_Plus_2 | HNSCC | - | - | - | PMID: 23981300 |
| 14 | HG-U133A | OSCC | 116 | CXCR4 | qPCR, IHC | PMID: 15558013 | |
| 15 | HG-U133A | LARYNGEAL CANCER | 30 | ACE2, DHTKD1, FLOT1, MAP4K1, NEK2, SFRS8, PRKD1, TBC1D4, TGOLN2, YTHDC2, | qPCR | PMID: 23950933 | |
| 16 | HG-U133A | OSCC | 53 | LGALS1, MMP1, LAGY, and KRT4 | qPCR | PMID: 15381369 | |
| 17 | HG-U133A | BUCCAL MUCOSA, NASAL EPITHELIUM | 314 | CEACAM5, CYP4F11 and S100P | qPCR | PMID: 18513428 | |
| 18 | HG-U133A_2 | HNSCC | RAB25, THBS1 and DUOX1 | qPCR | PMID: 22696598 | ||
| 19 | HG-U133A_2 | BUCCAL MUCOSA, NASAL EPITHELIUM | 314 | CEACAM5, CYP4F11 and S100P | qPCR | PMID: 18513428 | |
| 20 | Agilent-014850 / 4x44K G4112F | OSCC | 315 | SPP1, CA9, HOXC9, TNFRSF12A, LY6K, INHBA, FST, MFAP5 and DHRS2,MAL, TSN, SLC4A1AP, GPX3 | qPCR, IHC | PMID: 22072328 | |
| 21 | Agilent-014850 / 4x44K G4112F | OSCC, NORMAL, DYSPLASIA | - | - | - | PMID: 24035722 |
* Only treatment Naïve and HNSCC samples were taken from these datasets for the analysis
# This series has samples analyzed by two different technologies
OSCC: Oral Squamous Cell Carcinoma
IHC: Immunohistochemistry
Fig 2Identification of Protein-Protein Interaction.
Analysis for protein-protein interaction by STRING network identified two major interconnecting clusters with high degree interactions between the genes (N = 122). These 2 major clusters were interconnected by the nodes MYC, FN1, FOS and HSPA4. The number of lines represent the levels of evidence as indicated in the color legend. The different sizes of the node are based on the extent of protein structural information available for each gene while the colors of the node are a visual aid used for better representation. The markers from this analysis selected for patient validation are encircled.
List of top 20 genes based on the percentage alterations in TCGA.
| S. No | GENE SYMBOL | Chromosome Location | Alteration in Gene Expression (N = 498) | Agilent | U133 PLUS 2 | U133A | U133A_2 | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of Cases altered | % | p-value | FC (abs) | p-value | FC (abs) | p-value | FC (abs) | p-value | FC (abs) | |||
| 1 | FADD | 11q13.3 | 161 | 32 | 0.000578044 | 2.08 | 2.65787E-11 | 5.49 | 2.28365E-10 | 2.12 | 1.36636E-10 | 2.55 |
| 2 | ECT2 | 3q26.1-q26.2 | 146 | 29 | 8.42812E-12 | 4.27 | 3.40149E-21 | 3.92 | 1.4E-45 | 5.54 | 4.03399E-16 | 14.62 |
| 3 | NCBP2 | 3q29 | 129 | 26 | 7.30899E-11 | 2.08 | 1.25625E-19 | 11.86 | 3.04538E-14 | 4.21 | 3.38459E-18 | 2.68 |
| 4 | ANO1 | 11q13.3 | 122 | 24 | 4.76112E-09 | 17.91 | 1.57666E-06 | 2.78 | 4.68936E-20 | 3.33 | 1.27457E-21 | 15.26 |
| 5 | EXT1 | 8q24.11 | 122 | 24 | 1.41097E-17 | 6.82 | 5.41787E-13 | 5.53 | 3.76146E-12 | 4.03 | 2.8362E-15 | 3.21 |
| 6 | NDRG1 | 8q24.3 | 97 | 19 | 1.4748E-17 | 20.44 | 1.18762E-09 | 31.84 | 0.000021852 | 2.14 | 2.08484E-09 | 4.83 |
| 7 | TP63 | 3q28 | 96 | 19 | 5.15547E-23 | 13.19 | 3.14883E-18 | 8.36 | 9.07143E-32 | 3.79 | 4.49354E-13 | 3.9 |
| 8 | UBE2V2 | 8q11.21 | 87 | 17 | 2.45206E-12 | 3.72 | 1.1042E-12 | 10.24 | 6.6063E-09 | 4.7 | 7.18068E-10 | 5.61 |
| 9 | KLF10 | 8q22.2 | 78 | 16 | 1.08314E-06 | 2.03 | 1.00163E-12 | 17.06 | 1.7561E-07 | 7.28 | 4.66826E-15 | 4.77 |
| 10 | YES1 | 18p11.31-p11.21 | 77 | 15 | 1.40716E-12 | 5.28 | 3.07102E-09 | 3.28 | 1.62405E-05 | 2.85 | 2.87721E-07 | 2.38 |
| 11 | MRPL3 | 3q21-q23 | 70 | 14 | 8.47739E-13 | 2.32 | 2.38555E-29 | 145.69 | 2.53596E-09 | 26.54 | 4.99042E-08 | 2.19 |
| 12 | TRIB1 | 8q24.13 | 70 | 14 | 3.69935E-12 | 3.5 | 2.05752E-06 | 4.96 | 0.005448941 | 2.71 | 2.13018E-09 | 2.54 |
| 13 | WDYHV1 | 8q24.13 | 70 | 14 | 6.85544E-07 | 2.13 | 1.13573E-06 | 3.97 | 4.51362E-13 | 2.28 | 1.40299E-08 | 3.13 |
| 14 | EIF2S1 | 14q23.3 | 65 | 13 | 2.23584E-06 | 2.57 | 1.22031E-09 | 5.45 | 2.22462E-09 | 3.15 | 1.99808E-21 | 2.59 |
| 15 | FSCN1 | 7p22 | 67 | 13 | 4.37413E-18 | 21.28 | 1.91554E-10 | 3.49 | 1.0799E-36 | 6.68 | 2.80616E-15 | 4.66 |
| 16 | CLDN1 | 3q28-q29 | 61 | 12 | 0.02748958 | 2.65 | 3.61899E-38 | 48.07 | 4.25964E-10 | 2.17 | 0.001202747 | 2.07 |
| 17 | FUBP3 | 9q34.11 | 59 | 12 | 7.18473E-14 | 2.19 | 4.82326E-13 | 5.29 | 1.59966E-18 | 4.67 | 1.92657E-07 | 3.26 |
| 18 | SNAI2 | 8q11 | 59 | 12 | 9.67078E-17 | 17.59 | 1.61049E-19 | 23.56 | 6.20397E-17 | 14.27 | 3.2735E-14 | 14.92 |
| 19 | GPR87 | 3q24 | 55 | 11 | 2.89582E-12 | 10.47 | 9.98395E-13 | 24.53 | 7.94468E-07 | 5.87 | 8.74107E-16 | 4.53 |
| 20 | MRPL13 | 8q22.1-q22.3 | 56 | 11 | 9.44509E-15 | 2.94 | 6.90618E-18 | 25.09 | 6.74529E-18 | 8.11 | 6.46009E-09 | 2.56 |
Fig 3Validation of the markers in patients.
Quantitative gene expression profiling of the selected markers was carried out in Group I (primary; A) and the Group II (recurrent; B) cohort. PLAC8 and UBE2V2 were validated in all the samples (100%) of Group I with regard to regulation trends whereas other genes showed similar trend in >60% of the samples. In Group II, >60% of the patients showed concordant regulation trends for four genes. Based on the patient follow-up, the Group I was sub-categorized into non-recurrent (C) and recurrent (D) and the expression was further evaluated. ROC curve analysis in the Group I patients showed that PLAC8 (E), FOS (F), ANO1 (G) and UBE2V2 (H) had highest association with the disease (AUC >0.8). Bar represents the median fold change of Normals.
Fig 4Validation with the TCGA database.
The selected markers were analyzed in the TCGA database for the co-expression, overall survival and disease free survival for their significance in the HNSCC TCGA provisional study. ANO1 and FADD showed highest correlation in the co-expression analysis (A) with Pearson’s and Spearman’s correlation (0.68). ANO1 and FADD were further analyzed for their overall survival (OS) (B and D) and Disease free survival (DFS) (ANO1; C). Patients with ANO1 over-expression showed low median survival (18.96 vs 56.44 months; p = 0.0003) and low DFS (20.04 vs 53.09 months; p = 0.02) when compared with the cohort without alterations (B and C). FADD showed association with OS wherein low median survival (21.48 vs 57.42; p = 0.002) was observed in patients with an upregulation of the gene (D). Both ANO1 and FADD when assessed in combination, were associated with low median survival (21.48 vs 57.88; p = 0.0007) (E) and disease free survival (25.72 vs 53.09; p = 0.04) (F) in altered cases when compared to cases without alterations.