| Literature DB >> 26944944 |
Arturo López Pineda1, Henry Ato Ogoe2, Jeya Balaji Balasubramanian3, Claudia Rangel Escareño4, Shyam Visweswaran5, James Gordon Herman6, Vanathi Gopalakrishnan7.
Abstract
BACKGROUND: Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations. However, the lack of cell architecture in these small tissue samples hampers the process of distinguishing between the two subtypes. Molecular profiling can also be used to discriminate between the two lung cancer subtypes, on condition that the biopsy is composed of at least 50 % of tumor cells. However, for some cases, the tissue composition of a biopsy might be a mix of tumor and tumor-adjacent histologically normal tissue (TAHN). When this happens, a new biopsy is required, with associated cost, risks and discomfort to the patient. To avoid this problem, we hypothesize that a computational method can distinguish between lung cancer subtypes given tumor and TAHN tissue.Entities:
Mesh:
Year: 2016 PMID: 26944944 PMCID: PMC4778315 DOI: 10.1186/s12885-016-2223-3
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Datasets and sample distributions
| Dataset Source | Tissue type | ADC | SCC |
|---|---|---|---|
| GEO: GDS3257 (gene expression) | Tumor | 58 | *** |
| TAHN | 49 | *** | |
| TCGA: LUAD+LUSC (gene expression) | Tumor | 32 | 153 |
| TAHN | *** | *** | |
| TCGA: LUAD+LUSC (DNA methylation) | Tumor | 65 | 132 |
| TAHN | 24 | 27 |
See challenge in Background on lack of TAHN tissue availability (***). GEO gene expression platform: Affymetrix Human Genome U133A Array (22,283 features), TCGA gene expression platform: Agilent 244 K Custom Gene Expression (17,814 features). TCGA methylation platform: Illumina Infinium HumanMethylation 27 k (27,578 features)
Fig. 1Cross-validation (10-folds) experimental design for a particular classification task, using feature selection and discretization. There are three outcomes: a simple naïve Bayesian model with its test evaluation; clustering of samples based on selected genes; and gene enrichment analysis. Algorithms: ReliefF, Limma, minimum description length principle cut (MDLPC). Evaluation: area under the receiver operating characteristic (AUC), 95 % confidence interval (CI), and Brier Skill Score (BSS)
AUC classification performance for different classification tasks
| Classification Task | Omic | Feature selection with ReliefF | Feature selection with Limma | ||||
|---|---|---|---|---|---|---|---|
| AUC | 95 % C.I. | BSS | AUC | 95 % C.I. | BSS | ||
| TAHNADC vs. TumorADC | G | 0.99 | 0.97–1.0 | 0.89 | 0.94 | 0.82–1.0 | 0.73 |
| M | 1.0 | 1.0–1.0 | 0.99 | 0.81 | 0.58–0.97 | 0.17 | |
| TAHNSCC vs. TumorSCC | M | 1.0 | 0.99–1.0 | 0.94 | 0.99 | 0.96–1.0 | 0.66 |
| TumorADC vs. TumorSCC | G | 0.89 | 0.83–0.96 | 0.29 | 0.90 | 0.89–0.9 | 0.81 |
| M | 0.97 | 0.94–0.99 | 0.71 | 0.89 | 0.74–1.0 | 0.38 | |
| TAHNADC vs. TAHNSCC | M | 1.0 | 1.0–1.0 | 0.92 | 1.0 | 1.0–1.0 | 0.99 |
| TAHN-TumorADC vs. TAHN-TumorSCC | M | 0.92 | 0.89–0.95 | 0.42 | 0.94 | 0.87–1.0 | 0.56 |
G: gene expression, M: DNA methylation. The Brier Skill Score is a measurement of calibration of the classifier. A positive value on the BSS means that the classifier is well calibrated. A baseline classification is the work by Chang and Ramoni [22] which obtained an accuracy of 0.95 in the classification task TumorADC vs. TumorSCC
Fig. 2Heatmaps for classification task a TAHNADC vs. TAHNSCC, b TumorADC vs. TumorSCC and c TAHN-TumorADC vs. TAHN-TumorSCC using the ReliefF feature selection algorithm. In the vertical axis the corresponding methylation site and gene symbol (in parenthesis) are shown. Some methylation sites do not lie in a particular gene, therefore, no symbol is provided. When multiple methylation sites are selected for the same gene, these sites should have similar methylation intensity, for it to be included. In the horizontal axis, a color-coded representation of the tissue samples is provided. Two distinct groups are observed in all three heatmaps. Cluster purity (accuracy by classification using clustering) for each task is calculated to be 1.0, 0.94 and 0.85 respectively
Genes selected for the classification task of TAHN-TumorADC Vs. TAHN-TumorSCC
| Gene Symbol | Gene Name | Known Literature Evidence to Cancer |
|---|---|---|
| ST18 | suppression of tumorigenicity 18, zinc finger | Yes [ |
| CSTA | cystatin A (stefin A) | Yes [ |
| LPP | LIM domain containing preferred translocation partner in lipoma | Yes [ |
| CROT | carnitine O-octanoyltransferase | Yes [ |
| BDKRB1 | bradykinin receptor B1 | Yes [ |
| AKR1B10 | aldo-keto reductase family 1, member B10 (aldose reductase) | Yes [ |
| TP73 | tumor protein p73 | Yes [ |
| EFCAB3 | EF-hand calcium binding domain 3 | Yes |
| RREB1 | ras responsive element binding protein 1 | Yes [ |
| HIST1H4G | histone cluster 1, H4g | No |
| STAR | steroidogenic acute regulatory protein | Yes |
| ACSBG2 | acyl-CoA synthetase bubblegum family member 2 | Yes [ |
| DQX1 | DEAQ box RNA-dependent ATPase 1 | Yes [ |
| AQP10 | aquaporin 10 | Yes [ |
| PLEKHA6 | pleckstrin homology domain containing, family A member 6 | Yes [ |
| GCSAM | germinal center-associated, signaling and motility | No |
| WFDC5 | WAP four-disulfide core domain 5 | Yes |
| KRT7 | keratin 7, type II | Yes [ |
| DCST2 | DC-STAMP domain containing 2 | Yes [ |
| CALML3 | calmodulin-like 3 | Yes |
| ACAP3 | ArfGAP with coiled-coil, ankyrin repeat and PH domains 3 | Yes |
| LRRC17 | leucine rich repeat containing 17 | Yes [ |
| TRIM29 | tripartite motif containing 29 | Yes [ |
| CXCR2 | chemokine (C-X-C motif) receptor 2 | Yes [ |
| HOXD9 | homeobox D9 | Yes [ |
| COL17A1 | collagen, type XVII, alpha 1 | Yes [ |
| LMO3 | LIM domain only 3 (rhombotin-like 2) | Yes |
The list of genes is ordered by their ranks, as selected by ReliefF for the classification task of TAHN-TumorADC Vs. TAHN-TumorSCC. The Entrez gene symbol, and the gene name are listed in the first two columns respectively. The ‘Known Literature Evidence to Cancer’ indicates if links to cancer were detected by the IPA® software. Citations are provided to literature indicating links to adenocarcinoma, squamous-cell carcinoma and carcinoma in lung
Fig. 3Gene interaction network generated by the IPA® software. It shows an analysis of the genes found by ReliefF in the classification task TAHN-TumorADC vs TAHN-TumorSCC. Three diseases are being shown (carcinoma of the lung, adenocarcinoma and squamous cell carcinoma), and the selected genes from our analysis were connected to these diseases via literature evidence that indicates: direct interactions (straight line), or indirect interactions (dashed line). Some of those interactions have arrow-heads indicating causation (e.g. BDKRB1). An arrow-head with a bar (i.e., TP73) indicates inhibition