| Literature DB >> 30197682 |
Bin Huang1, Ning Zhong2, Hongbao Cao3,4, Guiping Yu5.
Abstract
There have been hundreds of genes demonstrated to be associated with lung squamous cell carcinoma (LSCC), presenting various degrees of association with this disease. In the present study, gene vectors were investigated as genetic biomarkers for the diagnosis and personalized treatment of LSCC. A LSCC genetic database (LSCC_GD) was developed through literature-associated data analysis, where 260 LSCC target genes were curated. Subsequently, numerous associations between these genes and LSCC were studied. Following this, a sparse representation-based variable selection (SRVS) was employed for gene selection from two LSCC gene expression datasets, followed by a case/control classification. Results were compared using analysis of variance (ANOVA)-based gene selection approaches. Using SRVS, a gene vector was selected from each dataset, resulting in significantly higher classification accuracy (CR), compared with randomly selected genes (For datasets GSE18842 and GSE1987, CR=100 and 100% and permutation P=5.0×10-4 and 1.8×10-3, respectively). The SRVS method outperformed ANOVA in terms of the classification ratio. The results indicated that, for a given dataset, there may be a gene vector from the 260 curated LSCC genes that possesses significant prediction power. SRVS is effective in identifying the optimum gene subset target for personalized treatment.Entities:
Keywords: lung squamous cell carcinoma; non-small-cell lung carcinoma; sparse representation; variable selection
Year: 2018 PMID: 30197682 PMCID: PMC6126348 DOI: 10.3892/ol.2018.9241
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Figure 1.The LSCC_GD database schema. LSCC associated genes, drugs were collected from literature data mining. LSCC associated pathways were collected from Gene Set Enrichment Analysis, based on which a Gene-Gene interaction network was generated. The LSCC associated diseases and potential drugs were acquired using Sub-Network Enrichment Analysis. LSCC, lung squamous cell carcinoma.
Statistics of two gene expression datasets.
| NCBI GEO ID | GSE18842 | GSE1987 |
|---|---|---|
| #LSCC case/control | 46/45 | 17/9 |
| #Genes from LSCC_GD | 232 | 702 |
NCBI, National Center for Biotechnology Information; LSCC, lung squamous cell carcinoma.
Figure 2.The Gene-Gene Interaction Network composed of the 49/260 LSCC target genes from LSCC_GD. The edge weight between two node/genes represents the number of pathways shared by the two genes. The larger the size of a node, the higher the number of pathways (LSCC_GD→Related Pathways) that include this gene. The brighter the color, the higher the Fisher's centrality of the gene (number of other genes connected). The adjacency matrix is presented in the LSCC_GD GGI Network. LSCC, lung squamous cell carcinoma.
Leave-one-out cross validation and permutation results.
| GSE18842 (case/control: 46/45) | GSE1987 (case/control: 17/9) | |||
|---|---|---|---|---|
| Analysis results | SRVS | ANOVA | SRVS | ANOVA |
| Maximum CR | 100.00 | 98.90 | 100.00 | 96.15 |
| #Selected Genes | 17 | 24 | 10 | 2 |
| P-value | 5.0×10−4 | 4.9×10−2 | 1.8×10−3 | 2.0×10−3 |
| Unique genes from all datasets (%) | 94.11% (16/17) | 91.67% (22/24) | 90.00% (9/10) | 0.00% (0/2) |
| Overlapping genes of two methods (%) | 23.53% (4/17) | 16.67% (4/24) | 10.00% (1/10) | 50.00% (1/2) |
ANOVA, analysis of variance; CR, classification ratio; SRVS, sparse representation-based variable selection.
Figure 3.Comparison of different metrics through a leave-one-out crosses validation. Genes were ranked in ascending order according to SRVSScore or PValueScore for SRVS or analysis of variance method, respectively. SRVS, sparse representation-based variable selection; CR, classification ratio; SRVSScore, SRVS-generated weights; PValueScore, analysis of variance-generated P-value score.