Literature DB >> 29257283

Construction of a 26‑feature gene support vector machine classifier for smoking and non‑smoking lung adenocarcinoma sample classification.

Lei Yang¹, Lu Sun², Wei Wang¹, Hao Xu¹, Yi Li¹, Jia-Ying Zhao¹, Da-Zhong Liu¹, Fei Wang¹, Lin-You Zhang¹.

Abstract

The present study aimed to identify the feature genes associated with smoking in lung adenocarcinoma (LAC) samples and explore the underlying mechanism. Three gene expression datasets of LAC samples were downloaded from the Gene Expression Omnibus database through pre‑set criteria and the expression data were processed using meta‑analysis. Differentially expressed genes (DEGs) between LAC samples of smokers and non‑smokers were identified using limma package in R. The classification accuracy of selected DEGs were visualized using hierarchical clustering analysis in R language. A protein‑protein interaction (PPI) network was constructed using gene interaction data from the Human Protein Reference Database for the DEGs. Betweenness centrality was calculated for each node in the network and genes with the greatest BC values were utilized for the construction of the support vector machine (SVM) classifier. The dataset GSE43458 was used as the training dataset for the construction and the other datasets (GSE12667 and GSE10072) were used as the validation datasets. The classification accuracy of the classifier was tested using sensitivity, specificity, positive predictive value, negative predictive value and area under curve parameters with the pROC package in R language. The feature genes in the SVM classifier were subjected to pathway enrichment analysis using Fisher's exact test. A total of 347 genes were identified to be differentially expressed between samples of smokers and non‑smokers. The PPI network of DEGs were comprised of 202 nodes and 300 edges. An SVM classifier comprised of 26 feature genes was constructed to distinguish between different LAC samples, with prediction accuracies for the GSE43458, GSE12667 and GSE10072 datasets of 100, 100 and 94.83%, respectively. Furthermore, the 26 feature genes that were significantly enriched in 9 overrepresented biological pathways, including extracellular matrix‑receptor interaction, proteoglycans in cancer, cell adhesion molecules, p53 signaling pathway, microRNAs in cancer and apoptosis, were identified to be smoking‑related genes in LAC. In conclusion, an SVM classifier with a high prediction accuracy for smoking and non‑smoking samples was obtained. The genes in the classifier may likely be the potential feature genes associated with the development of patients with LAC who smoke.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2017 PMID： 29257283 PMCID： PMC5783520 DOI： 10.3892/mmr.2017.8220

Source DB: PubMed Journal: Mol Med Rep ISSN： 1791-2997 Impact factor: 2.952

Introduction

Lung cancer is the most common cause of cancer-associated fatality in men and the second most common in women (1). The 5-year survival rate following diagnosis of lung cancer is 15.6%, making it one of the worst prognostic malignant tumors (2). The survival rate is lower compared with breast, colon and prostate cancer (2). Cigarette smoking is responsible for ~90% of lung cancer incidences and leads to decreased survival rates (3). The major histological types of lung cancer include adenocarcinoma, squamous cell carcinoma, large cell carcinoma and small cell carcinoma. The incidence of lung adenocarcinoma (LAC) increased gradually and this lung cancer has been the most frequently occurring histological type in most parts of the world in recent years (4). Adenocarcinoma account for ~40% of all lung cancer cases (5). Smoking is a major cause of lung adenocarcinoma (6). However, the causes of the increase in adenocarcinomas are not clear. Sequencing data from large-scale databases, such as The Cancer Genome Atlas, have aided in identification of novel factors and potentially targetable alterations in lung adenocarcinomas (7). A number of smoking-associated genes have been revealed in LAC, including the cyclin D1 A870 G gene, and polymorphisms of this gene have been indicated to modulate smoking-induced lung cancer risk (8). Estrogen receptor α promotes smoking carcinogen-induced lung carcinogenesis via cytochrome P450 1B1 (9). The interactions between smoking, polymorphisms of human 8-oxoguanine DNA glycosylase and p53 are associated with the development of lung cancer (10). Interactions between smoking, fragile histidine triad gene alterations (11) and excision repair cross-complementation group 1 polymorphisms (12) have also been reported in lung cancer. However, the recognized genetic changes in patients with LAC who are smokers remain to be elucidated and further studies are necessary to determine the underlying molecular mechanism of smoking-induced LAC. A recent study has aimed to identify smoking-associated genes via the differential analysis of RNA sequencing data (13). The study analyzed two datasets with only two samples and identified 1,603 differentially expressed genes (DEGs). The authors also identified that the possible alternative splicing of gene FCGBP may have an impact on lung cancer. However, the small sample size could lead to low reliability of the results. In the present study, three gene expression datasets of smokers and non-smokers with LAC (>50 samples/group) were obtained and DEGs were identified using meta-analysis. A protein-protein interaction (PPI) network of the DEGs was constructed with the betweenness centrality (BC) analysis for the selection of feature genes. Using the feature genes, a support vector machine (SVM) classifier, which is able to distinguish between samples from smokers and non-smokers with a high classification accuracy, was constructed. The feature genes in the SVM classifier were considered as the smoking-related genes in LAC and enrichment analysis was conducted to identify significant pathways.

Materials and methods

Gene expression data

To collect gene expression data from patients with LAC who smoke or do not smoke, the Gene Expression Omnibus (GEO; www.ncbi.nlm.nih.gov/geo/) database was used and the key words ‘lung adenocarcinoma’, ‘Homo sapiens’ and ‘smoke’ were searched. The following inclusion criteria were used to extract the corresponding datasets: i) They were gene expression data; ii) they were from LAC samples; iii) information concerning smoking was described; and iv) ≥50 samples were included in each dataset. A total of three datasets were collected from the GEO database, including GSE43458 (14), GSE10072 (15) and GSE12667 (16) (Table I).

Table I.

Data of the three collected gene expression datasets.

Accession number	Platform	Total samples (n)	Non-smokers (n)	Smokers (n)
GSE43458	HuGene-1_0-st-v1	110	40	40
GSE10072	HG-U133A	107	16	42
GSE12667	HG-U133_Plus_2	75	8	43

Raw data in these three datasets were analyzed with the affy package in R 3.2.1 (http://bioconductor.org/packages/release/bioc/html/affy.html) (17). Probes were subsequently mapped into genes. Probes corresponding to one gene were averaged as the final expression value of the gene. Normalization was performed with package limma (18) of R to conduct the analysis of the datasets.

Screening of DEGs

Meta-analysis was used to enforce the analytical reliability for gene expression data by combining data from different datasets. DEGs associated with smoking in the three gene expression datasets were screened via meta-analysis using the MetaDE.ES package of R (19). The method tested the heterogeneity of gene expression value from three datasets with three statistic parameters: Tau2, Q-value and Qpval. Subsequently, differential expression of genes between smoking and non-smoking samples was assessed by determining the P-value and false discovery rate (FDR). To determine the DEGs associated with smoking, tau2=0, Qpval >0.05 and FDR <0.05 were set as the cut-off points. Bidirectional clustering analysis using the pheatmap package in R language (https://cran.r-project.org/web/packages/pheatmap/index.html), which was based on the euclidean distance calculations for gene expression values, was also conducted to examine whether the selected DEGs were able to distinguish different samples, as described previously (20).

Construction of PPI network

To investigate the interactions of DEGs, the DEGs were mapped to the PPI database using the Human Protein Reference Database (21). The interactions of DEGs obtained were constructed into a PPI network with the proteins that were connected with at least three DEGs. The network was visualized with Cytoscape (22).

Calculation of BC

Feature genes that function as hub nodes in the PPI network were screened using a BC algorithm (23). BC represented the degree of node in the network and was calculated as follows: Where σst is the total number of shortest paths from node s to node t; σst(ν) is the number of shortest paths from s to t going through v; BC scores were between 0 and 1, and a higher BC score indicated a higher degree of the node.

Training and validation of SVM classifier

SVM classifier comprises of feature genes that distinguishes between different samples (24,25). To construct the SVM classifier, one of the downloaded datasets, GSE43458 (containing 40 non-smokers and 40 smokers) was selected as the training dataset basing on the top 10, 20, 30, 40 and 50 feature genes ranked by BC scores. The feature genes in the SVM classifier that could exactly distinguish between different samples in GSE42458 were subjected to two-way clustering analysis using pheatmap package in R 3.1.4 (https://cran.r-project.org/web/packages/pheatmap/index.html). Sample similarity matrices were also obtained by computing the Pearson's correlation coefficients of these genes using Cor package in R 3.1.4 (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html) and top 50 genes were selected for further analysis. The clustering and similarity matrices were visualized using heatmaps in pheatmap package in R 3.1.4 (https://cran.r-project.org/web/packages/pheatmap/index.html). The SVM classifier was validated with two independent datasets, GSE10072 and GSE12667. Sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and area under curve (AUC) were calculated using the pROC package in R language (https://cran.r-project.org/web/packages/pROC/index.html) to examine the classification accuracy of the SVM classifier as described previously (26,27).

Pathway enrichment analysis

Feature gene-related Kyoto Encyclopedia of Genes and Genomes pathways (http://www.genome.jp/kegg/) were revealed using Fisher's exact test as follows: Where N represented the total number of genes; M represented the number of genes in the pathway; and K indicated the number of feature genes.

Results

DEGs

A total of 12,476 genes were in the three gene expression datasets, and according to the set criteria, 347 DEGs between smoking and non-smoking LAC samples were identified. The top 10 DEGs ranked by FDR are listed in Table II. As indicated in Fig. 1, the 347 DEGs distinguished the samples of smokers from the non-smokers.

Table II.

Top 10 candidate feature genes by FDR.

ID	P-value	FDR	Qpval	Qval	Expression
ABCB11	1.52×10⁻⁰⁵	5.59×10⁻⁰⁴	9.27×10⁻⁰¹	8.28×10⁻⁰³	Up
ABCB6	2.27×10⁻⁰³	2.23×10⁻⁰²	9.63×10⁻⁰¹	2.18×10⁻⁰³	Up
ABCC2	2.11×10⁻⁰³	2.11×10⁻⁰²	9.51×10⁻⁰¹	3.83×10⁻⁰³	Up
ABCG5	4.01×10⁻⁰⁶	2.10×10⁻⁰⁴	9.00×10⁻⁰¹	1.58×10⁻⁰²	Up
ACD	4.81×10⁻⁰⁶	2.38×10⁻⁰⁴	9.40×10⁻⁰¹	5.71×10⁻⁰³	Up
ADAMTS5	1.88×10⁻⁰⁴	3.79×10⁻⁰³	9.10×10⁻⁰¹	1.26×10⁻⁰²	Up
AGT	2.89×10⁻⁰³	2.64×10⁻⁰²	8.47×10⁻⁰¹	3.71×10⁻⁰²	Up
AIM1L	1.00×10⁻²⁰	7.47×10⁻¹⁹	8.26×10⁻⁰¹	4.84×10⁻⁰²	Up
AKAP6	1.15×10⁻⁰⁴	2.69×10⁻⁰³	9.82×10⁻⁰¹	5.08×10⁻⁰⁴	Down
ALPL	5.01×10⁻⁰³	3.88×10⁻⁰²	9.06×10⁻⁰¹	1.40×10⁻⁰²	Down

FDR, false discovery rate; UP, upregulation in smokers; DOWN, downregulation in smokers.

Figure 1.

Hierarchical clustering results of lung adenocarcinoma samples from smokers and non-smokers according to the 347 differentially expressed genes. x-axis represents samples, in which samples of smokers are in purple whereas samples of non-smokers are in green; y-axis represents differentially expressed genes.

PPI network

A PPI network containing 202 nodes (genes) and 300 edges (connection between nodes) was obtained (Fig. 2). The proteins that were connected with ≥3 DEGs were also included in the PPI network. Degree distribution of genes in the network is indicated in Fig. 3. Similar to biological networks, the PPI network was scale-free, with the majority of genes (80 genes) exhibiting small degrees (Log transformed degree <1) and few genes (only 5) exhibiting larger degrees (Log transformed degree between 3 and 4). The genes with high degrees were hub genes, indicating their roles in the development of smoking-associated LAC.

Figure 2.

PPI network of differentially expressed genes identified between lung adenocarcinoma samples of smokers and non-smokers. Differentially expressed genes were differentially expressed in samples of smokers compared with samples of non-smokers. Upregulated genes are marked in orange, downregulated genes are marked in blue. Non-differentially expressed genes that interacted with ≥3 differentially expressed genes were also included in the PPI network. Non-differentially expressed genes are marked in green. PPI, protein-protein interaction.

Figure 3.

Degree distribution of nodes (genes) in the protein-protein interaction network of differentially expressed genes. x-axis indicates the Log transformed degree and y-axis indicates the number of nodes.

Feature genes

BC was calculated for each node in the PPI network. The top 10 genes by BC value were considered as the feature genes, including high mobility group box 1 (HMGB1); dynein light chain LC8-type 1; tubulin α 4a; 14-3-3 protein γ; tyrosine 3-monooxygenase; spectrin β, non-erythrocytic 1; ubiquilin 4; DNA methyltransferase 1 (DNMT1); enhancer of zeste 2 polycomb repressive complex 2 subunit (EZH2) and glucocorticoid modulatory element binding protein 1 (Table III).

Table III.

Top 10 genes ranked using BC.

Gene	BC	Expression	Degree	P-value	FDR	Qpval	Qval
HMGB1	1.98×10⁻⁰¹	Down	11	1.63×10⁻⁰³	1.76×10⁻⁰²	8.95×10⁻⁰¹	1.74×10⁻⁰²
DYNLL1	1.77×10⁻⁰¹	Up	15	1.00×10⁻²⁰	7.47×10⁻¹⁹	8.92×10⁻⁰¹	1.83×10⁻⁰²
TUBA4A	1.37×10⁻⁰¹	Up	10	2.08×10⁻⁰⁵	7.32×10⁻⁰⁴	8.50×10⁻⁰¹	3.56×10⁻⁰²
YWHAG	1.20×10⁻⁰¹	−	11	9.86×10⁻⁰¹	9.95×10⁻⁰¹	5.38×10⁻⁰⁶	2.07×10
YWHAQ	1.07×10⁻⁰¹	Up	10	2.40×10⁻⁰⁴	4.55×10⁻⁰³	8.46×10⁻⁰¹	3.77×10⁻⁰²
SPTBN1	1.04×10⁻⁰¹	Down	7	9.39×10⁻⁰⁴	1.23×10⁻⁰²	8.53×10⁻⁰¹	3.43×10⁻⁰²
UBQLN4	1.00×10⁻⁰¹	−	7	9.31×10⁻⁰¹	9.67×10⁻⁰¹	5.95×10⁻⁰²	3.55×10⁰
DNMT1	9.98×10⁻⁰²	Up	7	1.20×10⁻⁰⁴	2.77×10⁻⁰³	8.61×10⁻⁰¹	3.05×10⁻⁰²
EZH2	8.51×10⁻⁰²	Up	8	2.65×10⁻⁰³	2.48×10⁻⁰²	8.44×10⁻⁰¹	3.89×10⁻⁰²
GMEB1	8.45×10⁻⁰²	Down	8	2.31×10⁻⁰⁴	4.40×10⁻⁰³	9.32×10⁻⁰¹	7.20×10⁻⁰³

BC, betweenness centrality; UP, upregulation in smokers; DOWN, downregulation in smokers; -, no significant difference in expression; FDR, false discovery rate.

SVM classifier

Feature genes with the greatest BC values were used to construct the SVM classifier basing on dataset GSE43458. There were 8, 11, 14, 16, 18, 20, 22 and 26 feature genes in the top 10, 15, 20, 25, 30, 35, 40 and 50 genes, respectively. The training process is indicated in Fig. 4. The accuracy of the classifier reached 100% when the 26 feature genes in the top 50 were included. Therefore, the classifier comprised by these 26 feature genes were chosen as the final SVM classifier. These feature genes included Cbl proto-oncogene B (CBLB), DNMT1, EZH2, HMGB1, integrin α-5 (ITGA5), MDK, protein kinase C ι (PRKCI) and sprouty receptor tyrosine kinase signaling antagonist 2 (SPRY2).

Figure 4.

Predictive accuracy and error ratios of support vector machine classifier with different numbers of feature genes. Accuracy is indicated in light gray whereas error rate is indicated in dark gray.

Hierarchical clustering was performed for samples from the training dataset using the 26 feature genes (Fig. 5). The classifier separated samples of smokers from samples of non-smokers in dataset GSE43458 (Fig. 6A).

Figure 5.

Hierarchical clustering result from samples from smokers and non-smokers with lung adenocarcinoma using the 26 feature genes. x-axis represents samples, in which smokers were marked in orange and non-smokers were marked in purple; y-axis represents the 26 feature genes.

Figure 6.

Scatter plots of the support vector machine classifier on three microarray datasets. (A) GSE43458, (B) GSE12667, and (C) GSE10072 microarray datasets were indicated. Smokers are marked in red and non-smokers are marked in green.

The SVM classifier was validated using dataset GSE12667 and GSE10072. The classification accuracy in GSE12667 was 100% (Fig. 6B). In GSE10072, the classifier identified 42 smokers (42/42, 100%) and 13 non-smokers (13/16, 81.25%), and total accuracy was 94.83% (55/58) (Fig. 6C; Table IV). The classifier demonstrated high accuracy of 100, 100 and 94.83% in GSE43458, GSE12667 and GSE10072, respectively. Se, Sp, PPV, NPV and AUC results (Table IV) and receiver operating characteristic curves were generated (Fig. 7).

Table IV.

Prediction results of the support vector machine classifier in the three datasets.

Dataset	Samples (n)	Accuracy (%)	Se	Sp	PPV	NPV	AUC
GSE43458	80	100	1	1	1	1	1
GSE12667	51	100	1	1	1	1	1
GSE10072	58	94.83	1	0.813	0.933	1	0.994

Se, sensitivity; Sp, specificity; PPV, positive predictive value; NPV, negative predictive value; AUC, area under curve.

Figure 7.

Receiver operating characteristic curves of support vector machine classifier for the three microarray datasets. (A) GSE43458, (B) GSE12667 and (C) GSE10072 microarray datasets were indicated. AUC, area under curve.

Overrepresented biological pathways

The 26 feature genes were indicated to be significantly enriched in nine biological pathways (Table V): Extracellular matrix (ECM)-receptor interaction, proteoglycans in cancer, cell adhesion molecules, pathogenic Escherichia coli infection, p53 signaling pathway, microRNAs in cancer, bacterial invasion of epithelial cells, apoptosis and hematopoietic cell lineage.

Table V.

A total of 9 biological pathways significantly overrepresented by the 26 feature genes.

ID	Term	P-value	Genes
hsa04512	Extracellular matrix-receptor interaction	5.68×10⁻⁰³	SDC4, ITGA5, COL3A1
hsa05205	Proteoglycans in cancer	9.82×10⁻⁰³	SDC4, CBLB, ITGA5, SDC2
hsa04514	Cell adhesion molecules	2.17×10⁻⁰²	SDC4, CD4, SDC2
hsa05130	Pathogenic Escherichia coli infection	2.19×10⁻⁰²	YWHAQ, TUBA4A
hsa04115	p53 signaling pathway	3.21×10⁻⁰²	APAF1, BID
hsa05206	MicroRNAs in cancer	3.29×10⁻⁰²	EZH2, SPRY2, ITGA5, DNMT1
hsa05100	Bacterial invasion of epithelial cells	3.91×10⁻⁰²	CBLB, ITGA5
hsa04210	Apoptosis	4.86×10⁻⁰²	APAF1, BID
hsa04640	Hematopoietic cell lineage	4.96×10⁻⁰²	CD4, ITGA5

Discussion

In the present study, three gene expression datasets were obtained and a total of 347 DEGs were identified in samples from smokers with LAC compared with non-smokers with LAC using meta-analysis. A PPI network including 202 nodes and 300 edges was constructed, from which 26 feature genes were identified. The SVM classifier of these 26 genes separated smokers from non-smokers with an accuracy >94% in all the three datasets. Pathway enrichment analysis demonstrated that these feature genes were primarily associated with cancer development- and metastasis-associated pathways, including ECM-receptor interaction, proteoglycans in cancer, cell adhesion molecules, p53 signaling pathway, microRNAs in cancer and apoptosis. Due to the generalization ability, SVM has been widely used for analysis, including data classification and function approximation (28–30). SVM classifier has been demonstrated to distinguish whether one cancer sample type possessed distinctive signatures of gene expressions compared with other sample types (31). In the present study, an SVM classifier with 26 feature genes successfully distinguished LAC samples of smokers and non-smokers using bioinformatics analysis. Yousef et al (32) previously conducted a similar study for the identification of biomarkers, by integrating interaction networks and an SVM classifier, and subsequently obtained >90% accuracy in classification of selected microarray datasets. Furthermore, a previous study also demonstrated that the discriminant analysis based on an SVM classifier achieved satisfactory results in the classification of lung cancer samples (33). Specific genes within the 26 feature genes have been implicated in lung cancer or LAC. CBLB is a regulator of T-cell response (34). It has been reported that the single nucleotide polymorphisms of CBLB may predict the definitive radiotherapy outcomes for non-small cell lung cancer (NSCLC) (34). CBLB is associated with icotinib-induced apoptosis and G1 phase arrest of epidermal growth factor receptor mutation-positive NSCLC (35). DNMT1 is responsible for maintaining methylation patterns following DNA replication and has an important role in the development of various types of cancer (36). DNA methylation alterations are recognized as key epigenetic changes in cancer, influencing the chromosomal instability through global hypomethylation and aberrant gene expression via the alterations in methylation levels (37). The tobacco-specific carcinogen nicotine-derived nitrosamine ketone induces the accumulation of DNMT1 in patients with lung cancer (38). Furthermore, DNMT1 inhibits the expression of, the tumor suppressor Wnt7a in NSCLC (39). EZH2 is a member of the polycomb-group family, which is associated with maintaining the transcriptional repressive state of genes over successive cell generations (40). Yoon et al (41) previously suggested a correlation between the genotype variants in EZH2 and reduced lung cancer risk. Additionally, Zhang et al (42) determined that miR-138 inhibited tumor growth through the repression of EZH2 in NSCLC. Notably, a recent study indicated that EZH2 silencing with RNA interference induced G2/M arrest in human lung cancer cells in vitro (43), and Wang et al (44) recently demonstrated that EZH2 overexpression was associated with a poor prognosis for patients with LAC. In the present study, it was indicated that EZH2 was upregulated in the samples of smokers and thus the present findings suggest that EZH2 upregulation may result from smoking. HMGB1 has a role in tumor cell migration (45). Shen et al (46) indicated that the expression of HMGB1 correlates with the progression of NSCLC. ITGA5 is considered as a prognostic indicator in NSCLC (47). MDK promotes cell growth, migration and angiogenesis, in particular during tumorigenesis (48). A previous study indicated that MDK protein overexpression is correlated with the malignant status and prognosis of NSCLC (49). Furthermore, MDK has been targeted as a therapeutic biomarker for lung cancer (50). PRKCI is required for lung tumorigenesis as genetic loss of PRKCI inhibits Kras-initiated hyperplasia and subsequent lung tumor formation in vivo (51). SPRY2 inhibits cell migration and proliferation in NSCLC (52). In addition, a previous study has indicated that downregulation of SPRY2 in NSCLC contributes to tumor malignancy (53). Smoking can cause LAC and the incidence of this disease increased in recent years (4). However, the reason for this increase and the mechanism underlying smoking-associated development of LAC remain to be elucidated. The present study identified genes implicated in smoking-associated LAC, including CBLB, DNMT1, EZH2, HMGB1, ITGA5, MDK, PRKCI and SPRY2. Most of these genes have been reported in association with malignancy and certain were associated with lung cancer. The identification of these characteristic genes may aid in elucidating the mechanism underlying smoking associated-lung adenocarcinoma. Although further experiments such as validation the gene and protein expression level in the smoking and non-smoking LAC samples were not performed limited by the LAC samples available, these results may provide information to other researchers in the field. In conclusion, a number of key genes have been revealed in smokers with LAC and some of these have been implicated in lung cancer. However, the associations between the 26 feature genes, smoking and LAC remain to be fully elucidated with further studies.

48 in total

1. Screening of genes related to lung cancer caused by smoking with RNA-Seq.

Authors: C Zhou; H Chen; L Han; F Xue; A Wang; Y-J Liang
Journal: Eur Rev Med Pharmacol Sci Date: 2014 Impact factor: 3.507

2. Support vector machines coupled with proteomics approaches for detecting biomarkers predicting chemotherapy resistance in small cell lung cancer.

Authors: Mingyong Han; Jianjian Dai; Ying Zhang; Qi Lin; Man Jiang; Xiaoya Xu; Qi Liu; Jihui Jia
Journal: Oncol Rep Date: 2012-09-17 Impact factor: 3.906

Review 3. DNA methylation, chromatin inheritance, and cancer.

Authors: M R Rountree; K E Bachman; J G Herman; S B Baylin
Journal: Oncogene Date: 2001-05-28 Impact factor: 9.867

4. Cyclin D1 (CCND1) A870G gene polymorphism modulates smoking-induced lung cancer risk and response to platinum-based chemotherapy in non-small cell lung cancer (NSCLC) patients.

Authors: Oliver Gautschi; Barbara Hugli; Annemarie Ziegler; Colette Bigosch; Naomi L Bowers; Daniel Ratschiller; Monika Jermann; Rolf A Stahel; Jim Heighway; Daniel C Betticher
Journal: Lung Cancer Date: 2006-01-10 Impact factor: 5.705

5. Differences in epidemiology, histology, and survival between cigarette smokers and never-smokers who develop non-small cell lung cancer.

Authors: Ayesha Bryant; Robert James Cerfolio
Journal: Chest Date: 2007-06-15 Impact factor: 9.410

6. hOGG1, p53 genes, and smoking interactions are associated with the development of lung cancer.

Authors: Zhe Cheng; Wei Wang; Yong-Na Song; Yan Kang; Jie Xia
Journal: Asian Pac J Cancer Prev Date: 2012

7. The tobacco-specific carcinogen NNK induces DNA methyltransferase 1 accumulation and tumor suppressor gene hypermethylation in mice and lung cancer patients.

Authors: Ruo-Kai Lin; Yi-Shuan Hsieh; Pinpin Lin; Han-Shui Hsu; Chih-Yi Chen; Yen-An Tang; Chung-Fan Lee; Yi-Ching Wang
Journal: J Clin Invest Date: 2010-01-19 Impact factor: 14.808

8. The expression of high-mobility group protein box 1 correlates with the progression of non-small cell lung cancer.

Authors: Xiaokun Shen; Lingzhi Hong; Huiming Sun; Minke Shi; Yong Song
Journal: Oncol Rep Date: 2009-09 Impact factor: 3.906

Review 9. Worldwide trend of increasing primary adenocarcinoma of the lung.

Authors: Haruhiko Nakamura; Hisashi Saji
Journal: Surg Today Date: 2013-06-11 Impact factor: 2.549

10. Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method.

Authors: Peng Guan; Desheng Huang; Miao He; Baosen Zhou
Journal: J Exp Clin Cancer Res Date: 2009-07-18