Literature DB >> 29328377

Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

Abstract

The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2018 PMID： 29328377 PMCID： PMC5802200 DOI： 10.3892/mmr.2018.8398

Source DB: PubMed Journal: Mol Med Rep ISSN： 1791-2997 Impact factor: 2.952

Introduction

Breast cancer is one of the most commonly diagnosed types of cancer, accounting for one-third of cancer cases in the USA (1). The survival rate of breast cancer has improved steadily with the development of early diagnosis and adjuvant therapy; however, the overall survival of patients with metastatic disease still remains poor (2). It has been estimated that >90% of breast cancer mortalities are associated with tumor metastasis (3,4). Metastasis is associated with poor patient prognosis and an acceleration of the carcinoma progress (5). Brain, bone, lungs and liver are the most frequently targeted organs for breast cancer metastasis, and the tumor microenvironment is considered to be a critical regulator for the metastatic process (6). Comprehensive understanding of metastasis progression is very important for identifying novel therapeutic strategies to prevent metastatic disease. The MetaOmics software in R language is comprised of the MetaDE, MetaQC and MetaPath packages. The MetaDE package primarily contains 12 state-of-the-art genomic meta-analysis methods to detect differentially expressed genes (7). The MetaQC package is the quantitative and objective tool for the determination of the inclusion/exclusion criteria for meta-analysis (8). The MetaDE and MetaQC packages have been intensively utilized for data digging from microarray profiles. Fc fragment of immunoglobulin G binding protein, for example, has been reported as a candidate metastasis-associated gene using the integrated method of MetaDE and survival analysis (9). As an effective classifier for identification, the support vector machine (SVM) classifier is well suited for signature modeling (10). Guyon et al (11) applied the SVM classifier to select feature genes from DNA microarrays, and the selected genes were proved to exhibit a greater classification performance. Fan et al (10) demonstrated that the SVM classifier for feature gene selection was able to speed up the classification process and the generalization performance. In the present study, several microarray profiles of breast cancer samples (including metastatic and non-metastatic samples) were downloaded to investigate the feature genes in metastatic samples. A SVM classifier was constructed to identify feature genes, which was validated by another independent gene expression dataset from The Cancer Genome Atlas (TCGA) database.

Materials and methods

Processing of microarray data

Expression profiles matching the search terms of ‘breast cancer’, ‘homo sapiens’ and ‘metastasis’ in the Gene Expression Omnibus (GEO; www.ncbi.nlm.nih.gov/geo/) database were screened on 22nd April 2016. The profiles were selected using the following filtering criteria: i) The data was gene expression microarray data; ii) data was collected from cancerous tissue samples or cancerous-metastasis samples; iii) and the metastatic statuses of the samples were clearly recorded. A total of 5 microarray profiles were retrieved from the GEO database (Table I). The GSE46928, GSE43837, GSE46826, GSE39494 and GSE29431 profiles had a total of 52, 38, 27, 10 and 31 samples, respectively; these in turn included 11, 19, 21, 5 and 13 metastatic samples, respectively.

Table I.

Basic information of downloaded microarray data.

GEO accession	Chip	Probe number	Total sample number	Non-metastasis samples	Metastasis samples
GSE46928	HG-U133A	22,283	52	41	11
GSE43837	U133_X3P	61,360	38	19	19
GSE46826	Agilent-021924	62,977	28	6	22
GSE39494	Agilent-014850	41,000	10	5	5
GSE29431	HG-U133_Plus_2	54,675	31	18	13

GEO, Gene Expression Omnibus.

For GSE46928, GSE43837 and GSE29431 datasets based on the Affymetrix platform (Affymetrix; Thermo Fisher Scientific, Inc., Waltham, MA, USA), the raw data were used to perform background correction via Affymetrix microarray software Affy version 1.42.3 (https://bioconductor.org/packages/release/bioc/html/affy.html) in R version 3.1.0, and normalization via the quantiles method (12). For GSE46826 and GSE39494 datasets based on the Agilent platform (Agilent Technologies, Inc., Santa Clara, CA, USA), the gene names in the microarray data were identified according to Agilent platform. Then, the average values were used as the expression levels of genes corresponding to multiple probes. The Limma package 3.22.1 (13) (https://bioconductor.org/packages/release/bioc/html/limma.html) was used for the normalization of these data.

Screening of feature genes

All of the selected datasets were merged to form a novel dataset for the screening of feature genes using MetaDE.ES in the MetaDE package 1.0.5 (14). Firstly, principal component analysis and standardized mean rank methods in the MetaQC package (8) were applied to ensure quality control (QC) within the novel datasets from the different profiles. In this process, the following parameters were used: Internal QC, external QC, accuracy QC (AQCg), precision of AQCg, consistency QC (CQCg) and precision of CQCg. Tests for heterogeneity were then performed to determinate the gene expression differentiations among the different datasets; Qpval >0.05 and tau2=0 were used as the criteria for homogenous genes. Finally, the differentially expressed genes (DEGs) between metastatic samples and non-metastatic samples in the dataset were identified under the threshold of P<0.05, which were considered as feature genes in the following analysis.

Construction of the protein-protein interaction (PPI) network

The interactions between human genes in the Biological General Repository for Interaction Datasets (thebiogrid.org/, BioGRID Version 3.4.154 Released) (15), Human Protein Reference Database (www.hprd.org/, HPRD Release 9) (16) and Biomolecular Interaction Network Database (BIND 2.0) (17) were downloaded. The screened feature genes were then subjected to the downloaded interactions to obtain the PPI network, which was visualized using Cytoscape 3.6.0 software (18). The degree (the connection with other genes) and the betweenness centrality (BC) value of feature genes in the network were calculated. The following formula was used for calculating BC: Where σ is the shortest path between s and t, and σ is the node numbers in the path of σ. A high BC value indicates a high degree of feature genes in the network.

Establishment of the SVM classifier

Feature genes were ranked according to their BC values, and those that were present in the most qualified samples were collected as the training dataset for the establishment of the SVM classifier. The remaining feature genes were used as the verification datasets for the classifier. The feature genes in the SVM classifier were used to perform the two-way clustering of samples and expression levels. The clustering results were visualized using a heatmap (19). The aim of the constructed SVM classifier was to distinguish whether the cancer had metastasized by analyzing the primary cancer samples. A set of microarray data from breast cancer samples (https://cancergenome.nih.gov/) was downloaded from TCGA (tcga-data.nci.nih.gov/docs/publications/tcga/) for further clarification. In total, 597 samples were included in the dataset, among which 178 samples had clinical information regarding metastasis status, follow-up time and the clinical outcomes. There were 35 metastatic samples and 143 non-metastatic samples.

Function and pathway enrichment

Fisher's test was utilized with the ‘runHyperKEGG’ and ‘runHyperGO’ functions of the Easy Microarray Data Analysis package 1.4.4 (20) for the function and pathway enrichment of feature genes. P<0.05 was set as the cut-off criterion.

Results

Feature gene selection

The QC results of all 5 microarray profiles are displayed in Fig. 1 and Table II; the results indicated there was good quality within all datasets. Next, using the MetaDE package, 541 feature genes were identified and the top 10 were ranked by their P-values; these included, non-SMC condensing I complex subunit H, small nuclear ribonucleoprotein U11/U12 subunit 25, cellular retinoic acid binding protein 2, guanosine triphosphate binding protein 2, homer scaffolding protein 2, family with sequence similarity 64 member A, WD repeat domain (WDR) 45, dual specificity tyrosine phosphorylation regulated kinase 4, chromosome 12 open reading frame 10 and H2A histone family member Z (Table III).

Figure 1.

Quality control results of the merged datasets from 5 microarray profiles (marked as 1–5) obtained via MetaQC analysis. The first principal component is presented on the x-axis, while the second principal component is shown on the y-axis. QC, quality control; IQC, internal QC; EQC, external QC; AQCg, accuracy QC; AQCp, precision of AQCg; CQCg, consistency QC; CQCp, precision of CQCg.

Table II.

Results of quality control parameters and standardized mean rank.

Microarray profile	IQC	EQC	CQCg	CQCp	AQCg	AQCp	SMR
GSE46928	4.91	4.78	93.87	148.67	153.83	56.44	2.42
GSE43837	5.12	5.00	52.41	101.36	184.06	39.30	1.57
GSE46826	4.56	4.22	68.15	146.58	106.19	29.43	4.83
GSE39494	2.16	2.92	21.58	64.14	46.61	33.90	7.17
GSE29431	3.19	4.16	43.66	89.52	113.24	31.16	3.36

QC, quality control; IQC, internal QC; EQC, external QC; AQCg, accuracy QC; AQCp, precision of AQCg; CQCg, consistency QC; CQCp, precision of CQCg; SMR, standardized mean rank.

Table III.

Top 10 feature genes selected using the MetaDE package.

Gene	P-value	Q	Qp	Exp
NCAPH	4.17×10⁻⁵	1.4919	0.8281	1
SNRNP25	1.20×10⁻⁴	3.8687	0.4241	1
CRABP2	1.55×10⁻⁴	0.5088	0.9726	1
GTPBP2	3.51×10⁻⁴	0.4245	0.9804	1
HOMER2	3.74×10⁻⁴	3.4071	0.4921	1
FAM64A	3.93×10⁻⁴	2.5196	0.6411	1
WDR45	4.34×10⁻⁴	2.5287	0.6395	1
DYRK4	4.61×10⁻⁴	1.4036	0.8436	1
C12orf10	4.92×10⁻⁴	2.7885	0.5938	1
H2AFZ	5.19×10⁻⁴	3.0197	0.5545	1

NCAPH, non-SMC condensing I complex subunit H; SNRNP35, small nuclear ribonucleoprotein U11/U12 subunit 25; CRABP2, cellular retinoic acid binding protein 2; GTPBP2, guanosine triphosphate binding protein 2; HOMER2, homer scaffolding protein 2; FAM64A, family with sequence similarity 64 member A; WDR45, WD repeat domain 45; DYRK4, dual specificity tyrosine phosphorylation regulated kinase 4; C12orf10, chromosome 12 open reading frame 10; H2AFZ, H2A histone family member Z.

PPI network of feature genes

The PPI network of feature genes was comprised of 307 nodes (feature genes) and 586 lines (interactions; Fig. 2). There were 220 nodes (shown in green) that exhibited higher expression levels in metastatic samples, as well as 87 nodes (shown in purple) that exhibited lower expression levels in metastatic samples when compared to non-metastatic samples. As shown in Fig. 3, 168 genes exhibited a log (degree) of 0–1 and only 5 genes exhibited a log (degree) of >3 in the network. In addition, the top 30 genes with the highest BC values were listed in Table IV. The top 10 feature genes were Nuclear RNA Export Factor 1 (NXF1), cyclin-dependent kinase 2 (CDK2), myelocytomatosis proto-oncogene protein (MYC), Cullin 5 (CUL5), SHC Adaptor Protein 1 (SHC1), Clathrin heavy chain (CLTC), Nucleollin (NCL), WDR1, proteasome 26S subunit, non-ATPase 2 (PSMD2), telomeric repeat binding factor 2 (TERF2; Table IV). Among these feature genes the CDK inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1) and MYC interacted with CDK2.

Figure 2.

Protein-protein interaction network of feature genes. Green nodes are the genes that exhibited higher expression in metastatic samples, while the purple nodes are those that exhibited lower expression in metastatic samples when compared with non-metastatic samples.

Figure 3.

Distribution of node degrees in the protein-protein interaction network of feature genes. The x-axis is the log (degree) value and the y-axis is the corresponding node numbers to the degree.

Table IV.

Top 30 feature genes with the highest betweeness centrality in the protein-protein interaction network.

Gene	BC	EXP	Degree	P-value	Q	Qp
NXF1	0.3864	1	66	3.43×10⁻²	3.7163	0.4458
CDK2	0.2047	0	44	3.33×10⁻²	2.2882	0.6829
MYC	0.1382	1	27	4.91×10⁻²	3.4827	0.4805
CUL5	0.1006	1	21	2.86×10⁻²	3.0080	0.5565
SHC1	0.0974	1	16	1.60×10⁻²	1.1518	0.8860
CLTC	0.0783	0	20	2.66×10⁻²	2.8154	0.5892
NCL	0.0568	1	15	9.12×10⁻⁴	1.3121	0.8593
WDR1	0.0532	1	8	8.49×10⁻³	2.5722	0.6318
PSMD2	0.0476	1	13	8.31×10⁻⁴	3.4061	0.4923
TERF2	0.0460	0	11	1.65×10⁻²	0.3161	0.9888
RUVBL1	0.0450	1	13	2.51×10⁻²	0.8904	0.9259
PRDX1	0.0394	1	10	4.09×10⁻²	2.0057	0.7347
PTEN	0.0334	0	12	1.99×10⁻³	3.5056	0.4770
HDGF	0.0313	1	10	3.93×10⁻²	3.4475	0.4859
RUNX1T1	0.0291	0	4	2.88×10⁻²	0.2956	0.9901
IQCB1	0.0283	1	12	1.20×10⁻³	0.7995	0.9385
AKT1	0.0273	1	15	3.26×10⁻³	2.0318	0.7299
APEX1	0.0268	1	6	1.09×10⁻²	1.8543	0.7625
TSR1	0.0263	0	7	2.06×10⁻²	2.2661	0.6870
TUBB2A	0.0258	1	9	1.18×10⁻²	3.4922	0.4791
ETS1	0.0257	0	5	4.11×10⁻³	3.2520	0.5166
PSMC5	0.0249	1	11	1.85×10⁻²	2.7803	0.5952
RUNX1	0.0248	0	4	4.45×10⁻²	2.3257	0.6761
SMAD9	0.0242	0	6	3.52×10⁻²	1.3518	0.8525
STAU1	0.0239	1	14	1.33×10⁻²	1.7706	0.7779
DBN1	0.0235	1	13	2.31×10⁻³	2.1547	0.7073
SNCA	0.0229	0	10	2.51×10⁻²	2.9088	0.5732
CDKN1A	0.0226	0	12	1.48×10⁻²	3.7775	0.4369
SLC25A1	0.0223	1	2	2.22×10⁻²	1.1438	0.8873
NOS2	0.0222	0	9	4.71×10⁻²	1.0560	0.9012

EXP is the expression value ratio of genes between metastastic samples and non-metastastic samples, while values of 1 represent high expression in metastastic samples and values of 0 represent high expression in non-metastastic samples. BC, betweeness centrality.

SVM classifier

Feature genes ranked with BC values were picked at 10 intervals from the top 10 to the top 50, for the construction of the SVM classifier. The dataset GSE46928 with the largest sample size was used as the training dataset. As shown in Fig. 4A, the accuracy of the SVM classifier improved with the increasing number of genes and the accuracy stabilized at 100% once the top 30 genes were selected. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from the non-metastatic samples with high accuracy (Fig. 4B). The selected 30 genes were considered to be the critical biomarkers for metastatic breast cancer, and included protein kinase B serine/threonine kinase 1 (AKT1), CDKN1A, ETS proto-oncogene 1 transcription factor (ETS1), runt related transcription factor 1 (RUNX1), RUNX1 translocation partner 1 (RUNX1T1), nitric oxide synthase 2 (NOS2), MYC, phosphatase and tensin homolog (PTEN) and CDK2. Clustering analysis of these 30 feature genes and the samples in GSE46928 demonstrated that these genes have significantly different expression levels between the metastatic and non-metastatic samples (Fig. 5).

Figure 4.

Accuracy and efficacy of the support vector machine classifier. (A) The accuracy and error ratio of the classifier at different gene numbers (top 10 to top 50). (B) The classification efficacy of the classifier constructed using the top 30 genes for samples in the GSE46928 dataset. Non-metastatic samples are marked in black and the metastatic samples are marked in red.

Figure 5.

Clustering heatmap of the top 30 genes and samples in the training dataset. The color gradient from red to green represents the changes in expression level from high to low. The bars represent the samples (orange refers to metastatic samples; purple refers to non-metastatic samples). Met, metastatic samples; Non, non-metastatic samples.

The classification efficacy of the constructed classifier was also tested on the other 4 microarray datasets (Fig. 6). All samples in GSE39494 (Fig. 6B) and GSE46826 (Fig. 6D) were correctly distinguished, and only 3 samples in GSE29431 (Fig. 6A) and 4 samples in GSE43837 (Fig. 6C) were misclassified. Overall, the SVM classifier displayed good performance in terms of distinguishing between metastatic and non-metastatic samples. The correct rate, specificity, positive predictive value (PPV) and negative predictive value (NPV) were >0.89, sensitivity was >0.84 and the area under the receiver operating characteristic curve (AUROC) was >0.96 (Table V).

Figure 6.

Classification results on other microarray profiles, including (A) GSE29431, (B) GSE39494, (C) GSE43837 and (D) GSE46826. Non-metastatic samples are marked in black and metastatic samples are marked in red. The receiver operating characteristic curves of the classifier are displayed on the right-hand side. AUC, area under the curve.

Table V.

Classification effect evaluation of the support vector machine classifier.

Dataset	Number of samples	Correct rate	Sensitivity	Specificity	PPV	NPV	AUROC
GSE29431	31	1	1	1	1	1	1
GSE39494	10	0.903	0.846	0.944	0.917	0.895	0.975
GSE43837	38	1	1	1	1	1	1
GSE46826	28	0.895	0.895	0.895	0.895	0.895	0.965

PPV, positive predictive value; NPV, negative predictive value; AUROC, area under the receiver operating characteristic curve.

An independent dataset of breast cancer samples was downloaded from the TCGA database to test the classification effect of the constructed classifier (Fig. 7). The results revealed an accuracy of 94.38% (168/178) in 35 metastatic samples and 143 non-metastatic samples, with an AUROC of 0.958 (Fig. 7B). Based on the 30 feature genes, the survival time of patients with metastatic breast cancer was significantly shorter than the patients with non-metastatic breast cancer, and the survival status was worse (Fig. 7C).

Figure 7.

Classification effect of the support vector machine classifier on an independent sample from The Cancer Genome Atlas database. (A) The spot graph of the different samples (non-metastatic samples are marked in black and metastatic samples are marked in red). (B) The receiver operating characteristic curve and (C) the survival curve. AUC, area under the curve.

The 30 feature genes in the SVM classifier were utilized for function and pathway enrichment. The results indicated that cell cycle associated functions and pathways were the most significant terms (Fig. 8; Table VI).

Figure 8.

Enriched functions of the 30 feature genes. Gene numbers are displayed on the x-axis. The color represents the -log (P-value) and the changes from red to blue represents high -log (P-value) to low -log (P-value).

Table VI.

Enriched pathways of the 30 feature genes.

Pathway	P-value	Genes
hsa05200: Pathways in cancer	1.11×10⁻⁵	AKT1, CDKN1A, ETS1, RUNX1T1, NOS2, RUNX1, MYC, PTEN, CDK2
hsa04012: ErbB signaling pathway	3.85×10⁻³	AKT1, CDKN1A, SHC1, MYC
hsa04115: p53 signaling pathway	2.60×10⁻²	CDKN1A, PTEN, CDK2
hsa04110: Cell cycle	2.81×10⁻⁵	CDKN1A, MYC, CDK2

AKT1, protein kinase B serine/threonine kinase 1; CDKN1A, cyclin-dependent kinase inhibitor 1A; ETS1, ETS proto-oncogene 1 transcription factor; RUNX1, runt related transcription factor 1; RUNX1T1, RUNX1 translocation partner 1; NOS2, nitric oxide synthase 2; MYC, myelocytomatosis proto-oncogene protein; PTEN, phosphatase and tensin homolog; ErbB, Erb-B2 receptor tyrosine kinase 2; SHC1, SHC Adaptor Protein 1.

Discussion

As breast cancer metastasis accounts for the majority of breast cancer mortalities, there have been a number of reports analyzing DEGs associated with metastasis in breast cancer. Some previous studies have identified the markers associated with metastasis using the protein-network based approach (21–23). Walsh et al (24) identified tripartite motif containing 25 as a key determinant of breast cancer metastasis using an integrated transcriptional interaction network. In the present study, MetaQC package was firstly applied to conduct QC tests for the different profiles as the MetaQC package is the quantitative and objective tool in the determination of the inclusion/exclusion criteria for meta-analysis (8). The DEGs between metastatic and non-metastatic samples in the dataset were identified using the MetaDE package, which contains 12 state of the art genomic meta-analysis methods that detect DEGs (7). In the present study, a total of 541 feature genes were identified between metastatic and non-metastatic samples. The PPI network of DEGs was constructed and was comprised of 307 feature genes and 586 interactions, among which 220 nodes exhibited higher expression levels in metastatic samples and 87 nodes exhibited lower expression levels in metastatic samples when compared with non-metastatic samples. Feature genes were ranked according to their BC that quantifies the importance of a vertex within a graph (25,26). The top 10 genes with the highest BC values included NXF1, CDK2, MYC, CUL5, SHC1, CLTC, NCL, WDR1, PSMD2 and TERF2. CDKN1A, E2F1 and MYC were the genes that interacted with CDK2. Then, the SVM classifier of screened feature genes was constructed to evaluate the classification performance. The SVM classifier constructed by the top 30 feature genes (which included AKT1, CDKN1A, ETS1, RUNX1T1, NOS2, RUNX1, MYC, PTEN and CDK2, for example) was able to distinguish metastatic samples from the non-metastatic samples; this was proved by the clustering analysis. Overall, the classifier displayed good performance with a correct rate, specificity, PPV and NPV of >0.89, sensitivity >0.84 and an AUROC of >0.96. The verification on an independent dataset exhibited an accuracy of 94.38% and an AUROC of 0.958 for the 35 metastatic samples and 143 non-metastatic samples. The survival time of the metastatic samples was revealed to be shorter than the non-metastatic samples, based on the analysis of these 30 feature genes. Cell cycle associated functions and pathways were the most significant terms of the 30 feature factors. CDK2 is reported to exert important roles in cell cycle regulation and is associated with tumor aggressiveness and poor prognosis (27,28). Kim et al (29) demonstrated that the specific activity of CDK2 could be used as a prognostic indicator for early breast cancer. Roesley et al (30) also identified that CDK2 phosphorylates breast cancer metastasis suppressor 1 (BRMS1) on Serine 237 and the mutation can prevent BRMS1 from suppressing cell migration. In addition, sirtuin 2 (SIRT2)-mediated inhibition of the migration of fibroblasts can be antagonized by the CDK2-induced SIRT2 phosphorylation (31). CDKN1A (also known as p21), one of the CDK inhibitor genes, contributes to cell cycle progression (32). Variant genotypes of CDKN1A were observed to be associated with an increased risk of breast cancer in the Chinese female population (33). When mammalian cells are exposed to DNA damaging agents, CDKN1A will inhibit cyclin/CDK2 complexes and participate in mediating growth arrest (34). The CDK2/CDKN1A ratio is considered to be a predictive factor of major clinical events in patients with oral squamous cell carcinoma (35). E2F1 is a target of cellular (c)-Myc that promotes cell cycle progression (36). The E2F1 mRNA levels are a strong determinant of clinical outcome in primary breast cancer (37). The CDK2-E2F1 signaling pathway exerts a pivotal role in regulating the G1 to S phase transition in the cell cycle (38). The interactions between CDK2/CDKN1A and CDK2/E2F1 identified in the present study indicated that they may influence the metastasis of breast cancer via their effect on the cell cycle. The proto-oncogene c-MYC encodes a transcription factor that regulates cell growth, proliferation and apoptosis. c-MYC is commonly amplified in breast cancer and promotes the phenotypic transformation of mammary cells by synergistically interacting with transforming growth factor α (39). MYC gene amplification is often acquired in lethal distant breast cancer metastases of unamplified primary tumors (40), and the overexpression of MYC significantly decreased the metastasis of breast cancer cells to lung (41). In conclusion, in the present study a SVM classifier was constructed to assess the possibility of breast cancer metastasis, which exhibited high accuracy in several independent datasets. The CDK2, CDKN1A, E2F1 and MYC genes were highlighted as the potential feature genes for metastatic breast cancer, which may interact synergistically by influencing the cell cycle. The results provided some potential markers for breast cancer metastasis, which may also be prospective precise treatment targets for metastatic breast cancer. In the group's future studies, the expression levels of the potential feature genes will be validated in clinical samples by reverse transcription-quantitative polymerase chain reaction or immunohistochemical staining.

35 in total

1. BIND: the Biomolecular Interaction Network Database.

Authors: Gary D Bader; Doron Betel; Christopher W V Hogue
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

Review 2. E2F1 and c-Myc in cell growth and death.

Authors: Itaru Matsumura; Hirokazu Tanaka; Yuzuru Kanakura
Journal: Cell Cycle Date: 2003 Jul-Aug Impact factor: 4.534

3. Water extract of Hedyotis Diffusa Willd suppresses proliferation of human HepG2 cells and potentiates the anticancer efficacy of low-dose 5-fluorouracil by inhibiting the CDK2-E2F1 pathway.

Authors: Xu-Zheng Chen; Zhi-Yun Cao; Tuan-Sheng Chen; You-Quan Zhang; Zhi-Zhen Liu; Yin-Tao Su; Lian-Ming Liao; Jian Du
Journal: Oncol Rep Date: 2012-05-25 Impact factor: 3.906

4. Cancer statistics, 2010.

Authors: Ahmedin Jemal; Rebecca Siegel; Jiaquan Xu; Elizabeth Ward
Journal: CA Cancer J Clin Date: 2010-07-07 Impact factor: 508.702

5. Determination of the specific activity of CDK1 and CDK2 as a novel prognostic indicator for early breast cancer.

Authors: S J Kim; S Nakayama; Y Miyoshi; T Taguchi; Y Tamaki; T Matsushima; Y Torikoshi; S Tanaka; T Yoshida; H Ishihara; S Noguchi
Journal: Ann Oncol Date: 2007-10-22 Impact factor: 32.976

Review 6. Microenvironmental regulation of metastasis.

Authors: Johanna A Joyce; Jeffrey W Pollard
Journal: Nat Rev Cancer Date: 2008-03-12 Impact factor: 60.716

7. Expression of the cell cycle regulatory proteins p34cdc2, p21waf1, and p53 in node negative invasive ductal breast carcinoma.

Authors: H P Kourea; A K Koutras; C D Scopa; M N Marangos; E Tzoracoeleftherakis; D Koukouras; H P Kalofonos
Journal: Mol Pathol Date: 2003-12

8. Cytoscape 2.8: new features for data integration and network visualization.

Authors: Michael E Smoot; Keiichiro Ono; Johannes Ruscheinski; Peng-Liang Wang; Trey Ideker
Journal: Bioinformatics Date: 2010-12-12 Impact factor: 6.937

9. Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis.

Authors: Pall F Jonsson; Tamara Cavanna; Daniel Zicha; Paul A Bates
Journal: BMC Bioinformatics Date: 2006-01-06 Impact factor: 3.169

10. Low E2F1 transcript levels are a strong determinant of favorable breast cancer outcome.

Authors: Vincent Vuaroqueaux; Patrick Urban; Martin Labuhn; Mauro Delorenzi; Pratyaksha Wirapati; Christopher C Benz; Renata Flury; Holger Dieterich; Frédérique Spyratos; Urs Eppenberger; Serenella Eppenberger-Castori
Journal: Breast Cancer Res Date: 2007 Impact factor: 6.466

8 in total

1. Development of Personalized Therapeutic Strategies by Targeting Actionable Vulnerabilities in Metastatic and Chemotherapy-Resistant Breast Cancer PDXs.

Authors: Simona Punzi; Marine Meliksetian; Laura Riva; Federica Marocchi; Giancarlo Pruneri; Carmen Criscitiello; Franco Orsi; Lorenzo Spaggiari; Monica Casiraghi; Paolo Della Vigna; Lucilla Luzi; Giuseppe Curigliano; Pier Giuseppe Pelicci; Luisa Lanfrancone
Journal: Cells Date: 2019-06-18 Impact factor: 6.600

2. Characteristic genes in THP‑1 derived macrophages infected with Mycobacterium tuberculosis H37Rv strain identified by integrating bioinformatics methods.

Authors: Yu-Wei Zhang; Yan Lin; Hui-Yuan Yu; Ruo-Nan Tian; Fan Li
Journal: Int J Mol Med Date: 2019-07-30 Impact factor: 4.101

3. A predictive model for assessing prognostic risks in gastric cancer patients using gene expression and methylation data.

Authors: Dan Luo; QingLing Yang; HaiBo Wang; Mao Tan; YanLei Zou; Jian Liu
Journal: BMC Med Genomics Date: 2021-01-06 Impact factor: 3.063

4. Stroma-derived extracellular vesicle mRNA signatures inform histological nature of prostate cancer.

Authors: Alex P Shephard; Peter Giles; Mariama Mbengue; Amr Alraies; Lisa K Spary; Howard Kynaston; Mark J Gurney; Juan M Falcón-Pérez; Félix Royo; Zsuzsanna Tabi; Dimitris Parthimos; Rachel J Errington; Aled Clayton; Jason P Webber
Journal: J Extracell Vesicles Date: 2021-10

5. Identification of key miRNAs and mRNAs related to coronary artery disease by meta-analysis.

Authors: Long Liu; Jingze Zhang; Mei Wu; Haiming Xu
Journal: BMC Cardiovasc Disord Date: 2021-09-16 Impact factor: 2.298

Review 6. The genomic architecture of metastasis in breast cancer: focus on mechanistic aspects, signalling pathways and therapeutic strategies.

Authors: Yogita Chhichholiya; Prabhat Suman; Sandeep Singh; Anjana Munshi
Journal: Med Oncol Date: 2021-07-16 Impact factor: 3.064

7. Signature microRNAs and long noncoding RNAs in laryngeal cancer recurrence identified using a competing endogenous RNA network.

Authors: Zhengyi Tang; Ganguan Wei; Longcheng Zhang; Zhiwen Xu
Journal: Mol Med Rep Date: 2019-04-10 Impact factor: 2.952

8. Gene expression signatures predict response to therapy with growth hormone.

Authors: Adam Stevens; Philip Murray; Chiara De Leonibus; Terence Garner; Ekaterina Koledova; Geoffrey Ambler; Klaus Kapelari; Gerhard Binder; Mohamad Maghnie; Stefano Zucchini; Elena Bashnina; Julia Skorodok; Diego Yeste; Alicia Belgorosky; Juan-Pedro Lopez Siguero; Regis Coutant; Eirik Vangsøy-Hansen; Lars Hagenäs; Jovanna Dahlgren; Cheri Deal; Pierre Chatelain; Peter Clayton
Journal: Pharmacogenomics J Date: 2021-05-27 Impact factor: 3.550

8 in total