| Literature DB >> 27454576 |
Soobok Joe1, Hojung Nam2.
Abstract
BACKGROUND: The survival of patients with breast cancer is highly sporadic, from a few months to more than 15 years. In recent studies, the gene expression profiling of tumors has been used as a promising means of predicting prognosis factors.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27454576 PMCID: PMC4959370 DOI: 10.1186/s12911-016-0292-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Overall workflow The expression profiles of 24,924 genes were used for the survival log-rank test. Two gene groups were chosen as highly expressed genes (H1, H2), and low-expressed genes (L1, L2) associated with poor survival were clustered according to positive correlations. Negative correlations were used for pairing across differentially expressed gene sets. Finally, four paired gene sets were selected. To estimate the prognostic scores, the ratio of the level of high- and low-expressed genes was defined as the score
The dataset used in this study
| Dataset | grade | age | ER status | TNBC | Total | Platform | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | <40 | 40 ~ 60 | >60 | + | - | ||||
| METABRIC | 170 | 775 | 952 | 118 | 754 | 1109 | 1505 | 435 | 317 | 1981 | Illumina HT 12v3 |
| GSE25066 | 32 | 180 | 259 | 85 | 327 | 96 | 297 | 205 | 178 | 508 | Affymetrix HG U133A |
| GSE2034 |
|
|
|
|
|
| 209 | 77 |
| 286 | Affymetrix HG U133A |
| GSE3494 | 67 | 128 | 54 | 16 | 90 | 145 | 213 | 34 |
| 251 | Affymetrix HG U133A |
| GSE2109 | 31 | 113 | 136 |
|
|
|
|
| 47 | 351 | Affymetrix HG U133A |
Here, 1981, 508, 286 and 251 samples of gene expression profiles were used from the METABRIC, GSE25066, GSE22034, and GSE3494 datasets, respectively. The METABRIC data set is used for training, and three GSE (GSE25066, GSE2034, and GSE3494) datasets are used for validation. METABRIC, GSE25066, and GSE2109 datasets were used to find differentially expressed genes (DEGs) between TNBC vs. non-TNBC. The numbers located in table represent the number of samples according to breast cancer characteristics
Fig. 2Kaplan-Meier survival plots in the three GEO datasets a-c. Kaplan-Meier survival plots formulated with module 1 type on GSE25066, GSE2034, and GSE3494. Patients were grouped according to their module scores based on gene expression levels, with scores of over 50 % and less than 50 % representing the high and low scoring groups, respectively. d-f Kaplan-Meier survival result using module 1. Patients were grouped by module 1 scores based on gene expression levels, with scores of over 25 %, 25–75 %, and 75–100 % representing the high, medium, and low groups, respectively
Fig. 3Expression of high/low-expressed genes in poor survival and Boxplots of prognostic score according to breast cancer characteristics. a The expression levels of the genes used in module 1. The 26 genes in the upper part of the figure represent high-expressed genes associated with poor survival; and the 17 genes in the lower part represent row-expressed genes associated with poor survival. b Boxplot showing the distribution of module 1 scores in the METABRIC dataset according to PAM50 subtypes. c Boxplot showing the distribution of module 1 scores according to TNBC and Non TNBC. d Boxplot showing the distribution of module 1 scores according to ER and TP53 mutation status. e Boxplot showing the distribution of module 1 scores according to the tumor grade. Units for the ratio between high- to low-expressed genes associated with poor survival (see Methods)
The gene list used for module 1
| Gene | Description | METABRIC | BreastMark[ | |||
|---|---|---|---|---|---|---|
| HR |
| HR |
| |||
| High-expressed genes | CHEK1 a, b | checkpoint kinase 1 | 2.16 | 8.1E-11 | 1.32 | 3.9E-06 |
| FOXM1 b | forkhead box M1 | 2.58 | 4.4E-16 | 1.58 | 5.5E-13 | |
| CCNA2 a, b | cyclin A2 | 2.53 | 5.8E-15 | 1.47 | 3.0E-09 | |
| CDC20 a, b | cell division cycle 20 | 2.50 | 9.6E-15 | 1.54 | 5.8E-13 | |
| TTK a, b | TTK protein kinase | 2.28 | 1.5E-12 | 1.50 | 1.2E-11 | |
| CENPA a, b | centromere protein A | 2.56 | 4.4E-16 | 1.54 | 6.6E-13 | |
| KIF2C a, b | kinesin family member 2C | 2.49 | 5.6E-15 | 1.64 | 2.2E-16 | |
| BUB1 a | BUB1, mitotic checkpoint serine/threonine kinase | 2.50 | 2.1E-14 | 1.61 | 2.2E-15 | |
| MCM6 | minichromosome maintenance complex component 6 | 2.09 | 3.5E-10 | 1.56 | 8.8E-14 | |
| LMNB2 b | lamin B2 | 2.17 | 3.2E-11 | 1.38 | 2.6E-07 | |
| CDC45 b | cell division cycle 45 | 2.53 | 2.6E-14 | 1.50 | 4.3E-12 | |
| ANLN a | anillin actin binding protein | 2.27 | 6.1E-12 | 1.48 | 1.1E-07 | |
| MCM10 b | minichromosome maintenance 10 replication initiation factor | 2.30 | 1.5E-12 | 1.62 | 9.8E-14 | |
| CDCA8 a, b | cell division cycle associated 8 | 2.28 | 1.0E-12 | 1.55 | 3.8E-13 | |
| MELK b | maternal embryonic leucine zipper kinase | 2.56 | 3.6E-15 | 1.60 | 0 | |
| CCNB2 a | cyclin B2 | 2.79 | 0 | 1.72 | 0 | |
| CEP55 a, b | centrosomal protein 55 kDa | 2.55 | 9.1E-15 | 1.56 | 1.8E-13 | |
| DLGAP5 a, b | discs, large (Drosophila) homolog-associated protein 5 | 2.16 | 3.8E-10 | 1.46 | 3.6E-10 | |
| HJURP b | Holliday junction recognition protein | 2.79 | 0 | 1.61 | 2.3E-15 | |
| CDCA5 a | cell division cycle associated 5 | 2.76 | 0 | 1.29 | 1.3E-03 | |
| TRIP13 a, b | thyroid hormone receptor interactor 13 | 2.18 | 5.0E-11 | 1.44 | 6.6E-09 | |
| GTSE1 a, b | G2 and S-phase expressed 1 | 2.54 | 1.7E-14 | 1.35 | 5.5E-07 | |
| CDCA3 a, b | cell division cycle associated 3 | 2.29 | 5.3E-12 | 1.48 | 8.3E-10 | |
| PRR11 | proline rich 11 | 2.09 | 1.3E-10 | 1.18 | 2.6E-06 | |
| FAM83D a | family with sequence similarity 83 member D | 2.66 | 2.2E-16 | 1.45 | 2.6E-06 | |
| GTPBP4 b | GTP binding protein 4 | 1.73 | 1.6E-06 | 1.36 | 4.2E-07 | |
| Low-expressed genes | ESR1 b | estrogen receptor 1 | 0.54 | 2.6E-08 | 0.84 | 2.1E-02 |
| GATA3 b | GATA binding protein 3 | 0.57 | 4.0E-07 | 0.92 | 1.6E-01 | |
| LRIG1 | leucine-rich repeats and immunoglobulin-like domains 1 | 0.49 | 3.0E-10 | 0.65 | 1.4E-12 | |
| RABEP1 b | rabaptin, RAB GTPase binding effector protein 1 | 0.57 | 5.4E-07 | 0.75 | 2.3E-06 | |
| CIRBP b | cold inducible RNA binding protein | 0.44 | 1.1E-12 | 0.70 | 3.9E-09 | |
| EVL b | Enah/Vasp-like | 0.55 | 1.1E-07 | 0.78 | 1.0E-04 | |
| WDR19 | WD repeat domain 19 | 0.52 | 1.5E-08 | 0.77 | 4.5E-05 | |
| SCUBE2 b | signal peptide, CUB domain, EGF-like 2 | 0.55 | 1.7E-07 | 0.75 | 1.1E-04 | |
| KIF13B b | kinesin family member 13B | 0.55 | 3.8E-07 | 0.64 | 2.1E-11 | |
| TBC1D9 b | TBC1 domain family member 9 | 0.55 | 1.2E-07 | 0.82 | 9.5E-04 | |
| ANKRA2 b | ankyrin repeat family A member 2 | 0.55 | 6.2E-08 | 0.93 | 2.3E-01 | |
| DYNLRB2 | dynein, light chain, roadblock-type 2 | 0.49 | 1.3E-09 | 0.93 | 3.9E-01 | |
| NME5 b | NME/NM23 family member 5 | 0.44 | 3.8E-12 | 0.77 | 4.6E-05 | |
| CAPN8 | calpain 8 | 0.54 | 1.5E-07 | 0.67 | 2.5E-02 | |
| CASC1 b | cancer susceptibility candidate 1 | 0.44 | 1.8E-12 | 0.79 | 1.0E-04 | |
| BBOF1 | basal body orientation factor 1 | 0.46 | 2.1E-11 | 0.78 | 7.5E-05 | |
| RUNDC1 | RUN domain containing 1 | 0.55 | 1.0E-07 | 0.75 | 3.0E-04 | |
High expressed genes: high-expressed gene group associated with poor survival, Low expressed genes: low-expressed gene group associated with poor survival, agenes associated with the cell cycle process. bDifferentially expressed genes between triple-negative and non-triple-negative breast cancer, HR: hazard ratio, p-value: log-rank test