| Literature DB >> 29307142 |
Gunhee Lee1,2, Minho Lee3.
Abstract
Transcriptome analysis has been widely used to make biomarker panels to diagnose cancers. In breast cancer, the age of the patient has been known to be associated with clinical features. As clinical transcriptome data have accumulated significantly, we classified all human genes based on age-specific differential expression between normal and breast cancer cells using public data. We retrieved the values for gene expression levels in breast cancer and matched normal cells from The Cancer Genome Atlas. We divided genes into two classes by paired t test without considering age in the first classification. We carried out a secondary classification of genes for each class into eight groups, based on the patterns of the p-values, which were calculated for each of the three age groups we defined. Through this two-step classification, gene expression was eventually grouped into 16 classes. We showed that this classification method could be applied to establish a more accurate prediction model to diagnose breast cancer by comparing the performance of prediction models with different combinations of genes. We expect that our scheme of classification could be used for other types of cancer data.Entities:
Keywords: biomarkers; breast cancer; differentially expressed genes; gene classification
Year: 2017 PMID: 29307142 PMCID: PMC5769863 DOI: 10.5808/GI.2017.15.4.156
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1Distribution of age of breast cancer patients analyzed in this work.
Fig. 2Schematic view of overall procedures of the two-step classification.
Definition of classification of genes based on age-specific significance and the corresponding number of genes
| Secondary class | Young | Intermediate | Older | Primary class | |
|---|---|---|---|---|---|
| A | B | ||||
| 1 | Significant | Significant | Significant | 377 | 0 |
| 2 | Nonsignificant | Nonsignificant | Nonsignificant | 4,495 | 13,548 |
| 3 | Nonsignificant | Nonsignificant | Significant | 781 | 69 |
| 4 | Nonsignificant | Significant | Nonsignificant | 56 | 29 |
| 5 | Significant | Nonsignificant | Nonsignificant | 44 | 18 |
| 6 | Significant | Significant | Nonsignificant | 28 | 9 |
| 7 | Significant | Nonsignificant | Nonsignificant | 71 | 8 |
| 8 | Nonsignificant | Significant | Significant | 110 | 3 |
Significances were defined based on the p-value of paired t test.
Fig. 3Patterns of significance of age-specific differential expression of class A (A) and class B (B).
Comparison of performance of SVM models of different combinations of genes to distinguish breast cancer and normal cells
| Input genes (N) | Sampling pool | Type I (one gene per class) | Type II (n random genes from the pool) | p-value |
|---|---|---|---|---|
| 3 | Classes A3–A5 | 0.9351 | 0.9215 | 4.732e-16 |
| 3 | Classes A6–A8 | 0.9660 | 0.9662 | 0.8391 |
| 6 | Classes A3–A8 | 0.9842 | 0.9743 | 2.2e-16 |
| 3 | Classes B3–B5 | 0.8701 | 0.8522 | 1.441e-15 |
| 3 | Classes B6–B8 | 0.9269 | 0.9392 | 0.4868 |
| 6 | Classes B3–B8 | 0.9386 | 0.9227 | 6.984e-09 |
N input genes were sampled from each sampling pool by adopting two types of combinations (types I and II).
SVM, support vector machine.