| Literature DB >> 32241278 |
Nahim Adnan1, Zhijie Liu2, Tim H M Huang2, Jianhua Ruan3,4.
Abstract
BACKGROUND: Discovering a highly accurate and robust gene signature for the prediction of breast cancer metastasis from gene expression profiling of primary tumors is one of the most challenging tasks to reduce the number of deaths in women. Due to the limited success of gene-based features in achieving satisfactory prediction accuracy, many methodologies have been proposed in recent years to develop network-based features by integrating network information with gene expression. However, evaluation results are inconsistent to confirm the effectiveness of network-based features, because of many confounding factors involved in classification model learning process, such as data normalization, dimension reduction, and feature selection. An unbiased comparative evaluation is essential for uncovering the strength of network-based features.Entities:
Keywords: Breast cancer metastasis; Gene expression analysis; Metastasis prediction; Network features
Mesh:
Substances:
Year: 2020 PMID: 32241278 PMCID: PMC7119280 DOI: 10.1186/s12920-020-0676-3
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Specification of the studies in ACES
| Dataset | Geo accession no. | No. of poor | No. of good | Total patient |
|---|---|---|---|---|
| Desmedt | 7390 | 56 | 127 | 183 |
| Hatzis | 25066 | 102 | 48 | 150 |
| Ivshina | 4922 | 30 | 72 | 102 |
| Loi | 6532 | 24 | 33 | 57 |
| Pawitan | 1456 | 33 | 114 | 147 |
| Miller | 3494 | 21 | 68 | 89 |
| Minn | 2603 | 21 | 44 | 65 |
| Schmidt | 11121 | 24 | 145 | 169 |
| Symmans | 17705 | 37 | 187 | 224 |
| WangY | 5327 | 10 | 42 | 52 |
| WangYE | 2034 | 88 | 169 | 257 |
| Zhang | 12093 | 9 | 112 | 121 |
| ACES | 455 | 1161 | 1616 |
Specification of the feature types
| Name | Details |
|---|---|
| Gene | Using gene expression without integrating any network information. |
| CENO | A genes’ expression is based on the average expression of its neighbors only. |
| CEMEAN | The mean of the expression of a gene and its neighbors. |
| CEMAX | The maximum of the expression of a gene and its neighbors. |
| CEMIN | The minimum of the expression of a gene and its neighbors. |
| CEMED | The median of the expression of a gene and its neighbors. |
| CEVAR | The variance of the expression of a gene and its neighbors. |
| CEEdge | Each edge is the summation of the expression of its corresponding genes. |
| PPINO | A genes’ expression is based on the average expression of its neighbors only. |
| PPIMEAN | The mean of the expression of a gene and its neighbors. |
| PPIMAX | The maximum of the expression of a gene and its neighbors. |
| PPIMIN | The minimum of the expression of a gene and its neighbors. |
| PPIMED | The median of the expression of a gene and its neighbors. |
| PPIVAR | The variance of the expression of a gene and its neighbors. |
| PPIEdge | Each edge is the summation of the expression of its corresponding genes. |
Fig. 1Number of significant features for each feature type in 13 datasets
Fig. 2Patient class distribution in 12 studies
Fig. 3Classification accuracy of each feature type in 13 datasets. The highlighted entry as “bold” and “underlined” in each dataset indicates that it has the highest average AUC score for that dataset
Fig. 4Comparison of classification accuracy of different network-based features with the gene-based feature for ACES dataset. The red line indicates the average AUC score of gene-based feature. The bars indicate the average AUC score of 10 repetitions for 5-fold cross-validation. The value on top of the bar indicates the -log10(p-value) of the two-sided paired t-test of the AUC scores of the cross-validation folds of indicated feature type with the gene-based feature
Fig. 5Classification accuracy on different sub-samples of ACES. The highlighted entry as “bold” indicates that the average AUC for that network-based feature is higher than the average AUC score of gene-based feature and the entry as “underlined” indicates that it has the highest AUC score for that sub-sample
Fig. 6Feature type stability. Boxplot of the fold change of overlapping gene signatures in pairwise setting across 12 studies. Fold change values were converted to log scale. Red diamond denotes the geometric mean of the fold change values
GO analysis of gene-based feature
| Feature | Term | Count | Benjamini corrected |
|---|---|---|---|
| Gene-based | Cell cycle | 138 | 2.20E-15 |
| DNA repair | 42 | 1.90E-01 | |
| cell-cell adhesion | 49 | 4.4.E-1 | |
| p53 signaling pathway | 20 | 1.70E-02 |
GO analysis of network-based feature types
| Feature Type | Co-expression Network | PPI network | ||||
|---|---|---|---|---|---|---|
| Term | Count | Benjamini corrected | Term | Count | Benjamini corrected | |
| NO | Growth factor | 24 | 4.10E-02 | G-protein coupled receptor signaling pathway | 49 | 9.40E-01 |
| Jak-stat signaling pathway | 28 | 1.70E-01 | Jak-stat signaling pathway | 19 | 4.30E-01 | |
| Cell junction | 49 | 5.90E-01 | Cell junction | 42 | 1.80E-01 | |
| Olfactory transduction | 13 | 6.40E-01 | Extracellular region | 111 | 6.60E-02 | |
| MEAN | RNA transport | 33 | 3.30E-01 | Transmembrane | 479 | 1.60E-24 |
| Lysosome | 40 | 5.00E-01 | Extracellular matrix | 37 | 2.30E-03 | |
| Antigen processing and presentation | 19 | 3.60E-01 | Bicellular tight junction | 14 | 8.80E-01 | |
| Endothelial cell chemotaxis | 5 | 9.90E-01 | Notch signaling pathway | 6 | 9.2E-1 | |
| Leukocyte transendothelial migration | 10 | 9.7E-01 | ||||
| MAXIMUM | Cell cycle | 132 | 2.40E-12 | Cell cycle | 122 | 2.70E-10 |
| Antigen processing and presentation | 27 | 7.70E-03 | Positive regulation of canonical Wnt signaling pathway | 36 | 1.60E-05 | |
| DNA repair | 45 | 5.20E-02 | T-Cell receptor signaling pathway | 51 | 1.20E-09 | |
| Cardiac epithelial to mesenchymal transition | 5 | 9.9E-01 | Cell-cell adherens junction | 70 | 3.50E-05 | |
| Positive regulation of blood vessel endothelial cell migration | 7 | 5.7E-01 | ||||
| MINIMUM | Cell cycle | 116 | 2.00E-06 | Cell cycle | 136 | 5.90E-14 |
| p53 signaling pathway | 5 | 9.00E-01 | p53 signaling pathway | 19 | 4.90E-02 | |
| Rho cell motility signaling pathway | 7 | 9.60E-01 | DNA repair | 41 | 2.80E-01 | |
| MEDIAN | Cell cycle | 124 | 2.80E-09 | Sensory transduction | 26 | 9.90E-01 |
| DNA repair | 47 | 1.70E-02 | Telomere | 4 | 9.80E-01 | |
| CSonic Hedgehog (SHH) Receptor Ptc1 Regulates cell cycle | 3 | 9.90E-01 | Ribosome | 17 | 9.80E-01 | |
| VARIANCE | Cell cycle | 138 | 6.30E-18 | Intracellular steroid hormone receptor signaling pathway | 7 | 9.40E-01 |
| DNA repair | 47 | 8.80E-04 | Extracellular space | 177 | 2.10E-01 | |
| Positive regulation of telomere maintainance | 12 | 8.00E-02 | Immune response | 64 | 9.50E-01 | |
| EDGE | Cell cycle | 147 | 1.80E-22 | Cell cycle | 167 | 4.80E-30 |
| Positive regulation of telomere maintainance | 15 | 1.30E-03 | Cell-cell adherens junction | 84 | 5.00E-10 | |
| DNA repair | 44 | 7.20E-03 | Positive regulation of epithelial to mesenchymal transition | 7 | 9.5E-01 | |
| Cell-cell adhesion | 48 | 1.30E-01 | Regulation of cell motility | 6 | 9.8E-01 | |