| Literature DB >> 32606385 |
Xiangchun Yu1,2,3, Sha Cao4, Yi Zhou2, Zhezhou Yu5, Ying Xu6,7.
Abstract
A novel method is developed for predicting the stage of a cancer tissue based on the consistency level between the co-expression patterns in the given sample and samples in a specific stage. The basis for the prediction method is that cancer samples of the same stage share common functionalities as reflected by the co-expression patterns, which are distinct from samples in the other stages. Test results reveal that our prediction results are as good or potentially better than manually annotated stages by cancer pathologists. This new co-expression-based capability enables us to study how functionalities of cancer samples change as they evolve from early to the advanced stage. New and exciting results are discovered through such functional analyses, which offer new insights about what functions tend to be lost at what stage compared to the control tissues and similarly what new functions emerge as a cancer advances. To the best of our knowledge, this new capability represents the first computational method for accurately staging a cancer sample. The R source code used in this study is available at GitHub (https://github.com/yxchspring/CECS).Entities:
Mesh:
Year: 2020 PMID: 32606385 PMCID: PMC7327081 DOI: 10.1038/s41598-020-67476-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The numbers of DEGs and CEGs in each of the eight cancer types.
| Stage | #DEGs and #CEGs | BRCA | COAD | HNSC | KIRC | KIRP | LUAD | STAD | THCA | |
|---|---|---|---|---|---|---|---|---|---|---|
| Stage 1 vs. control | #DEGs | Up | 1,255 | 1,718 | 347 | 1,110 | 640 | 2,004 | 673 | 893 |
| Down | 512 | 729 | 975 | 594 | 735 | 129 | 670 | 228 | ||
| #CEGs | Up | 61,638 | 385,089 | 1,428 | 56,850 | 1,113 | 130,324 | 3,717 | 3,651 | |
| Down | 1,690 | 11,763 | 26,804 | 920 | 2,564 | 121 | 15,338 | 600 | ||
| Stage 2 vs. control | #DEGs | Up | 1,564 | 1,581 | 547 | 1,399 | 700 | 1,268 | 662 | 662 |
| Down | 527 | 681 | 943 | 560 | 784 | 190 | 713 | 488 | ||
| #CEGs | Up | 14,410 | 143,619 | 634 | 62,299 | 14,452 | 3,102 | 1,892 | 7,835 | |
| Down | 981 | 6,472 | 25,765 | 9,450 | 13,173 | 258 | 9,888 | 9,125 | ||
| Stage 3 vs. control | #DEGs | Up | 1,040 | 1,607 | 597 | 1,159 | 955 | 1,269 | 597 | 903 |
| Down | 553 | 588 | 949 | 739 | 640 | 217 | 744 | 289 | ||
| #CEGs | Up | 3,925 | 104,838 | 1,308 | 8,426 | 5,864 | 2,847 | 572 | 5,109 | |
| Down | 1,912 | 6,200 | 16,024 | 2,306 | 784 | 1,007 | 7,295 | 2,074 | ||
| Stage 4 vs. control | #DEGs | Up | 818 | 1,035 | 798 | 1,325 | 657 | 1,097 | 504 | 932 |
| Down | 721 | 685 | 872 | 727 | 727 | 156 | 939 | 535 | ||
| #CEGs | Up | 7,386 | 4,597 | 542 | 18,161 | 9,068 | 14,427 | 952 | 3,449 | |
| Down | 15,550 | 8,372 | 15,892 | 5,343 | 24,119 | 923 | 13,429 | 3,142 | ||
Figure 1An illustration of our algorithm. (A) Identification of DEGs between cancer versus control tissues at each stage. (B) Construction of co-expression networks for samples in each of the four stages with the DEGs obtained from step A. (C) Construction of perturbed networks over samples in each stage plus a new sample denoted by T. (D) Representation of each perturbed network as a feature vector needed for training, giving rise to four feature vectors concatenated into a long vector, which will be fed into a trainer as the feature vector for sample T.
Prediction performance of cancer stages using C5.0.
| Stage | Measure | BRCA | COAD | HNSC | KIRC | KIRP | LUAD | STAD | THCA |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Sensitivity | 0.7852 | 0.9227 | 0.6714 | 0.9519 | 0.9353 | 0.8795 | 0.24 | 0.9753 |
| Specificity | 0.983 | 0.9927 | 0.8783 | 0.9737 | 0.9 | 0.9227 | 0.9678 | 0.9172 | |
| 2 | Sensitivity | 0.9409 | 0.9755 | 0.51 | 0.8938 | 0.6 | 0.74 | 0.8 | 0.9333 |
| Specificity | 0.9737 | 0.9641 | 0.9393 | 0.977 | 0.9928 | 0.9377 | 0.9257 | 0.9836 | |
| 3 | Sensitivity | 0.7932 | 0.9237 | 0.4696 | 0.9639 | 0.7714 | 0.6667 | 0.7955 | 0.8212 |
| Specificity | 0.9722 | 0.9817 | 0.926 | 0.9933 | 0.9508 | 0.9688 | 0.7983 | 0.9422 | |
| 4 | Sensitivity | 0.72 | 0.9667 | 0.8675 | 0.9292 | 0.8 | 0.4714 | 0.7273 | 0.4188 |
| Specificity | 0.922 | 0.9894 | 0.886 | 0.9809 | 0.9465 | 0.8965 | 0.889 | 0.9692 | |
| All | Accuracy | 0.8768 | 0.9504 | 0.7283 | 0.9452 | 0.8707 | 0.7933 | 0.7078 | 0.8772 |
| Kappa | 0.7957 | 0.9293 | 0.546 | 0.9175 | 0.7436 | 0.6807 | 0.5696 | 0.7928 |
The confusion matrix for predicted vs. annotated stage of HNSC.
| Predicted/annotated | Stage 1 | Stage 2 | Stage 3 | Stage 4 |
|---|---|---|---|---|
| Stage 1 | 4 | 5 | 2 | 1 |
| Stage 2 | 2 | 14 | 5 | 5 |
| Stage 3 | 1 | 0 | 16 | 9 |
| Stage 4 | 0 | 1 | 0 | 62 |
The confusion matrix for predicted vs. annotated stage of STAD.
| Predicted/actual | Stage 1 | Stage 2 | Stage 3 | Stage 4 |
|---|---|---|---|---|
| Stage 1 | 8 | 1 | 2 | 0 |
| Stage 2 | 0 | 19 | 3 | 0 |
| Stage 3 | 4 | 11 | 37 | 2 |
| Stage 4 | 3 | 1 | 2 | 9 |
Figure 2The distributions of the number of DEGs across samples at different “stages” by the manually annotated stages and our predicted stages. (a) The distribution of the numbers of DEGs (y-axis) in each annotated stage of HNSC. (b) The distribution of the numbers of DEGs in each predicted stage of HNSC.
Figure 3The distributions of the number of DEGs across samples at different “stages” by the manually annotated stages and our predicted stages. (a) The distribution of the numbers of DEGs in each annotated stage of STAD. (b) The distribution of the numbers of DEGs in each predicted stage of STAD.
The numbers of pathways enriched by co-expressed genes in controls and at each stage.
| Stage | #CEPs | BRCA | COAD | HNSC | KIRC | KIRP | LUAD | STAD | THCA |
|---|---|---|---|---|---|---|---|---|---|
| Stage 1 vs. control | Control | 223 | 178 | 126 | 195 | 138 | 48 | 84 | 44 |
| Up | 120 | 13 | 74 | 22 | 4 | 46 | 53 | 82 | |
| Down | 71 | 102 | 92 | 34 | 66 | 6 | 66 | 2 | |
| Stage 2 vs. control | Control | 323 | 150 | 136 | 181 | 109 | 56 | 102 | 137 |
| Up | 166 | 25 | 39 | 135 | 66 | 54 | 18 | 16 | |
| Down | 83 | 115 | 104 | 2 | 117 | 5 | 65 | 164 | |
| Stage 3 vs. control | Control | 251 | 102 | 111 | 201 | 99 | 53 | 102 | 57 |
| Up | 143 | 25 | 27 | 265 | 81 | 41 | 35 | 85 | |
| Down | 73 | 81 | 114 | 17 | 17 | 9 | 74 | 20 | |
| Stage 4 vs. control | Control | 326 | 131 | 111 | 226 | 125 | 100 | 205 | 119 |
| Up | 135 | 38 | 27 | 313 | 99 | 110 | 34 | 56 | |
| Down | 109 | 70 | 108 | 33 | 66 | 5 | 100 | 58 |
#CEPs is for the number of co-expressed gene pairs; Up is for the number of CEPs by up-regulated genes; and Down is similarly for down-regulated genes.
The number of enriched pathways in normal controls.
| #CEPs | BRCA | COAD | HNSC | KIRC | KIRP | LUAD | STAD | THCA | |
|---|---|---|---|---|---|---|---|---|---|
| Total | 442 | 274 | 168 | 355 | 261 | 133 | 264 | 257 | |
| (I) | 69 | 29 | 78 | 0 | 3 | 5 | 34 | 9 | |
| (II) | Stage 1 | 91 | 78 | 20 | 103 | 68 | 21 | 21 | 16 |
| Stage 2 | 64 | 31 | 7 | 7 | 17 | 3 | 20 | 20 | |
| Stage 3 | 16 | 8 | 1 | 9 | 9 | 0 | 19 | 14 | |
| Stage 4 | 56 | 11 | 0 | 11 | 9 | 7 | 85 | 40 | |
| Total | 227 | 128 | 28 | 130 | 103 | 31 | 145 | 90 | |
| (III) | 1–2 | 106 | 52 | 89 | 29 | 118 | 48 | 50 | 99 |
| 3,4 | 166 | 66 | 40 | 163 | 111 | 36 | 69 | 74 | |
Total on the second row is the number of pathways enriched by CEPs in control samples for each cancer type while Total under (II) is for the number of unique pathways enriched by CEPs across all cancer samples of each type.
The number of tissue samples for eight cancer types.
| Cancer type | Control | Stage 1 | Sage 2 | Stage 3 | Stage 4 |
|---|---|---|---|---|---|
| BRCA | 113 | 182 | 624 | 249 | 20 |
| COAD | 41 | 75 | 179 | 131 | 64 |
| HNSC | 44 | 25 | 70 | 78 | 261 |
| KIRC | 72 | 266 | 57 | 123 | 82 |
| KIRP | 32 | 172 | 22 | 51 | 15 |
| LUAD | 59 | 278 | 121 | 84 | 26 |
| STAD | 32 | 53 | 111 | 150 | 38 |
| THCA | 58 | 286 | 52 | 113 | 57 |