| Literature DB >> 30072645 |
Wen-Hui Wang1,2,3, Ting-Yan Xie4,5, Guang-Lei Xie6,7, Zhong-Lu Ren8,9, Jin-Ming Li10,11.
Abstract
Identifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view, a progressive approach is needed to identify molecular subtypes in human colon cancer using gene expression data. We propose an approach to identify the molecular subtypes of colon cancer that integrates denoising by the Bayesian robust principal component analysis (BRPCA) algorithm, hierarchical clustering by the directed bubble hierarchical tree (DBHT) algorithm, and feature gene selection by an improved differential evolution based feature selection method (DEFSW) algorithm. In this approach, the normal samples being completely and exclusively clustered into one class is considered to be the standard of reasonable clustering subtypes, and the feature selection pays attention to imbalances of samples among subtypes. With this approach, we identified the molecular subtypes of colon cancer on the mRNA gene expression dataset of 153 colon cancer samples and 19 normal control samples of the Cancer Genome Atlas (TCGA) project. The colon cancer was clustered into 7 subtypes with 44 feature genes. Our approach could identify finer subtypes of colon cancer with fewer feature genes than the other two recent studies and exhibits a generic methodology that might be applied to identify the subtypes of other cancers.Entities:
Keywords: Bayesian robust principal component; colon cancer; feature selection; hierarchical clustering; subtypes of cancer
Year: 2018 PMID: 30072645 PMCID: PMC6115727 DOI: 10.3390/genes9080397
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Summary of our integrative approach. BRPCA: Bayesian robust principal component analysis; DBHT: directed bubble hierarchical tree; DEFSw: differential evolution based feature selection.
Figure 2The DEFSW algorithm and its parameter setting in our study.
Figure 3Sample cluster structure from directed bubble hierarchical tree (DBHT) analysis of the sparse component in the BRPCA (Bayesian robust principal component analysis) model for the 153 colon cancer samples and 19 normal samples downloaded from the Cancer Genome Atlas (TCGA). The labels inside the symbols correspond to the different subtypes identified by Muzny et al. [2]. MSI: microsatellite instability; CIMP: CpG island methylator phenotype; CIN: chromosomal instability.
Confusion matrix of subtypes identified by our approach and subtypes identified by Muzny et al. [2]. MSI: microsatellite instability; CIMP: CpG island methylator phenotype; CIN: chromosomal instability.
| Subtype | S1 | S2 | S3 | S4 | S5 | S6 | S7 |
|---|---|---|---|---|---|---|---|
| MSI/CIMP | 1 | 3 | 2 | 2 | 19 | 9 | 22 |
| CIN | 24 | 13 | 14 | 2 | 0 | 0 | 2 |
| Invasive | 15 | 1 | 8 | 11 | 1 | 0 | 1 |
| Unknown | 0 | 1 | 2 | 0 | 0 | 0 | 0 |
Confusion matrix of subtypes identified by our approach and subtypes identified by Ren et al. [4].
| Subtype | S1 | S2 | S3 | S4 | S5 | S6 | S7 |
|---|---|---|---|---|---|---|---|
|
| 40 | 17 | 26 | 14 | 3 | 0 | 10 |
|
| 0 | 1 | 0 | 1 | 17 | 9 | 15 |
Figure 4Sample cluster structure directly clustered using directed bubble hierarchical tree (DBHT) for the same data in Figure 3. Labels and symbols are also the same as Figure 3.
Figure 5Clustering by the consensus clustering algorithm when K = 2 to 10. (a) Cluster consensus values and consensus cumulative distribution function (CDF) on the mRNA genes expression dataset; (b) cluster consensus values and consensus CDF on component of the mRNA gene expression dataset.
Overall mean accuracy, overall mean weight accuracy, and mean accuracy for each class by 1000 times of cross-validation using the naive Bayes (NB) algorithm on the feature gene sets.
| Cross Validation (%) | Accuracy (%) | Weight Accuracy (%) | Class 1 (%) | Class 2 (%) | Class 3 (%) | Class 4 (%) | Class 5 (%) | Class 6 (%) | Class 7 (%) | Class 8 (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 82.71 | 84.98 | 75.70 | 82.50 | 91.17 | 88.50 | 81.83 | 90.50 | 70.67 | 99.00 |
| 20 | 81.76 | 82.58 | 76.72 | 74.75 | 78.30 | 72.25 | 90.35 | 83.00 | 86.17 | 99.08 |
| 30 | 81.12 | 81.90 | 75.90 | 78.23 | 76.50 | 71.40 | 87.86 | 82.83 | 82.50 | 100.00 |