| Literature DB >> 23799085 |
Meng-Yun Wu1, Dao-Qing Dai, Xiao-Fei Zhang, Yuan Zhu.
Abstract
In cancer biology, it is very important to understand the phenotypic changes of the patients and discover new cancer subtypes. Recently, microarray-based technologies have shed light on this problem based on gene expression profiles which may contain outliers due to either chemical or electrical reasons. These undiscovered subtypes may be heterogeneous with respect to underlying networks or pathways, and are related with only a few of interdependent biomarkers. This motivates a need for the robust gene expression-based methods capable of discovering such subtypes, elucidating the corresponding network structures and identifying cancer related biomarkers. This study proposes a penalized model-based Student's t clustering with unconstrained covariance (PMT-UC) to discover cancer subtypes with cluster-specific networks, taking gene dependencies into account and having robustness against outliers. Meanwhile, biomarker identification and network reconstruction are achieved by imposing an adaptive [Formula: see text] penalty on the means and the inverse scale matrices. The model is fitted via the expectation maximization algorithm utilizing the graphical lasso. Here, a network-based gene selection criterion that identifies biomarkers not as individual genes but as subnetworks is applied. This allows us to implicate low discriminative biomarkers which play a central role in the subnetwork by interconnecting many differentially expressed genes, or have cluster-specific underlying network structures. Experiment results on simulated datasets and one available cancer dataset attest to the effectiveness, robustness of PMT-UC in cancer subtype discovering. Moveover, PMT-UC has the ability to select cancer related biomarkers which have been verified in biochemical or biomedical research and learn the biological significant correlation among genes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23799085 PMCID: PMC3684607 DOI: 10.1371/journal.pone.0066256
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Summary of PMT-UC for discovering cancer subtypes, underlying network structures, and biomarkers.
The effect of the parameter on the performance of PMT-UC.
|
| RI | aRI | SHD1 | SHD2 | FN | TN |
| 10–10 | 0.918 (0.045) | 0.836 (0.090) | 5.000 (0.798) | 4.565 (1.727) | 2.870 (1.792) | 89.609 (0.583) |
| 0.001 | 0.923 (0.048) | 0.846 (0.095) | 4.913 (0.949) | 4.826 (1.072) | 2.174 (1.557) | 89.565 (0.590) |
| 0.01 | 0.914 (0.049) | 0.828 (0.098) | 5.435 (1.472) | 5.043 (1.107) | 2.609 (2.210) | 89.174 (1.029) |
| 0.1 | 0.937 (0.034) | 0.873 (0.068) | 2.652 (1.229) | 2.522 (1.344) | 0.870 (0.968) | 90.000 (0.000) |
| 1 | 0.689 (0.192) | 0.380 (0.383) | 5.000 (0.000) | 5.261 (0.864) | 6.913 (2.575) | 88.478 (1.702) |
The effect of the parameter on the performance of PMT-UC is discussed in terms of the five measures RI, aRI, SHD, FN and TN, where SHD1 and SHD2 are the results for the first and second clusters respectively, FN is the number of informative variables incorrectly selected to be noninformative and TN is the number of noninformative variables correctly selected. In the true case, , .
c(d): c and d are the average and standard deviation of corresponding results in 50 simulations, respectively.
The convergence of PMT-UC with respect to different initializations.
| dataset | 1 | 2 | RI | aRI | SHD1 | SHD2 | FN | TN |
| 1 | 4.0 (0.0) | 3.6 (0.5) | 0.91 (0.04) | 0.83 (0.07) | 3.00 (1.58) | 2.00 (1.73) | 0.40 (0.89) | 90.00 (0.00) |
| 2 | 4.0 (0.0) | 4.0 (0.0) | 0.97 (0.01) | 0.94 (0.02) | 3.20 (1.48) | 3.00 (1.58) | 0.80 (0.45) | 90.00 (0.00) |
| 3 | 4.0 (0.0) | 3.8 (0.4) | 0.95 (0.01) | 0.89 (0.02) | 1.80 (0.45) | 3.60 (0.55) | 1.20 (0.45) | 90.00 (0.00) |
| 4 | 4.0 (0.0) | 4.0 (0.0) | 0.95 (0.02) | 0.89 (0.04) | 6.40 (1.34) | 4.00 (0.71) | 1.60 (0.89) | 89.80 (0.45) |
| 5 | 4.0 (0.0) | 4.0 (0.0) | 0.95 (0.00) | 0.90 (0.00) | 1.40 (0.55) | 3.60 (0.55) | 1.00 (0.00) | 90.00 (0.00) |
The convergence of PMT-UC is explored by considering the selected parameters and , and the experiment results RI, aRI, SHD, FN and TN, with respect to different initializations using K-means. SHD1 and SHD2 are the results for the first and second clusters respectively, FN is the number of informative variables incorrectly selected to be noninformative and TN is the number of noninformative variables correctly selected. In the true case, , .
c(d): c and d are the average and standard deviation of corresponding results in 10 experiments with a fixed dataset, respectively.
Comparison of performance of PMT-UC, PMG-UC and PMT-DC applied on binary-clusters simulated datasets.
| ν | Set-up |
| PMT-UC | PMG-UC | PMT-DC | ||||||||||||
| N | RI | aRI | FN | TN | N | RI | aRI | FN | TN | N | RI | aRI | FN | TN | |||
| 20 | 1 | 2 |
|
|
|
|
|
|
|
|
| 84.36 | 45 |
|
|
|
|
| 3 | 0 | – | – | – | – | 0 | – | – | – | – | 5 |
| 0.956 |
| 86.20 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 0 | – | – | – | – | ||
| A | 50 |
|
|
|
| 50 |
|
|
| 84.36 | 50 |
|
|
|
| ||
| 2 | 2 |
|
|
|
|
| 50 | 0.971 | 0.941 | 1.86 | 84.42 | 40 |
| 0.962 | 2.00 |
| |
| 3 | 0 | – | – | – | – | 0 | – | – | – | – | 10 | 0.960 | 0.919 | 2.00 | 84.80 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 0 | – | – | – | – | ||
| A | 50 |
|
|
|
| 50 | 0.971 | 0.941 | 1.86 | 84.42 | 50 |
| 0.953 | 2.00 |
| ||
| 3 | 2 |
|
|
|
|
|
|
|
|
| 84.72 | 23 |
|
|
|
| |
| 3 | 0 | – | – | – | – | 0 | – | – | – | – | 22 | 0.929 | 0.857 |
| 85.59 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 5 | 0.861 | 0.721 |
| 76.60 | ||
| A | 50 |
|
|
|
| 50 |
|
|
| 84.72 | 50 | 0.952 | 0.904 |
| 86.42 | ||
| 4 | 2 |
|
|
|
|
|
| 0.734 | 0.470 | 7.22 |
| 39 | 0.883 | 0.767 | 5.51 |
| |
| 3 | 0 | – | – | – | – | 0 | – | – | – | – | 11 | 0.841 | 0.681 | 5.00 | 87.64 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 0 | – | – | – | – | ||
| A | 50 |
|
|
|
| 50 | 0.734 | 0.470 | 7.22 |
| 50 | 0.874 | 0.748 | 5.40 |
| ||
| 10 | 1 | 2 |
|
|
|
|
| 41 | 0.942 | 0.884 |
| 83.24 | 28 |
|
|
|
|
| 3 | 0 | – | – | – | – | 9 | 0.881 | 0.761 |
| 76.33 | 19 | 0.943 | 0.885 |
| 79.21 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 3 | 0.891 | 0.782 |
| 87.33 | ||
| A | 50 |
|
|
|
| 50 | 0.931 | 0.862 |
| 82.00 | 50 | 0.958 | 0.917 |
| 85.12 | ||
| 2 | 2 |
|
|
|
|
| 46 | 0.867 | 0.734 | 2.70 | 82.13 | 36 | 0.943 | 0.887 | 2.28 |
| |
| 3 | 0 | – | – | – | – | 4 | 0.797 | 0.593 | 1.50 | 72.50 | 13 | 0.943 | 0.887 | 1.92 | 80.31 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 1 | 0.882 | 0.764 | 4.00 | 85.00 | ||
| A | 50 |
|
|
|
| 50 | 0.861 | 0.723 | 2.60 | 81.36 | 50 | 0.942 | 0.885 | 2.22 | 86.74 | ||
| 3 | 2 |
|
|
|
|
| 33 | 0.922 | 0.845 |
| 82.52 | 16 | 0.873 | 0.747 | 1.63 |
| |
| 3 | 0 | – | – | – | – | 17 | 0.853 | 0.706 |
| 55.53 | 28 | 0.942 | 0.884 |
| 82.96 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 6 | 0.758 | 0.516 | 1.67 | 80.50 | ||
| A | 50 |
|
|
|
| 50 | 0.899 | 0.798 |
| 73.34 | 50 | 0.898 | 0.796 |
| 84.34 | ||
| 4 | 2 |
|
|
|
|
|
| 0.499 | 0.000 | 8.80 | 76.70 | 42 | 0.681 | 0.368 | 7.02 | 86.64 | |
| 3 | 0 | – | – | – | – | 0 | – | – | – | – | 5 | 0.796 | 0.593 | 4.80 | 85.40 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 3 | 0.644 | 0.284 | 8.33 | 84.33 | ||
| A | 50 |
|
|
|
| 50 | 0.499 | 0.000 | 8.80 | 76.70 | 50 | 0.691 | 0.385 | 6.88 | 86.38 | ||
| 6 | 1 | 2 | 50 |
|
|
|
| 40 | 0.619 | 0.240 | 6.88 | 74.75 | 32 |
|
|
|
|
| 3 | 0 | – | – | – | – | 10 | 0.868 | 0.735 | 0.00 | 81.50 | 13 | 0.872 | 0.743 | 0.92 | 84.77 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 5 | 0.503 | 0.008 | 4.00 | 90.00 | ||
| A | 50 |
|
|
|
| 50 | 0.669 | 0.339 | 5.50 | 76.10 | 50 | 0.889 | 0.779 |
|
| ||
| 2 | 2 |
|
|
|
|
| 45 | 0.550 | 0.101 | 8.82 | 78.20 | 46 | 0.672 | 0.351 | 4.59 | 83.00 | |
| 3 | 0 | – | – | – | – | 5 | 0.863 | 0.726 | 2.00 | 75.00 | 0 | – | – | – | – | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 4 | 0.501 | 0.001 | 10.00 | 88.50 | ||
| A | 50 |
|
|
|
| 50 | 0.581 | 0.163 | 8.14 | 77.88 | 50 | 0.659 | 0.323 | 5.02 | 83.44 | ||
| 3 | 2 |
|
|
|
|
| 45 | 0.502 | 0.006 | 9.89 |
| 27 | 0.542 | 0.094 | 7.85 | 80.93 | |
| 3 | 0 | – | – | – | – | 5 | 0.500 | 0.004 | 9.00 | 82.00 | 15 | 0.749 | 0.500 | 2.00 | 73.07 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 8 | 0.617 | 0.232 | 7.50 | 78.75 | ||
| A | 50 |
|
|
|
| 50 | 0.502 | 0.006 | 9.80 | 87.70 | 50 | 0.616 | 0.238 | 6.04 | 78.22 | ||
| 4 | 2 |
|
|
|
|
|
| 0.498 | 0.002 | 9.90 | 87.10 | 39 | 0.495 | 0.000 | 8.10 | 76.44 | |
| 3 | 0 | – | – | – | – | 0 | – | – | – | – | 7 | 0.507 | 0.019 | 5.86 | 67.71 | ||
| 4/5 | 0 | – | – | – | – | 0 | – | – | – | – | 4 | 0.619 | 0.234 | 1.00 | 67.00 | ||
| A | 50 |
|
|
|
| 50 | 0.498 | 0.002 | 9.90 | 87.10 | 50 | 0.506 | 0.021 | 7.22 | 74.46 | ||
The clustering and gene selection results for the four set-ups with , in terms of the average frequencies (N) of the selected numbers of clusters (K), and the average of RI, aRI, FN, and TN in 50 simulations, where FN is the number of informative variables incorrectly selected to be noninformative and TN is the number of noninformative variables correctly selected. In the true case, , , . The table indicates in bold all results that perform best or that are not significantly different from each other.
: denotes that can be 4 or 5.
: denotes that can be any element of the set which contains the predefined numbers of clusters.
Figure 2Boxplots of structural hamming distance (SHD) between correct and inferred networks.
On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. Results shown for PMT-UC, PMG-UC and PMT-DC in the four set-ups of three cases . SHD1 and SHD2 are the results for the first and second clusters, respectively.
Figure 3Network reconstruction for simulated datasets with .
TRUE:1 and TRUE:2 are the parts of the original and corresponding to the first informative genes for the first and second clusters, respectively. PMT-UC:1 and PMT-UC:2 are the estimation of those parts of the inverse scale matrices using PMT-UC. PMG-UC:1 and PMG-UC:2 are the estimation of those parts of the inverse covariance matrices using PMG-UC.
Comparison of performance of PMT-UC, PMG-UC and PMT-DC applied on simulated datasets with multiple thin-tailed clusters.
| Method |
| N | FN | TN | RI | aRI | RI1 | aRI1 | RI2 | aRI2 |
| PMT-UC | 2 | – | – | – | – | – | – | – | – | – |
| 3 | 50 | 0.00 | 90.00 | 0.639 | 0.347 | 1.000 | 1.000 | 0.494 | 0.000 | |
| 4 | – | – | – | – | – | – | – | – | – | |
| 5 | – | – | – | – | – | – | – | – | – | |
| 6 | – | – | – | – | – | – | – | – | – | |
| A | 50 | 0.00 | 90.00 | 0.639 | 0.347 | 1.000 | 1.000 | 0.494 | 0.000 | |
| PMG-UC | 2 | – | – | – | – | – | – | – | – | – |
| 3 | – | – | – | – | – | – | – | – | – | |
| 4 | 8 | 0.00 | 89.00 | 0.794 | 0.445 | 1.000 | 1.000 | 0.495 | 0.000 | |
| 5 | 42 | 0.00 | 89.62 | 0.793 | 0.489 | 0.993 | 0.984 | 0.497 | 0.000 | |
| 6 | – | – | – | – | – | – | – | – | – | |
| A | 50 | 0.00 | 89.52 | 0.793 | 0.482 | 0.995 | 0.987 | 0.497 | 0.000 | |
| PMT-DC | 2 | – | – | – | – | – | – | – | – | – |
| 3 | – | – | – | – | – | – | – | – | – | |
| 4 | 39 | 0.00 | 88.77 | 0.794 | 0.564 | 1.000 | 1.000 | 0.496 | 0.000 | |
| 5 | 11 | 0.00 | 88.00 | 0.797 | 0.506 | 1.000 | 1.000 | 0.502 | 0.000 | |
| 6 | – | – | – | – | – | – | – | – | – | |
| A | 50 | 0.00 | 88.60 | 0.794 | 0.551 | 1.000 | 1.000 | 0.495 | 0.000 |
The comparison of performance of PMT-UC, PMG-UC and PMT-DC applied on simulated datasets with multiple thin-tailed clusters, in terms of the average frequencies (N) of the selected numbers of clusters (K), and the average of RI, aRI, FN, and TN in 50 simulations. RI1 and RI2 are the RI with respect to the first two clusters and the last three clusters, respectively. aRI1 and aRI2 are the aRI with respect to the first two clusters and the last three clusters, respectively. In the true case, , , .
: denotes that can be any element of the set which contains the predefined numbers of clusters.
Optimal clustering results for the leukemia dataset.
| PMT-UC | PMG-UC | PMT-DC | |||||||
| Clusters (#Samples) | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |
| ALL-B(38) | 37 | 1 | 0 | 37 | 2 | 0 | 24 | 14 | 0 |
| ALL-T(9) | 0 | 8 | 1 | 0 | 8 | 1 | 8 | 0 | 1 |
| AML(25) | 1 | 0 | 24 | 2 | 0 | 24 | 0 | 0 | 25 |
Figure 4The subnetworks for ALL-B and AML of leukemia dataset estimated by PMT-UC.
Nodes represent human genes, and they are connected by a link if their partial correlation derived from is larger than . Each gene is labeled by its Gene Symbol (see Text S5 for the detailed information of the genes in each subnetwork). The shape of each node indicates whether the gene has cluster-specific means (circle) or not (diamond).
The Gene Ontology results of the subnetwork for ALL-B of leukemia dataset.
| Subnetwork | Elements | GO Number | Ontology Description | P-value |
| ALL-B-1 | HLA-F HLA-DRB1 HLA-DRB5 | GO:0071556 | integral to lumenal side of endoplasmic | 1.1×10–20 |
| CD74 HLA-DPB1 HLA-DPA1 | reticulum membrane | |||
| HLA-DQA1 HLA-DRB1 | GO:0012507 | ER to Golgi transport vesicle membrane | 1.1×10–19 | |
| ALL-B-1 | CD74 HLA-DMA HLA-DRB1 | GO:0019886 | antigen processing and presentation of | 2.8×10–19 |
| HLA-DRB5 HLA-DPB1 HLA-DPA1 | exogenous peptide antigen via MHC class II | |||
| HLA-DQA1 HLA-DRB1 | GO:0005765 | lysosomal membrane | 6.4×10–18 | |
| ALL-B-2 | IGKC IGLC3 IGHG3 IGHA1 | GO:0003823 | antigen binding | 6.2×10–10 |
| ALL-B-2 | IGKC IGLC3 IGHG3 | GO:0006958 | complement activation, classical pathway | 1.6×10–6 |
| ALL-B-3 | The entire subnetwork | GO:0002474 | antigen processing and presentation | 1.3×10–10 |
| of peptide antigen via MHC class I | ||||
| ALL-B-4 | HBB SLC4A1 | GO:0015701 | bicarbonate transport | 6.8×10–6 |
| ALL-B-4 | ALAS2 BLVRB | GO:0042168 | heme metabolic process | 1.5×10–5 |
| ALL-B-4 | SLC4A1 NEFL | GO:0008022 | protein C-terminus binding | 8.0×10–5 |
| ALL-B-5 | EEF1B2 RPL35A | GO:0044444 | cytoplasmic part | 1.3×10–4 |
| ALL-B-5 | COX7C COX4I1 | GO:0004129 | cytochrome-c oxidase activity | 1.9×10–5 |
| ALL-B-6 | The entire subnetwork | GO:0050832 | defense response to fungus | 1.5×10–8 |
| GO:0042742 | defense response to bacterium | 1.1×10–5 | ||
| ALL-B-6 | S100A9 S100A8 | GO:0070488 | neutrophil leukocyte aggregation | 1.7×10–7 |
| GO:0002523 | leukocyte migration involved in inflammatory response | 3.7×10–6 | ||
| ALL-B-7 | DUSP1 FOS | GO:0051592 | response to calcium ion | 1.6×10–3 |
| GO:0051591 | response to cAMP | 1.6×10–3 | ||
| ALL-B-8 | The entire subnetwork | GO:0015671 | oxygen transport | 2.3×10–6 |
| GO:0031720 | haptoglobin binding | 2.1×10–8 | ||
| GO:0004601 | peroxidase activity | 4.3×10–5 |
The first column (Subnetwork) reports the name of the subnetwork introduced in Figure 4. The second column (Elements) presents the elements of subnetwork of which the functional and biological relationship are analyzed based on the GO annotation.
The Gene Ontology results of the subnetwork for AML of leukemia dataset.
| Subnetwork | Elements | GO Number | Ontology Description | P-value |
| AML-1 | IGKC IGLC3 IGHG3 | GO:0006958 | complement activation, classical pathway | 1.6×10–6 |
| AML-2 | FOSB JUNB | GO:0071277 | cellular response to calcium ion | 4.3×10–5 |
| AML-3 | IFI30 FCER1G | GO:0019886 | antigen processing and presentation of | 2.0×10–4 |
| exogenous peptide antigen via MHC class II | ||||
| GO:0042590 | antigen processing and presentation of | 3.8×10–4 | ||
| exogenous peptide antigen via MHC class I | ||||
| AML-3 | FCER1G CTSB | GO:0009897 | external side of plasma membrane | 4.0×10–4 |
| AML-3 | IFI30 CTSB | GO:0043202 | lysosomal lumen | 6.2×10–5 |
| AML-4 | The entire subnetwork | GO:0008009 | chemokine activity | 1.4×10–5 |
| AML-5 | SLC4A1 HBB | GO:0015701 | bicarbonate transport | 6.1×10–6 |
| AML-6 | CCL3 CCL4 | GO:0031730 | CCR5 chemokine receptor binding | 1.6×10–7 |
| GO:0031726 | CCR1 chemokine receptor binding | 1.6×10–7 | ||
| AML-7 | The entire network | GO:0070488 | neutrophil leukocyte aggregation | 1.7×10–7 |
| GO:0002523 | leukocyte migration involved in inflammatory response | 3.7×10–6 |
The first column (Subnetwork) reports the name of the subnetwork introduced in Figure 4. The second column (Elements) presents the elements of subnetwork of which the functional and biological relationship are analyzed based on the GO annotation.