| Literature DB >> 28814280 |
Sebastian J Teran Hidalgo1, Mengyun Wu1,2, Shuangge Ma3,4.
Abstract
BACKGROUND: In biomedical research, gene expression profiling studies have been extensively conducted. The analysis of gene expression data has led to a deeper understanding of human genetics as well as practically useful models. Clustering analysis has been a critical component of gene expression data analysis and can reveal the (previously unknown) interconnections among genes. With the high dimensionality of gene expression data, many of the existing clustering methods and results are not as satisfactory. Intuitively, this is caused by "a lack of information". In recent profiling studies, a prominent trend is to collect data on gene expressions as well as their regulators (copy number alteration, microRNA, methylation, etc.) on the same subjects, making it possible to borrow information from other types of omics measurements in gene expression analysis.Entities:
Keywords: Assisted analysis; Clustering; Gene expression data
Mesh:
Year: 2017 PMID: 28814280 PMCID: PMC5559859 DOI: 10.1186/s12864-017-3990-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Toy example. A toy example with two clusters represented using different colors. Circles and squares represent GEs and CNAs, respectively. The thickness of lines represents the degree of similarity. Left: true structure as well as that recovered by the proposed approach. Right: K-means
Fig. 2Heatmap of simulated GEs. Heatmaps of GEs for one simulated replicate. Left: using the observed GE values. Right: using the predicted GE values, where the two-cluster structure is more clearly seen
Simulation: mean M measures over 100 replicates
|
|
|
| ANCut | ANCut | K-means |
|
|
| ANCut | ANCut | K-means |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 100 | 100 | 0.10 | 0% | 17% | 27.8% | 100 | 100 | 0.20 | 0% | 3.6% | 2.5% |
| 100 | 150 | 0.10 | 0% | 6.5% | 7.2% | 100 | 150 | 0.20 | 0% | 0.2% | 0.1% |
| 100 | 200 | 0.10 | 0% | 1.9% | 1.6% | 100 | 200 | 0.20 | 0% | 0.05% | 0.01% |
| 150 | 100 | 0.10 | 0% | 11.5% | 20.9% | 150 | 100 | 0.20 | 0% | 1.3% | 1.1% |
| 150 | 150 | 0.10 | 0% | 3% | 3.2% | 150 | 150 | 0.20 | 0% | 0.02% | 0.02% |
| 150 | 200 | 0.10 | 0% | 0.4% | 0.4% | 150 | 200 | 0.20 | 0% | 0.01% | 0% |
| 200 | 100 | 0.10 | 0% | 8.3% | 14.3% | 200 | 100 | 0.20 | 0% | 0.06% | 0.08% |
| 200 | 150 | 0.10 | 0% | 1.6% | 1.8% | 200 | 150 | 0.20 | 0% | 0.01% | 0.01% |
| 200 | 200 | 0.10 | 0% | 0.02% | 0.02% | 200 | 200 | 0.20 | 0% | 0% | 0% |
Simulation under coefficient setting C1 with h=0.15 and ρ=0.20: mean values based on 100 replicates
| Parameters |
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ANCut | NCut | K-means | Spec. | ANCut | NCut | K-means | Spec. |
| 200 | 500 | 250 | 3 | 44.8% | 49.8% | 49.8% | 49.7% | 48% | 48.3% | 49.5% | 42.5% |
| 200 | 500 | 250 | 6 | 27.7% | 41.8% | 41.9% | 36.8% | 46.2% | 47.1% | 49.3% | 33.8% |
| 200 | 500 | 500 | 3 | 45% | 49.7% | 49.8% | 49.7% | 28.4% | 44.3% | 47.7% | 42.8% |
| 200 | 500 | 500 | 6 | 28.4% | 42.4% | 41.8% | 37.4% | 17.6% | 40.6% | 42.2% | 34.5% |
| 400 | 500 | 250 | 3 | 38.2% | 49.5% | 49.7% | 49.5% | 47.7% | 45.2% | 49.6% | 43.5% |
| 400 | 500 | 250 | 6 | 16.8% | 25% | 25.5% | 23.9% | 41.6% | 40.7% | 49% | 24% |
| 400 | 500 | 500 | 3 | 38.4% | 40.4% | 49.7% | 49.5% | 25.3% | 30.1% | 48.1% | 43.3% |
| 400 | 500 | 500 | 6 | 16.8% | 23.7% | 25.2% | 24.5% | 12.4% | 25.7% | 24.4% | 24.3% |
| 200 | 800 | 400 | 3 | 45.2% | 49.8% | 49.8% | 49.7% | 48.4% | 46% | 49.8% | 43.3% |
| 200 | 800 | 400 | 6 | 18.2% | 33.3% | 20.4% | 33.9% | 42.9% | 33.7% | 47.3% | 30.1% |
| 200 | 800 | 800 | 3 | 45.4% | 49.7% | 49.8% | 49.7% | 32.8% | 48.2% | 48.1% | 42.4% |
| 200 | 800 | 800 | 6 | 29.3% | 36.5% | 34.2% | 33.7% | 22.8% | 36.6% | 29.6% | 30% |
| 400 | 800 | 400 | 3 | 39.6% | 49.7% | 49.6% | 48.8% | 47.8% | 48.1% | 49.7% | 44.1% |
| 400 | 800 | 400 | 6 | 29.4% | 22.6% | 34.1% | 20.3% | 46.7% | 23.1% | 49.2% | 20.3% |
| 400 | 800 | 800 | 3 | 39.5% | 49.8% | 49.4% | 48.7% | 28.8% | 48.9% | 48.4% | 44.2% |
| 400 | 800 | 800 | 6 | 27.9% | 22.6% | 34.2% | 21.1% | 19.7% | 19.4% | 24.5% | 18.3% |
Simulation under coefficient setting C1 with h=0.15 and ρ=0.40: mean values based on 100 replicates
| Parameters |
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ANCut | NCut | K-means | Spec. | Aug-K | ANCut | NCut | K-means | Spec. | Aug-K |
| 200 | 500 | 250 | 3 | 40.6% | 49.1% | 49.3% | 48.3% | 99.7% | 47.6% | 48.1% | 49.5% | 41.7% | 1% |
| 200 | 500 | 250 | 6 | 17.4% | 20.8% | 20.9% | 20.6% | 46.1% | 42.3% | 36.1% | 47.4% | 20.7% | 21.6% |
| 200 | 500 | 500 | 3 | 40.7% | 49.6% | 49.1% | 48.5% | 99.5% | 29.4% | 48.3% | 47.2% | 42.6% | 1% |
| 200 | 500 | 500 | 6 | 17.2% | 20.3% | 20.3% | 20.6% | 58.3% | 12.5% | 14.8% | 15.8% | 20.6% | 22.3% |
| 400 | 500 | 250 | 3 | 30% | 48.1% | 47.5% | 43.1% | 99.9% | 46.7% | 47.8% | 49.6% | 40.2% | 0.1% |
| 400 | 500 | 250 | 6 | 7.4% | 19.6% | 9% | 9% | 24.9% | 33.6% | 34.7% | 39.4% | 11.5% | 20.8% |
| 400 | 500 | 500 | 3 | 30.8% | 48.5% | 47.7% | 43.6% | 99.9% | 20.9% | 37.5% | 47.4% | 39.9% | 0.5% |
| 400 | 500 | 500 | 6 | 7.6% | 9% | 9.4% | 9.4% | 30.7% | 6.9% | 6.7% | 7.7% | 11.8% | 23% |
| 200 | 800 | 400 | 3 | 41% | 49.7% | 48.2% | 45.8% | 99.7% | 47.9% | 49.5% | 49.5% | 41.2% | 0.4% |
| 200 | 800 | 400 | 6 | 18.5% | 20.9% | 18.8% | 18.7% | 52% | 43.2% | 22.5% | 44.7% | 18.6% | 20.7% |
| 200 | 800 | 800 | 3 | 41.4% | 49.3% | 48.4% | 45.8% | 99.6% | 32.6% | 46.8% | 47.1% | 40.5% | 1% |
| 200 | 800 | 800 | 6 | 18.6% | 18.4% | 19% | 18.6% | 62% | 13.4% | 11.9% | 16.4% | 18.8% | 20% |
| 400 | 800 | 400 | 3 | 31.5% | 49.7% | 42% | 37.1% | 99.9% | 47% | 46.8% | 49.6% | 33.9% | 0.1% |
| 400 | 800 | 400 | 6 | 9.4% | 6.9% | 7.9% | 8% | 27.4% | 35.2% | 14.5% | 35.6% | 10.3% | 18.7% |
| 400 | 800 | 800 | 3 | 32.5% | 49.4% | 43% | 39.1% | 99.9% | 11.7% | 43.6% | 7.7% | 10.2% | 0.1% |
| 400 | 800 | 800 | 6 | 9.3% | 7.6% | 8% | 8.2% | 31.3% | 10.9% | 4.1% | 6.6% | 10.2% | 9.1% |
Simulation under coefficient setting C1 with h=0.25 and ρ=0.20: mean values based on 100 replicates
| Parameters |
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ANCut | NCut | K-Means | Spec. | ANCut | NCut | K-Means | Spec. |
| 200 | 500 | 250 | 3 | 34% | 47.5% | 47.6% | 45.1% | 47.2% | 45.4% | 49.5% | 40.3% |
| 200 | 500 | 250 | 6 | 11.8% | 13% | 13.7% | 13.6% | 39.1% | 14.1% | 43% | 15.6% |
| 200 | 500 | 500 | 3 | 33.9% | 47.5% | 47.6% | 44.9% | 21.2% | 38.5% | 46.5% | 39.9% |
| 200 | 500 | 500 | 6 | 12.3% | 14.2% | 14.2% | 14% | 9.9% | 11.8% | 10.6% | 15.2% |
| 400 | 500 | 250 | 3 | 24% | 38.5% | 40.5% | 34.6% | 44.5% | 37.4% | 49.6% | 32.3% |
| 400 | 500 | 250 | 6 | 4.4% | 4.6% | 4.8% | 4.8% | 28% | 15.3% | 31.3% | 7.1% |
| 400 | 500 | 500 | 3 | 23.8% | 37.8% | 40.6% | 34.7% | 15.4% | 30.3% | 40.9% | 32.3% |
| 400 | 500 | 500 | 6 | 4.4% | 4.7% | 5% | 5.1% | 4.6% | 12.1% | 4.5% | 7.3% |
| 200 | 800 | 400 | 3 | 35% | 45.7% | 43.8% | 39.6% | 47.4% | 43.4% | 49.5% | 34.8% |
| 200 | 800 | 400 | 6 | 14.2% | 12.9% | 12.8% | 12.8% | 40.3% | 43.1% | 39.9% | 14.5% |
| 200 | 800 | 800 | 3 | 35.3% | 45.6% | 44.2% | 40.3% | 27% | 43.1% | 42.9% | 35.3% |
| 200 | 800 | 800 | 6 | 14.4% | 13.8% | 13.1% | 12.8% | 15% | 12% | 9.7% | 14.3% |
| 400 | 800 | 400 | 3 | 25.3% | 44.4% | 30.8% | 29.3% | 45.4% | 43.1% | 49.3% | 26.1% |
| 400 | 800 | 400 | 6 | 6.6% | 3.6% | 4.6% | 4.3% | 30.7% | 14.9% | 29.8% | 6.5% |
| 400 | 800 | 800 | 3 | 25.3% | 32.1% | 30.9% | 29.4% | 19.6% | 21.1% | 25.5% | 26.2% |
| 400 | 800 | 800 | 6 | 6.5% | 3.9% | 4.4% | 4.6% | 8.6% | 8.4% | 3.9% | 6.5% |
Simulation under setting coefficient C1 with h=0.25 and ρ=0.40: mean values based on 100 replicates
| Parameters |
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ANCut | NCut | K-means | Spec. | ANCut | NCut | K-means | Spec. |
| 200 | 500 | 250 | 3 | 24.8% | 29.8% | 32% | 29.5% | 45.1% | 44.8% | 49% | 26.7% |
| 200 | 500 | 250 | 6 | 4.5% | 4.6% | 5% | 4.6% | 29.6% | 21.3% | 30.1% | 6.5% |
| 200 | 500 | 500 | 3 | 25.6% | 28.3% | 31.8% | 29.8% | 15.9% | 17.7% | 27.5% | 26.7% |
| 200 | 500 | 500 | 6 | 4.4% | 4.5% | 5% | 4.7% | 4.2% | 6.8% | 4.8% | 6.6% |
| 400 | 500 | 250 | 3 | 14.3% | 16.5% | 18.3% | 17.3% | 38.9% | 43.5% | 46.5% | 17.4% |
| 400 | 500 | 250 | 6 | 1% | 1.1% | 1.1% | 1.1% | 16% | 11.5% | 16.8% | 1.9% |
| 400 | 500 | 500 | 3 | 14.9% | 20.4% | 18.7% | 17.7% | 10.6% | 23.1% | 14.2% | 17.8% |
| 400 | 500 | 500 | 6 | 1% | 1.3% | 1.3% | 1% | 1.1% | 1.3% | 1.8% | 1.9% |
| 200 | 800 | 400 | 3 | 25.8% | 28.2% | 27.1% | 26.5% | 45.7% | 43.4% | 47.8% | 23.7% |
| 200 | 800 | 400 | 6 | 7.3% | 4.9% | 4.3% | 4% | 30.6% | 16.7% | 28.9% | 6.1% |
| 200 | 800 | 800 | 3 | 26.2% | 26.5% | 27.7% | 26.8% | 20.3% | 23% | 19.6% | 23.7% |
| 200 | 800 | 800 | 6 | 7.1% | 4.4% | 4% | 3.9% | 25.4% | 10.6% | 20.1% | 5.1% |
| 400 | 800 | 400 | 3 | 16.1% | 16.4% | 15.9% | 15.7% | 40.5% | 31.5% | 42.6% | 16% |
| 400 | 800 | 400 | 6 | 4.6% | 1.2% | 1% | 0.9% | 19% | 5% | 15.4% | 1.7% |
| 400 | 800 | 800 | 3 | 16.2% | 17.4% | 15.9% | 16% | 14.7% | 14.1% | 11.2% | 15.8% |
| 400 | 800 | 800 | 6 | 3.6% | 1% | 0.9% | 1% | 5.8% | 5.4% | 0.9% | 1.7% |
Simulation under coefficient setting C2 with ρ=0.40: mean values based on 100 replicates
| Parameters |
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ANCut | NCut | K-means | Spec. | ANCut | NCut | K-means | Spec. |
| 200 | 500 | 250 | 3 | 49.9% | 49.9% | 50% | 49.9% | 31.2% | 32.6% | 47.9% | 33.7% |
| 200 | 500 | 250 | 6 | 41.6% | 42.4% | 48.5% | 46.2% | 24.1% | 27.1% | 46.3% | 37.9% |
| 200 | 500 | 500 | 3 | 49.2% | 49.5% | 50% | 50% | 31.2% | 32.1% | 47.9% | 33.7% |
| 200 | 500 | 500 | 6 | 41.9% | 43.9% | 48.7% | 44.1% | 24.3% | 20.1% | 46.6% | 17.9% |
| 400 | 500 | 250 | 3 | 46.3% | 47.6% | 49.9% | 48.8% | 28.9% | 30.1% | 47.9% | 41.6% |
| 400 | 500 | 250 | 6 | 33.6% | 34.5% | 39.4% | 40.4% | 15.2% | 19.3% | 17.9% | 16.2% |
| 400 | 500 | 500 | 3 | 46.2% | 48% | 49.8% | 47.1% | 28.3% | 31.3% | 47.9% | 31.4% |
| 400 | 500 | 500 | 6 | 34.2% | 34% | 35.3% | 33.9% | 16.5% | 17.4% | 19.6% | 14.1% |
| 200 | 800 | 400 | 3 | 48.6% | 49.3% | 49.9% | 49.3% | 32.8% | 35.1% | 47.6% | 45.9% |
| 200 | 800 | 400 | 6 | 42.3% | 43.3% | 46.3% | 44.3% | 29.3% | 30.1% | 41.8% | 33.8% |
| 200 | 800 | 800 | 3 | 48.6% | 49% | 49.9% | 48.1% | 32.5% | 36.7% | 47.9% | 29.4% |
| 200 | 800 | 800 | 6 | 42.1% | 39.3% | 46.5% | 43.9% | 29.4% | 24.1% | 42.8% | 26.5% |
| 400 | 800 | 400 | 3 | 46.5% | 48.3% | 49.8% | 47.4% | 33.3% | 45.2% | 48.5% | 47.5% |
| 400 | 800 | 400 | 6 | 37.5% | 34.4% | 40.2% | 37.2% | 23.6% | 22.1% | 27.9% | 20.7% |
| 400 | 800 | 800 | 3 | 46.9% | 48.5% | 49.7% | 48.1% | 31% | 30% | 48.1% | 31.5% |
| 400 | 800 | 800 | 6 | 37.7% | 36.1% | 40% | 36.2% | 24.4% | 23.6% | 27.2% | 29.1% |
Fig. 3Functional modes. Analysis of TCGA data using ANCut (left) and K-means (right): the functional modes of the clusters
Fig. 4All GO processes. Analysis of TCGA data using ANCut: proportions of genes with a certain GO process in the four clusters
Fig. 5Selected GO processes. Analysis of TCGA data using ANCut: proportions of genes with a certain GO process in the four clusters