| Literature DB >> 29021969 |
Zura Kakushadze1,2, Willie Yu3.
Abstract
We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.Entities:
Keywords: Cancer signatures; Clustering; Genome; K-means; Machine learning; Nonnegative matrix factorization; Sample; Somatic mutation; Source code; eRank
Year: 2017 PMID: 29021969 PMCID: PMC5634820 DOI: 10.1016/j.bdq.2017.07.001
Source DB: PubMed Journal: Biomol Detect Quantif
Fig. 1Horizontal axis: serial standard deviation for N = 96 mutation categories (i = 1, …, N) of cross-sectionally demeaned log-counts across n = 14 cancer types (for samples aggregated by cancer types, so s = 1, …, d, d = n). Vertical axis: density using R function density(). See Section 2.4.1 for details.
Weights (in the units of 1%, rounded to 2 digits) for the first 48 mutation categories (this table is continued in Table 2 with the next 48 mutation categories) for the 7 clusters in Clustering-A (see Table S4) based on unnormalized (columns 2–8) and normalized (columns 9–15) regressions (see Section 2.6 for details). Each cluster is defined as containing the mutations with nonzero weights. (The mutations are encoded as follows: XYZW = Y>W: XYZ. Thus, GCGA = C>A: GCG.) For instance, cluster Cl-2 contains 8 mutations GCGA, TCGA, ACGG, GCCG, GCGG, TCGG, GTCA, GTCG. In each cluster the weights are normalized to add up to 100% (up to 2 digits due to the aforesaid rounding). In Tables 1 through S10 “weights based on unnormalized regressions” are given by (13), (14) and (15), while “weights based on normalized regressions” are given by (17), (14) and (16), i.e., the exposures are calculated based on arithmetic averages (see Section 2.6 for details).
| Mutation | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACAA | 0.00 | 0.00 | 0.00 | 6.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.55 | 0.00 | 0.00 | 0.00 |
| ACCA | 0.00 | 0.00 | 0.00 | 0.00 | 5.83 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.08 | 0.00 | 0.00 |
| ACGA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 |
| ACTA | 0.00 | 0.00 | 0.00 | 0.00 | 6.16 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.38 | 0.00 | 0.00 |
| CCAA | 0.00 | 0.00 | 0.00 | 0.00 | 7.91 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.10 | 0.00 | 0.00 |
| CCCA | 0.00 | 0.00 | 0.00 | 0.00 | 6.46 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.68 | 0.00 | 0.00 |
| CCGA | 0.00 | 0.00 | 7.21 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.23 | 0.00 | 0.00 | 0.00 | 0.00 |
| CCTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.75 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.79 | 0.00 |
| GCAA | 4.05 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.65 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCCA | 0.00 | 0.00 | 0.00 | 0.00 | 4.56 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.73 | 0.00 | 0.00 |
| GCGA | 0.00 | 13.81 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 13.89 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCTA | 0.00 | 0.00 | 0.00 | 0.00 | 5.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.20 | 0.00 | 0.00 |
| TCAA | 0.00 | 0.00 | 0.00 | 6.26 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.21 | 0.00 | 0.00 | 0.00 |
| TCCA | 0.00 | 0.00 | 0.00 | 0.00 | 8.94 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.29 | 0.00 | 0.00 |
| TCGA | 0.00 | 11.87 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.24 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCTA | 0.00 | 0.00 | 0.00 | 8.05 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.00 | 0.00 | 0.00 | 0.00 |
| ACAG | 0.00 | 0.00 | 0.00 | 0.00 | 3.96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.18 | 0.00 | 0.00 |
| ACCG | 0.00 | 0.00 | 8.07 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.17 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACGG | 0.00 | 12.62 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.22 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACTG | 0.00 | 0.00 | 0.00 | 0.00 | 4.77 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.03 | 0.00 | 0.00 |
| CCAG | 0.00 | 0.00 | 9.26 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.35 | 0.00 | 0.00 | 0.00 | 0.00 |
| CCCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.91 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.02 |
| CCGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.37 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.12 |
| CCTG | 0.00 | 0.00 | 12.46 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.58 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.61 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.57 |
| GCCG | 0.00 | 14.79 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 15.62 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCGG | 0.00 | 15.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 13.92 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.86 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.92 |
| TCAG | 0.00 | 0.00 | 0.00 | 0.00 | 10.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.03 | 0.00 | 0.00 |
| TCCG | 0.00 | 0.00 | 0.00 | 0.00 | 5.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.95 | 0.00 | 0.00 |
| TCGG | 0.00 | 8.40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.65 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCTG | 0.00 | 0.00 | 0.00 | 0.00 | 14.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.53 | 0.00 | 0.00 |
| ACAT | 0.00 | 0.00 | 0.00 | 7.67 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.71 | 0.00 | 0.00 | 0.00 |
| ACCT | 4.78 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACGT | 23.47 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 23.18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACTT | 0.00 | 0.00 | 0.00 | 5.43 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.47 | 0.00 | 0.00 | 0.00 |
| CCAT | 0.00 | 0.00 | 0.00 | 6.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.02 | 0.00 | 0.00 | 0.00 |
| CCCT | 0.00 | 0.00 | 0.00 | 5.59 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.63 | 0.00 | 0.00 | 0.00 |
| CCGT | 17.66 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| CCTT | 0.00 | 0.00 | 0.00 | 7.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.04 | 0.00 | 0.00 | 0.00 |
| GCAT | 0.00 | 0.00 | 0.00 | 5.98 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.01 | 0.00 | 0.00 | 0.00 |
| GCCT | 5.74 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.93 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCGT | 20.46 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19.80 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCTT | 0.00 | 0.00 | 0.00 | 5.88 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.93 | 0.00 | 0.00 | 0.00 |
| TCAT | 11.42 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCCT | 0.00 | 0.00 | 0.00 | 7.81 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.76 | 0.00 | 0.00 | 0.00 |
| TCGT | 12.42 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCTT | 0.00 | 0.00 | 0.00 | 9.47 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.29 | 0.00 | 0.00 | 0.00 |
Fig. 2Cluster Cl-1 in Clustering-A with weights based on unnormalized regressions with arithmetic means (see Section 2.6). See Tables S2, 1, and 2. Here and in all figures, for comparison and visualization convenience, we show all 96 channels on the horizontal axis even though the weights are nonzero only for the mutation categories belonging to a given cluster. Thus, in this cluster, only 8 weights are nonzero, to wit, for GCAA, ACCT, ACGT, CCGT, GCCT, GCGT, TCAT, TCGT.
Fig. 15Cluster Cl-7 in Clustering-A with weights based on normalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Weights (in the units of 1%, rounded to 2 digits) for the first 48 mutation categories for the 7 clusters in Clustering-A (see Table S4) based on unnormalized (columns 2–8) and normalized (columns 9–15) regressions with the exposures computed via geometric means (see Section 2.6 for details). Here “weights based on unnormalized regressions” are given by (13), (14) and (19), while “weights based on normalized regressions” are given by (17), (14) and (21). Other conventions are the same as in Table 1.
| Mutation | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACAA | 0.00 | 0.00 | 0.00 | 6.54 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.54 | 0.00 | 0.00 | 0.00 |
| ACCA | 0.00 | 0.00 | 0.00 | 0.00 | 6.16 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.20 | 0.00 | 0.00 |
| ACGA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.05 |
| ACTA | 0.00 | 0.00 | 0.00 | 0.00 | 6.38 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.44 | 0.00 | 0.00 |
| CCAA | 0.00 | 0.00 | 0.00 | 0.00 | 8.27 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.27 | 0.00 | 0.00 |
| CCCA | 0.00 | 0.00 | 0.00 | 0.00 | 6.73 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.77 | 0.00 | 0.00 |
| CCGA | 0.00 | 0.00 | 7.32 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.24 | 0.00 | 0.00 | 0.00 | 0.00 |
| CCTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.77 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.76 | 0.00 |
| GCAA | 4.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.68 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCCA | 0.00 | 0.00 | 0.00 | 0.00 | 4.70 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.75 | 0.00 | 0.00 |
| GCGA | 0.00 | 13.79 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 13.76 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCTA | 0.00 | 0.00 | 0.00 | 0.00 | 5.16 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.22 | 0.00 | 0.00 |
| TCAA | 0.00 | 0.00 | 0.00 | 6.22 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.20 | 0.00 | 0.00 | 0.00 |
| TCCA | 0.00 | 0.00 | 0.00 | 0.00 | 8.86 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.08 | 0.00 | 0.00 |
| TCGA | 0.00 | 11.96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCTA | 0.00 | 0.00 | 0.00 | 8.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.01 | 0.00 | 0.00 | 0.00 |
| ACAG | 0.00 | 0.00 | 0.00 | 0.00 | 4.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.16 | 0.00 | 0.00 |
| ACCG | 0.00 | 0.00 | 8.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.17 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACGG | 0.00 | 12.58 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.32 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACTG | 0.00 | 0.00 | 0.00 | 0.00 | 4.73 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.88 | 0.00 | 0.00 |
| CCAG | 0.00 | 0.00 | 9.34 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.36 | 0.00 | 0.00 | 0.00 | 0.00 |
| CCCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.97 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.04 |
| CCGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.47 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.24 |
| CCTG | 0.00 | 0.00 | 12.56 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.61 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.68 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.63 |
| GCCG | 0.00 | 14.96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 15.53 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCGG | 0.00 | 15.17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 14.18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.92 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.94 |
| TCAG | 0.00 | 0.00 | 0.00 | 0.00 | 9.40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.99 | 0.00 | 0.00 |
| TCCG | 0.00 | 0.00 | 0.00 | 0.00 | 4.93 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.90 | 0.00 | 0.00 |
| TCGG | 0.00 | 8.53 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.60 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCTG | 0.00 | 0.00 | 0.00 | 0.00 | 13.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.56 | 0.00 | 0.00 |
| ACAT | 0.00 | 0.00 | 0.00 | 7.72 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.73 | 0.00 | 0.00 | 0.00 |
| ACCT | 4.86 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACGT | 23.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 23.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| ACTT | 0.00 | 0.00 | 0.00 | 5.45 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.47 | 0.00 | 0.00 | 0.00 |
| CCAT | 0.00 | 0.00 | 0.00 | 6.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.02 | 0.00 | 0.00 | 0.00 |
| CCCT | 0.00 | 0.00 | 0.00 | 5.60 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.62 | 0.00 | 0.00 | 0.00 |
| CCGT | 17.45 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| CCTT | 0.00 | 0.00 | 0.00 | 7.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.05 | 0.00 | 0.00 | 0.00 |
| GCAT | 0.00 | 0.00 | 0.00 | 5.98 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.00 | 0.00 | 0.00 | 0.00 |
| GCCT | 5.85 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.97 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCGT | 20.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19.63 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GCTT | 0.00 | 0.00 | 0.00 | 5.90 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.92 | 0.00 | 0.00 | 0.00 |
| TCAT | 11.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCCT | 0.00 | 0.00 | 0.00 | 7.77 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.75 | 0.00 | 0.00 | 0.00 |
| TCGT | 12.39 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 12.30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| TCTT | 0.00 | 0.00 | 0.00 | 9.35 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.27 | 0.00 | 0.00 | 0.00 |
Table 1 continued: weights for the next 48 mutation categories.
| Mutation | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ATAA | 0.00 | 0.00 | 0.00 | 0.00 | 4.18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.52 | 0.00 | 0.00 |
| ATCA | 0.00 | 0.00 | 10.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 10.15 | 0.00 | 0.00 | 0.00 | 0.00 |
| ATGA | 0.00 | 0.00 | 0.00 | 0.00 | 4.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.30 | 0.00 | 0.00 |
| ATTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.54 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.66 | 0.00 |
| CTAA | 0.00 | 0.00 | 11.74 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 11.16 | 0.00 | 0.00 | 0.00 | 0.00 |
| CTCA | 0.00 | 0.00 | 0.00 | 0.00 | 3.79 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.98 | 0.00 | 0.00 |
| CTGA | 0.00 | 0.00 | 0.00 | 0.00 | 4.88 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.02 | 0.00 | 0.00 |
| CTTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.33 | 0.00 |
| GTAA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.35 |
| GTCA | 0.00 | 15.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 15.36 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTGA | 0.00 | 0.00 | 9.28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.21 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.19 |
| TTAA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.26 | 0.00 |
| TTCA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.64 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.58 |
| TTGA | 0.00 | 0.00 | 8.84 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.55 | 0.00 | 0.00 | 0.00 | 0.00 |
| TTTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.27 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.38 | 0.00 |
| ATAC | 0.00 | 0.00 | 0.00 | 7.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.06 | 0.00 | 0.00 | 0.00 |
| ATCC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.39 | 0.00 |
| ATGC | 0.00 | 0.00 | 0.00 | 4.97 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.98 | 0.00 | 0.00 | 0.00 |
| ATTC | 0.00 | 0.00 | 0.00 | 6.30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.34 | 0.00 | 0.00 | 0.00 |
| CTAC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.78 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.81 | 0.00 |
| CTCC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.31 | 0.00 |
| CTGC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.37 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.41 | 0.00 |
| CTTC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.14 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.92 | 0.00 |
| GTAC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.84 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.96 | 0.00 |
| GTCC | 0.00 | 0.00 | 11.51 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 11.78 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTGC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.32 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.43 | 0.00 |
| GTTC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.05 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.23 | 0.00 |
| TTAC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.97 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.10 | 0.00 |
| TTCC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.69 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.79 | 0.00 |
| TTGC | 0.00 | 0.00 | 11.62 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 11.82 | 0.00 | 0.00 | 0.00 | 0.00 |
| TTTC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.29 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.28 | 0.00 |
| ATAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.98 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.09 |
| ATCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.81 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.70 |
| ATGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.97 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.99 |
| ATTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.08 |
| CTAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.56 |
| CTCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.52 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.31 |
| CTGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.67 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.83 |
| CTTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.67 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.89 | 0.00 |
| GTAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.58 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.49 |
| GTCG | 0.00 | 7.80 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.11 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.82 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.98 |
| GTTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.02 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.97 |
| TTAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.24 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.43 |
| TTCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.73 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.75 |
| TTGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.10 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.06 |
| TTTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.05 | 0.00 |
The within-cluster cross-sectional correlations Θ (columns 2–8), the overall correlations Ξ (column 11) based on the overall cross-sectional regressions, and multiple R2 and adjusted R2 of these regressions (columns 9 and 10). See Section 3.3 for details. Cancer types are labeled by X1 through X14 as in Table S2. All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font.
| Cancer type | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 | r.sq | adj.r.sq | Overall cor |
|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 57.66 | 31.8 | 75.04 | 41.7 | ||||||
| X2 | 66.35 | 79.64 | 41.42 | −2.87 | 25.43 | |||||
| X3 | −12.6 | 39.19 | 12.59 | 68.65 | 17.06 | 68.74 | ||||
| X4 | 9.88 | 16.97 | 52.94 | 79.11 | 46.74 | 7.34 | 58.18 | 54.9 | 61.53 | |
| X5 | 63.31 | 50.79 | 28.58 | 5.12 | 13.66 | |||||
| X6 | 34.07 | 48.92 | 76.77 | 19.59 | 34.54 | |||||
| X7 | 34.69 | 64.65 | 48.79 | 63.79 | 72.56 | |||||
| X8 | −31.6 | 39.99 | 65.56 | −46.21 | −6.95 | −3.36 | 61.8 | 69.52 | 67.12 | 41.88 |
| X9 | −28.63 | 53.86 | −34.26 | 46.93 | 59.88 | 13.59 | −12.39 | 77.76 | 76.02 | 70.18 |
| X10 | 61.59 | 63.06 | 67.15 | 41.13 | 4.11 | 43.87 | ||||
| X11 | 56.6 | 66.76 | 55.12 | 16.33 | 26.3 | |||||
| X12 | 17.48 | 5.1 | 16.5 | 27.74 | 21.63 | |||||
| X13 | 58.21 | 75.77 | 78.67 | 20.28 | 44.07 | |||||
| X14 | 38.93 | 65.92 | 17.23 | 58.54 | 4.73 | 35.72 | 31.27 | 65.4 |
The within-cluster cross-sectional correlations Δ between the weights for 7 cancer signatures Sig1 through Sig7 of [8] and the weights (using normalized regressions with exposures based on arithmetic averages) for 7 clusters in Clustering A (see Section 3.3 for details). All quantities are in the units of 1% rounded to 2 digits. The values above 80% are given in bold font.
| Signature | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 |
|---|---|---|---|---|---|---|---|
| Sig1 | 10.29 | −6.42 | −8.33 | 51.12 | 29.06 | 20.61 | |
| Sig2 | −0.37 | 1.75 | 42.13 | 75.58 | −27.92 | −3.34 | |
| Sig3 | −51.53 | 54.4 | −37.16 | 28.19 | 32.98 | 12.37 | −17.7 |
| Sig4 | 31.56 | 11.97 | 54.43 | 56.83 | −1.17 | 60.41 | |
| Sig5 | −42.53 | 40.31 | 62.96 | −47.62 | −8.34 | −8.39 | 61.61 |
| Sig6 | 47.79 | 40.62 | 17.8 | 27.45 | −27.96 | 16.87 | 16.97 |
| Sig7 | 19.87 | 55.03 | 33.4 | 13.89 | −29.59 | 13.93 |
Fig. 3Cluster Cl-1 in Clustering-A with weights based on normalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Fig. 10Cluster Cl-5 in Clustering-A with weights based on unnormalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Fig. 11Cluster Cl-5 in Clustering-A with weights based on normalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Fig. 8Cluster Cl-4 in Clustering-A with weights based on unnormalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Fig. 9Cluster Cl-4 in Clustering-A with weights based on normalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Fig. 12Cluster Cl-6 in Clustering-A with weights based on unnormalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Fig. 13Cluster Cl-6 in Clustering-A with weights based on normalized regressions with arithmetic means (see Section 2.6). See Tables S4, 1, and 2.
Table 3 continued: weights for the next 48 mutation categories.
| Mutation | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 | Cl-1 | Cl-2 | Cl-3 | Cl-4 | Cl-5 | Cl-6 | Cl-7 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ATAA | 0.00 | 0.00 | 0.00 | 0.00 | 4.41 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.51 | 0.00 | 0.00 |
| ATCA | 0.00 | 0.00 | 10.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 10.15 | 0.00 | 0.00 | 0.00 | 0.00 |
| ATGA | 0.00 | 0.00 | 0.00 | 0.00 | 4.15 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.25 | 0.00 | 0.00 |
| ATTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.59 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.64 | 0.00 |
| CTAA | 0.00 | 0.00 | 11.34 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 11.10 | 0.00 | 0.00 | 0.00 | 0.00 |
| CTCA | 0.00 | 0.00 | 0.00 | 0.00 | 3.87 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.94 | 0.00 | 0.00 |
| CTGA | 0.00 | 0.00 | 0.00 | 0.00 | 5.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.07 | 0.00 | 0.00 |
| CTTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.31 | 0.00 |
| GTAA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.33 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.36 |
| GTCA | 0.00 | 15.17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 15.40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTGA | 0.00 | 0.00 | 9.30 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.24 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.18 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.22 |
| TTAA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.21 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.21 | 0.00 |
| TTCA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.73 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.66 |
| TTGA | 0.00 | 0.00 | 8.62 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.51 | 0.00 | 0.00 | 0.00 | 0.00 |
| TTTA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.36 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.35 | 0.00 |
| ATAC | 0.00 | 0.00 | 0.00 | 7.07 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.08 | 0.00 | 0.00 | 0.00 |
| ATCC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.38 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.40 | 0.00 |
| ATGC | 0.00 | 0.00 | 0.00 | 4.99 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.99 | 0.00 | 0.00 | 0.00 |
| ATTC | 0.00 | 0.00 | 0.00 | 6.34 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.36 | 0.00 | 0.00 | 0.00 |
| CTAC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.82 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.81 | 0.00 |
| CTCC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.32 | 0.00 |
| CTGC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.27 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.35 | 0.00 |
| CTTC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.09 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.01 | 0.00 |
| GTAC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.82 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.90 | 0.00 |
| GTCC | 0.00 | 0.00 | 11.65 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 11.80 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTGC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.26 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.36 | 0.00 |
| GTTC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.18 | 0.00 |
| TTAC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.09 | 0.00 |
| TTCC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.69 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.76 | 0.00 |
| TTGC | 0.00 | 0.00 | 11.69 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 11.81 | 0.00 | 0.00 | 0.00 | 0.00 |
| TTTC | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.37 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.31 | 0.00 |
| ATAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.94 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.03 |
| ATCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.83 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.74 |
| ATGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.01 |
| ATTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.98 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 7.00 |
| CTAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.52 |
| CTCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.53 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.37 |
| CTGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.63 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.76 |
| CTTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.36 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.13 | 0.00 |
| GTAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.59 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.51 |
| GTCG | 0.00 | 7.84 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.08 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| GTGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.87 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.97 |
| GTTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.71 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.77 |
| TTAG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 4.32 |
| TTCG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.74 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.76 |
| TTGG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.11 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 6.09 |
| TTTG | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.22 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 8.12 | 0.00 |
| bio.calc.norm.ret <- function (ret) |
| { |
| s <- apply(ret, 1, sd) |
| x <- ret / s |
| return(x) |
| } |
| qrm.calc.norm.ret <- bio.calc.norm.ret |
| bio.cl.sigs <- function(x, iter.max = 100, |
| num.try = 1000, num.runs = 10000) |
| { |
| cl.ix <- function(x) match(1, x) |
| y <- log(1 + x) |
| y <- t(t(y) - colMeans(y)) |
| x.d <- exp(y) |
| k <- ncol(bio.erank.pc(y)$pc) |
| n <- nrow(x) |
| u <- rnorm(n, 0, 1) |
| q <- matrix(NA, n, num.runs) |
| p <- rep(NA, num.runs) |
| for(i in 1:num.runs) |
| { |
| z <- qrm.stat.ind.class(y, k, iter.max = iter.max, |
| num.try = num.try, demean.ret = F) |
| p[i] <- sum((residuals(lm(u ∼ -1 + z)))ˆ2) |
| q[, i] <- apply(z, 1, cl.ix) |
| } |
| p1 <- unique(p) |
| ct <- rep(NA, length(p1)) |
| for(i in 1:length(p1)) |
| ct[i] <- sum(p1[i] == p) |
| p1 <- p1[ct == max(ct)] |
| i <- match(p1, p)[1] |
| ix <- q[, i] |
| k <- max(ix) |
| z <- matrix(NA, n, k) |
| for(j in 1:k) |
| z[, j] <- as.numeric(ix == j) |
| res <- bio.cl.wts(x.d, z) |
| return(res) |
| } |
| bio.cl.wts <- function (x, ind) |
| { |
| first.ix <- function(x) match(1, x)[1] |
| calc.wts <- function(x, use.wts = F, use.geom = F) |
| { |
| if(use.geom) |
| { |
| if(use.wts) |
| s <- apply(log(x), 1, sd) |
| else |
| s <- rep(1, nrow(x)) |
| s <- 1 / s / sum(1 / s) |
| fac <- apply(xˆs, 2, prod) |
| } |
| else |
| { |
| if(use.wts) |
| s <- apply(x, 1, sd) |
| else |
| s <- rep(1, nrow(x)) |
| fac <- colMeans(x / s) |
| } |
| w <- coefficients(lm(t(x) ∼ -1 + fac)) |
| w <- 100 * w / sum(w) |
| return(w) |
| } |
| n <- nrow(x) |
| w <- w.g <- v <- v.g <- rep(NA, n) |
| z <- colSums(ind) |
| z <- as.numeric(paste(z, ".", apply(ind, 2, first.ix), sep = "")) |
| dimnames(ind)[[2]] <- names(z) <- 1:ncol(ind) |
| z <- sort(z) |
| z <- names(z) |
| ind <- ind[, z] |
| dimnames(ind)[[2]] <- NULL |
| for(i in 1:ncol(ind)) |
| { |
| take <- ind[, i] == 1 |
| if(sum(take) == 1) |
| { |
| w[take] <- w.g[take] <- 1 |
| v[take] <- v.g[take] <- 1 |
| next |
| } |
| w[take] <- calc.wts(x[take,], F, F) |
| w.g[take] <- calc.wts(x[take,], F, T) |
| v[take] <- calc.wts(x[take,], T, F) |
| v.g[take] <- calc.wts(x[take,], T, T) |
| } |
| res <- new.env() |
| res$ind <- ind |
| res$w <- w |
| res$w.g <- w.g |
| res$v <- v |
| res$v.g <- v.g |
| return(res) |
| } |