| Literature DB >> 26817708 |
Theodore Roman1,2, Lu Xie3,4, Russell Schwartz5,6.
Abstract
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data.Entities:
Mesh:
Year: 2016 PMID: 26817708 PMCID: PMC4895288 DOI: 10.1186/s12864-015-2302-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Visual representation of geometric structures tested with synthetic data and corresponding evolutionary scenarios
Fig. 2Adjusted Rand indices (ARIs) for 100 replicates of synthetic data under seven mixture scenarios with varying noise. The first column (panels x.1) shows the performance of medoidshift without a kernel function; the second column (panels x.2) show the performance of using the negative exponential kernel function; and the third column (panels x.3) is our new 2-stage medoidshift clustering method. Each row has increasing noise; the first row (panels 1.y) has no noise, the second row has σ=0.05 noise added, the third row has σ=0.1 noise added, the fourth row has σ=0.15 noise added, and the fifth row has σ=0.2 noise added
Fig. 3Visual representation of ovarian tumor data (OV) in principal components space (panel (a)), and of lung squamous small cell carcinoma (LUSC) (panel (b)). Data are colored based on their cluster membership as determined by 2-stage medoidshift clustering
Summary of DAVID ontology terms most strongly associated with each ovarian (OV) and lung (LUSC) cluster
| Cancer type | Cluster number | Terms |
|---|---|---|
| OV | 1 | Keratinization, small proline-rich, epidermal cell differentiation, |
| epithelial cell differentiation | ||
| OV | 2 | Antigen processing, MHC class II, asthma, allograft rejection, |
| type I diabetes mellitus, cell adhesion | ||
| OV | 3 | Keratin, coil 1a/b/2/12, intermediate filament, cytoskeleton, |
| non-membrane-bound organelle | ||
| LUSC | 1 | Zinc finger, KRAB, C2H2, transcriptional regulation, DNA-binding, metal binding |
| LUSC | 2 | Keratin, peripherin, intermediate |
| filament family orphan 1 |