| Literature DB >> 28193548 |
Jack Kuipers1, Katharina Jahn1, Niko Beerenwinkel1.
Abstract
The mutational heterogeneity observed within tumours poses additional challenges to the development of effective cancer treatments. A thorough understanding of a tumour's subclonal composition and its mutational history is essential to open up the design of treatments tailored to individual patients. Comparative studies on a large number of tumours permit the identification of mutational patterns which may refine forecasts of cancer progression, response to treatment and metastatic potential. The composition of tumours is shaped by evolutionary processes. Recent advances in next-generation sequencing offer the possibility to analyse the evolutionary history and accompanying heterogeneity of tumours at an unprecedented resolution, by sequencing single cells. New computational challenges arise when moving from bulk to single-cell sequencing data, leading to the development of novel modelling frameworks. In this review, we present the state of the art methods for understanding the phylogeny encoded in bulk or single-cell sequencing data, and highlight future directions for developing more comprehensive and informative pictures of tumour evolution. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.Entities:
Keywords: Cancer evolution; Phylogenetics; Single-cell sequencing; Tumour heterogeneity
Mesh:
Substances:
Year: 2017 PMID: 28193548 PMCID: PMC5813714 DOI: 10.1016/j.bbcan.2017.02.001
Source DB: PubMed Journal: Biochim Biophys Acta Rev Cancer ISSN: 0304-419X Impact factor: 10.680
Fig. 1(a) Schematic representation of the clonal expansion that shaped the heterogeneous tumour depicted in (b). The colours of the cells represent their belonging to the different subclones. The small stars inside the cells represent the mutations present. (c) Two bulk samples admixed with normal cells (empty grey circles) taken from the tumour in (b). The bar plots depicted next to the samples can be derived from variant allele frequency data obtained by bulk sequencing. Each bar represents the estimated cellular prevalence of one mutation present in the sample. Note that the dark purple mutation on the bottom left of (a) is absent from the frequency plots because it is too low frequency to be detected. (d) Mutation histories compatible with the cell prevalences of sample 1 or sample 2. (Not all compatible trees are depicted.) The two trees in the intersection are compatible with both samples. It can not be inferred from the given data that the left one is the true history that matches the clonal expansion in (a).
Fig. 2From the heterogeneous tumour from Fig. 1 depicted in (a) which has evolved following the schematic representation in (b), the 10 single cells shown in (b) are selected for sequencing. One cell is normal tissue while the remaining nine cells from the tumour contain additional mutations represented by the stars in the cells. The cells belong to a binary genealogical tree as in (c) where they are connected at their common ancestors. The exact nature of the branch points cannot necessarily be determined by the mutations each cell possesses, for example the three cells on the left can have any arrangement as long as they are all below the purple mutation which distinguishes them from other cells. The representation in (c) is a sample genealogical tree focussing on the relationship between the cells themselves while an equivalent representation is presented in (d). Here the mutations are encapsulated in nodes on a tree with the samples attached as leaves to create a mutation tree. This representation emphasises the ordering and evolutionary history of the mutations.
Fig. 3Left: Overview of the typical work flow for the reconstruction of mutation histories from bulk tumour samples. DNA is extracted from a bulk sample and sequenced to reveal the admixed mutation profile. Clustering mutations by variant allele frequencies reveals possible subclones and their relative frequency in the admixed sample. Based on this information compatible mutation histories are inferred. Right: Overview of the typical work flow for the reconstruction of mutation histories from single-cell samples. The DNA is extracted from the individual cells and amplified due to the limited starting material. This process does not amplify all genomic sites equally well. The amplified DNA material is then sequenced and mutations are called. The mutation profiles of the individual cells are now combined into a single (noisy) character state matrix that is then used for tree inference.
Clonal reconstruction methods based on SNV bulk data. Abbreviations: EM, expectation maximisation; MCMC, Markov chain Monte Carlo; MILP, mixed integer linear programming; QIP, quadratic integer programming.
| Software | Year | Reference | Phylogeny | Multiple samples | Inference |
|---|---|---|---|---|---|
| TrAp | 2013 | Y | N | Exhaustive search | |
| Clomial | 2014 | N | Y | Binomial/EM | |
| PhyloSub | 2014 | Y | Y | Tree-structured stick-breaking/MCMC | |
| PyClone | 2014 | N | Y | Dirichlet process, beta-binomial/MCMC | |
| RecBTP | 2014 | Y | N | Approximation algorithm | |
| SciClone | 2014 | N | N | Beta mixture model | |
| AncesTree | 2015 | Y | Y | Optimisation/MILP | |
| CITUP | 2015 | Y | Y | Optimisation/QIP | |
| LICHeE | 2015 | Y | Y | Heuristic | |
| BayClone | 2015 | N | Y | Gibbs sampling/Metropolis-Hastings | |
| CTPsingle | 2016 | Y | N | Dirichlet process, beta-binomial/MCMC | |
| Cloe | 2016 | Y | Y | Metropolis-coupled MCMC |
Clonal reconstruction methods based on SNV and CNA bulk data. Abbreviations: HMM, hidden Markov model; MCMC, Markov chain Monte Carlo.
| Software | Year | Reference | Phylogeny | Multiple samples | Inference |
|---|---|---|---|---|---|
| CHAT | 2014 | N | N | Dirichlet process Gaussian mixture model/MCMC | |
| CloneHD | 2014 | N | Y | HMM/local optimisation | |
| SubcloneSeeker | 2014 | Y | Y | Exhaustive enumeration | |
| PhyloWGS | 2015 | Y | Y | Tree-structured stick-breaking/MCMC | |
| SCHISM | 2015 | Y | Y | Likelihood ratio tests/genetic algorithm | |
| SPRUCE | 2016 | Y | Y | Exhaustive enumeration | |
| CANOPY | 2016 | Y | Y | MCMC |
Characteristics of some single-cell sequencing datasets. The number of samples is per patient. The number of cells, also per patient, only includes those which passed quality control and were used for mutation calling. The false positive and allelic drop out rate estimates are per genomic position. The number of mutations excludes those which only occur in one cell which are uninformative for the phylogenetic reconstruction. They may however include mutations occurring (or with missing data) in all cells which are also uninformative. These have been removed from the count of [70] and do not occur for the ER + tumour of [78] on in any of the patient samples from [80].
| Cancer type | Year and reference | Number of patients | Number of samples | Number of mutations | Number of cells | False positive rate | Allelic drop out rate | Missing data |
|---|---|---|---|---|---|---|---|---|
| Myeloproliferative neoplasm | (2012) | 1 | 1 | 712 | 58 | 6.04 ×10 −5 | 0.4309 | 58% |
| Kidney | (2012) | 1 | 1 | 35 | 17 | 2.67 ×10 −5 | 0.1643 | 22% |
| Bladder | (2012) | 1 | 1 | 443 | 44 | 6.7 × 10 −5 | 0.4 | 55% |
| Colon | (2014) | 1 | 1 | 176 | 63 | <1 ×10 −4 | >0.5 | – |
| Breast | (2014) | 2 | 1 | 40/519 | 47/16 | 1.24 ×10 −6 | 0.0973 | 1% |
| Leukemia | (2014) | 3 | 1 | ≤ 1953 | 11–12 | – | 0.12 | 28% |
| Leukemia | (2014) | 6 | 1 | 10–105 | 96–150 | – | ≤ 0.3 | – |
| Breast (and xenografts) | (2015) | 2 | 2/3 | 37/45 | 120/90 | – | ≈ 0.2 | 7–12% |
| Ovarian (intraperitoneal) | (2016) | 3 | 4–5 | 23–33 | 420–672 | – | – | – |
The number of mutations listed for [77] refers to the number of loci sequenced.
The number of mutations only indicates those uncovered in targeted panels of 40/45 SNVs for [50] and of 43–84 SNVs for [97].
Overview of single-cell phylogenetic methods. Abbreviation: MCMC, Markov chain Monte Carlo.
| Method | Phylogenetic representation | Inference |
|---|---|---|
| Kim and Simon | Mutation tree | Pairwise ordering and maximum spanning tree |
| BitPhylogeny | Clonal tree | Tree-structure stick-breaking MCMC |
| OncoNEM | Sample/clonal tree | Greedy structure search |
| SCITE | Mutation tree | MCMC |
SCITE [116] provides the option of using the sample tree representation.