| Literature DB >> 25293804 |
Niko Beerenwinkel1, Roland F Schwarz2, Moritz Gerstung2, Florian Markowetz2.
Abstract
Cancer is a somatic evolutionary process characterized by the accumulation of mutations, which contribute to tumor growth, clinical progression, immune escape, and drug resistance development. Evolutionary theory can be used to analyze the dynamics of tumor cell populations and to make inference about the evolutionary history of a tumor from molecular data. We review recent approaches to modeling the evolution of cancer, including population dynamics models of tumor initiation and progression, phylogenetic methods to model the evolutionary relationship between tumor subclones, and probabilistic graphical models to describe dependencies among mutations. Evolutionary modeling helps to understand how tumors arise and will also play an increasingly important prognostic role in predicting disease progression and the outcome of medical interventions, such as targeted therapy.Entities:
Keywords: Cancer; cancer progression; evolution; population genetics; probabilistic graphical models
Mesh:
Year: 2014 PMID: 25293804 PMCID: PMC4265145 DOI: 10.1093/sysbio/syu081
Source DB: PubMed Journal: Syst Biol ISSN: 1063-5157 Impact factor: 15.683
FModeling the population dynamics of cancer cells. a) Schematic illustration of the genetic progression from initially healthy cells (normal cells) to an invasive cancer by accumulating driver mutations. b) Age-incidence curves rise sharply above the age of 50 and are informative about the dynamics of tumor progression. The straight line shows a fit with power 4.8. The log-log-linear dependency of incidence on age is used in multistage theory to estimate the number of rate-limiting steps in cancer progression from incidence data. c) Population genetics models such as the Wright–Fisher model can be used to model the accumulation of driver mutations through multiple clonal expansions and to derive the average waiting times for a given number of alterations . d) Dynamics of a three-strategy game corresponding to cell types A, B, and C. While simple additive fitness models always lead to the survival of the fittest, evolutionary game theory accounts for cellular interactions and allows for more complex dynamics, such as stable coexistence of cell types. Indicated here is a stable equilibrium with strategy A and C, but not B, which is reached from all three starting conditions via the indicated evolutionary paths.
A table of common terms and acronyms used in cancer genomics
| Term | Description |
|---|---|
| aCGH | Array CGH; microarray-based high-resolution CGH method |
| APC | Adenomatous polyposis coli; a human tumor suppressor gene (TSG) |
| Allele frequency | Fraction of cells (in NGS data, of reads) carrying a mutation |
| Aneuploidy | Abnormal number of (parts of) chromosomes ( |
| BAF | B allele frequency; ratio of the number of |
| BRAF | Gene coding for the B-Raf protein, a signal transduction kinase that can be involved in cancer if mutated |
| Carcinogenesis | The process of cancer development |
| Cellularity | The proportion of tumor cells in a sample |
| CGH | Comparative genome hybridization; cytogenetic method for analyzing copy number variations (CNVs) |
| Chromothripsis | The shattering of the genome in one catastrophic event ( |
| Chromoplexy | Chained rearrangement across several chromosomes ( |
| Clone | A set of tumor cells descending from the same ancestor and hence sharing its mutations |
| Clonal frequency | Percentage of tumor cells carrying an allele |
| CNV | Germline (normal) copy number variation in normal cells of the tissue and in tumor cells |
| CNA | Somatic copy number aberration (or alteration) in the cancer genome of tumor cells ( |
| Driver mutation | Mutation that confers a selective advantage |
| EGFR | Epidermal growth factor receptor; a cell-surface receptor whose altered expression is involved in cancer |
| Kataegis | Local hypermutation; many SNVs clustered on a short genomic segment ( |
| LOH | Loss of heterozygosity; Loss of one parental allele with or without a copy number change. |
| logR | Logarithmic intensity ratio of tumor and control DNA in an array CGH experiment. |
| NGS | Next-generation sequencing; High-throughput sequencing technologies based on massively parallel |
| DNA amplification and sequencing | |
| Oncogene | Gene that confers a selective advantage if hit by a gain-of-function mutation |
| Oncogenetic model | A model of the dependencies of events (CNAs, SNVs) in cancer development ( |
| Passenger mutation | Selectively neutral mutation |
| Phasing | The assignment of genomic alterations to specific haplotypes ( |
| Phylogenetic model | A model of the evolutionary relationship between different tumor samples from the same patient or between clones from the same tumor ( |
| Segmentation | The process of calling integer copy numbers from noisy logR values. |
| Somatic evolution | Evolution within an organism |
| SNP | Single nucleotide polymorphism; single base variant existing in the human population |
| (found in normal tissue and tumor) | |
| SNV | Single nucleotide variant; single base change that occurred in the tumor ( |
| TSG | Tumor suppressor gene; gene that confers a selective advantage if hit by a loss-of-function mutation |
| Tumorigenesis | The process of tumor development |
| Vascularization | The process of establishing blood vessels |
FCommon aberrations in cancer genomes. These events lead to the abnormal chromosome numbers (aneuploidy) and chromosome structures of a cancer genome. Lines indicate the genome with germline genome on top and cancer genome with somatic aberrations below. Double lines are used when differentiating heterozygous and homozygous changes is useful. Dots represent single nucleotide changes, whereas lines and arrows represent structural changes.
FPhylogenetic versus oncogenetic models. Phylogenetic models of tumor samples (a) and oncogenetic models of cancer drivers (b) use the same type of data: genomic aberrations observed in patient tumor samples. Phylogenetic models (a) use mostly genomewide data of a small number of evolutionary-related tumor samples, either from the same patient or from different clones within the same tumor. Tumor progression models (b), on the other hand, generally concentrate on a small number of aberrations observed in a larger number of independent tumors from different patients.
Software tools implementing phylogenetic methods for reconstructing within-patient and within-tumor evolutionary tumor histories.
| Software | Data | Model / Inference | References |
|---|---|---|---|
| PhyloSuba | SNV | Tree-stick-breaking process, binomial / MCMC | ( |
| PyCloneb | SNV | Dirichlet Process, beta-binomial / MCMC | ( |
| SciClonec | SNV | Beta mixture model | |
| Clomiald | SNV | Binomial / EM | ( |
| Trape | SNV | Exhaustive search under constraints | ( |
| CloneHDf | SNV + CNA | HMM, EM, Variational Bayes | ( |
| ThetAg | CNA | Maximum likelihood | ( |
| cancerTimingh | CNA | Maximum likelihood | ( |
| GRAFTi | CNA | Partial maximum likelihood | ( |
| MEDICCj | CNA | Finite state transducer, Minimum-event distance | ( |
| TuMultk | CNA | Breakpoint distance | ( |
| TITANl | CNA | HMM / EM | ( |
Notes: SNV, single-nucleotide variant; CNA, copy number aberration; MCMC, Markov-chain monte carlo; EM, expectation maximization; HMM, Hidden Markov Model;
ahttps://github.com/morrislab/phylosub
bhttp://compbio.bccrc.ca/software/pyclone
chttps://github.com/genome/sciclone
dhttp://www.bioconductor.org/packages/devel/bioc/html/Clomial.html
ehttp://sourceforge.net/projects/klugerlab/files/TrAp
fhttps://github.com/andrej-fischer/cloneHD
ghttps://github.com/raphael-group/THetA
hhttp://cran.r-project.org/web/packages/cancerTiming
ihttp://www.sanger.ac.uk/genetics/CGP/Software/GRAFT
jhttps://bitbucket.org/rfs/medicc
khttp://bioserv.rpbs.univ-paris-diderot.fr/ letouze/TuMult
lhttp://compbio.bccrc.ca/software/titan/
FInferring tumor phylogeny from next-generation sequencing data. a) Subclones are related to each other by an evolutionary process of acquisition of mutations. In this example, the three clones (leaf nodes) are characterized by different combinations of the four single nucleotide variant (SNV) sets , , , and . The percentages on the edges of the tree indicate the fraction of cells with this particular set of SNVs, e.g., 70% of all cells carry , 40% additionally carry , and only 7% carry , , and . b) The evolutionary history of a tumor gives rise to a heterogeneous collection of normal cells (small discs) and cancer subclones (large discs, triangles, squares). Internal nodes that have been fully replaced by their descendants (like the one carrying SNV sets and without or ) are no longer part of the tumor. c) Sequencing data consist of short reads covering (parts of) the cancer genome. Comparison to the germline DNA of the same patient allows to identify SNVs and other genomic aberrations. Since reads are short, most will only cover a single SNV. In few cases, pairs of SNVs are covered, which allows to assess patterns of co-occurrence and mutual exclusivity between SNVs. d) The sets of SNVs distinguishing the subclones cluster in the SNV frequency distribution. The mean of each cluster (-axis) is the fraction of cells carrying this set of SNVs. The goal of tumor phylogenetics is to infer the evolutionary tree (a) from the mutations observed in the sequencing data (c) and their frequencies (d).
FTwo simple principles for tree inference from SNVs. For a given set of subclones and their respective clonal fractions, each illustrated by a triangle with a dot at the top vertex representing the clonal origin, two conditions need to be met for a potential phylogeny to be considered feasible: a) Dirichlet's box: If two SNV frequencies (small triangles inside large triangle) sum to more than 100%, then some cancer cells must contain both SNVs (overlap of the two small triangles). In a tree-like evolutionary process some cells must have acquired the same mutation independently, which in cancer, is considered highly unlikely. Hence, one of the two subclones (small triangles) is ancestral to the other. b) Larger ancestor: In this case, if one clonal fraction is larger than the other, the larger must be the ancestor; otherwise cancer cells would have lost the previously gained mutation (nonoverlapping regions between the two small triangles at the bottom), which again is considered highly unlikely. The most likely feasible solution is shown in c), where both principles are met (and the two small triangles are nested).
FPhasing copy number profiles. While SNP arrays are capable of determining a major and minor copy number for the two parental alleles, their assignment (phasing) to the two actual physical alleles A and B is unknown. Because evolutionary events happen on the physical copies, correct phasing is essential for determining evolutionary distances. In this example, the two major copy number profiles between sample 1 and sample 2 (left) have a distance of two events (one amplification at position 1 and one amplification spanning positions 4 and 5), while the minor copy number profiles are identical, yielding a total of two events between the genomes of sample 1 and sample 2. Optimal assignment (right) to the alleles A and B reduces the evolutionary distance to a single amplification event spanning the first five genomic loci. This is also not evident from the total copy number (the sum of major and minor) which would still require two separate events.
Software tools implementing probabilistic graphical models for estimating cancer progression.
| Model | Topology | LPD | Constraints | Noise | Learning | Software | References |
|---|---|---|---|---|---|---|---|
| OT/HI | tree | discrete | monotone | no | ML | oncomodela | ( |
| OT | tree | discrete | monotone | no | MWB | oncotreesb | ( |
| OT | tree | discrete | monotone | yes | MWB | oncotreec | ( |
| HOT | tree | discrete | monotone | yes | ML via SEM | n.a. | ( |
| Mixture of OTs | forest | discrete | monotone | yes | ML via SEM | mtreemixd | ( |
| Mixture of HOTs | forest | discrete | monotone | yes | ML via SEM | hotmixe | ( |
| CBN | DAG | discrete | monotone | yes | ML | cbnf | ( |
| CT-CBN | DAG | waiting time | monotone | no | ML via EM | ct-cbng | ( |
| Hidden CBN | DAG | waiting time | monotone | yes | ML via EM, SA | h-cbnh | ( |
| Bayesian CBN | DAG | discrete | monotone | yes | MCMC | bayes-cbni | ( |
| NAM | DAG | waiting time | none | no | ML | n.a. | ( |
| ON | DAG | discrete | (semi-)mon. | yes | MILP | diprogj | ( |
| RESIC | none | RE | none | no | SM | upon request | ( |
Notes: OT, Oncogenetic tree; OT/HI, OT with hidden internal nodes; HOT, Hidden oncogenetic tree; CBN, Conjunctive Bayesian Network; CT-CBN, Continuous-time CBN; NAM, Network aberration model; ON, Oncogenetic network; LPD, local probability distribution; RESIC, Retracing the Evolutionary Steps in Cancer; DAG, directed acyclic graph; ML, maximum likelihood; EM, Expectation-Maximization algorithm; SEM, Structural EM; MCMC, Markov chain Monte Carlo; MWB, Maximum weight branching; MILP, mixed integer linear program; RE, Rate of evolution (Moran process); SA, Simulated annealing; SM, Simplex minimization by Nelder and Mead
ahttp://cran.r-project.org/web/packages/oncomodel/
bhttp://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/cgh.html
chttp://cran.r-project.org/web/packages/Oncotree/
dhttp://mtreemix.bioinf.mpi-inf.mpg.de/
ehttps://github.com/atofigh/hotmix
fhttp://www.cbg.ethz.ch/software/cbn/
ghttp://www.cbg.ethz.ch/software/ct-cbn/
hhttp://www.cbg.ethz.ch/software/ct-cbn/
ihttp://www.cbg.ethz.ch/software/bayes-cbn/
jhttps://bitbucket.org/farahani/diprog