| Literature DB >> 29467476 |
Xiaoxiao Zhang1,2,3, Holger Fröhlich1,4, Dima Grigoriev5, Sergey Vakulenko6,7, Jörg Zimmermann1, Andreas Günter Weber8.
Abstract
We propose a simple 3-parameter model that provides very good fits for incidence curves of 18 common solid cancers even when variations due to different locations, races, or periods are taken into account. From a data perspective, we use model selection (Akaike information criterion) to show that this model, which is based on the Weibull distribution, outperforms other simple models like the Gamma distribution. From a modeling perspective, the Weibull distribution can be justified as modeling the accumulation of driver events, which establishes a link to stem cell division based cancer development models and a connection to a recursion formula for intrinsic cancer risk published by Wu et al. For the recursion formula a closed form solution is given, which will help to simplify future analyses. Additionally, we perform a sensitivity analysis for the parameters, showing that two of the three parameters can vary over several orders of magnitude. However, the shape parameter of the Weibull distribution, which corresponds to the number of driver mutations required for cancer onset, can be robustly estimated from epidemiological data.Entities:
Mesh:
Year: 2018 PMID: 29467476 PMCID: PMC5821839 DOI: 10.1038/s41598-018-21734-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Empirical cumulative cancer incidence data are consistent with the Weibull cumulative probability function in 18 cancers (data for ages up to 85 years old). Empirical (blue line) and Weibull function-fitted (red line) cancer cumulative incidence curves for 18 tissues, goodness of fit is reported in each subplot. The 18 cancers exhibit a good goodness of fit when using R2 between model-reported age incidence and the empirical cumulative cancer incidence are used as metrics.
Figure 2Sensitivity analysis of parameter estimates using the scaled Weibull function for exemplary 14 cancer types. Whereas the estimates P for the cell population at risk and the scale parameter λ can vary over two order of magnitude, the estimates of the shape parameter k are within about ±1. Notice that the shape parameter allows interpretation as the number of limiting events.
Figure 3Shape parameters estimated by fitting empirical cancer incidence data using the Weibull function (data with ages for up to 85 years). Cancer patients are grouped by year of diagnosis, race and registry. Cancers are ordered by median values of shape. Shapes are uniform regardless of risk factors, which is consistent with intuitive expectations: race and environmental changes are less likely to alter the number of driver events for cancer onsets.
Figure 4Relationship between cancer incidence and stem cell divisions among 30 cancer types. The lifetime cancer risk regression line is conceptually the same as that used by Wu et al.[11].
Figure 5Relationship between cumulative cancer incidences up to age 40, 50, 60, 70, 80 years old and life time stem cell divisions.
One possible combination of parameters with which the tLIR model of[11] fits empirical data well. We are restricting r to be in the range [10−10, 10−6] as was done by[11].
| Cancer | k | r |
| Stem cell | Division rate | Generation1 | Risk |
|---|---|---|---|---|---|---|---|
| AML | 4.8 | 1.000e − 06 | 1.00 | 1.35e + 08 | 12.000 | 1047.01 | 4.651e − 03 |
| BCC | 4.5 | 1.000e − 06 | 0.98 | 5.82e + 09 | 7.600 | 678.44 | 2.181e − 04 |
| CLL | 4.9 | 1.000e − 06 | 0.99 | 1.35e + 08 | 12.000 | 1047.01 | 6.925e − 03 |
| COAD | 5.5 | 5.012e − 07 | 1.00 | 2.00e + 08 | 73.000 | 6232.58 | 5.677e − 02 |
| DUAD | 5.3 | 1.000e − 06 | 1.00 | 4.00e + 06 | 24.000 | 2061.93 | 3.714e − 04 |
| ESCA | 4.5 | 5.012e − 07 | 0.99 | 8.64e + 05 | 17.400 | 1498.72 | 3.106e − 03 |
| GBNPAD | 3.5 | 1.000e − 06 | 0.85 | 1.60e + 06 | 0.584 | 70.25 | 1.896e − 03 |
| GBM* | 1.35e + 08 | 0.000 | 27.01 | 3.825e − 03 | |||
| HNSC | 3.8 | 1.995e − 07 | 0.99 | 1.85e + 07 | 21.500 | 1851.64 | 1.730e − 02 |
| LHCA | 3.6 | 1.000e − 06 | 0.94 | 3.01e + 09 | 0.912 | 109.05 | 7.079e − 03 |
| LUAD | 2.8 | 7.943e − 08 | 0.83 | 1.22e + 09 | 0.070 | 36.13 | 2.304e − 02 |
| MBM* | 1.36e + 08 | 0.000 | 27.02 | 1.414e − 04 | |||
| SKCM | 3.8 | 1.000e − 06 | 1.00 | 3.80e + 09 | 2.480 | 242.62 | 3.038e − 02 |
| OSARC | 1.0 | 1.585e − 08 | 0.96 | 4.18e + 06 | 0.067 | 27.69 | 2.696e − 04 |
| OSARCA | 1.0 | 6.310e − 07 | 0.96 | 6.50e + 05 | 0.067 | 25.01 | 2.527e − 05 |
| OSARCH | 3.0 | 1.000e − 06 | 0.99 | 8.60e + 05 | 0.067 | 25.41 | 1.660e − 05 |
| OSARCL | 1.0 | 3.981e − 07 | 0.96 | 1.59e + 06 | 0.067 | 26.30 | 1.312e − 04 |
| OSARCP | 3.1 | 1.000e − 06 | 0.91 | 4.50e + 05 | 0.067 | 24.47 | 3.229e − 05 |
| OVGC* | 1.10e + 07 | 0.000 | 23.39 | 7.638e − 05 | |||
| PDAD | 3.8 | 1.000e − 06 | 0.92 | 4.18e + 09 | 1.000 | 116.96 | 1.016e − 02 |
| PECA | 3.6 | 1.000e − 06 | 0.99 | 7.40e + 07 | 1.000 | 111.14 | 1.498e − 04 |
| SIAD | 5.0 | 5.012e − 07 | 1.00 | 1.00e + 08 | 36.000 | 3086.58 | 8.013e − 04 |
| TGCC | 1.9 | 7.943e − 07 | 0.96 | 7.20e + 06 | 5.800 | 515.78 | 2.244e − 03 |
| TPFC | 3.1 | 1.000e − 06 | 0.98 | 6.50e + 07 | 0.087 | 33.35 | 6.922e − 03 |
| TMCA | 3.1 | 7.943e − 07 | 0.93 | 6.50e + 06 | 0.087 | 30.03 | 8.707e − 05 |
1Assuming lifetime is 85 years old, stem cells go through generations.
*Cancers of which parameter estimates are impossible because division rate is 0.
Figure 6Goodness of fit for scaled Weibull function versus that of power law function (a), and scaled Gamma function (b). Each dot represents R2 for one cancer subtype defined by the combination of cancer type and one factor such as diagnosis year, race, location and sex. Cancer types are color coded.
AIC of the scaled Gamma function and the scaled Weibull function.
| Cancer | Gamma | Weibull | |
|---|---|---|---|
| LUSC | lung squamous cell carcinoma | 2467641.74 | 2439830.08 |
| LUAD | lung adenocarcinoma | 1773103.18 | 1757914.86 |
| KIPAN | pan − kidney cohort (kich + kirc + kirp) | 1086052.13 | 1074221.46 |
| BLCA | bladder urothelial carcinoma | 1397493.34 | 1364300.97 |
| THCA | thyroid carcinoma | 970479.60 | 966651.56 |
| PAAD | pancreatic adenocarcinoma | 684511.50 | 680353.75 |
| ESCA | esophageal carcinoma | 426913.10 | 424019.74 |
| OV | ovarian serous cystadenocarcinoma | 249900.19 | 248282.96 |
| SKCM | skin cutaneous melanoma | 3200014.23 | 3147559.74 |
| STAD | stomach adenocarcinoma | 438574.51 | 431801.81 |
| PRAD | prostate adenocarcinoma | 5411659.76 | 5425983.72 |
| COADREAD | colorectal adenocarcinoma | 3498130.92 | 3467772.58 |
| GBMLGG | glioma | 433655.37 | 413968.25 |
| BRCA | breast invasive carcinoma | 6654036.96 | 6705665.03 |
| SARC | sarcoma | 248310.46 | 238833.58 |
| TGCT | testicular germ cell tumors | 128517.41 | 128593.83 |
| HNSC | head and neck squamous cell carcinoma | 627828.32 | 626670.00 |
| LIHC | liver hepatocellular carcinoma | 510889.67 | 505268.40 |
Figure 7Number of driver mutations required for cancer onset estimated by classical power law model (red) and our scaled Weibull model (blue).
Manually curated cancer definitions.
| Cancer | Abbr. | Primary site1 | Histology2 |
|---|---|---|---|
| Acute myeloid leukemia | AML | 9840, 9861, 9865–9867, 9869, 9871–9874, 9895–9897, 9898, 9910–9911, 9920 | |
| Basal cell carcinoma | BCC | 8090–8095, 8097–8098 | |
| Chronic lymphocytic leukemia | CLL | 9823 | |
| Colorectal adenocarcinoma | COAD | C180-C189, C199, C209-C212, C218, C260 | 8140–8141, 8143, 8145, 8147, 8210–8211, 8220–8221, 8570–8576 |
| Duodenum adenocarcinoma | DUAD | ICD9 1520 | 8140–8141, 8143, 8145, 8147, 8210–8211, 8220–8221, 8570–8576 |
| Esophageal squamous cell carcinoma | ESCA | C150-C155, C158-C159 | 8070–8076, 8078 |
| Gallbladder non papillary adenocarcinoma | GBNPAD | C239 | 8000–8005, 8010–8015, 8020–8022, 8041, 8043, 8050–8052, 8070–8076, 8078, 8140–8141, 8143, 8147, 8160–8162, 8255, 8480–8481, 8490, 8500–8501, 8503–8504, 8507–8508 8560, 8562, 8570–8576, 8896, 8900–8902, 8980–8982 9590–9591, 9596, 9650–9655, 9659, 9661–9665, 9667, 9670–9671, 9673, 9675, 9680, 9684, 9687–9688, 9690–9691, 9695, 9698–9699, 9701–9702, 9705, 9712, 9714, 9716, 9719, 9724, 9727–9729, 9731, 9734–9735, 9737–9738, 9740–9741, 9750–9751, 9754–9759, 9811–9818, 9823, 9831, 9837, 9965, 9967, 9971, 9975 |
| Glioblastoma | GBM | C710-C725, C753 | 9440–9441, 9442, 9444 |
| Head and neck squamous cell carcinoma | HNSC | ICD9 1400–1419, 1430–1499, 1600–1619 | 8070–8076, 8078 |
| Hepatocellular carcinoma | LHCA | C220-C221 | |
| Lung adenocarcinoma | LUAD | C340-C343, C348-C349 | 8140–8141, 8143, 8147, 8570–8576 |
| Medulloblastoma | MBM | C710-C725, C753 | 9470–9474 |
| Melanoma | SKCM | C440-C449 | 8720–8790 |
| Osteosarcoma | OSARC | ICD9 1700–1709 | 9180–9189 |
| Osteosarcoma of the arms | OSARCA | ICD9 1704–1705 | 9180–9189 |
| Osteosarcoma of the head | OSARCH | ICD9 1700 | 9180–9189 |
| Osteosarcoma of the legs | OSARCL | ICD9 1707–1708 | 9180–9189 |
| Osteosarcoma of the pelvis | OSARCP | ICD9 1706 | 9180–9189 |
| Pancreatic ductal adenocarcinoma | PDAD | C250-C259 | 8140–8141, 8143, 8147, 8210–8211, 8255, 8260–8263, 8310, 8480–8481, 8570–8576 |
| Pancreatic endocrine (islet cell) carcinoma | PECA | C250-C259 | 8150–8157 |
| Small intestine adenocarcinoma | SIAD | C170-C173, C178-C179 | 8140–8141, 8143, 8145, 8147, 8255, 8260–8263, 8480–8481, 8570–8576 |
| Thyroid papillary or follicular carcinoma | TPFC | C739 | 8050, 8260–8263, 8330–8333, 8335, 8337, 8340–8347, 8450 |
| Thyroid medullary carcinoma | TMCA | C739 | 8510 |
| Ovarian germ cell | OVGC | C569 | 9060–9065 |
| Testicular germ cell cancer | TGCC | C620-C621, C629 | 9060–9065 |
1Either ICD-O-3 site code or ICD9 code describing tumor primary site is provided.
2ICD-O-3 histology code.
TCGA cancer definitions for 18 cancer types.
| Cancer | Abbreviation | Primary site | Histology |
|---|---|---|---|
| Bladder urothelial carcinoma | BLCA | C670-C676, C679 | 8010, 8070, 8120, 8130, 8260 |
| Breast invasive carcinoma | BRCA | C502-C505, C508-C509 | 8010, 8013, 8022, 8050, 8090, 8200–8201, 8211, 8401, 8480, 8500, 8502–8503, 8507, 8510, 8520, 8522–8524, 8541, 8575, 9020 |
| Colorectal adenocarcinoma | COADREAD | C180, C182-C189, C199, C209, C494, C809 | 8010, 8140, 8211, 8255, 8260, 8263, 8480, 8560, 8574 |
| Esophageal carcinoma | ESCA | C151, C153-C155, C159-C160 | 8070–8071, 8083, 8140, 8211, 8480 |
| Glioma | GBMLGG | C710-C714, C718-C719 | 9382, 9400–9401, 9440, 9450–9451 |
| Head and neck squamous cell carcinoma | HNSC | C009, C019, C021-C022, C029-C031, C039-C040, C049-C050, C059-C060, C062, C069, C099, C103, C109, C139, C148, C321, C329, C411 | 8070–8072, 8074, 8083 |
| Pan-kidney cohort (KICH + KIRC + KIRP)* | KIPAN | C649 | 8260, 8310, 8312, 8317 |
| Liver hepatocellular carcinoma | LIHC | C220 | 8170–8171, 8173–8174, 8180, 8310 |
| Lung adenocarcinoma | LUAD | C340-C343, C348-C349 | 8140, 8230, 8250, 8252–8253, 8255, 8260, 8310, 8480, 8490, 8507, 8550 |
| Lung squamous cell carcinoma | LUSC | C340-C343, C348-C349 | 8052, 8070–8073, 8083, 8140 |
| Ovarian serous cystadenocarcinoma | OV | C480-C482, C569 | 8440–8441, 8460 |
| Pancreatic adenocarcinoma | PAAD | C250-C252, C258-C259 | 8020, 8140, 8246, 8255, 8480, 8500 |
| Prostate adenocarcinoma | PRAD | C619 | 8140, 8255, 8480, 8490, 8500, 8550 |
| Sarcoma | SARC | C029, C169, C186, C402-C403, C471, C480-C481, C490-C496, C498-C499, C540, C542, C549, C559, C569, C631, C649, C701 | 8800, 8802, 8805, 8811, 8821–8822, 8830, 8850–8851, 8854, 8858, 8890, 8896, 9040–9041, 9043, 9540 |
| Skin cutaneous melanoma | SKCM | C079, C179, C189, C218, C220, C300, C341, C343, C349, C410, C442-C447, C449, C482, C490-C499, C509, C519, C529, C541, C711, C713, C719-C720, C749, C761-C763, C770, C772-C775, C779 | 8720–8721, 8730, 8742–8744, 8770–8772 |
| Stomach adenocarcinoma | STAD | C160-C163, C165, C169 | 8140, 8144–8145, 8211, 8255, 8260, 8480, 8490 |
| Testicular germ cell tumors | TGCT | C629 | 9061, 9070–9071, 9080–9081, 9085 |
| Thyroid carcinoma | THCA | C739 | 8050, 8260, 8290, 8330, 8340, 8342, 8344, 8350 |
*KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma.