| Literature DB >> 28939880 |
Abstract
The widely accepted multiple-hit hypothesis of carcinogenesis states that cancers arise after several successive events. However, no consensus has been reached on the quantity and nature of these events, although "driver" mutations or epimutations are considered the most probable candidates. By using the largest publicly available cancer incidence statistics (20 million cases), I show that incidence of 20 most prevalent cancer types in relation to patients' age closely follows the Erlang probability distribution (R2 = 0.9734-0.9999). The Erlang distribution describes the probability y of k independent random events occurring by the time x, but not earlier or later, with events happening on average every b time intervals. This fits well with the multiple-hit hypothesis and potentially allows to predict the number k of key carcinogenic events and the average time interval b between them, for each cancer type. Moreover, the amplitude parameter A likely predicts the maximal populational susceptibility to a given type of cancer. These parameters are estimated for 20 most common cancer types and provide numerical reference points for experimental research on cancer development.Entities:
Mesh:
Year: 2017 PMID: 28939880 PMCID: PMC5610194 DOI: 10.1038/s41598-017-12448-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Comparison of different statistical distributions with actual distributions of prostate and breast cancer incidence by age. Dots indicate actual data for 5-year age intervals, curves indicate PDFs fitted to the data. The middle age of each age group is plotted. Different colours indicate different years of observation, from 1999 to 2012. The fitting procedure was identical for all distributions. The normal distribution did not converge for prostate cancer. Prostate and breast cancers were selected due to being the highest-incidence gender-specific cancer types.
Figure 2The Erlang distribution approximates cancer incidence by age for 20 most prevalent cancer types. Dots indicate actual data for 5-year age intervals, curves indicate the PDF of the Erlang distribution fitted to the data (see Table 1 for R2 and estimated parameters). The middle age of each age group is plotted. Cancer types are arranged in the order of decreasing incidence.
Estimated carcinogenesis parameters for 20 most prevalent cancer types.
| Cancer type |
|
|
| R2 |
|---|---|---|---|---|
| Number of carcinogenic events ± s.e.m. | Average time between events, years ± s.e.m. | Maximal populational susceptibility, % ± s.e.m. | Goodness of fit | |
| Prostate | 41 ± 1 | 1.83 ± 0.00 | 26.40 ± 0.18 | 0.9992 |
| Lung and bronchus | 30 ± 2 | 2.75 ± 0.01 | 16.44 ± 0.24 | 0.9981 |
| Colon and rectum | 10 ± 1 | 13.75 ± 0.17 | 66.93 ± 3.80 | 0.9991 |
| Breast | 9 ± 1 | 10.71 ± 0.09 | 20.44 ± 0.46 | 0.9981 |
| Bladder | 21 ± 1 | 4.59 ± 0.02 | 9.93 ± 0.17 | 0.9995 |
| Non-Hodgkin lymphomas | 8 ± 1 | 19.26 ± 0.58 | 31.21 ± 3.90 | 0.9964 |
| Uterus | 20 ± 1 | 3.67 ± 0.02 | 3.77 ± 0.05 | 0.9954 |
| Pancreas | 15 ± 1 | 7.07 ± 0.01 | 7.15 ± 0.06 | 0.9999 |
| Melanoma | 4 ± 1 | 81.01 ± 7.38 | 100 | 0.9954 |
| Leukaemias | 8 ± 2 | 23.56 ± 1.09 | 49.57 ± 10.93 | 0.9957 |
| Kidney | 15 ± 1 | 5.75 ± 0.04 | 3.69 ± 0.07 | 0.9971 |
| Ovary | 8 ± 1 | 13.66 ± 0.12 | 5.40 ± 0.13 | 0.9989 |
| Stomach | 11 ± 1 | 11.51 ± 0.15 | 7.25 ± 0.42 | 0.9986 |
| Oral cavity | 13 ± 1 | 6.32 ± 0.03 | 2.29 ± 0.03 | 0.9983 |
| Myeloma | 16 ± 1 | 6.14 ± 0.03 | 2.67 ± 0.06 | 0.9992 |
| Oesophagus | 20 ± 0 | 4.25 ± 0.00 | 1.27 ± 0.00 | 0.9999 |
| Liver | 13 ± 2 | 6.67 ± 0.11 | 1.45 ± 0.07 | 0.9863 |
| Brain | 4 ± 1 | 76.69 ± 13.77 | 26.34 ± 14.52 | 0.9777 |
| Thyroid | 5 ± 0 | 14.67 ± 0.24 | 1.52 ± 0.04 | 0.9734 |
| Larynx | 24 ± 1 | 3.15 ± 0.01 | 0.71 ± 0.01 | 0.9989 |
The parameters are determined for the Erlang distribution fitted to actual cancer incidence data (see Fig. 2). Cancer types are listed in the order of decreasing incidence.
Robustness of carcinogenesis parameter estimation for prostate cancer.
| Year of observation |
|
|
| R2 |
|---|---|---|---|---|
| Number of carcinogenic events ± s.e.m. | Average time between events, years ± s.e.m. | Maximal populational susceptibility, % ± s.e.m. | Goodness of fit | |
| 1999 | 40.72 ± 1.28 | 1.876 ± 0.063 | 31.79 ± 0.48 | 0.9992 |
| 2000 | 39.56 ± 1.28 | 1.931 ± 0.067 | 32.23 ± 0.50 | 0.9992 |
| 2001 | 40.59 ± 1.16 | 1.873 ± 0.057 | 32.00 ± 0.43 | 0.9993 |
| 2002 | 38.82 ± 0.99 | 1.955 ± 0.053 | 31.57 ± 0.38 | 0.9994 |
| 2003 | 38.37 ± 1.25 | 1.981 ± 0.069 | 28.82 ± 0.45 | 0.9991 |
| 2004 | 38.10 ± 1.41 | 1.992 ± 0.079 | 27.94 ± 0.49 | 0.9988 |
| 2005 | 38.67 ± 1.29 | 1.959 ± 0.070 | 27.33 ± 0.43 | 0.9990 |
| 2006 | 39.85 ± 1.21 | 1.886 ± 0.061 | 28.30 ± 0.39 | 0.9991 |
| 2007 | 40.14 ± 1.46 | 1.863 ± 0.072 | 28.67 ± 0.47 | 0.9987 |
| 2008 | 41.56 ± 1.58 | 1.784 ± 0.072 | 25.49 ± 0.43 | 0.9984 |
| 2009 | 42.91 ± 1.79 | 1.711 ± 0.075 | 23.35 ± 0.42 | 0.9979 |
| 2010 | 44.39 ± 2.16 | 1.651 ± 0.084 | 21.62 ± 0.45 | 0.9971 |
| 2011 | 44.97 ± 2.48 | 1.623 ± 0.094 | 21.14 ± 0. 50 | 0.9962 |
| 2012 | 44.19 ± 2.32 | 1.648 ± 0.090 | 16.84 ± 0.38 | 0.9964 |
The parameters are determined for the gamma distribution fitted to actual cancer incidence data (see Fig. 1). The gamma distribution was selected instead of the Erlang distribution to show precise estimates for the number of carcinogenic events. Prostate cancer was selected due to the highest incidence, the highly efficient screening procedure, the highest estimated number of carcinogenic events and the dramatic variation in incidence between the years of observation.