| Literature DB >> 29988082 |
Sagi Abelson1, Grace Collord2,3, Stanley W K Ng4, Omer Weissbrod5, Netta Mendelson Cohen5, Elisabeth Niemeyer6, Noam Barda7, Philip C Zuzarte8, Lawrence Heisler8, Yogi Sundaravadanam8, Robert Luben9, Shabina Hayat9, Ting Ting Wang1,10, Zhen Zhao1, Iulia Cirlan1, Trevor J Pugh1,8,10, David Soave8, Karen Ng8, Calli Latimer2, Claire Hardy2, Keiran Raine2, David Jones2, Diana Hoult11, Abigail Britten11, John D McPherson8, Mattias Johansson12, Faridah Mbabaali8, Jenna Eagles8, Jessica K Miller8, Danielle Pasternack8, Lee Timms8, Paul Krzyzanowski8, Philip Awadalla8, Rui Costa13, Eran Segal5, Scott V Bratman1,8,14, Philip Beer2, Sam Behjati2,3, Inigo Martincorena2, Jean C Y Wang1,15,16, Kristian M Bowles17,18, J Ramón Quirós19, Anna Karakatsani20,21, Carlo La Vecchia20,22, Antonia Trichopoulou20, Elena Salamanca-Fernández23,24, José M Huerta24,25, Aurelio Barricarte24,26,27, Ruth C Travis28, Rosario Tumino29, Giovanna Masala30, Heiner Boeing31, Salvatore Panico32, Rudolf Kaaks33, Alwin Krämer34, Sabina Sieri35, Elio Riboli36, Paolo Vineis36, Matthieu Foll12, James McKay12, Silvia Polidoro37, Núria Sala38, Kay-Tee Khaw39, Roel Vermeulen40, Peter J Campbell2,41, Elli Papaemmanuil2,42, Mark D Minden1,10,15,16, Amos Tanay5, Ran D Balicer7, Nicholas J Wareham11, Moritz Gerstung43,44, John E Dick45,46, Paul Brennan47, George S Vassiliou48,49,50, Liran I Shlush51,52,53.
Abstract
The incidence of acute myeloid leukaemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 65. Most cases arise without any detectable early symptoms and patients usually present with the acute complications of bone marrow failure1. The onset of such de novo AML cases is typically preceded by the accumulation of somatic mutations in preleukaemic haematopoietic stem and progenitor cells (HSPCs) that undergo clonal expansion2,3. However, recurrent AML mutations also accumulate in HSPCs during ageing of healthy individuals who do not develop AML, a phenomenon referred to as age-related clonal haematopoiesis (ARCH)4-8. Here we use deep sequencing to analyse genes that are recurrently mutated in AML to distinguish between individuals who have a high risk of developing AML and those with benign ARCH. We analysed peripheral blood cells from 95 individuals that were obtained on average 6.3 years before AML diagnosis (pre-AML group), together with 414 unselected age- and gender-matched individuals (control group). Pre-AML cases were distinct from controls and had more mutations per sample, higher variant allele frequencies, indicating greater clonal expansion, and showed enrichment of mutations in specific genes. Genetic parameters were used to derive a model that accurately predicted AML-free survival; this model was validated in an independent cohort of 29 pre-AML cases and 262 controls. Because AML is rare, we also developed an AML predictive model using a large electronic health record database that identified individuals at greater risk. Collectively our findings provide proof-of-concept that it is possible to discriminate ARCH from pre-AML many years before malignant transformation. This could in future enable earlier detection and monitoring, and may help to inform intervention.Entities:
Mesh:
Year: 2018 PMID: 29988082 PMCID: PMC6485381 DOI: 10.1038/s41586-018-0317-6
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 49.962
Figure 1Prevalence of ARCH, number of mutations and clone size in individuals who developed AML
a, Prevalence of ARCH-PD among pre-AML cases (red) and controls (blue). b, The number of ARCH-PD mutations detected in cases and controls according to age. Box plot centres, hinges and whiskers represent the median, first and third quartiles and 1.5 x interquartile range, respectively. c, VAF of ARCH-PD mutations. Significant differences are defined as P<0.0005 (two-sided Wilcoxon rank sum test with Bonferroni multiple testing correction) and are indicated by asterisks (*). All panels show data for n=800 biologically independent samples.
Extended Data Figure 1Prevalence of ARCH-PD mutations with VAF ≥ 10% according to age.
Red and blue lines represent the proportion of pre-AMLs and controls, respectively, harbouring ARCH-PD mutations with VAF ≥10%.
Extended Data Figure 2Serial collected sampling supports a long-lived HSPC as the cell of origin for most ARCH-PD clones
a,b, VAF trajectory of persistent clones carrying putative driver mutations in pre-AML cases (right panel) and controls (left panel). Age is indicated on the x-axis. In the upper panel, VAF is shown on the y-axis and each persistent mutation is shown in a different colour, with circles denoting individual serial samples and solid lines representing the growth trajectory between serial samples. In the lower panel, dashed lines indicate the time interval between the last sampling and the end of follow-up (controls) or AML diagnosis (cases). c, Clonal growth rates (α) are shown for 27 control clones corresponding to 54 time points and 13 pre-AML clones corresponding to 15 time points. Box plots show median and whiskers represent the lower and upper quartiles.
Figure 2Acquisition of specific recurrent AML mutations by healthy individuals at young age is associated with progression to AML
a, Relative frequency of mutations in the indicated genes according to age group for pre-AMLs (red) and controls (blue). b, Proportion of pre-AML cases and controls harbouring ARCH-PD mutations in recurrently mutated genes. Asterisks (*) indicate P<0.05 (Fisher’s exact test with Bonferroni multiple testing correction). c, Plot showing the cumulative frequency of recurrent AML mutations (reported in >5 specimens in COSMIC) in pre-AML cases and controls. ARCH-PD mutations are ranked from left to right along the x-axis from low to high recurrence. d, VAF of recurrent mutations in cases and controls. Low, intermediate and highly recurrent COSMIC mutations are defined as those reported in 5-19 samples, 20-300 samples and >300 samples, respectively. Box plots indicate median, first and third quartiles and 1.5 x interquartile range. P-values were calculated by two-sided Wilcoxon rank sum test with Bonferroni multiple testing correction. All panels show data for n=800 unique individuals.
Extended Data Figure 3Performance of combined model in predicting AML progression.
a, Receiver operating characteristic (ROC) curve for prediction of AML development using model 1 (see Methods). The red dot indicates the point on the curve with the highest positive predictive value (PPV) with sensitivity of 41.9% and specificity of 95.7%. b, Kaplan-Meier estimates of time to AML diagnosis for individuals predicted to develop AML (red) and not develop AML (blue) by model 1 (HR = 10.38, P 4.2e-10 ,Wald test) and c) model 2 (HR = 10.75, P = 1.75e-08, Wald test), from the point of enrolment until the end of follow-up to the EPIC study.
Extended Data Figure 4AML predictive models
a,b,c Time-dependent receiver operating characteristic curve for Cox proportional hazards model trained on the DC (a), VC (b) and combined cohorts (c). d,e,f Dynamic AUC for Cox proportional hazards models trained on the DC (d), VC (e) or combined cohort (f). g,h, Red and blue bars indicate the observed and expected VAF (g) and driver frequency (h) for pre-AML cases and controls for each gene indicated on the x-axis. DC, discovery cohort (n = 505 unique individuals); VC, validation cohort (n=291 individuals); ROC, receiver operating characteristic; AUC, area under curve.
Figure 3Model of future AML risk
a, Forest plot of the risk of AML. Purple, orange and green circles indicate hazard ratios and horizontal lines denote 95% confidence intervals for the combined cohort. For each gene, the indicated hazard ratio applies to the AML risk conferred by each 5% increase in mutation VAF over a 10 year period. The green vertical line indicates the mean HR across all genes. The HR for RUNX1 must to be interpreted with caution due to the relatively high prevalence of deleterious germline variants in this gene, which may not be readily distinguishable from somatic mutations in unmatched sequencing assays (see Methods). The proportion of individuals with mutations in each gene and the average VAF are indicated to the right of the forest plot; red and blue circles represent pre-AMLs and controls, respectively, with circle sizes scaled to reflect mutation frequency and VAF. b-d, Kaplan-Meier curves of AML-free survival, defined as the time between sample collection and AML diagnosis, death or last follow-up. Survival curves are stratified according to mutation status in selected genes (b), number of driver mutations per individual and largest clone detected (c) and red cell distribution width (RDW) (d). Panels a-c represent data for n=796 unique individuals and panel d includes n=299 individuals for whom RDW measurements were available.
Extended Data Figure 5AML-free survival according to mutation status and RDW.
a, Kaplan-Meier curves of AML-free survival, defined as the time between sample collection and AML diagnosis, death or last follow-up. Survival curves are stratified according to mutation status in genes mutated in at least 3 samples across the combined validation and discovery cohorts. N=796 unique individuals. b, Kaplan-Meier curve of AML-free survival stratified according to RDW value >14 or ≤14. Plot represents data for N=128 biologically independent individuals with RDW measurements recorded, including all pre-AMLs regardless of ARCH-PD status, and controls with ARCH-PD (controls without detectable mutations omitted). RDW, red cell distribution width.
Extended Data Figure 6Description of the cohort and the EHR derived measurements
a, Kaplan-Meier curves showing age stratified survival rates for 875 individuals who developed AML. b, Line plot representation of the number of cases per 100,000 control individuals in the EHR database. The centre values and error bars define the average and s.d respectively
Figure 4Increased risk for AML development is inferred from electronic health records.
a, Box plot of normalised lab measurements. Increased RDW, reduction in monocyte, platelet, red blood cell and white blood cell counts presented high association (lower panel) with higher AML risk and differed at least a year before AML diagnosis. b, Model performance stratification by age and gender. c, Absolute lab values for true positive (TP) and false negatives (FN) predictions. WBC, white blood cell count; MONO.abs, absolute monocyte count; PLT, platelet count; NEUT.abs, absolute neutrophil count; RBC, red blood cell count; RDW, red cell distribution width. Box plots indicate median, first and third quartiles and 1.5 x interquartile range.
Extended Data Figure 7Laboratory measurements contributing to EHR model
Box plot of normalized lab measurements (upper panels) and their association (lower panel) with higher AML risk. Box plots show median and whiskers represent the lower and upper quartiles
Extended Data Figure 8Top 50 EHR model parameters
Bar chart showing the relative contribution of the top 50 features incorporated into the EHR prediction model, ranked according to their predictive value (gain).
Extended Data Figure 9Distribution of EHR model parameters
Heat-map illustrating absolute values of clinical measurements. Blue, white and red represent low, intermediate and high values, respectively. Light grey represents missing data. FN and TP annotation is indicated on the lower bar as dark-grey and yellow color respectively. FN, false negative; TP, false positive; EHR, electronic health record.
Genes sequenced by cRNA bait pulldown in the validation cohort.