| Literature DB >> 29718321 |
Michael Lee1, Erdahl T Teber2, Oliver Holmes3, Katia Nones4, Ann-Marie Patch4, Rebecca A Dagg5, Loretta M S Lau5, Joyce H Lee5, Christine E Napier5, Jonathan W Arthur2, Sean M Grimmond6, Nicholas K Hayward7,8, Peter A Johansson8, Graham J Mann7,9, Richard A Scolyer7,10,11, James S Wilmott7,10, Roger R Reddel5, John V Pearson3, Nicola Waddell4, Hilda A Pickett1.
Abstract
The replicative immortality of human cancer cells is achieved by activation of a telomere maintenance mechanism (TMM). To achieve this, cancer cells utilise either the enzyme telomerase, or the Alternative Lengthening of Telomeres (ALT) pathway. These distinct molecular pathways are incompletely understood with respect to activation and propagation, as well as their associations with clinical outcomes. We have identified significant differences in the telomere repeat composition of tumours that use ALT compared to tumours that do not. We then employed a machine learning approach to stratify tumours according to telomere repeat content with an accuracy of 91.6%. Importantly, this classification approach is applicable across all tumour types. Analysis of pathway mutations that were under-represented in ALT tumours, across 1,075 tumour samples, revealed that the autophagy, cell cycle control of chromosomal replication, and transcriptional regulatory network in embryonic stem cells pathways are involved in the survival of ALT tumours. Overall, our approach demonstrates that telomere sequence content can be used to stratify ALT activity in cancers, and begin to define the molecular pathways involved in ALT activation.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29718321 PMCID: PMC6007693 DOI: 10.1093/nar/gky297
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
List of tumour types and abbreviations for datasets analysed from TCGA
| Tumour type | Abbreviation |
|---|---|
| Bladder urothelial carcinoma | BLCA |
| Brain lower grade glioma | LGG |
| Breast invasive carcinoma | BRCA |
| Cervical squamous cell carcinoma and endocervical adenocarcinoma | CESC |
| Colon adenocarcinoma | COAD |
| Esophageal carcinoma | ESCA |
| Glioblastoma multiforme (adult) | GBM |
| Head and neck squamous cell carcinoma | HNSC |
| Kidney chromophobe | KICH |
| Kidney renal clear cell carcinoma | KIRC |
| Kidney renal papillary cell carcinoma | KIRP |
| Liver hepatocellular carcinoma | LICH |
| Lung adenocarcinoma | LUAD |
| Lung squamous cell carcinoma | LUSC |
| Ovarian serous cystadenocarcinoma | OV |
| Prostate adenocarcinoma | PRAD |
| Sarcoma | SARC |
| Skin cutaneous melanoma | SKCM |
| Stomach adenocarcinoma | STAD |
| Thyroid carcinoma (Papillary Thyroid Carcinoma) | THCA |
| Uterine corpus endometrial carcinoma | UCEC |
Figure 1.Estimation of sequencing error rate in telomere repeats using a synthetic substrate. (A) Schematic outlining the experimental design and analysis pipeline to determine the sequencing error rate in telomere repeats. Document symbol denotes data files generated. (B) Calculated sequencing error rate by base position in TTAGGG repeat unit using different base quality score filters for trimming. (C) Calculated sequencing error rate for the first base position in the TTAGGG repeat unit across the sequence read using different base quality score filters for trimming. (D) Calculated sequencing error rate for each possible base mutation at each position in the TTAGGG repeat unit splitting reads into different strand types: G-strand (reads containing predominantly TTAGGG repeats) and C-strand (reads containing predominantly CCCTAA repeats). All error bars shown are standard error of the mean, n = 2.
Figure 2.Quantification of the proportion of variant repeats in telomeres using WGS. (A) Number of samples found to be ALT +ve or ALT –ve using the C-circle assay in a panel of 86 pancreatic neuroendocrine tumours (PanNET) and a panel of 81 melanomas. (B) Schematic of analysis pipeline used to extract and analyse telomere sequences from WGS data. Document symbol denotes data files generated. This analysis pipeline was used to determine the variant repeat composition of telomeres in the panel of (C) PanNETs, and (D) melanomas. The number of variant repeats was represented as a percentage of telomeric repeats, with tumours separated into ALT +ve and ALT –ve. (E) Comparison between relative telomere content (rel.TC), calculated as log2(tumour/normal), between ALT +ve and ALT –ve tumours across the panel of PanNET and melanomas. (F) Receiver operating characteristic (ROC) curve for use of rel.TC to stratify ALT +ve and ALT –ve tumours. The true positive rate (TPR) was plotted against the false positive rate (FPR), with the calculated area under the curve (AUC) value shown. (G) Accuracies for correctly stratifying ALT +ve and ALT –ve tumours using rel.TC across the panel of PanNETs and melanomas, using a rel.TC cut-off of 0.33.
Figure 3.WGS-based classifiers to determine TMM using telomere variant repeats and relative telomere content. TMM classifiers were generated using the random forest (RF) approach, utilizing variant repeat content and relative telomere content (rel.TC) as features, and using as a training dataset: (A) pancreatic neuroendocrine tumours (PanNETs), (B) melanomas or (C) PanNETs and melanomas combined. Left panel: The proportion of votes that were ALT, produced by the generated RF for ALT +ve and ALT –ve tumours for the validated panel of PanNETs and melanomas. Middle panel: Receiver operating characteristic curve for generated RF classifier. The true positive rate (TPR) was plotted against the false positive rate (FPR), with the calculated area under the curve (AUC) value shown. Right panel: Accuracies for correctly stratifying ALT +ve and ALT –ve tumours using the RF classifier, across the panel of PanNETs and melanomas. (D) Ranked importance of features used in RF classifier, trained using PanNETs and melanomas combined, showing mean decrease in accuracy for each feature used when it is randomly permutated.
Figure 4.Application of TMM classifier to TCGA datasets. (A) Predicted TMM classifications for 908 tumours from 22 tumour types using the random forest classifier. The proportion of votes that were ALT, produced from the classifier, for each sample was plotted, with tumours with >0.5 classified as ALT +ve and those with <0.5 as ALT –ve. A comparison of relative telomere content (rel.TC), calculated as log2(tumour/normal) using qMotif, between predicted ALT +ve and ALT –ve tumours (B) across each of the individual predicted tumour types datasets and (C) across all tumours (including the two validated datasets). The fitted distribution is shown with black ticks marking individual samples and grey ticks marking the mean rel.TC. The dotted grey line marks the overall mean rel.TC.
Figure 5.ATRX, DAXX, and TERT mutations across 24 tumour datasets. (A) The prevalence of somatic coding mutations in the genes ATRX and DAXX in ALT +ve and ALT –ve tumours across 23 tumour types. Somatic mutations were classified by impact using variant effect predictor (VEP), with high and moderate impact mutations shown. (B) The prevalence of activating promoter mutations in the TERT gene (C228T and C250T) in ALT +ve and ALT –ve tumours across 23 tumour types.
Figure 6.Genes and pathways associated with TMM across nine tumour datasets. (A) Genes and (B) pathways identified as containing a significant over- or under-representation of mutations (adjusted P-value < 0.05, FDR of 5%, and >2-fold difference) in ALT +ve tumours compared to ALT –ve tumours across all tumour types combined (PAN-CANCER dataset). Graphs plot the proportion of all ALT +ve and ALT –ve tumours that contain a high or moderate impact mutation in the affected gene or pathway.