| Literature DB >> 33666134 |
Ludmila Danilova1,2, John Wrangle3, James G Herman4, Leslie Cope1.
Abstract
The contribution of DNA-methylation based gene silencing to carcinogenesis is well established. Increasingly, DNA-methylation is examined using genome-wide techniques, with recent public efforts yielding immense data sets of diverse malignancies representing the vast majority of human cancer related disease burden. Whereas mutation events may group preferentially or in high frequency with a given histology, mutations are poor classifiers of tumour type. Here we examine the hypothesis that cancer-specific DNA-methylation reflects the tissue of origin or carcinogenic risk factor, and these methylation abnormalities may be used to faithfully classify tumours according to histology. We present an analysis of 7427 tumours representing 19 human malignancies and 708 normal samples demonstrating that specific tumour changes in methylation can correctly determine site of origin and tumour histology with 86% overall accuracy. Examination of misclassified tumours reveals underlying shared biology as the source of misclassifications, including common cell of origin or risk factors.Entities:
Keywords: DNA methylation; TCGA; cancer diagnosis; random forest
Mesh:
Substances:
Year: 2021 PMID: 33666134 PMCID: PMC8865329 DOI: 10.1080/15592294.2021.1890885
Source DB: PubMed Journal: Epigenetics ISSN: 1559-2294 Impact factor: 4.528
Figure 1.Universal DNA methylation marks of cancer. A heatmap displays methylation beta values of 73 (32 hyper- and 41 hypo-methylated) probes in the validation set of 19 tumour types. The probes were selected on five core tumours by Boruta algorithm. Dark blue colour on the heatmap corresponds to fully methylated status (beta value = 1); white colour corresponds to unmethylated status (beta value = 0). CpG probes in rows are hierarchically clustered. Samples in columns are clustered by tumour type and by sample type (tumour and normal). Rows (probes) are annotated by direction of methylation comparing to normals (hyper- or hypo-methylated) and by proximity of a probe to CpG island. Columns (samples) are annotated by colour representing histologically confirmed tumour type (tumorType), sample type (tumour/normal), probability to be a tumour sample estimated by random forest model (prob.T), and if a sample was misclassified (misclass). A list of misclassified samples and corresponding probabilities are in Table S2. Tumour type abbreviations can be found in Table S1.
Figure 2.Histology-specific markers for five core tumour types. A heatmap displays methylation beta values of 305 probes resulting from Boruta analysis of 100 tumours from each of 5 core tumour histologies to determine a classifier set of probes, as well as beta values of normal samples corresponding to five core tumours. CpG probes in rows are hierarchically clustered. Samples in columns are clustered by tumour type and by sample type (tumour and normal). The classification results on the validation set of core tumours are shown in Table S6. Annotation colours of rows and columns and the beta value colours are the same as on Figure 1. See Figure 1 legend for details.
Figure 3.A 305-probe classifier set derived from five core tumour types used to classify tumours according to histology for 19 human malignancies. A heatmap of methylation beta values for 305 probes (rows) in 1900 samples (columns) from the training set of 19 tumour types. Columns are annotated by colour representing histologically confirmed tumour type. CpG probes in rows are hierarchically clustered. Samples in columns are clustered by tumour type and by sample type (tumour and normal). Annotation colours of rows and columns and the beta value colours are the same as on Figure 1. See Figure 1 legend for details.
Figure 4.Confusion matrix (in per cent) of the validation set of prediction of 19 cancer types. Confusion matrix of the validation set (n = 5527) of cancer type prediction using 305 Boruta selected probes on five core cancers and applied to predict 19 cancer types. The core cancers are highlighted in grey on the left. The percentage of correctly predicted samples is highlighted in blue; more than 5% of misclassification events are highlighted in pink. True histology is in rows; predicted histology is in columns. The error rate is in italic. Tumour type abbreviations can be found in Table S1.