| Literature DB >> 31419933 |
Pascal David Johann1,2,3,4, Natalie Jäger5,6,7, Stefan M Pfister5,6,8,7, Martin Sill5,6,7.
Abstract
BACKGROUND: With the advent of array-based techniques to measure methylation levels in primary tumor samples, systematic investigations of methylomes have widely been performed on a large number of tumor entities. Most of these approaches are not based on measuring individual cell methylation but rather the bulk tumor sample DNA, which contains a mixture of tumor cells, infiltrating immune cells and other stromal components. This raises questions about the purity of a certain tumor sample, given the varying degrees of stromal infiltration in different entities. Previous methods to infer tumor purity require or are based on the use of matching control samples which are rarely available. Here we present a novel, reference free method to quantify tumor purity, based on two Random Forest classifiers, which were trained on ABSOLUTE as well as ESTIMATE purity values from TCGA tumor samples. We subsequently apply this method to a previously published, large dataset of brain tumors, proving that these models perform well in datasets that have not been characterized with respect to tumor purity .Entities:
Mesh:
Substances:
Year: 2019 PMID: 31419933 PMCID: PMC6697926 DOI: 10.1186/s12859-019-3014-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview on published methods to infer tumor purity based on WES/SNP array, gene expression arrays and methylation arrays
| Publication | Method name | Statistical framework/technique | Datasets used for establishing the method/ validation of the method | Datatypes which can be used as input |
|---|---|---|---|---|
| Carter et al. [ | ABSOLUTE | Tumor purity inference based on somatic copy number aberrations in SNP arrays | TCGA | WES data/SNP array |
| Yoshihara et al., 2013 [ | ESTIMATE | Comparison of various published gene sets to delineate a) immune signature b) stromal signature - based on these signatures, calculation of purity score | TCGA | Affymetrix gene expression array data |
| Aran, D. et al. 2015 [ | LUMP (leukocytes unmethylation for purity) | Averaging of the methylation values 44 CpG sites, known to be hypomethylated in immune cells | TCGA | 450 K methylation array data |
| Zhang et al. 2017 [ | InfiniumPurify | Tumor purity estimation: (PMID:28122605) comparison of tumor and normal samples to identify DMC (differentially methylated CpG sites) between tumors and an universal set of normal samples in the TCGA dataset followed by kernel density estimation to obtain tumor purity | TCGA | 450 K Methylation array data |
| Benelli et al. 2018 [ | PAMES (Purity Assessment from clonal MEthylation Sites) | - Calculation of average methylation values per CpG island from TCGA entities. - Calculation of the Area under the curve for the ROC curves of each CpG island: If AUC < 0.2 or AUC > 0.8 a certain CpG site was considered discriminatory and taken into the model. - Tumor purity estimate based on the median of hypomethylated and hypermethylated sites | TCGA (generation of the model), Comparison to other TCGA samples and one additional dataset (333 prostata adenocarcinomas) | 450 K methylation array data |
Fig. 1Pearson correlation of the ESTIMATE purity values and RF_Purify_ESTIMATE for the different TCGA tumor entities, split into training and test set (a-s) and for the whole TCGA set with ESTIMATE values available (t)
Fig. 2Dot plot visualizing the pearson correlation of tumor purities assessed by RF_Purify_ESTIMATE, RF_Purify_ABSOLUTE, ESTIMATE, ABSOLUTE and LUMP
Fig. 3Characterization of RF_purify_ESTIMATE and RF_purify_ABSOLUTE. Figure 3 A displays the fraction of CpG sites localized in CpG islands, gene bodies and promoters in the two models compared to all CpG sites on the 450 K array. Figure 3 B the fraction of CpG sites that overlap with tumor suppressor genes
Fig. 4Tumor purities in different entities and their subgroups (Capper et al. [4]) as calculated by RF_Purify_ESTIMATE