| Literature DB >> 23049875 |
Sarah Song1, Katia Nones, David Miller, Ivon Harliwong, Karin S Kassahn, Mark Pinese, Marina Pajic, Anthony J Gill, Amber L Johns, Matthew Anderson, Oliver Holmes, Conrad Leonard, Darrin Taylor, Scott Wood, Qinying Xu, Felicity Newell, Mark J Cowley, Jianmin Wu, Peter Wilson, Lynn Fink, Andrew V Biankin, Nic Waddell, Sean M Grimmond, John V Pearson.
Abstract
Tumour cellularity, the relative proportion of tumour and normal cells in a sample, affects the sensitivity of mutation detection, copy number analysis, cancer gene expression and methylation profiling. Tumour cellularity is traditionally estimated by pathological review of sectioned specimens; however this method is both subjective and prone to error due to heterogeneity within lesions and cellularity differences between the sample viewed during pathological review and tissue used for research purposes. In this paper we describe a statistical model to estimate tumour cellularity from SNP array profiles of paired tumour and normal samples using shifts in SNP allele frequency at regions of loss of heterozygosity (LOH) in the tumour. We also provide qpure, a software implementation of the method. Our experiments showed that there is a medium correlation 0.42 ([Formula: see text]-value=0.0001) between tumor cellularity estimated by qpure and pathology review. Interestingly there is a high correlation 0.87 ([Formula: see text]-value [Formula: see text] 2.2e-16) between cellularity estimates by qpure and deep Ion Torrent sequencing of known somatic KRAS mutations; and a weaker correlation 0.32 ([Formula: see text]-value=0.004) between IonTorrent sequencing and pathology review. This suggests that qpure may be a more accurate predictor of tumour cellularity than pathology review. qpure can be downloaded from https://sourceforge.net/projects/qpure/.Entities:
Mesh:
Year: 2012 PMID: 23049875 PMCID: PMC3457972 DOI: 10.1371/journal.pone.0045835
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Design of mixing experiments.
| Tumor Cellularity | Sample ID | Mixture |
| 100% | ND_0_CD_100 | 100% cell line tumor DNA |
| 85% | ND_15_CD_85 | 85% cell line tumor DNA |
| 80% | ND_20_CD_80 | 80% cell line tumor DNA |
| 75% | ND_25_CD_75 | 75% cell line tumor DNA |
| 65% | ND_35_CD_65 | 65% cell line tumor DNA |
| 60% | ND_40_CD_60 | 60% cell line tumor DNA |
| 50% | ND_50_CD_50 | 50% cell line tumor DNA |
| 40% | ND_60_CD_40 | 40% cell line tumor DNA |
| 30% | ND_70_CD_30 | 30% cell line tumor DNA |
| 20% | ND_80_CD_20 | 20% cell line tumor DNA |
| 15% | ND_85_CD_15 | 15% cell line tumor DNA |
| 10% | ND_90_CD_10 | 10% cell line tumor DNA |
| 5% | ND_95_CD_5 | 5% cell line tumor DNA |
| 0% | ND_100_CD_0 | 0% cell line tumor DNA |
Figure 1Overview of the qpure method.
Circos plots of the SNP array data for a paired normal (ND) and tumor (TD) sample showing regions of LOH in the tumor sample (A). The chromosome ideograms are shown on the outer wheel, the logR and BAF values are plotted in the middle and inner wheel respectively. The density plot of the probes in LOH regions (B) is used to calculate the d-score (C). The d-score is compared to the density plots of probes within regions of LOH for the cell line: normal DNA mixtures which represent different cellularity (D). The d-score and cellularity are highly correlated (E). Three plots from the left to the right are the scatter plot only, with fitting the simple linear model and with fitting the spline regression model respectively.
Figure 2B allele frequency (BAF) and log R ratio (LRR) plots for a region of LOH with changing tumor cellularity.
DNA from a cancer cell line and matched normal DNA were mixed in different proportions and assayed using SNP arrays. BAF and LRR plots were generated using GenomeStudio software (Illumina). For illustrative purposes a region of loss on the p arm of chromosome 7 in the cancer cell line is shown. In the 100 normal sample (0 tumor) the SNPs are either heterozygous (BAF 0.5) or homozygous (BAF = 0 or 1). In regions of single chromosome loss in the tumour there is LOH. In the 100 cell line the BAF is showing a homozygous state and there is clear loss in the LRR. As tumour cellularity decreases the separation of the BAF decreases.
The leave-one-out cross-validation results for each model in the qpure method.
| No | Model | Prediction Error |
| 1 | K-Means + Linear regression | 0.5% |
| 2 | K-Means + Spline regression | 0.3% |
| 3 | Mixture clustering (1∶3) + Linear regression | 0.2% |
| 4 | Mixture clustering (1∶3) + Spline regression | 0.16% |
| 5 | Mixture clustering (1∶5) + Linear regression | 3.4% |
| 6 | Mixture clustering (1∶5) + Spline regression | 2.8% |
| 7 | Mixture clustering (1∶x) + Linear regression | 0.2% |
| 8 | Mixture clustering (1∶x) + Spline regression | 0.13% |
In the second column the number in the brackets is the pre-defined number of components. The smaller prediction error is related a better prediction model.
Figure 3Correlations of cellularity estimated by different methods in a pancreatic cancer cohort.
Cellularity was predicted in the pancreatic cohort using 3 methods: pathology review, qpure and deep Ion Torrent sequencing of KRAS. Cellularity predictions are shown in the boxplot (A), the -value was calculated using an ANOVA test to determine whether on average there is difference between the cellularity scores returned by the different methods. The correlation between each method using Spearman’s rank correlation was calculated (B–D). Scatter plots are shown which compare KRAS deep sequencing and qpure estimates (B), qpure and pathology estimates (C), and KRAS deep sequencing and pathology estimates (D).