| Literature DB >> 28061752 |
Casey P Shannon1,2, Robert Balshaw3,4, Virginia Chen3,5, Zsuzsanna Hollander3,5, Mustafa Toma6, Bruce M McManus3,7,5,8, J Mark FitzGerald9,8, Don D Sin9,5,8, Raymond T Ng3,10,5,8, Scott J Tebbutt3,9,5,8.
Abstract
BACKGROUND: Measuring genome-wide changes in transcript abundance in circulating peripheral whole blood is a useful way to study disease pathobiology and may help elucidate the molecular mechanisms of disease, or discovery of useful disease biomarkers. The sensitivity and interpretability of analyses carried out in this complex tissue, however, are significantly affected by its dynamic cellular heterogeneity. It is therefore desirable to quantify this heterogeneity, either to account for it or to better model interactions that may be present between the abundance of certain transcripts, specific cell types and the indication under study. Accurate enumeration of the many component cell types that make up peripheral whole blood can further complicate the sample collection process, however, and result in additional costs. Many approaches have been developed to infer the composition of a sample from high-dimensional transcriptomic and, more recently, epigenetic data. These approaches rely on the availability of isolated expression profiles for the cell types to be enumerated. These profiles are platform-specific, suitable datasets are rare, and generating them is expensive. No such dataset exists on the Affymetrix Gene ST platform.Entities:
Mesh:
Year: 2017 PMID: 28061752 PMCID: PMC5219701 DOI: 10.1186/s12864-016-3460-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Schematic representation of the experiment. Cell proportions were estimated from DNA methylation profiles for 172 samples (1; COPD patients). These DNA methylation-derived cell proportions were used as a ‘silver’ standard in the absence of the ground truth. This ‘silver standard’ dataset was used to train a multi-response Gaussian model using the gene expression data (2). Out-of-sample performance was evaluated using a repeated (20x) 10-fold cross-validation (3) and in two independent sets of samples, in different clinical indications: granulocyte, monocyte, and lymphocyte performance was evaluated in a large set of samples (4; 197 samples, heart failure), while monocyte, B, and T cell prediction performance was evaluated in a smaller set (4; 28 samples, asthma)
Description of predicted leukocytes
| Cell name | Abbreviation used | Description |
|---|---|---|
| Granulocytes | Gran | CD15+ granulocytes |
| Monocytes | Mono | CD14+ monocytes |
| B cells | Bcell | CD19+ B-lymphocytes |
| T cells (CD4+) | CD4T | CD3 + CD4+ T-lymphocytes |
| T cells (CD8+) | CD8T | CD3 + CD8+ T-lymphocytes |
| NK cells | NK | CD56+ Natural Killer (NK) cells |
Fig. 2Assessing model fit. Predicted proportions from the model are plotted against the DNA methylation-derived cell proportions for each sample in the training data (a) or that obtained from CBC/Diffs (b). For (a), linear best-fit line to the data is plotted (blue line) with 95% point-wise confidence interval for fit (grey band) and compared with perfect agreement (red dashed line). For (b), predicted monocyte proportions are compared directly to the CBC/diffs. The predicted granulocyte proportions are compared to the sum of neutrophil, eosinophil and basophil proportions from the CBC/diffs, while the sum of the predicted B, CD4+ T, CD8+ T and NK cell proportions is compared to the total lymphocyte proportions from the CBC/diffs. For each cell type, Pearson’s product–moment correlation (Pearson’s r) and the root mean squared error (RMSE) are reported
Model performance
| Cell type | 20x 10-fold cross-validation | Independent test set | ||
|---|---|---|---|---|
| RMSE (mean ± sd) | Pearson’s r (mean ± sd) | RMSE (n) | Pearson’s r (n) | |
| Bcell | 0.021 ± 0.007 | 0.755 ± 0.188 | 0.04 (28) | 0.93 (28) |
| CD4T | 0.038 ± 0.01 | 0.813 ± 0.09 | 0.06 (28) | 0.91 (28) |
| CD8T | 0.034 ± 0.006 | 0.683 ± 0.138 | ||
| Gran | 0.054 ± 0.013 | 0.923 ± 0.046 | 0.06 (197) | 0.89 (197) |
| Mono | 0.018 ± 0.003 | 0.842 ± 0.068 | 0.02 (197) | 0.74 (197) |
| NK | 0.027 ± 0.006 | 0.816 ± 0.083 | NA | NA |
Fig. 3Cross-validation performance. Distribution of root mean square error (RMSE; (a)) and Pearson’s product–moment correlation (Pearson’s r; (b)) for out-of-sample predictions across repeated (20x) 10-fold cross-validations are visualized using boxplots. The mean and 95% CI are shown as a point and range in the center of each boxplot and represent the expected out-of-sample performance
Fig. 4Our model accurately predicts the cellular composition of blood samples and outperforms existing approaches in Affymetrix Gene ST data. Predicted cell proportions are plotted against the cell proportions obtained from CBC/diffs ((a); CHFP cohort) or a cell-type specific DNA methylation cell-typing assay ((b); Epiontis asthma cohort). In (a), the sum of the predicted B, CD4+ T, CD8+ T and NK cell proportions is compared to the total lymphocyte proportions from the CBC/diffs. The predicted granulocyte and monocyte proportions are directly compared. In (b), the sum of the predicted CD4+ and CD8+ T cell proportions is compared to T cell proportion from the Epiontis assay. The predicted monocyte and B cell proportions are directly compared. For each cell type, Pearson’s product–moment correlation (Pearson’s r) and the root mean squared error (RMSE) are reported
Fig. 5Our model identifies better performing marker genes for use with reference-free approaches in Affymetrix Gene ST data. Surrogate proportion variables obtained from CellCODE are plotted against the cell proportions obtained from CBC/Diffs in an independent dataset (CHFP cohort). The sum of the surrogate proportion variables obtained for B, CD4+ T, CD8+ T and NK cells is compared to the total lymphocyte proportions from the CBC/Diffs. Marker genes used by CellCODE were derived from the coefficients of the model (a) or using the recommended set of marker genes (b) derived from the IRIS reference dataset. For each cell type, Spearman’s rank correlation (ρ) is reported
Fig. 6Model predicted cell proportions highlight prednisone-dependent changes in peripheral blood composition. Treatment of acute exacerbations (AE) in COPD with prednisone results in important changes in the cellular composition of peripheral blood. The distributions of granulocyte, monocyte, B, CD4+ T, CD8+ T and NK cell proportions are visualized for patients from the Rapid Transition Program (RTP) cohort that were given prednisone or not (p-value is for the unpaired Student’s t-test comparing the two groups in each case)