| Literature DB >> 30413720 |
Francesco Vallania1,2, Andrew Tam1,3, Shane Lofgren1,2, Steven Schaffert1,2, Tej D Azad1, Erika Bongen1, Winston Haynes2, Meia Alsup1,3, Michael Alonso4, Mark Davis1, Edgar Engleman4, Purvesh Khatri5,6.
Abstract
In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.Entities:
Mesh:
Year: 2018 PMID: 30413720 PMCID: PMC6226523 DOI: 10.1038/s41467-018-07242-6
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Analysis of platform bias in deconvolution across multiple methods and matrices. a Goodness of fit values across 1071 human PBMC samples as a function of microarray platform using the IRIS signature matrix. Goodness of fit is displayed as a stacked barplot with color indicating corresponding values starting from goodness of fit value of 0.5 or lower up to values of 0.9 and above. Barplots are grouped by the method of deconvolution used for the analysis. b Same as in a for LM22. c Same as in a for immunoStates
Fig. 2Effect of disease on deconvolution. a ROC curves indicating the ability of IRIS, LM22, and immunoStates (denoted by line color) to distinguish blood-derived samples from tissue biopsies in healthy donors (1383 samples) using goodness of fit across all tested methods (denoted by line type). AUCs indicate mean AUC for an individual signature matrix across all methods. b Same as in a but in disease samples (2684 samples)
Fig. 3Deconvolution concordance by matrix and method. Boxplots represent the distribution of pairwise correlation coefficients between estimated proportions for all matrices and deconvolution methods. Center lines correspond to the median value of each box and the lower and upper bounds of each box correspond to their first and the third quartiles, respectively. Comparisons were divided in (1) pairs with the same signature matrix but run with different methods, (2) pairs with different signature matrices but run using the same method, and (3) pairs where both matrix and method were different. Significance analysis was performed using the Wilcoxon’s paired rank sum test
Fig. 4Correlation with measured cell proportions across 402 human blood samples. a Correlation between measured cell proportions and deconvolution estimates in five different human sample cohorts (denoted by different shapes) across different deconvolution methods (denoted by different colors) using IRIS, LM22, and immunoStates (x-axis). Correlation is measured by Pearson’s correlation coefficient. Center dot represents mean value for each violin plot. Error bars represent standard error of the mean. b Same as in a for RMSE between measured and estimated cell proportions. Significance analysis was performed using the Wilcoxon’s paired rank sum test