| Literature DB >> 30864327 |
Zhiyue Tom Hu1, Yuting Ye, Patrick A Newbury, Haiyan Huang, Bin Chen.
Abstract
The inconsistency of open pharmacogenomics datasets produced by different studies limits the usage of such datasets in many tasks, such as biomarker discovery. Investigation of multiple pharmacogenomics datasets confirmed that the pairwise sensitivity data correlation between drugs, or rows, across different studies (drug-wise) is relatively low, while the pairwise sensitivity data correlation between cell-lines, or columns, across different studies (cell-wise) is considerably strong. This common interesting observation across multiple pharmacogenomics datasets suggests the existence of subtle consistency among the different studies (i.e., strong cell-wise correlation). However, significant noises are also shown (i.e., weak drug-wise correlation) and have prevented researchers from comfortably using the data directly. Motivated by this observation, we propose a novel framework for addressing the inconsistency between large-scale pharmacogenomics data sets. Our method can significantly boost the drug-wise correlation and can be easily applied to re-summarized and normalized datasets proposed by others. We also investigate our algorithm based on many different criteria to demonstrate that the corrected datasets are not only consistent, but also biologically meaningful. Eventually, we propose to extend our main algorithm into a framework, so that in the future when more datasets become publicly available, our framework can hopefully offer a "ground-truth" guidance for references.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30864327 PMCID: PMC6417811
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928
Alternating Imputation and Correction Method (AICM)
| for |
| |
| |
| |
| |
| |
| |
Fig. 1:The percentage change (%) of the medians of the correlations on synthetic datasets with different parameters. x-axis is iter and y-axis is λ.
Fig. 2:Distribution of drug-wise correlations between the synthetic datasets before AICM is applied and after. Note that the darker green bars denote overlap of uncorrelated rows and correlated rows in this histogram.
Fig. 3:The shift of Spearman’s correlation, both individually and as a distribution, of common drugs between specified datasets before and after AICM is run.
Brief statistics of the original and post-correction drug-wise Spearman’s correlation
| Datasets | Mean | Median | Significant | Size | ||||
|---|---|---|---|---|---|---|---|---|
| Before | After | Before | After | Before | After | Drug | Cell | |
| CTRPv2 & GDSC1000 | 0.261 | 0.410 | 0.249 | 0.411 | 63.33% | 90.00% | 90 | 566 |
| CTRPv2 & FIMM | 0.485 | 0.624 | 0.468 | 0.585 | 70.00% | 93.33% | 30 | 41 |
| GDSC1000 & FIMM | 0.250 | 0.352 | 0.278 | 0.380 | 27.59% | 55.17% | 29 | 47 |
Fig. 4:Individual drugs with respect to individual cell lines before and after AICM is deployed. First five demonstrate drugs whose correlations are significantly improved and the last one demonstrates a drug whose correlation is poorly improved.