| Literature DB >> 26714840 |
Kai Wang1, Charles A Phillips2, Arnold M Saxton3, Michael A Langston4.
Abstract
BACKGROUND: Differential Shannon entropy (DSE) and differential coefficient of variation (DCV) are effective metrics for the study of gene expression data. They can serve to augment differential expression (DE), and be applied in numerous settings whenever one seeks to measure differences in variability rather than mere differences in magnitude. A general purpose, easily accessible tool for DSE and DCV would help make these two metrics available to data scientists. Automated p value computations would additionally be useful, and are often easier to interpret than raw test statistic values alone.Entities:
Mesh:
Year: 2015 PMID: 26714840 PMCID: PMC4696313 DOI: 10.1186/s13104-015-1786-4
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1The output of EntropyExplorer on breast cancer data. The numerical matrices m1 and m2 have been read into R. The function call has specified “dse” for differential Shannon entropy, “v” for value, and 10 to return the top 10 values
Fig. 2Another use of EntropyExplorer on breast cancer data. The function call has specified “dcv” for differential coefficient of variation, “bv” to specify both value and p value, and to sort by value, and 12 to return the top 12 rows
Correlations between SE and variance, and between SE and , on 16 microarray gene expression datasets
| Datasets | Correlation Between SE and Variance | Correlation between SE and | ||
|---|---|---|---|---|
| Case | Control | Case | Control | |
| Allergic Rhinitis | −0.5515 | −0.5769 | −0.9703 | −0.9658 |
| Asthma_GSE4302 | −0.4272 | −0.4677 | −0.1924 | −0.2004 |
| BreastCancer_GSE10810 | −0.3942 | −0.3378 | −0.1810 | −0.1265 |
| CLL_GSE8835 | 0.2251 | 0.2522 | −0.0806 | −0.0624 |
| ColorectalCancer_GSE9348 | 0.3122 | 0.4454 | −0.0086 | 0.0206 |
| CrohnsDisease_GSE6731 | −0.2826 | −0.2380 | −0.1664 | −0.4020 |
| LungAdenocarcinoma_GSE7670 | 0.0725 | 0.3360 | −0.0173 | 0.0105 |
| MS_GDS3920 | −0.3615 | −0.3320 | −0.0515 | −0.0559 |
| Obesity_GSE12050 | 0.9998 | 0.9990 | 0.1584 | 0.5420 |
| Pancreas_GDS4102 | −0.4137 | −0.4455 | −0.1331 | −0.0890 |
| ParkinsonsDisease_GSE20141 | −0.1732 | −0.2554 | −0.0024 | −0.0155 |
| ProstateCancer_GSE6919_GPL8300 | 0.2118 | 0.1552 | −0.0562 | −0.0699 |
| Psoriasis_GSE13355 | −0.6386 | −0.6554 | −0.5200 | −0.6779 |
| Schizophrenia_GSE17612 | 0.3632 | 0.3910 | 0.0170 | 0.0235 |
| T2D_GSE20966 | −0.6006 | −0.5550 | −0.4356 | −0.4663 |
| UlcerativeColitis_GSE6731 | −0.3112 | −0.2555 | −0.1799 | −0.1451 |
KS test D-statistic results comparing the DSE distribution against several common distributions
| Dataset | Distribution | ||||
|---|---|---|---|---|---|
| Normal | Chi-square | F | t | t (standardized DSE)* | |
| Allergic Rhinitis | 0.3109 | 1 | 1 | 0.4991 | 0.3526 |
| Asthma_GSE4302 | 0.2795 | 1 | 1 | 0.4895 | 0.3117 |
| BreastCancer_GSE10810 | 0.2115 | 1 | 1 | 0.4797 | 0.3944 |
| CLL_GSE8835 | 0.1506 | 1 | 0.9975 | 0.4519 | 0.1596 |
| ColorectalCancer_GSE9348 | 0.1232 | 1 | 0.9994 | 0.4514 | 0.2142 |
| CrohnsDisease_GSE6731 | 0.2131 | 1 | 0.987 | 0.4691 | 0.2392 |
| LungAdenocarcinoma_GSE7670 | 0.19 | 1 | 0.9999 | 0.4663 | 0.332 |
| MS_GDS3920 | 0.2703 | 1 | 0.9994 | 0.4813 | 0.3397 |
| Obesity_GSE12050 | 0.2352 | 1 | 0.9991 | 0.484 | 0.287 |
| Pancreas_GDS4102 | 0.2606 | 1 | 0.9937 | 0.4532 | 0.3254 |
| ParkinsonsDisease_GSE20141 | 0.0628 | 1 | 0.9361 | 0.3816 | 0.0582 |
| ProstateCancer_GSE6919_GPL8300 | 0.1575 | 1 | 1 | 0.4739 | 0.2522 |
| Psoriasis_GSE13355 | 0.3327 | 1 | 0.9999 | 0.4932 | 0.4195 |
| Schizophrenia_GSE17612 | 0.183 | 1 | 0.9998 | 0.4705 | 0.2138 |
| T2D_GSE20966 | 0.3271 | 1 | 0.9999 | 0.4936 | 0.3562 |
| UlcerativeColitis_GSE6731 | 0.2397 | 1 | 0.998 | 0.4831 | 0.3608 |
* The last column shows the results after first standardizing DSE by dividing each DSE by the standard deviation of all DSEs
Fig. 3The distribution of differential Shannon entropy. The observed distribution of differential Shannon entropy in sample prostate cancer data is shown. Similar patterns were seen in all 16 data sets. None of the standard distributions tested matched the observed distributions closely enough to be considered as a reference distribution for obtaining p values