| Literature DB >> 31314810 |
Han-Ming Liu1, Dan Yang1, Zhao-Fa Liu1, Sheng-Zhou Hu1, Shen-Hai Yan1, Xian-Wen He1.
Abstract
The hypothesis of data probability density distributions has many effects on the design of a new statistical method. Based on the analysis of a group of real gene expression profiles, this study reveal that the primary density distributions of the real profiles are normal/log-normal and t distributions, accounting for 80% and 19% respectively. According to these distributions, we generated a series of simulation data to make a more comprehensive assessment for a novel statistical method, maximal information coefficient (MIC). The results show that MIC is not only in the top tier in the overall performance of identifying differentially expressed genes, but also exhibits a better adaptability and an excellent noise immunity in comparison with the existing methods.Entities:
Year: 2019 PMID: 31314810 PMCID: PMC6636747 DOI: 10.1371/journal.pone.0219551
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Real gene expression profiles.
| Sample | Count |
|---|---|
| Arabidopsis thaliana | 6 |
| Arachis hypogaea | 1 |
| Citrus limonia | 1 |
| Citrus reticulata | 2 |
| Citrus sinensis | 7 |
| Danio rerio | 1 |
| Drosophila melanogaster | 6 |
| Glycine max | 5 |
| Homo sapiens | 32 |
| Mus musculus | 16 |
| Oryctolagus cuniculus | 1 |
| Oryza sativa | 4 |
| Phaseolus coccineus | 1 |
| Rattus norvegicus | 7 |
| Solanum lycopersicum | 1 |
| Staphylococcus aureus | 2 |
| Staphylococcus aureus subsp. aureus str. Newman | 1 |
| Triticum aestivum | 3 |
| Triticum turgidum subsp. durum | 1 |
| Zea mays | 2 |
Parameters of log-normal distribution in simulation.
| Group | Non-differential expression | Differential expression | ||||||
|---|---|---|---|---|---|---|---|---|
| Case | Control | Case | Control | |||||
| 1 | 5 | 1.5 | 5 | 1.5 | 4.5 | 1 | 5 | 2 |
| 2 | 5 | 1.5 | 5 | 1.5 | 5 | 1 | 6 | 1.1 |
| 3 | 5 | 1.5 | 5 | 1.5 | 6 | 0.8 | 6.5 | 1.2 |
| 4 | 5.5 | 1.3 | 5.5 | 1.3 | 4.5 | 1 | 5 | 2 |
| 5 | 5.5 | 1.3 | 5.5 | 1.3 | 5 | 1 | 6 | 1.1 |
| 6 | 5.5 | 1.3 | 5.5 | 1.3 | 6 | 0.8 | 6.5 | 1.2 |
| 7 | 7 | 1 | 7 | 1 | 4.5 | 1 | 5 | 2 |
| 8 | 7 | 1 | 7 | 1 | 5 | 1 | 6 | 1.1 |
| 9 | 7 | 1 | 7 | 1 | 6 | 0.8 | 6.5 | 1.2 |
Parameters of Cauchy distribution in simulation.
| Group | Non-differential expression | Differential expression | ||||||
|---|---|---|---|---|---|---|---|---|
| Case | Control | Case | Control | |||||
| 1 | 1000 | 10 | 1000 | 10 | 1000 | 10 | 950 | 9 |
| 2 | 100 | 5 | 100 | 5 | 100 | 5 | 95 | 4 |
Probability density distributions of real data.
| Distribution | Count |
|---|---|
| Log-normal | 43 |
| Normal | 37 |
| t | 19 |
| Cauchy | 1 |
Fig 1Four typical density distributions of real data.
Fig 2Bootstrap-AUC curves.
The curve on upper-left, upper-right, lower-left and lower-right has a normal, log-normal, t or Cauchy distribution respectively. The Bootstrap counts of the 20 points are: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, and 100, respectively.
Fig 3AUC boxplots on normal data ‘×’s are the means.
Bidirectional arrows represent ±1σ.
Fig 6AUC boxplots on Cauchy data ‘×’s are the means.
Bidirectional arrows represent ±1σ.
Counts of AUC ≤ 0.5.
| Distribution | MIC | DESeq2 | Limma | ROTS | SAM |
|---|---|---|---|---|---|
| Normal | 0 | 378 | 896 | 33 | 300 |
| Log-Normal | 0 | 141 | 438 | 0 | 1 |
| t | 0 | 0 | 490 | 0 | 0 |
| Cauchy | 0 | 44 | 73 | 0 | 0 |
| Total | 0 | 563 | 1897 | 33 | 301 |
| Ratio (%) | 0 | 22.52 | 75.88 | 1.32 | 12.04 |
Note: The counts come from the 2,500 simulation datasets, one for each.
Fig 7AUC boxplots on normal noisy data ‘×’s are the means.
Bidirectional arrows represent ±1σ.
Fig 10AUC boxplots on Cauchy noisy data ‘×’s are the means.
Bidirectional arrows represent ±1σ.
Counts of fitted lines with a slope greater than 0.
| Distribution | MIC | DESeq2 | Limma | ROTS | SAM |
|---|---|---|---|---|---|
| Normal | 2 | 79 | 823 | 88 | 56 |
| Log-Normal | 0 | 38 | 0 | 0 | 8 |
| t | 0 | 0 | 100 | 3 | 3 |
| Cauchy | 0 | 47 | 12 | 50 | 24 |
| Total | 2 | 164 | 935 | 141 | 91 |
| Ratio (%) | 0.08 | 6.56 | 37.40 | 5.64 | 3.64 |
Note: The counts come from the 2,500 simulation datasets, one for each. And, the approximately horizontal lines have been removed.
Algorithm runtimes (unit: second).
| Distribution | Method | ||||
|---|---|---|---|---|---|
| MIC | DESeq2 | Limma | ROTS | SAM | |
| Normal | 0.72 | 6.37 | 0.30 | 9.59 | 1.06 |
| Log-Normal | 0.60 | 6.39 | 0.41 | 9.08 | 1.46 |
| Student | 0.89 | 5.36 | 0.28 | 9.08 | 1.06 |
| Cauchy | 0.71 | 9.00 | 0.34 | 9.06 | 1.13 |
| Total | 2.92 | 27.11 | 1.34 | 36.81 | 4.71 |
Parameters of t distribution in simulation.
| Group | Non-differential expression | Differential expression | ||||||
|---|---|---|---|---|---|---|---|---|
| Case | Control | Case | Control | |||||
| 1 | 3 | 0 | 3 | 0 | 3 | 1 | 4 | 0 |
| 2 | 4 | 0 | 4 | 0 | 4 | 1 | 3 | 0 |
| 3 | 3 | 3 | 3 | 3 | 3 | 2 | 4 | 1 |
| 4 | 3 | 2 | 3 | 2 | 3 | 1 | 4 | 0 |
| 5 | 1.5 | 0 | 1.5 | 0 | 1.5 | 0 | 1.3 | 0.5 |