| Literature DB >> 35585166 |
Michele Tumminello1,2, Giorgio Bertolazzi1, Gianluca Sottile3,4, Nicolina Sciaraffa5, Walter Arancio5, Claudia Coronnello6.
Abstract
Statistical tests of differential expression usually suffer from two problems. Firstly, their statistical power is often limited when applied to small and skewed data sets. Secondly, gene expression data are usually discretized by applying arbitrary criteria to limit the number of false positives. In this work, a new statistical test obtained from a convolution of multivariate hypergeometric distributions, the Hy-test, is proposed to address these issues. Hy-test has been carried out on transcriptomic data from breast and kidney cancer tissues, and it has been compared with other differential expression analysis methods. Hy-test allows implicit discretization of the expression profiles and is more selective in retrieving both differential expressed genes and terms of Gene Ontology. Hy-test can be adopted together with other tests to retrieve information that would remain hidden otherwise, e.g., terms of (1) cell cycle deregulation for breast cancer and (2) "programmed cell death" for kidney cancer.Entities:
Mesh:
Year: 2022 PMID: 35585166 PMCID: PMC9117296 DOI: 10.1038/s41598-022-12246-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Venn diagrams of the differentially expressed genes and significant terms found in each of the three analysis steps by the three methods: Hy-test, moderated t-test, and SAM. The upper panels (A, B, C) refer to the breast tissue and the lower panels (D, E, F) to the kidney. The first column (A and D) refers to the DE analysis, the second column (B and E) to the enrichment analysis and the third column (C and F) to the PubMed research. Significance is assessed when a Bonferroni corrected p-value is below the 5% level.
GO-terms significantly associated with breast cancer among significant GO-terms found using Hy-test, moderated t-test and both procedures.
| Sign. GO-term | GO ID | Analysis | Term size | BR term size | |
|---|---|---|---|---|---|
| Cell cycle checkpoint signaling | GO:0000075 | 167 | 34 | < 1.11E−16 | |
| Mitotic spindle checkpoint signaling | GO:0071174 | 38 | 14 | < 1.11E−16 | |
| Regulation of cell cycle | GO:0051726 | 951 | 134 | < 1.11E−16 | |
| Regulation of cell cycle process | GO:0010564 | 594 | 102 | < 1.11E−16 | |
| Spindle assembly checkpoint signaling | GO:0071173 | 37 | 15 | < 1.11E−16 | |
| Cell surface receptor signaling pathway | GO:0007166 | Mod | 2485 | 643 | < 1.11E−16 |
| Cell–cell signaling | GO:0007267 | Mod | 1545 | 436 | < 1.11E−16 |
| Regulation of signal transduction | GO:0009966 | Mod | 2734 | 619 | < 1.11E−16 |
| Regulation of signaling | GO:0023051 | Mod | 3107 | 719 | < 1.11E−16 |
| Signal transduction | GO:0007165 | Mod | 5175 | 1210 | < 1.11E−16 |
| Angiogenesis | GO:0001525 | Both | 493 | 171 | < 1.11E−16 |
| Cell communication | GO:0007154 | Both | 5681 | 1342 | < 1.11E−16 |
| Cell population proliferation | GO:0008283 | Both | 1835 | 473 | < 1.11E−16 |
| Mitotic cell cycle | GO:0000278 | Both | 833 | 217 | < 1.11E−16 |
| Tissue development | GO:0009888 | Both | 1749 | 483 | < 1.11E−16 |
Term size is the number of genes that compose a GO-term; BR term size is the number of GO-term genes associated with breast cancer; p-value is computed by using the hypergeometric distribution.
GO-terms significantly associated with “kidney cancer” among significant GO-terms found using Hy-test, t-test and both procedures.
| Sign. GO-term | GO ID | Analysis | Term size | KIRC term size | |
|---|---|---|---|---|---|
| Apoptotic process | GO:0006915 | 1761 | 363 | < 1.11E−16 | |
| Cell death | GO:0008219 | 1951 | 396 | < 1.11E−16 | |
| Programmed cell death | GO:0012501 | 1808 | 371 | < 1.11E−16 | |
| Cell differentiation | GO:0030154 | Mod | 3844 | 1159 | < 1.11E−16 |
| Kidney development | GO:0001822 | Mod | 283 | 115 | < 1.11E−16 |
| Kidney epithelium development | GO:0072073 | Mod t-test | 133 | 61 | < 1.11E−16 |
| Regulation of cell differentiation | GO:0045595 | Mod | 1432 | 459 | 1.98E−05 |
| Renal system development | GO:0072001 | Mod t-test | 292 | 118 | < 1.11E−16 |
| Antigen processing and presentation | GO:0019882 | Both | 102 | 54 | 2.37E−09 |
| Cell killing | GO:0001906 | Both | 173 | 79 | 6.80E−15 |
| Immune system development | GO:0002520 | Both | 881 | 301 | 6.86E−04 |
| Leukocyte mediated cytotoxicity | GO:0001909 | Both | 117 | 62 | 7.01E−09 |
| Lymphocyte proliferation | GO:0046651 | Both | 276 | 133 | 4.07E−07 |
| Regulation of signaling | GO:0023051 | Both | 3110 | 924 | < 1.11E−16 |
Term size is the number of genes that compose a GO-term; KIRC term size is the number of GO-term genes associated with kidney cancer; p-value is computed by using the hypergeometric distribution.
Figure 2Correlation structure of breast cancer expression genes. Top-left panel refers to all genes, the top-right panel refers to the set of genes selected by moderated t-test, and the bottom panel refers to the set of genes selected by the Hy-test. is the block average correlation.
Figure 3Correlation structure of kidney cancer expression genes. Top-left panel refers to all genes, the top-right panel refers to the set of genes selected by moderated t-test, and the bottom panel refers to the set of genes selected by the Hy-test. is the block average correlation.
Results of simulation block (b), where two vectors of paired synthetic expression profiles , have to satisfy (1) and (2) .
| Log-normal | Power-law | Log-normal | Power-law | Log-normal | Power-law | |
|---|---|---|---|---|---|---|
| Hy-test | 0.89 | 0.94 | 0.94 | 0.97 | 1.00 | 0.99 |
| Mod t-test | 0.85 | 0.70 | 0.96 | 0.88 | 1.00 | 0.98 |
| Hy-test | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Mod t-test | 0.90 | 0.76 | 0.98 | 0.93 | 1.00 | 1.00 |
Average rejection rates after 250 Monte Carlo replicates is reported for two different sample sizes, i.e., , and distributions (log-normal and power-law), after adjusting the p-values with the Bonferroni correction.