| Literature DB >> 35641928 |
Pablo Flores1,2, Miquel Salicrú3, Alex Sánchez-Pla3,4, Jordi Ocaña3.
Abstract
BACKGROUND: In integrative bioinformatic analyses, it is of great interest to stablish the equivalence between gene or (more in general) feature lists, up to a given level and in terms of their annotations in the Gene Ontology. The aim of this article is to present an equivalence test based on the proportion of GO terms which are declared as enriched in both lists simultaneously.Entities:
Keywords: Bootstrap; Delta method; Gene lists; Irrelevance of dissimilarity; Simulation; Type I error
Mesh:
Year: 2022 PMID: 35641928 PMCID: PMC9158181 DOI: 10.1186/s12859-022-04739-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Contingency table for frequencies of enriched and non enriched GO terms in two gene lists and
| Enriched in | Non enriched in | ||
|---|---|---|---|
| Enriched in | |||
| Non enriched in | |||
Fig. 1Probability of rejecting the null hypothesis of non equivalence in the Sorensen test (i.e., to declare biological similarity) with
Fig. 2Probability of rejecting the null hypothesis of non equivalence in the Sorensen test (i.e., to declare biological similarity) with
Fig. 3The N(0,1) density compared with the “true” distribution of the statistic and a bootstrap estimate of its distribution
Probability of declaring equivalence (pr(Rej) normal test, bootstrap test) for a simulated dissimilarity equal to the equivalence limit, . stands for the total number of GO terms, with probabilities of enrichment.
| nSim | ||||||||
|---|---|---|---|---|---|---|---|---|
| 99433 | 0.2857 | 0.2857 | 0.01250 | 0.005 | 0.005 | 0.0807 | 0.0251 | 22.50 |
| 99994 | 0.2857 | 0.2857 | 0.01875 | 0.005 | 0.010 | 0.0741 | 0.0371 | 33.75 |
| 100000 | 0.2857 | 0.2857 | 0.02500 | 0.010 | 0.010 | 0.0706 | 0.0417 | 45.00 |
| 100000 | 0.2857 | 0.2857 | 0.06875 | 0.005 | 0.050 | 0.0617 | 0.0485 | 123.75 |
| 100000 | 0.2857 | 0.2857 | 0.07500 | 0.010 | 0.050 | 0.0614 | 0.0483 | 135.00 |
| 100000 | 0.2857 | 0.2857 | 0.12500 | 0.050 | 0.050 | 0.0591 | 0.0500 | 225.00 |
| 100000 | 0.2857 | 0.2857 | 0.13125 | 0.005 | 0.100 | 0.0584 | 0.0493 | 236.25 |
| 100000 | 0.2857 | 0.2857 | 0.13750 | 0.010 | 0.100 | 0.0583 | 0.0496 | 247.50 |
| 100000 | 0.2857 | 0.2857 | 0.18750 | 0.050 | 0.100 | 0.0568 | 0.0499 | 337.50 |
| 100000 | 0.2857 | 0.2857 | 0.25000 | 0.100 | 0.100 | 0.0558 | 0.0497 | 450.00 |
| 100000 | 0.2857 | 0.2857 | 0.25625 | 0.005 | 0.200 | 0.0558 | 0.0498 | 461.25 |
| 100000 | 0.2857 | 0.2857 | 0.26250 | 0.010 | 0.200 | 0.0559 | 0.0497 | 472.50 |
| 100000 | 0.2857 | 0.2857 | 0.31250 | 0.050 | 0.200 | 0.0548 | 0.0494 | 562.50 |
| 100000 | 0.2857 | 0.2857 | 0.37500 | 0.100 | 0.200 | 0.0541 | 0.0490 | 675.00 |
| 100000 | 0.2857 | 0.2857 | 0.50000 | 0.200 | 0.200 | 0.0538 | 0.0495 | 900.00 |
is the expected total number of enriched terms. nsim corresponds to the number of effective simulation replicates (over an initial number of ) to obtain ( test computations, ; pr(Rej) was based on an initial number of simulation replicates). In some scenarios with low , the generated tables contained zeros making impossible the Sorensen–Dice computations, so the effective number of simulation replicates was lower than what was initially planned
Probability of declaring equivalence (pr(Rej) for the normal test, for the bootstrap test) when the simulated dissimilarity is equal to the equivalence limit, .
| nSim | ||||||||
|---|---|---|---|---|---|---|---|---|
| 100000 | 0.2857 | 0.2857 | 0.01250 | 0.005 | 0.005 | 0.0590 | 0.0498 | 225.00 |
| 100000 | 0.2857 | 0.2857 | 0.01875 | 0.005 | 0.010 | 0.0570 | 0.0499 | 337.50 |
| 100000 | 0.2857 | 0.2857 | 0.02500 | 0.010 | 0.010 | 0.0560 | 0.0502 | 450.00 |
| 100000 | 0.2857 | 0.2857 | 0.06875 | 0.005 | 0.050 | 0.0534 | 0.0485 | 1237.50 |
| 100000 | 0.2857 | 0.2857 | 0.07500 | 0.010 | 0.050 | 0.0532 | 0.0500 | 1350.00 |
| 100000 | 0.2857 | 0.2857 | 0.12500 | 0.050 | 0.050 | 0.0527 | 0.0503 | 2250.00 |
| 100000 | 0.2857 | 0.2857 | 0.13125 | 0.005 | 0.100 | 0.0524 | 0.0499 | 2362.50 |
| 100000 | 0.2857 | 0.2857 | 0.13750 | 0.010 | 0.100 | 0.0524 | 0.0502 | 2475.00 |
| 100000 | 0.2857 | 0.2857 | 0.18750 | 0.050 | 0.100 | 0.0522 | 0.0503 | 3375.00 |
| 100000 | 0.2857 | 0.2857 | 0.25000 | 0.100 | 0.100 | 0.0519 | 0.0502 | 4500.00 |
| 100000 | 0.2857 | 0.2857 | 0.25625 | 0.005 | 0.200 | 0.0518 | 0.0501 | 461.25 |
| 100000 | 0.2857 | 0.2857 | 0.26250 | 0.010 | 0.200 | 0.0516 | 0.0501 | 4725.00 |
| 100000 | 0.2857 | 0.2857 | 0.31250 | 0.050 | 0.200 | 0.0513 | 0.0499 | 5625.00 |
| 100000 | 0.2857 | 0.2857 | 0.37500 | 0.100 | 0.200 | 0.0513 | 0.0499 | 6750.00 |
| 100000 | 0.2857 | 0.2857 | 0.50000 | 0.200 | 0.200 | 0.0512 | 0.0501 | 9000.00 |
stands for the total number of GO terms, with probabilities of enrichment. is the expected total number of enriched terms. nsim corresponds to the number of effective simulation replicates to obtain ( test computations, ; pr(Rej) was based on simulation replicates)
Fig. 4Equivalences between gene lists
Fig. 5Average proportion of enriched GO terms in the Kidney rejection PBTs and Cancer allOnco gene lists, displayed along GO ontologies and GO levels
Degree of coincidence between the equivalence test described here and the equivalence test based on the goProfiles approach.
| Onto Level | AllOnco gene lists | PBT’s gene lists | ||
|---|---|---|---|---|
| Correlation | Correlation | |||
| BP-3 | 0.6507 | 0.0022 | 0.2711 | 0.0013 |
| BP-4 | 0.6895 | 0.0008 | 0.3781 | 0 |
| BP-5 | 0.6943 | 0.0004 | 0.372 | 0 |
| BP-6 | 0.6703 | 0.0006 | 0.3214 | 0.0004 |
| BP-7 | 0.66 | 0.0004 | 0.2777 | 0.0019 |
| BP-8 | 0.6409 | 0.0004 | 0.238 | 0.0082 |
| BP-9 | 0.704 | 0.0002 | 0.2026 | 0.0229 |
| BP-10 | 0.7178 | 0.0002 | 0.196 | 0.0282 |
| CC-3 | 0.5199 | 0.0036 | 0.151 | 0.0683 |
| CC-4 | 0.54 | 0.0022 | 0.1753 | 0.0386 |
| CC-5 | 0.5648 | 0.001 | 0.3057 | 0 |
| CC-6 | 0.4052 | 0.006 | 0.2354 | 0.0017 |
| CC-7 | 0.3964 | 0.0089 | 0.2127 | 0.0071 |
| CC-8 | 0.4671 | 0.0073 | 0.1795 | 0.0318 |
| CC-9 | 0.5888 | 0.0085 | 0.2083 | 0.0046 |
| CC-10 | 0.7008 | 0.0032 | 0.2878 | 0 |
| MF-3 | 0.3878 | 0.0556 | 0.1088 | 0.1825 |
| MF-4 | 0.6514 | 0.0018 | 0.1303 | 0.1051 |
| MF-5 | 0.6437 | 0.002 | 0.1906 | 0.0208 |
| MF-6 | 0.7292 | 0.0002 | 0.1929 | 0.0103 |
| MF-7 | 0.7539 | 0.0002 | 0.0735 | 0.3016 |
| MF-8 | 0.601 | 0.0018 | 0.2117 | 0.0193 |
| MF-9 | 0.4453 | 0.0167 | 0.1629 | 0.0244 |
| MF-10 | 0.1874 | 0.0667 | 0.4846 | 0.0476 |
The correlations were computed over the upper limits of the one-sided confidence intervals defining the tests. These upper limit values were organized as triangular matrices (upper limit when testing list i vs. list j with one test, for the other test) for the kidney transplantation rejection and cancer datasets. Its significance was stablished by means of the Mantel’s test
Comparing the data structures to compute the goProfiles test and the one based on enrichment contingency tables.
| Non-enriched in both lists | Enriched only in list 1 | Enriched only in list 2 | Enriched in both lists | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GO term number | 1 | |||||||||||
| Annotation frequency in gene list 1 | ||||||||||||
| Annotation frequency in gene list 2 | ||||||||||||
| Enrichment in list 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | ||||
| Enrichment in list 2 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | ||||
In the latter test, the annotation frequencies are substituted by 0 and 1 (i.e., “non-enriched” and “enriched” GO term.) and if the test is based on the Sorensen–Dice similarity, the first set of GO terms (non-enriched in both lists) is ignored. The GO terms are arbitrarily ordered: from left to right, first there are all those non-enriched in both lists ( in total), next those enriched in the first list but not in the second one (), then those enriched in the second list but not in the first () and finally those GO terms enriched in both lists ()