| Literature DB >> 21999355 |
Miquel Salicrú1, Jordi Ocaña, Alex Sánchez-Pla.
Abstract
BACKGROUND: How to compare studies on the basis of their biological significance is a problem of central importance in high-throughput genomics. Many methods for performing such comparisons are based on the information in databases of functional annotation, such as those that form the Gene Ontology (GO). Typically, they consist of analyzing gene annotation frequencies in some pre-specified GO classes, in a class-by-class way, followed by p-value adjustment for multiple testing. Enrichment analysis, where a list of genes is compared against a wider universe of genes, is the most common example.Entities:
Mesh:
Year: 2011 PMID: 21999355 PMCID: PMC3747174 DOI: 10.1186/1471-2105-12-401
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flow diagram for the basic algorithm. Flow diagram to illustrate the method of combining a general profile comparison test and class-by-class analyses.
Figure 2Basic vs expanded profiles. A schematic view of basic and expanded functional profiles associated with a list of 4 genes projected at the second level of the MF ontology.
Figure 3Relations between lists of genes. Possible relationships between gene lists to be compared: one list includes the other; two intersecting lists; two non-intersecting lists.
Dominant vs recessive diseases.
| MF | BP | CC | |
|---|---|---|---|
| squared Euclidean distance | 0.1029440 | 0.4138672 | 0.02482656 |
| p-value | < 2.2 × 10-16 | < 2.2 × 10-16 | 1 × 10-4 |
| 95% CI lower limit | 0.07004932 | 0.2715809 | 0.00894685 |
| 95% CI upper limit | 0.13583861 | 0.5561534 | 0.04070628 |
Global analysis at level 2.
Results of performing a difference test between functional profiles produced from lists of dominant and recessive genes, at the second level of each ontology. The null hypothesis of equality of profiles can be rejected for all ontologies at a 5% significance level
Figure 4Dominant vs recessive genes. Comparison of functional profiles at the second level of the MF ontology based on the lists associated with dominant and recessive diseases.
Dominant vs recessive diseases.
| Description | GOID | p-value |
|---|---|---|
| Binding | GO:0005488 | 1.591855 × 10-2 |
| catalytic activity | GO:0003824 | 2.847567 × 10-20 |
| electron carrier activity | GO:0009055 | 2.535709 × 10-3 |
| sequence-specific DNA binding transcription factor activity | GO:0003700 | 5.082308 × 10-14 |
| structural molecule activity | GO:0005198 | 2.727443 × 10-8 |
| transcription regulator activity | GO:0030528 | 3.685094 × 10-6 |
Analysis at level 2.
Significant MF level 2 GO classes after a class-by-class analysis based on Fisher's test and correction for multiple testing
Dominant vs recessive diseases.
| Description | GOID | p-value |
|---|---|---|
| biological regulation | GO:0065007 | 1.011094 × 10-13 |
| cell proliferation | GO:0008283 | 1.148909 × 10-10 |
| death | GO:0016265 | 4.620938 × 10-9 |
| developmental process | GO:0032502 | 9.242509 × 10-9 |
| growth | GO:0040007 | 3.916801 × 10-4 |
| immune system process | GO:0002376 | 1.032981 × 10-3 |
| locomotion | GO:0040011 | 3.610015 × 10-4 |
| metabolic process | GO:0008152 | 1.654187 × 10-4 |
| multi-organism process | GO:0051704 | 3.214156 × 10-2 |
| multicellular organismal process | GO:0032501 | 2.839762 × 10-7 |
| negative regulation of biological process | GO:0048519 | 1.206870 × 10-16 |
| pigmentation | GO:0043473 | 3.834365 × 10-3 |
| positive regulation of biological process | GO:0048518 | 3.273178 × 10-13 |
| regulation of biological process | GO:0050789 | 2.141995 × 10-21 |
| signaling | GO:0023052 | 9.023421 × 10-14 |
| signaling process | GO:0023046 | 1.113202 × 10-10 |
Analysis at level 2.
Significant BP level 2 GO classes after a class-by-class analysis based on Fisher's test and correction for multiple testing
Dominant vs recessive diseases.
| Description | GOID | p-value |
|---|---|---|
| negative regulation of transcription from RNA polymerase II promoter | GO:0000122 | 1.271933 × 10-2 |
| negative regulation of transcription, DNA-dependent | GO:0045892 | 4.613832 × 10-3 |
| positive regulation of transcription from RNA polymerase II promoter | GO:0045944 | 3.114127 × 10-7 |
| positive regulation of transcription, DNA-dependent | GO:0045893 | 5.356749 × 10-7 |
| regulation of calcium ion transport | GO:0051924 | 4.333597 × 10-3 |
| regulation of transcription from RNA polymerase II promoter | GO:0006357 | 2.291754 × 10-9 |
| regulation of transcription, DNA-dependent | GO:0006355 | 9.753411 × 10-14 |
| transcription from RNA polymerase II promoter | GO:0006366 | 8.714318 × 10-11 |
Analysis at level 10.
Significant BP level 10 GO classes after a class-by-class analysis based on Fisher's test and correction for multiple testing
Comparison of two prostate cancer studies.
| MF | BP | CC | |
|---|---|---|---|
| squared Euclidean distance | 0.001028538 | 0.004627587 | 0.003136238 |
| p-value | 0.1108498 | 0.07159675 | 0.004018912 |
| 95% CI lower limit | -5.921965 × 10-5 | -0.0001544709 | 0.0004614338 |
| 95% CI upper limit | 2.116296 × 10-3 | 0.0094096442 | 0.0058110419 |
Global analysis at level 2. Results of performing a difference test between functional profiles produced from two studies of prostate cancer ([27] and [28]), at the second level of each ontology. The null hypothesis of equality of profiles can be rejected for the CC ontology at 5% significance
Comparison of two prostate cancer studies at level 2.
| Description | GOID | p-value |
|---|---|---|
| organelle | GO:0043226 | 0.03825311 |
| macromolecular complex | GO:0032991 | 0.04459121 |
Significant CC level 2 GO classes after a class-by-class analysis based on Fisher's test and correction for multiple testing
Comparison of two prostate cancer studies at level 10.
| Description | GOID | p-value |
|---|---|---|
| cytosolic large ribosomal subunit | GO:0022625 | 0.0002794424 |
| cytosolic small ribosomal subunit | GO:0022627 | 0.0028788683 |
| Large ribosomal subunit | GO:0015934 | 0.0027483186 |
| Small ribosomal subunit | GO:0015935 | 0.0027483186 |
Significant CC level 10 GO classes after a class-by-class analysis based on Fisher's test and correction for multiple testing
Simulation results.
| Onto. |
|
|
| A and B gene lists | Testing procedure | ||
|---|---|---|---|---|---|---|---|
| Reference: [ | |||||||
| MF | 88 | 69 | 52 | Class-by-class | 0.0012 | 0.3903 | |
| Chi-square | 0.0334 | 1 | |||||
| New global | 0.0469 | 1 | |||||
| Additional signif. classes | 0.04585 | 0.697 | |||||
| BP | 1602 | 372 | 328 | Class-by-class | 0.002 | 1 | |
| Chi-square | 0.162 | 1 | |||||
| New global | 0.042 | 1 | |||||
| Additional signif. classes | 0.042 | 0 | |||||
| CC | 298 | 305 | 336 | Class-by-class | 0.0042 | 1 | |
| Chi-square | 0.0775 | 1 | |||||
| New global | 0.0389 | 1 | |||||
| Additional signif. classes | 0.0374 | 0 | |||||
| References: [ | |||||||
| MF | 88 | 110 | 99 | Class-by-class | 0.0028 | 0.0729 | |
| Chi-square | 0.0341 | 0.998 | |||||
| New global | 0.0428 | 0.7281 | |||||
| Additional signif. classes | 0.0409 | 0.659 | |||||
| BP | 1722 | 858 | 651 | Class-by-class | 0.003 | 0.351 | |
| Chi-square | 0.152 | 1 | |||||
| New global | 0.056 | 0.997 | |||||
| Additional signif. classes | 0.055 | 0.646 | |||||
| CC | 394 | 897 | 679 | Class-by-class | 0.0076 | 0.9982 | |
| Chi-square | 0.0883 | 1 | |||||
| New global | 0.0625 | 0.9999 | |||||
| Additional signif. classes | 0.0599 | 0.0018 | |||||
Probability of rejecting the null hypothesis of equality of profiles at a nominal 5% significance level in different scenarios associated with real case studies at level 10 in the GO. In the column "testing procedure", "Class-by-class" stands for declaring global differences (i.e. rejecting the null hypothesis of profile equality) if at least one significant class is detected in a class-by-class analysis with correction for testing multiplicity; "Chi-square" stands for the classical chi-square test of homogeneity; "New global" stands for the global test presented in this paper and, finally, "Additional signif. classes" stands for step 3 in the algorithm proposed in the methods section, i.e. proportion of simulation replicates where additional significant classes were detected when a class-by-class analysis was unable of any detection.