| Literature DB >> 19208252 |
Eric Yang1, Ioannis P Androulakis.
Abstract
BACKGROUND: One of the challenges with modeling the temporal progression of biological signals is dealing with the effect of noise and the limited number of replicates at each time point. Given the rising interest in utilizing predictive mathematical models to describe the biological response of an organism or analysis such as clustering and gene ontology enrichment, it is important to determine whether the dynamic progression of the data has been accurately captured despite the limited number of replicates, such that one can have confidence that the results of the analysis are capturing important salient dynamic features.Entities:
Mesh:
Year: 2009 PMID: 19208252 PMCID: PMC2653486 DOI: 10.1186/1471-2105-10-55
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Given the limited number of replicates for a given experiment, it is difficult to determine whether sufficient replicates have been captured. Had three replicates rather than four replicates been used, the captured signal could have been significantly different. In this case, the sample with three replicates randomly drops one of the replicates in the.
Figure 2The improvements observed over three different datasets when using the LOOCV Quality Assessment metric vs. EDGE. In all cases, there exists a consistent improvement with the LOOCV method, but advances in both technology or increasing the number of replicates will close the gap between the two methods as evidenced by GDS972, a dataset measuring chronic infusion of datasets run on the RAE230A array with 4 replicates per time point.
Tabulates the total number of ontologies that were found for a given number of clusters for each dataset, as well as the number of significantly enriched ontologies.
| Edge Only | LOOCV + EDGE | ||||
| Gene Set | Clusters | Significant Ontologies | All Ontologies | Significant Ontologies | All Ontologies |
| GDS253 | 2 | 42 | 1696 | 136 | 1049 |
| 3 | 133 | 1696 | 202 | 1049 | |
| 4 | 183 | 1696 | 288 | 1049 | |
| 5 | 222 | 1696 | 304 | 1049 | |
| 6 | 261 | 1696 | 352 | 1049 | |
| 7 | 330 | 1696 | 392 | 1049 | |
| 8 | 441 | 1696 | 438 | 1049 | |
| 9 | 455 | 1696 | 491 | 1049 | |
| 10 | 470 | 1696 | 499 | 1049 | |
| 11 | 497 | 1696 | 517 | 1049 | |
| 12 | 537 | 1696 | 540 | 1049 | |
| 13 | 584 | 1696 | 544 | 1049 | |
| 14 | 598 | 1696 | 558 | 1049 | |
| 15 | 620 | 1696 | 585 | 1049 | |
| 16 | 642 | 1696 | 593 | 1049 | |
| 17 | 673 | 1696 | 612 | 1049 | |
| 18 | 721 | 1696 | 619 | 1049 | |
| 19 | 733 | 1696 | 632 | 1049 | |
| GDS599 | 2 | 78 | 1292 | 96 | 140 |
| 3 | 129 | 1292 | 109 | 140 | |
| 4 | 172 | 1292 | 117 | 140 | |
| 5 | 219 | 1292 | 121 | 140 | |
| 6 | 282 | 1292 | 121 | 140 | |
| 7 | 344 | 1292 | 122 | 140 | |
| 8 | 420 | 1292 | 128 | 140 | |
| 9 | 444 | 1292 | 128 | 140 | |
| 10 | 456 | 1292 | 131 | 140 | |
| 11 | 484 | 1292 | 131 | 140 | |
| 12 | 544 | 1292 | 131 | 140 | |
| 13 | 569 | 1292 | 131 | 140 | |
| 14 | 600 | 1292 | 132 | 140 | |
| 15 | 632 | 1292 | 133 | 140 | |
| 16 | 644 | 1292 | 133 | 140 | |
| 17 | 658 | 1292 | 133 | 140 | |
| 18 | 684 | 1292 | 133 | 140 | |
| 19 | 708 | 1292 | 134 | 140 | |
| GDS972 | 2 | 189 | 2372 | 98 | 1696 |
| 3 | 211 | 2372 | 133 | 1696 | |
| 4 | 230 | 2372 | 183 | 1696 | |
| 5 | 265 | 2372 | 222 | 1696 | |
| 6 | 310 | 2372 | 261 | 1696 | |
| 7 | 330 | 2372 | 330 | 1696 | |
| 8 | 374 | 2372 | 411 | 1696 | |
| 9 | 413 | 2372 | 455 | 1696 | |
| 10 | 450 | 2372 | 470 | 1696 | |
| 11 | 471 | 2372 | 497 | 1696 | |
| 12 | 521 | 2372 | 537 | 1696 | |
| 13 | 558 | 2372 | 584 | 1696 | |
| 14 | 582 | 2372 | 598 | 1696 | |
| 15 | 609 | 2372 | 620 | 1696 | |
| 16 | 656 | 2372 | 642 | 1696 | |
| 17 | 707 | 2372 | 673 | 1696 | |
| 18 | 714 | 2372 | 721 | 1696 | |
| 19 | 773 | 2372 | 733 | 1696 | |
In all of the cases, we see that after running the LOOCV filter as well as the selection via EDGE, the total number of ontologies that have been selected is lower, whereas the number of statistically significantly enriched ontologies remains relatively constant. Thus, the improvement in the total number of ontologies appears to be due primarily to the removal of genes that do not show significant co-functionality.
Figure 3For the GDS523 dataset associated with an acute administration of corticosteroids, the early time points are associated with rapid dynamics which due to the greater number of samples, may adversly affect the correlation coefficient, despite the fact that the majority of the experimental duration, the signal has been accurately measured.
Enriched ontologies that were not present in the original EDGE selection, but appeared when examining the subset that did not pass LOOCV after EDGE, and the subset that passed both LOOCV and EDGE.
| a | |
| GDS599 | |
| Rejected by LOOCV | Accepted by LOOCV |
| actin polymerization or depolymerization | cellular carbohydrate metabolic process |
| amino acid biosynthetic process | cellular alcohol metabolic process |
| coenzyme biosynthetic process | response to external stimulus |
| ER to Golgi vesicle-mediated transport | response to stress |
| fat-soluble vitamin metabolic process | response to stimulus |
| nucleobase, nucleoside, nucleotide and nucleic acid transport | generation of precursor metabolites and energy |
| defense response | |
| inflammatory response | |
| response to wounding | |
| acute inflammatory response | |
| acute-phase response | |
| b | |
| GDS253 | |
| Rejected by LOOCV | Accepted by LOOCV |
| blood vessel remodeling | acute-phase response |
| cholesterol transport | alcohol biosynthetic process |
| endothelial cell proliferation | coenzyme biosynthetic process |
| keratinocyte differentiation | cofactor biosynthetic process |
| positive regulation of epithelial cell proliferation | DNA damage response, signal transduction |
| regulation of heart contraction | gluconeogenesis |
| regulation of muscle contraction | hexose biosynthetic process |
| response to hydrogen peroxide | monosaccharide biosynthetic process |
| sterol transport | purine ribonucleotide biosynthetic process |
| regulation of circadian rhythm | |
| ribonucleotide biosynthetic process | |
| translational initiation | |
| c | |
| GDS972 | |
| Rejected by LOOCV | Accepted by LOOCV |
| ameboidal cell migration | activation of immune response |
| base-excision repair | activation of plasma proteins during acute inflammatory response |
| DNA-dependent DNA replication | aging |
| ER to Golgi vesicle-mediated transport | alcohol catabolic process |
| fatty acid beta-oxidation | ATP synthesis coupled electron transport |
| fatty acid oxidation | B cell mediated immunity |
| germ cell migration | bile acid metabolic process |
| Golgi vesicle transport | carbohydrate catabolic process |
| I-kappaB kinase/NF-kappaB cascade | cellular aromatic compound metabolic process |
| modification-dependent macromolecule catabolic process | cellular carbohydrate catabolic process |
| modification-dependent protein catabolic process | cofactor biosynthetic process |
| proteasomal ubiquitin-dependent protein catabolic process | complement activation |
| protein amino acid N-linked glycosylation | complement activation, classical pathway |
| regulation of cellular biosynthetic process | DNA damage response, signal transduction |
| regulation of DNA replication | DNA damage response, signal transduction by p53 class mediator |
| regulation of protein import into nucleus | gas transport |
| ubiquitin-dependent protein catabolic process | glucose catabolic process |
| glutamine family amino acid catabolic process | |
| glycolysis | |
| heterocycle metabolic process | |
| hexose catabolic process | |
| humoral immune response mediated by circulating immunoglobulin | |
| immunoglobulin mediated immune response | |
| lipid biosynthetic process | |
| lymphocyte mediated immunity | |
| mitochondrial ATP synthesis coupled electron transport | |
| monosaccharide catabolic process | |
| oxidative phosphorylation | |
| protein targeting to mitochondrion | |
| response to toxin | |
| S-adenosylhomocysteine metabolic process | |
| steroid biosynthetic process | |
| sulfur compound biosynthetic process | |
It is evident that in all cases the subset that passed both LOOCV and EDGE introduced more additional ontologies. Furthermore, one of the interesting observations is that through the rejection of a population of genes, we are better able to see evidence of biological processes that are associated with inflammation, the immune response, metabolism, and injury (red) which are hallmarks of our experiments associated with the anti-inflammatory effects of corticosteroids or the response to a significant burn injury.
Figure 4The method for generating all of the different sub-sampled signals. For each time point, the maximum or the minimum replicate is randomly removed. The ensemble average is then taken of these sub-sampled signals. A population of these sub-sampled signals are then generated for a similarity comparison.