| Literature DB >> 28950861 |
Duanchen Sun1,2,3, Yinliang Liu1,2,3, Xiang-Sun Zhang1, Ling-Yun Wu4,5,6.
Abstract
BACKGROUND: High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes.Entities:
Keywords: Complex diseases; Enrichment analysis; Gene ontology; Integer programming; Network-based probabilistic generative model
Mesh:
Year: 2017 PMID: 28950861 PMCID: PMC5615262 DOI: 10.1186/s12918-017-0456-7
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1The tripartite graph for an intuitive interpretation of the generative process. Rectangular nodes represent the GO terms, and rounded nodes represent all genes annotated in one species. An edge is introduced between a GO term and a gene node, if and only if the gene is annotated by that term. The edges between two gene nodes are the counterpart edges in a biological network. The different line types represent different connection types: active-active, active-inactive, inactive-inactive
Fig. 2The workflow of NetGen. The active gene list G is the model input. We want to identify the most enriched GO term set C, which has a reasonable biological explanation to G, as the final output. A greedy-based heuristic algorithm was used to maximum the log-likelihood function
Fig. 3The performance of NetGen and alternative methods on biological process (BP) domain. Each panel stands for a setting of generating parameters. The performance of NetGen, GenGO and Fisher’s exact test are shown in red, blue and orange respectively. The active gene lists were simulated under the assumption of NetGen
The enrichment analysis result of NetGen on lung cancer dataset
| Rank | GO ID | Description |
|
|---|---|---|---|
| 1 | GO:0006491 | N-glycan processing | 1.32e-3 |
| 2 | GO:0006662 | glycerol ether metabolic process | 1.93e-3 |
| 3 | GO:0006175 | dATP biosynthetic process | 5.54e-3 |
| 4 | GO:0060149 | negative regulation of posttranscriptional gene silencing | 5.54e-3 |
| 5 | GO:0006043 | glucosamine catabolic process | 5.54e-3 |
| 6 | GO:0035772 | interleukin-13-mediated signaling pathway | 5.54e-3 |
| 7 |
| negative regulation of steroid hormone secretion | 1.11e-2 |
| 8 | GO:0021648 | vestibulocochlear nerve morphogenesis | 1.11e-2 |
| 9 | GO:0072318 | clathrin coat disassembly | 1.11e-2 |
| 10 | GO:1900748 | positive regulation of vascular endothelial growth factor signaling pathway | 1.65e-2 |
| 11 | GO:0015014 | heparan sulfate proteoglycan biosynthetic process, polysaccharide chain biosynthetic process | 1.65e-2 |
| 12 | GO:0000710 | meiotic mismatch repair | 1.65e-2 |
| 13 | GO:0030070 | insulin processing | 1.65e-2 |
| 14 |
| ureter urothelium development | 1.65e-2 |
| 15 | GO:0034154 | toll-like receptor 7 signaling pathway | 2.20e-2 |
| 16 |
| negative regulation of female gonad development | 1 |
| 17 |
| synaptic vesicle priming | 1 |
| 18 |
| positive regulation of clathrin-dependent endocytosis | 1 |
| 19 |
| tRNA wobble position uridine thiolation | 1 |
| 20 |
| angiogenesis involved in coronary vascular morphogenesis | 1 |
Best parameter setting: p 1=0.5, p 2=0.05, q=0.001. Term combination p=1.50e-24
An appropriate parameter combination identified via a mixed parameter selection strategy was shown at the bottom of the table. The Fisher’s exact test p-values for single term and term combination were listed. The GO terms in bold were particularly identified by NetGen
The enrichment analysis result of NetGen on ulcerative colitis dataset
| Rank | GO ID | Description |
|
|---|---|---|---|
| 1 |
| positive regulation of transcription elongation from RNA polymerase II promoter | 1.10e-3 |
| 2 | GO:0018874 | benzoate metabolic process | 3.83e-3 |
| 3 | GO:0010900 | negative regulation of phosphatidylcholine catabolic process | 3.83e-3 |
| 4 | GO:1900402 | regulation of carbohydrate metabolic process by regulation of transcription from RNA polymerase II promoter | 3.83e-3 |
| 5 | GO:0006294 | nucleotide-excision repair, preincision complex assembly | 3.83e-3 |
| 6 | GO:0038193 | thromboxane A2 signaling pathway | 3.83e-3 |
| 7 | GO:0031119 | tRNA pseudouridine synthesis | 7.65e-3 |
| 8 | GO:0007439 | ectodermal digestive tract development | 7.65e-3 |
| 9 | GO:0009240 | isopentenyl diphosphate biosynthetic process | 1.15e-2 |
| 10 | GO:0045196 | establishment or maintenance of neuroblast polarity | 1.15e-2 |
| 11 | GO:0006154 | adenosine catabolic process | 1.15e-2 |
| 12 | GO:0002254 | kinin cascade | 1.15e-2 |
| 13 | GO:2000681 | negative regulation of rubidium ion transport | 1.15e-2 |
| 14 |
| hematopoietic stem cell migration | 1.52e-2 |
| 15 | GO:0008612 | peptidyl-lysine modification to peptidyl-hypusine | 1.52e-2 |
| 16 |
| negative regulation of hydrogen peroxide-mediated programmed cell death | 1 |
| 17 |
| regulation of high voltage-gated calcium channel activity | 1 |
| 18 |
| positive regulation of GTPase activity | 1 |
| Best parameter setting: | Term combination | ||
aGO:0032850 updated to alternate term GO:0043547
An appropriate parameter combination identified via a mixed parameter selection strategy was shown at the bottom of the table. The Fisher’s exact test p-values for single term and term combination were listed. The GO terms in bold were particularly identified by NetGen
The enrichment analysis result of NetGen on cervical carcinogenesis dataset
| Rank | GO ID | Description |
|
|---|---|---|---|
| 1 | GO:0006271 | DNA strand elongation involved in DNA replication | 3.42e-11 |
| 2 |
| regulation of spindle organization | 1.78e-3 |
| 3 | GO:0001927 | exocyst assembly | 6.43e-3 |
| 4 | GO:0038016 | insulin receptor internalization | 6.43e-3 |
| 5 |
| intralumenal vesicle formation | 6.43e-3 |
| 6 | GO:0086042 | cardiac muscle cell-cardiac muscle cell adhesion | 6.43e-3 |
| 7 |
| regulation of muscle hyperplasia | 1.28e-2 |
| 8 | GO:2000393 | negative regulation of lamellipodium morphogenesis | 1.28e-2 |
| 9 |
| regulation of ubiquitin homeostasis | 1.28e-2 |
| 10 | GO:0006050 | mannosamine metabolic process | 1.28e-2 |
| 11 |
| regulation of mitotic centrosome separation | 1.92e-2 |
| 12 | GO:0072708 | response to sorbitol | 1.92e-2 |
| 13 | GO:0001992 | regulation of systemic arterial blood pressure by vasopressin | 1.92e-2 |
| 14 | GO:1902498 | regulation of protein autoubiquitination | 1.92e-2 |
| 15 | GO:0048388 | endosomal lumen acidification | 1.92e-2 |
| 16 |
| vesicle fusion with Golgi apparatus | 2.55e-2 |
| 17 | GO:0097264 | self proteolysis | 3.18e-2 |
| 18 | GO:0045329 | carnitine biosynthetic process | 3.18e-2 |
| 19 |
| kinetochore assembly | 7.45e-2 |
| 20 |
| karyogamy | 1 |
| 21 |
| regulation of apolipoprotein binding | 1 |
| 22 |
| negative regulation of cellular pH reduction | 1 |
| 23 |
| endoplasmic reticulum membrane organization | 1 |
Best parameter setting: p 1=0.5, p 2=0.05, q=0.001. Term combination p=1.10e-37
An appropriate parameter combination identified via a mixed parameter selection strategy was shown at the bottom of the table. The Fisher’s exact test p-values for single term and term combination were listed. The GO terms in bold were particularly identified by NetGen
The enrichment analysis result of NetGen on renal cell carcinoma dataset
| Rank | GO ID | Description |
|
|---|---|---|---|
| 1 | GO:0090259 | regulation of retinal ganglion cell axon guidance | 7.56e-7 |
| 2 |
| transferrin transport | 1.05e-6 |
| 3 | GO:0072017 | distal tubule development | 3.03e-5 |
| 4 | GO:2000054 | negative regulation of Wnt signaling pathway involved in dorsal/ventral axis specification | 3.34e-5 |
| 5 |
| glomerular visceral epithelial cell development | 1.17e-3 |
| 6 | GO:2000287 | positive regulation of myotome development | 5.81e-3 |
| 7 | GO:0006113 | fermentation | 5.81e-3 |
| 8 | GO:0051460 | negative regulation of corticotropin secretion | 5.81e-3 |
| 9 | GO:0060720 | spongiotrophoblast cell proliferation | 5.81e-3 |
| 10 | GO:0043438 | acetoacetic acid metabolic process | 5.81e-3 |
| 11 | GO:0032972 | regulation of muscle filament sliding speed | 5.81e-3 |
| 12 |
| negative regulation of protein kinase C signaling | 1.16e-2 |
| 13 | GO:0035425 | autocrine signaling | 1.16e-2 |
| 14 | GO:0010760 | negative regulation of macrophage chemotaxis | 1.16e-2 |
| 15 |
| positive regulation of dopamine receptor signaling pathway | 1.73e-2 |
| 16 | GO:0097411 | hypoxia-inducible factor-1alpha signaling pathway | 1.73e-2 |
| 17 | GO:0060435 | bronchiole development | 1.73e-2 |
| 18 |
| amino acid neurotransmitter reuptake | 1.73e-2 |
| 19 | GO:0046598 | positive regulation of viral entry into host cell | 2.31e-2 |
| 20 | GO:0015015 | heparan sulfate proteoglycan biosynthetic process, enzymatic modification | 2.87e-2 |
| 21 | GO:0006572 | tyrosine catabolic process | 2.87e-2 |
| 22 | GO:0019532 | oxalate transport | 2.87e-2 |
| 23 |
| mesonephric tubule morphogenesis | 3.44e-2 |
| 24 |
| glucose 6-phosphate metabolic process | 5.12e-2 |
| 25 |
| Factor XII activation | 1 |
| 26 |
| negative regulation of sodium ion transport | 1 |
| 27 |
| positive regulation of skeletal muscle cell proliferation | 1 |
Best parameter setting: p 1=0.5, p 2=0.05, q=0.001. Term combination p=8.68e-50
An appropriate parameter combination identified via a mixed parameter selection strategy was shown at the bottom of the table. The Fisher’s exact test p-values for single term and term combination were listed. The GO terms in bold were particularly identified by NetGen
Fig. 4Comparison of the averaged semantic similarity score in the identified term set. The light green distribution represents the semantic similarity score at the random level. The blue, orange and red bar represent the NetGen, GenGO and Fisher’s exact test, respectively. The semantic similarity score was computed using the GOSemSim package [24] in R