| Literature DB >> 24555475 |
Zuoshuang Xiang, Tingting Qin, Zhaohui S Qin, Yongqun He.
Abstract
BACKGROUND: The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level.Entities:
Mesh:
Year: 2013 PMID: 24555475 PMCID: PMC3852244 DOI: 10.1186/1752-0509-7-S3-S9
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1The GenoMesh algorithm.
Figure 2ROC curve comparison of different methods for MeSH term weighting and gene-to-gene dissimilarity calculations.
Analysis of the relationships between E. coli hfq, dsrA, and cpxR genes
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 | D035001 | host factor 1 protein | 170 | 28 | 0 |
| 2 | D015964 | Gene Expression Regulation, Bacterial | 121 | 36 | 49 |
| 3 | D012333 | RNA, Messenger | 84 | 22 | 2 |
| 4 | D022661 | RNA, Untranslated | 55 | 38 | 0 |
| 5 | D012808 | Sigma Factor | 52 | 38 | 11 |
| 6 | D011485 | Protein binding | 39 | 12 | 4 |
| 7 | D014176 | Protein Biosynthesis | 37 | 16 | 1 |
| 8 | D016601 | RNA-binding Proteins | 34 | 3 | 0 |
| 9 | D014158 | Transcription, Genetic | 33 | 11 | 19 |
| 10 | D001425 | Bacterial Outer Membrane Proteins | 31 | 11 | 20 |
| 11 | D014157 | Transcription Factors | 24 | 6 | 13 |
| 12 | D004268 | DNA-Binding Proteins | 22 | 10 | 3 |
| 13 | D018832 | Molecular Chaperones | 17 | 3 | 24 |
| 14 | D015536 | Down-Regulation | 11 | 0 | 1 |
| 15 | D012270 | Ribosomes | 9 | 3 | 0 |
| 16 | D006360 | Heat-Shock Proteins | 7 | 1 | 16 |
| 17 | D033903 | Periplasmic Proteins | 1 | 0 | 13 |
GenoMesh results: hfq vs dsrA: Dissimilarity: 0.0845. p-value: 0.0003, co-published papers: 39
hfq vs cpxR: Dissimilarity: 0.2901. p-value: 0.0215, co-published papers: 0
Selected top E. coli five gene pairs predicated using literature data before 2004 and verified by literature data afterwards.
| Index | Gene1 | Gene2 | Dissim Score | p-value | PMIDs | MeSH terms |
|---|---|---|---|---|---|---|
| 1 | 0.073 | 3.83E-05 | 15778224, 17660416, 18411271 | Polyisoprenyl Phosphates || Bacitracin || Phosphoric Monoester Hydrolases || Fosfomycin || Periplasm | ||
| 2 | 0.075 | 4.25E-05 | 15683249, 16645316, 16807239, 17489563 | Electron Transport Complex I || NADH Dehydrogenase || Iron-Sulfur Proteins || NADH, NADPH Oxidoreductases || Electron Spin Resonance Spectroscopy | ||
| 3 | 0.098 | 5.84E-05 | 15778224, 17660416, 18411271 | Polyisoprenyl Phosphates || Bacitracin || Fosfomycin || Phosphoric Monoester Hydrolases || Periplasm | ||
| 4 | 0.110 | 7.53E-05 | 17668201, 17938909, 18335216 | Hydrogenase || Hydrogen || Genetic Enhancement || Formate Dehydrogenases || Paraquat | ||
| 5 | 0.144 | 1.18E-04 | 17785472 | L-Serine Dehydratase || Serine || Amino Acid Transport Systems || Urinary Tract || Transcription Factors | ||
Figure 3Clusters of . (A) Thirty-two E. coli flagellar genes were clustered together; (B) Six E. coli flagellar genes were clustered together. The neighbour branch of the six-gene branch includes five E. coli genes.
Figure 4A cluster of .
GenoMesh analysis of 31 E. coli pathways containing at least 10 genes.
| Index | Pathway name | # of genes | Average dissimilarity score | SD | Z value | *p-value |
|---|---|---|---|---|---|---|
| 1 | superpathway of chorismate | 50(61) | 0.077 | 0.134 | -10.98 | 0 |
| 2 | superpathway of histidine, purine, and pyrimidine biosynthesis | 42(58) | 0.080 | 0.117 | -10.67 | 2.91E-275 |
| 3 | superpathway of glycolysis, pyruvate dehydrogenase, TCA, and glyoxylate bypass | 35(45) | 0.074 | 0.140 | -8.39 | 3.19E-146 |
| 4 | aspartate superpathway | 26(29) | 0.080 | 0.133 | -8.06 | 2.03E-103 |
| 5 | respiration (anaerobic) | 24(30) | 0.086 | 0.170 | -8.57 | 1.87E-108 |
| 6 | respiration (anaerobic)-- electron donors reaction list | 21(31) | 0.209 | 0.260 | -25.72 | 0 |
| 7 | mixed acid fermentation | 21(28) | 0.102 | 0.171 | -10.32 | 5.00E-138 |
| 8 | superpathway of glyoxylate bypass and TCA | 21(24) | 0.123 | 0.190 | -11.86 | 9.88E-182 |
| 9 | superpathway of lysine, threonine, methionine, and S-adenosyl-L-methionine biosynthesis | 21(23) | 0.103 | 0.140 | -10.45 | 1.71E-141 |
| 10 | tRNA charging pathway | 21(23) | 0.073 | 0.107 | -6.21 | 2.18E-51 |
| 11 | superpathway of threonine metabolism | 20(26) | 0.133 | 0.208 | -14.37 | 8.72E-253 |
| 12 | superpathway of arginine and polyamine biosynthesis | 19(22) | 0.124 | 0.135 | -11.32 | 1.46E-152 |
| 13 | superpathway of phenylalanine, tyrosine, and tryptophan biosynthesis | 18(25) | 0.148 | 0.162 | -15.52 | 1.15E-269 |
| 14 | superpathway of leucine, valine, and isoleucine biosynthesis | 17(30) | 0.215 | 0.247 | -23.38 | 0 |
| 15 | aerobic respiration -- electron donors reaction list | 17(21) | 0.270 | 0.286 | -30.45 | 0 |
| 16 | TCA cycle | 17(20) | 0.143 | 0.209 | -14.37 | 2.47E-221 |
| 17 | respiration (anaerobic)-- electron acceptors reaction list | 16(25) | 0.194 | 0.212 | -20.18 | 0 |
| 18 | superpathway of lipopolysaccharide biosynthesis | 15(26) | 0.093 | 0.127 | -7.47 | 1.20E-54 |
| 19 | superpathway of glycolysis and Entner-Doudoroff | 15(22) | 0.114 | 0.126 | -9.92 | 8.82E-95 |
| 20 | superpathway of fatty acid biosynthesis | 12(24) | 0.223 | 0.221 | -19.90 | 0 |
| 21 | glycolysis I | 12(18) | 0.113 | 0.135 | -8.61 | 1.11E-59 |
| 22 | formylTHF biosynthesis I | 12(15) | 0.060 | 0.079 | -3.04 | 4.90E-09 |
| 23 | methionine and methyl-donor-molecule biosynthesis | 11(13) | 0.115 | 0.145 | -8.36 | 1.92E-52 |
| 24 | superpathway of sulfate assimilation and cysteine biosynthesis | 11(12) | 0.176 | 0.225 | -14.26 | 2.72E-148 |
| 25 | tetrahydrofolate biosynthesis I | 11(12) | 0.081 | 0.153 | -4.95 | 1.13E-19 |
| 26 | de novo biosynthesis of pyrimidine ribonucleotides | 11(12) | 0.119 | 0.142 | -8.67 | 3.51E-56 |
| 27 | peptidoglycan biosynthesis I | 11(11) | 0.294 | 0.225 | -25.91 | 0 |
| 28 | arginine biosynthesis I | 11(11) | 0.181 | 0.156 | -14.93 | 3.35E-162 |
| 29 | de novo biosynthesis of pyrimidine deoxyribonucleotides | 10(18) | 0.150 | 0.220 | -11.06 | 4.00E-83 |
| 30 | chorismate biosynthesis | 10(11) | 0.210 | 0.202 | -16.58 | 7.45E-184 |
| 31 | colanic acid building blocks biosynthesis | 10(11) | 0.114 | 0.135 | -7.78 | 3.94E-42 |
Note: All permutation p-values are <0.001. * p-valule: 0 means less than 1.00E-323.
Figure 5Histogram analyses of average dissimilarity scores of random networks. The peaks and shapes of the curves are affected by the number of genes included in the random networks.
Figure 6Analysis of the term “Neutrophil Activation” from the GenoMesh MeSHBrowse website. After browsing the MeSH hierarchical tree from “Phenomena and Processes” → “Immune System Phenomena” → “Immune System Processes” → “Neutrophil Activation”, 23 E. coli genes were found to be associated with the MeSH term “Neutrophil Activation". The related genes and gene pairs were then provided next to the hierarchical tree. Furthermore, a network of these 23 E. coli genes was automatically generated (note: the network image will only be generated if the gene number is less than 100). The gray or red-colored edges represent respectively interactions or predicted interactions. The GenoMesh annotation of the gene pair ytjC and yjhR is provided when a user moves the mouse cursor over the red line (edge) linking these two genes. A click on this link would lead the page to a detailed analysis of the gene pair (not shown).
Five example homologous E. coli and Brucella genes and their associated genes
| Gene Name | Associated | Associated |
|---|---|---|
*Note: To be included as an associated gene with one of the five selective target genes, the gene needs to share at least one co-publication with the target gene, or the two gene pair has a p-value < 0.05 based on the GenoMesh dissimilarity calculation.