| Literature DB >> 21871085 |
Junguk Hur1, Zuoshuang Xiang, Eva L Feldman, Yongqun He.
Abstract
BACKGROUND: Vaccine literature indexing is poorly performed in PubMed due to limited hierarchy of Medical Subject Headings (MeSH) annotation in the vaccine field. Vaccine Ontology (VO) is a community-based biomedical ontology that represents various vaccines and their relations. SciMiner is an in-house literature mining system that supports literature indexing and gene name tagging. We hypothesize that application of VO in SciMiner will aid vaccine literature indexing and mining of vaccine-gene interaction networks. As a test case, we have examined vaccines for Brucella, the causative agent of brucellosis in humans and animals.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21871085 PMCID: PMC3180695 DOI: 10.1186/1471-2172-12-49
Source DB: PubMed Journal: BMC Immunol ISSN: 1471-2172 Impact factor: 3.615
Figure 1Overall VO-SciMiner workflow.
Figure 2. (A) Asserted hierarchy; (B) Inferred hierarchy. These are Protégé screenshots of VO without (A) or with reasoning using HermiT 1.3.2 (B).
Performance of VO-SciMiner literature mining
| Testing set | Gold-Standard VO-Paper association | Identified by VO-SciMiner | True Identification | Recall | Precision | F-measure |
|---|---|---|---|---|---|---|
| Positive 50 | 89 | 81 | 81 | 91% | 100% | 95% |
| Negative 50 | 0 | 1 | 0 | |||
| Total | 89 | 82 | 81 | 91% | 99% | 95% |
VO-based indexing results using VO-SciMiner and PubMed (as of April 20, 2011)
| VO-SciMiner without child VOs | VO-SciMiner with child VOs | ||||
|---|---|---|---|---|---|
| PubMed Search Keywords | PubMed | Total (only by | Common | Total (only by | Common |
| 1,379 | 1,359 (0) | 1,359 | 2,155 (790) | 1,365 | |
| Live attenuated | 74 | 85 (12) | 73 | 922 (849) | 73 |
| Live attenuated | 52 | 50 (1) | 49 | 736 (647) | 49 |
| Live attenuated | 36 | 37 (1) | 36 | 204 (168) | 36 |
| Live attenuated | 5 | 5 (0) | 5 | 23 (18) | 5 |
Figure 3Comparison between the Entrez system and VO-based literature search approaches. An example of searching "live attenuated Brucella abortus vaccine" is illustrated.
Brucella genes in the gene-VO network
| Locus Tag | Gene symbol | Gene name | # of associated VO terms | # of papers - All VO terms (Live Attenuated vaccines) | Virulence factor or Protective antigen | Sub-cellular localization (PSORTb 3.0 score) |
|---|---|---|---|---|---|---|
| BMEI0296 | purE | phosphoribosylaminoimidazole carboxylase, catalytic subunit | 11 | 4 (4) | V | U (2.00) |
| BMEI0295 | purK | phosphoribosylaminoimidazole carboxylase ATPase subunit | 4 | 1 (1) | CM (7.88) | |
| BMEI2035 | bvrS | sensor histidine kinase BvrS, putative | 10 | 3 (1) | V | CM (10.00) |
| BMEI0190 | ptsP | phosphoenolpyruvate-protein phosphotransferase | 5 | 1 (1) | C (9.97) | |
| BMEI2036 | bvrR | DNA-binding response regulator BvrR, putative | 10 | 4 (2) | V | C (9.97) |
| BMEII0150 | fliC | flagellin family protein | 5 | 3 (0) | V | E (9.71) |
| BMEII0665 | rnr | exoribonuclease, VacB/RNase II family | 6 | 1 (1) | C (9.97) | |
| BMEII0427 | eryD | erythritol transcriptional regulator | 2 | 1 (0) | C (8.96) | |
| BMEII1116 | vjbR | transcriptional regulator, LuxR family | 10 | 3 (2) | V | U (2.00) |
| BMEI0749 | rpoB | DNA-directed RNA polymerase subunit beta | 6 | 5 (1) | C (9.97) | |
| BMEI1649 | ureG-1 | urease accessory protein UreG | 2 | 1 (0) | C (9.97) | |
| BMEI2036 | bvrR | DNA-binding response regulator BvrR, putative | 10 | 4 (2) | V | C (9.97) |
| BMEII0404 | leuB | 3-isopropylmalate dehydrogenase | 5 | 1 (1) | C (9.97) | |
| BMEI0101 | cysK | Cysteine synthase A | 6 | 1 (1) | V | C (9.26) |
| BMEI1653 | ureB-1 | urease subunit beta | 3 | 2 (0) | C (9.26) | |
| BMEII0561 | gcvP | glycine dehydrogenase | 5 | 1 (0) | V | C (9.97) |
| BMEI1654 | ureA-1 | urease subunit gamma | 3 | 2 (0) | C (9.26) | |
| BMEII0193 | potA | ABC transporter | 6 | 3 (2) | C (9.12) | |
| BMEII1054 | hisG | ATP phosphoribosyltransferase catalytic subunit | 2 | 1 (0) | C (9.97) | |
| BMEII0205 | dppF | ABC transporter | 6 | 3 (2) | CM (7.88) | |
| BMEI1324 | pepN | aminopeptidase N | 5 | 1 (1) | V | C (9.12) |
| BMEII0407 | asd | aspartate-semialdehyde dehydrogenase | 6 | 1 (0) | C (9.97) | |
| BMEI1652 | ureC-1 | urease subunit alpha | 3 | 3 (0) | C (9.97) | |
| BMEI0933 | cysK | cysteine synthase A | 6 | 1 (1) | V | C (9.97) |
| BMEI1111 | acpXL | acyl carrier protein | 5 | 2 (0) | C (9.97) | |
| BMEI1475 | acpP | acyl carrier protein | 5 | 1 (0) | C (9.26) | |
| BMEI0546 | pncA | pyrazinamidase/nicotinamidase | 2 | 1 (0) | V | C (9.97) |
| BMEI1829 | ropB | outer membrane protein, putative | 23 | 37 (17) | OM (10.00) | |
| BMEI1237 | galE | epimerase/dehydratase family protein, putative | 6 | 3 (1) | V | C (8.96) |
| BMEI1249 | omp25 | outer-membrane protein Omp25 | 11 | 17 (4) | V, P | OM (10.00) |
| BMEI0402 | omp31-1 | outer membrane protein Omp31 | 23 | 22 (9) | P | OM (10.00) |
| BMEII0844 | omp31-2 | outer membrane protein, 31 kDa | 19 | 17 (7) | OM (10.00) | |
| BMEI1413 | gmd | GDP-mannose 4,6-dehydratase | 5 | 2 (0) | V | C (9.97) |
| BMEI0997 | wbdA | glycosyl transferase, group 1 family protein | 16 | 3 (1) | V | U (2.00) |
| BMEI1416 | rfbE | O-antigen export system ATP-binding protein RfbE | 6 | 3 (2) | CM (7.88) | |
| BMEII0847 | wbjE | putative glycosyltransferase | 5 | 1 (0) | U (2.00) | |
| BMEI1393 | wbpZ | glycosyl transferase, group 1 family protein | 5 | 1 (0) | V | C (9.26) |
| BMEI1335 | omp | outer membrane lipoprotein-related protein | 12 | 5 (2) | U (2.00) | |
| BMEI0998 | wboA | glycosyl transferase WboA | 17 | 16 (10) | V | C (8.96) |
| BMEI0340 | pal | lipoprotein, Pal family | 12 | 7 (3) | P | OM (10.00) |
| BMEI0830 | yaeT | bacterial surface antigen | 23 | 37 (17) | OM (10.00) | |
| BMEI1417 | wbkB | wbkB protein | 5 | 2 (0) | V | U (2.00) |
| BMEI0509 | lpcC | lipopolysaccharide core biosynthesis mannosyltransferase LpcC | 9 | 2 (1) | C (9.97) | |
| BMEI1414 | perA | perosamine synthase, putative | 10 | 3 (1) | V | C (9.97) |
| BMEII0837 | hyaD | glycosyl transferase, group 2 family protein | 5 | 1 (0) | CM (9.82) | |
| BMEI1404 | wbkA | mannosyltransferase, putative | 16 | 7 (3) | V | C (9.26) |
| BMEII0253 | mepA | penicillin-insensitive murein endopeptidase | 5 | 1 (1) | P (9.76) | |
| BMEII0404 | leuB | 3-isopropylmalate dehydrogenase | 5 | 1 (1) | C (9.97) | |
| BMEI0474 | petB | ubiquinol-cytochrome c reductase, cytochrome b | 12 | 4 (2) | CM (10.00) | |
| BMEII0429 | eryB | glycerol-3-phosphate dehydrogenase | 2 | 1 (0) | V | C (9.97) |
| BMEI0137 | mdh | malate dehydrogenase | 5 | 1 (1) | U (4.99) | |
| BMEI0140 | kgd | alpha-ketoglutarate decarboxylase | 11 | 8 (4) | C (9.97) | |
| BMEII0076 | tycC | enterobactin synthetase, component F, putative | 2 | 1 (0) | C (9.97) | |
| BMEI1547 | atpI | ATP sythase protein I, putative | 6 | 1 (1) | U (2.00) | |
| BMEI0147 | xerC | site-specific tyrosine recombinase XerC | 5 | 1 (0) | C (9.26) | |
| BMEI0215 | ialA | dinucleoside polyphosphate hydrolase | 1 | 1 (0) | C (9.97) | |
| BMEI0884 | gyrA | DNA gyrase subunit A | 2 | 1 (0) | C (9.97) | |
| BMEI0040 | xerD | site-specific tyrosine recombinase XerD | 5 | 2 (0) | C (9.97) | |
| BMEI0880 | ssb | single-stranded DNA-binding protein family | 5 | 2 (1) | C (9.26) | |
| BMEI1823 | gyrB | DNA gyrase subunit B | 2 | 1 (0) | C (9.97) | |
| BMEI1200 | parC | DNA topoisomerase IV subunit A | 2 | 1 (0) | C (9.12) | |
| BMEI0878 | uvrA | excinuclease ABC subunit A | 5 | 2 (1) | V | C (9.97) |
| BMEI1307 | xerC | site-specific recombinase, phage integrase family | 5 | 1 (0) | C (9.97) | |
| BMEII0676 | parE | DNA topoisomerase IV subunit B | 2 | 1 (0) | C (9.97) | |
| BMEI1946 | mutM | formamidopyrimidine-DNA glycosylase | 1 | 1 (0) | V | C (9.97) |
| BMEII0739 | alkB | alkylated DNA repair protein AlkB | 6 | 1 (1) | U (2.00) | |
| BMEII0184 | insN | IS3 family element, transposase orfA | 6 | 1 (1) | U (2.00) | |
| BR1202 | recA | recombinase A | 2 | 1 (0) | V | C (10.00) |
| BMEI1650 | ureF | urease accessory protein UreF, putative | 2 | 1 (0) | U (2.00) | |
| BMEII1047 | groES | co-chaperonin GroES | 1 | 1 (0) | C (9.97) | |
| BMEI1041 | sufC | ABC transporter, ATP-binding protein | 6 | 3 (2) | C (9.97) | |
| BMEI2001 | dnaJ | chaperone protein DnaJ | 1 | 1 (0) | C (9.97) | |
| BMEI1330 | htrA | serine protease | 7 | 6 (3) | V | P (9.76) |
| BMEI2002 | dnaK | molecular chaperone DnaK | 12 | 7 (2) | V, P | C (9.97) |
| BMEII1048 | groEL | chaperonin GroEL | 10 | 9 (5) | C (9.97) | |
| BMEI1655 | ureD-1 | urease accessory protein UreD | 3 | 2 (0) | C (9.26) | |
| BMEI1649 | ureG-1 | urease accessory protein UreG | 2 | 1 (0) | C (9.97) | |
| BMEII0401 | trx-2 | thioredoxin | 5 | 2 (1) | C (9.26) | |
| BMEI1069 | tig | trigger factor | 10 | 2 (0) | V, P | C (8.96) |
| BMEI1651 | ureE | urease accessory protein UreE | 2 | 1 (0) | C (9.97) | |
| BMEI1060 | dsbA | outer membrane protein, putative | 23 | 37 (17) | V | U (2.00) |
| BMEI2022 | trx-1 | thioredoxin | 5 | 2 (1) | C (9.26) | |
| BMEI1265 | surA | peptidyl-prolyl cis-trans isomerase, putative | 9 | 1 (1) | P | P (9.76) |
| BMEI1492 | exsA | ABC transporter, ATP-binding/permease protein | 6 | 1 (1) | V | CM (10.00) |
| BMEI2010 | infC | translation initiation factor IF-3 | 2 | 2 (0) | C (9.97) | |
| BMEI0826 | frr | ribosome recycling factor | 5 | 2 (1) | C (9.97) | |
| BMEI0748 | rplL | 50S ribosomal protein L7/L12 | 16 | 17 (6) | P | U (6.49) |
| BMEI0752 | rpsL | 30S ribosomal protein S12 | 5 | 1 (1) | C (9.26) | |
| BMEI1497 | tlyA | hemolysin A | 1 | 2 (0) | C (8.96) | |
| BMEII0003 | modC | molybdenum ABC transporter, ATP-binding protein | 6 | 3 (2) | C (9.12) | |
| BMEII0893 | katA | catalase | 6 | 2 (0) | P (10.00) | |
| BMEII0177 | znuC | zinc ABC transporter, ATP-binding protein | 6 | 3 (2) | V | CM (7.88) |
| BMEI0790 | phoA | bacterial alkaline phosphatase | 9 | 7 (2) | P (10.00) | |
| BMEII0581 | sodC | superoxide dismutase, Cu-Zn | 15 | 14 (8) | V, P | P (10.00) |
| BMEI1292 | fsr | fosmidomycin resistance protein | 2 | 1 (0) | CM (10.00) | |
| BMEII0108 | tauB | taurine ABC transporter, ATP-binding protein | 6 | 3 (2) | CM (9.98) | |
| BMEI0635 | cbiO | cobalt ABC transporter, ATP-binding protein | 6 | 3 (2) | CM (9.82) | |
| BMEII0704 | bfr | bacterioferritin | 12 | 4 (2) | P | C (9.97) |
| BMEII0178 | znuA | zinc ABC transporter, periplasmic zinc-binding protein | 9 | 1 (1) | V | P (9.76) |
| BMEII0589 | ribH | riboflavin synthase subunit beta | 12 | 8 (0) | C (9.97) | |
| BMEI2029 | ahcY | S-adenosyl-L-homocysteine hydrolase | 4 | 1 (1) | C (9.97) | |
| BMEI1099 | cobT | nicotinate-nucleotide--dimethylbenzimidazole phosphoribosyltransferase | 2 | 1 (0) | C (9.97) | |
| BMEI1187 | ribH | riboflavin synthase subunit beta | 12 | 8 (0) | C (9.97) | |
| BMEII0470 | crcB | crcB family protein | 5 | 1 (0) | CM (10.00) | |
| BMEII0288 | oppF | peptide ABC transporter, ATP-binding protein | 6 | 3 (2) | CM (9.99) | |
| BMEI1081 | surE | stationary phase survival protein SurE | 8 | 5 (2) | C (8.96) | |
| BMEI0215 | ialA | dinucleoside polyphosphate hydrolase | 1 | 1 (0) | C (9.97) | |
| BMEI0920 | mazG | nucleoside triphosphate pyrophosphohydrolase | 5 | 1 (0) | C (8.96) | |
| BMEI1584 | ialB | invasion protein B | 1 | 2 (0) | P | U (2.50) |
| BMEII0355 | gal | D-galactose 1-dehydrogenase, putative | 5 | 1 (0) | C (9.97) | |
| BMEI1111 | acpXL | acyl carrier protein | 5 | 2 (0) | C (9.97) | |
| BMEI1553 | bacA | transport protein | 9 | 5 (4) | V | CM (10.00) |
| BMEI1475 | acpP | acyl carrier protein | 5 | 1 (0) | C (9.26) | |
| BMEII0983 | chvE | sugar ABC transporter, periplasmic sugar-binding protein, putative | 9 | 1 (1) | U (5.02) | |
| BMEI1394 | manA | mannose-6-phosphate isomerase | 5 | 1 (0) | C (8.96) | |
| BMEI1237 | galE | epimerase/dehydratase family protein, putative | 6 | 3 (1) | V | C (8.96) |
| BMEII0430 | eryA | erythritol kinase | 5 | 2 (1) | C (9.26) | |
| BMEI0310 | gap | glyceraldehyde-3-phosphate dehydrogenase | 5 | 1 (1) | C (9.97) | |
| BMEII0750 | smoK | sugar ABC transporter, ATP-binding protein | 6 | 3 (2) | CM (9.99) | |
| BMEI1416 | rfbE | O-antigen export system ATP-binding protein RfbE | 6 | 3 (2) | CM (7.88) | |
| BMEII0940 | smoK | sugar ABC transporter, ATP-binding protein | 6 | 3 (2) | CM (9.99) | |
| BMEII0625 | ugpB | glycerol-3-phosphate ABC transporter, periplasmic glycerol-3-phosphate-binding protein | 1 | 1 (0) | V | P (9.76) |
| BMEI0309 | pgk | phosphoglycerate kinase | 6 | 1 (1) | C (9.97) | |
| BMEII0899 | manB | phosphoglucomutase, putative | 10 | 3 (1) | V | C (9.26) |
| BMEII0428 | eryC | D-erythrulose-1-phosphate dehydrogenase | 9 | 2 (1) | V | C (8.96) |
| ABM67295 | P39 | immunogenic 39-kDa protein | 15 | 7 (3) | P | P (9.44) |
| BMEII0982 | rbsA | sugar ABC transporter, ATP-binding protein, putative | 5 | 1 (1) | CM (9.82) | |
| BMEII0355 | gal | D-galactose 1-dehydrogenase, putative | 5 | 1 (0) | C (9.97) | |
| BMEI1396 | pmm | phosphomannomutase, putative | 10 | 5 (1) | V | C (9.97) |
| BMEII0145 | xylG | D-xylose ABC transporter, ATP-binding protein | 6 | 3 (2) | CM (7.88) | |
| BMEI1886 | pgm | phosphoglucomutase | 9 | 2 (1) | V | C (8.96) |
| BMEII0251 | glk | glucokinase | 5 | 1 (1) | C (9.97) | |
| BMEI1077 | yajC | preprotein translocase, YajC subunit | 6 | 1 (1) | CM (9.82) | |
| BMEI1076 | secD | protein-export membrane protein, SecD/SecF family | 6 | 1 (1) | CM (10.00) | |
| BMEII0026 | virB2 | type IV secretion system protein VirB2 | 9 | 3 (1) | V | CM (9.46) |
| BMEII0028 | virB4 | type IV secretion system protein VirB4 | 9 | 4 (1) | V | CM (10.00) |
| BMEII0029 | virB5 | type IV secretion system protein VirB5 | 9 | 1 (1) | V | U (2.00) |
| BMEII0032 | virB8 | type IV secretion system protein VirB8 | 9 | 1 (0) | V | U (2.00) |
| BMEII0025 | virB1 | type IV secretion system protein VirB1 | 5 | 1 (0) | V | E (9.64) |
| BMEII0034 | virB10 | type IV secretion system protein VirB10 | 5 | 1 (1) | V | U (4.90) |
| CAA86936 | BLS | Brucella lumazine synthase | 19 | 12 (2) | P | C (9.97) |
| BMEI1305 | omp2b | porin Omp2b | 16 | 21 (9) | OM (9.93) | |
| BMEI1306 | omp2a | porin Omp2a | 13 | 17 (8) | OM (9.93) | |
| BMEI0135 | omp19 | lipoprotein Omp19 | 12 | 10 (5) | V, P | OM (10.00) |
| BMEII0017 | omp10 | lipoprotein Omp10 | 9 | 6 (2) | V | OM (10.00) |
| BMEI0536 | omp28 | immunoreactive 28 kDa outer membrane protein | 19 | 15 (11) | P | P (10.00) |
| BMEI0330 | opgC | opgC protein, putative | 5 | 1 (1) | CM (10.00) | |
| BMEI0634 | crcB | crcB family protein | 5 | 1 (0) | CM (10.00) | |
| BMEI0545 | pncA | hypothetical protein | 5 | 1 (0) | V | CM (10.00) |
Note: Sub-cellular localization was predicted using PSORTb 3.0 implemented in Vaxign. PSORTb score ranges from 0 (0% probability) to 10 (100% probability). Proteins with a score below 7.44 are classed as unknown. The abbreviations in this table are C (Cytoplasmic), CM (CytoplasmicMembrane), E (Extracellular), OM (OuterMembrane), P (Periplasmic) and U (Unknown).
COG functional analysis of Brucella genes associated with vaccine research
| VO-associated Genes | Virulent Genes | |
|---|---|---|
| COG Description | # of genes (p-value) | # of genes (p-value) |
| Amino acid transport and metabolism | 12 (0.771) | 4 (1.000) |
| Carbohydrate transport and metabolism | 19 (0.000*) | 6 (0.043*) |
| Cell cycle control, cell division, chromosome partitioning | 1 (1.000) | |
| Cell motility | 1 (1.000) | 1 (0.341) |
| Cell wall/membrane/envelope biogenesis | 22 (0.000*) | 9 (0.001*) |
| Coenzyme transport and metabolism | 4 (0.820) | |
| Energy production and conversion | 5 (0.843) | 1 (0.728) |
| Inorganic ion transport and metabolism | 10 (0.149) | 3 (0.475) |
| Intracellular trafficking, secretion, and vesicular transport | 7 (0.003*) | 5 (0.000*) |
| Lipid transport and metabolism | 3 (0.627) | 1 (1.000) |
| Nucleotide transport and metabolism | 2 (0.769) | 1 (1.000) |
| Posttranslational modification, protein turnover, chaperones | 16 (0.000*) | 5 (0.025*) |
| Replication, recombination and repair | 14 (0.001*) | 3 (0.428) |
| Secondary metabolites biosynthesis, transport and catabolism | 3 (0.750) | 1 (0.601) |
| Signal transduction mechanisms | 3 (1.000) | 2 (0.357) |
| Transcription | 6 (0.704) | 2 (1.000) |
| Translation, ribosomal structure and biogenesis | 5 (0.684) | |
Note: * indicates p < 0.05.
Figure 4Networks between genes and live attenuated . Searches of "live attenuated Brucella vaccine" using (A) the PubMed Entrez system and (B) the VO-SciMiner system. Edge color represents the types of association; blue for gene-vaccine association and red for gene-gene association.
Figure 5. All Brucella vaccine terms in VO were used. This co-citation network was generated without (A) or with (B) VO hierarchy. VO terms are shown in green and Brucella gene in red. The line depth between VO vaccine terms and Brucella gene symbols represents the relative number of documents with each pair of VO vaccine terms and Brucella genes are co-cited. The terms and edges in yellow were inferred from the VO hierarchy.
Figure 6Screen shot of VO-SciMiner webpage. Two panels are illustrated in this web page with "VO_0001136: live attenuated Brucella suis vaccine" selected. The left panel shows the hierarchical structure of the ontology term 'Brucella vaccine' and its child terms in VO. Two numbers are displayed next to each of the VO terms. The first number represents the total number of search hits including the research results from all the child terms (if any) and the second number is the number of hits for the current term. Once either number is clicked, the abstract details are displayed on the right panel. The identified VO terms and Brucella genes are highlighted.