| Literature DB >> 23846595 |
Roberto Vera1, Yasset Perez-Riverol, Sonia Perez, Balázs Ligeti, Attila Kertész-Farkas, Sándor Pongor.
Abstract
The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23846595 PMCID: PMC3708619 DOI: 10.1093/database/bat051
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.The JBioWH architecture.
Data sources included in JBioWH
| Data Type | Data Source | URL | Data Format |
|---|---|---|---|
| Taxonomy | NCBI Taxonomy | Delim. Text | |
| Ontology | GO | OBO XML | |
| Gene | Gene | Delim. Text | |
| Gene | KEGG Gene | Text | |
| Gene | GeneBank | Text | |
| Gene | RefSeq | Text | |
| Chromosome | Genomes | Delim. Text | |
| Protein | UniProt | XML | |
| Enzyme | KEGG Enzyme | Text | |
| PPI | IntAct | PSI 25 XML | |
| PPI | MINT | PSI 25 XML | |
| PPI | DIP | PSI 25 XML | |
| PPI | BioGrid | PSI 25 XML | |
| Prot. Cluster | UniRef | XML | |
| Drug | DrugBank | XML | |
| Drug | KEGG Comp. | Text | |
| Pathway | KEGG Pathway | Text | |
| Reaction | KEGG Reaction | Text | |
| Disease | OMIM | Text | |
| Prot. Domain | PFAM | SQL |
The databases were accessed in October 2012.
Figure 2.The JBioWH relational schema with the main tables and their relationships.
Figure 3.The structure of the Java API packages. The JAVA API contains (i) Core classes that define the data modules, (ii) the Desktop Client for non-programmer users and (iii) Tool package with command line programs and examples.
Figure 4.The content of the modules in the JAVA API. Each module defines (i) the JPA to manage the relational scheme in Java, (ii) Parser and Loader functions to load data from database sources to the JBioWH and (iii) Search classes to execute queries.
Figure 5.A screenshot of JBioWH Desktop Client. The left panel shows the relational schemes opened. The top right panel shows the list of the database inserted in the relational scheme, while on the bottom left panel one can see the tables in the selected database.
Two simple examples and their solutions using SQL language and the Java API code
| Task | SQL solution | Java solution |
|---|---|---|
| Retrieve the protein sequence for the protein Q8DR59 from UniProt. | SearchProtein sProt = new SearchProtein(); for( System.out.println(p.getSeq()); | |
| Retrieve the protein sequence of all human proteins | SearchTaxonomy sTax = new SearchTaxonomy(); SearchProtein sProt = new SearchProtein(); List c = new ArrayList(); List o = new ArrayList(); c.add(taxs); o.add(“ for( System.out.println(p.getSeq()); |
Figure 6.The solution of the second task in Table 2 using the JBioWH Desktop Client. The step-by-step guide on how to get this answer can be seen on the project Web site.
This table shows the use of the TaxonomyGraph class to create the hierarchical structure of a Taxonomy family
| Family | Tax Id | Graph | Time (s) | |
|---|---|---|---|---|
| Vertex | Edges | |||
| Bacteria | 2 | 283 371 | 283 370 | 121 |
| 32 008 | 3790 | 3789 | 4 | |
| 1313 | 303 | 302 | 3 | |
This table shows the genes encoding for drug’s target protein that are in the same chromosome at a distance less than a specific number of pair bases
| Family | Genes | Found gene ID | Specie | Time(s) |
|---|---|---|---|---|
| 41 576 | 930805-930802 | 112 | ||
| 224 568 | 4010698-4010703 | 249 | ||
| 4010703-4010704 |