| Literature DB >> 19828083 |
Marco Mesiti1, Ernesto Jiménez-Ruiz, Ismael Sanz, Rafael Berlanga-Llavori, Paolo Perlasca, Giorgio Valentini, David Manset.
Abstract
BACKGROUND: The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing.Entities:
Mesh:
Year: 2009 PMID: 19828083 PMCID: PMC2762072 DOI: 10.1186/1471-2105-10-S12-S7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
XML languages for the representation of biological data types
| Molecular entities | BSML [ | Biological sequences and sequence annotation | v.3.1/2005 | Uses DTD. Included in EMBLxml. |
| ProXML [ | Protein sequences, structures and families | v.1.0/2006 | Uses XSD. Included within HOBIT formats | |
| RNAML [ | RNA sequence, structure and experimental data | v.1.1/2002 | Uses XSD | |
| AGAVE [ | Biological sequences and sequence annotation | 2003 | XSD Included in EMBLxml | |
| Uniprot XSD [ | Representation of UniProt Records | 2004 | XSD, Successor of SP (SwissProt) ML format | |
| EMBLxml [ | Biological sequences and sequence annotation | v.1.1./2007 | Uses XSD. Currently includes BSML and AGAVE. | |
| GAME [ | Genome and Sequence | v.0.3/1999 | Uses DTD | |
| SequenceML | Sequence Information | v.2.1 2006 | Designed to replace FASTA. Belongs to HOBIT XML formats. | |
| Biological Expression | GeneXML [ | Gene expression data | - | Uses DTD |
| MAGE-ML [ | Microarray expression data | v.1.0/2006 | Uses DTD | |
| System Biology | CellML [ | Models of biochemical reaction networks | v.1.1/2006 | Uses DTD. Available conversion to BioPAX. |
| SBML [ | Models of biochemical reaction networks | Lev. 2/2007 | Uses XSD. Available conversion to BioPAX. | |
| PSI-MI [ | Protein Interactions | v.2.5/2005 | Uses XSD and OBO. Linked with OBO vocabularies. | |
| BioPAX [ | Metabolic pathways, molecular interactions | Lev. 3/2008 | Uses OWL. Linked to OBO vocabularies. | |
| CML [ | Description of Molecules and Reactions | v.2.1./2003 | Uses XSD | |
This table summarizes some of the characteristics of a subset of existing XML languages. In particular, we note the application scope, the number and year of the current version, and comments such as the kind of schema it relies on, or the interaction with other standards.
Figure 1Data integration architectures.
Summary of the integration aspects analyzed in this paper
| BioData | Sequences, Biological Expressions, Pathways, etc. |
| Instantiation | Materialized vs. Virtual integration |
| Integration | Common data storage, data access or data interface |
| Global View | Local As View, Global As View or Both As View |
| Global Model | Relational-based, Tree-based, Graph-based |
| Query Model | Ad-Hoc, SQL, XPath, XQuery, SPARQL, etc. |
| Semantics | Dictionaries, Thesauri or Domain Ontologies |
| Scalability | Low ( |
This table represents the aspects around which biological data integration approaches are compared.
Figure 2Approaches to obtaining a global view.
Figure 3Schema Matching example between BioPax and SBML formats.
Figure 4Samples of mapping expressions.
Data warehouse approaches
| BioData | Sequences | All Types | Genes | All Types | AllTypes |
| Instantiation | Materialized | ||||
| Integration | Common Storage/Access | ||||
| Global View | LAV | GAV (I) | LAV | ||
| Global Model | Relational | Graph | RDF/OWL | ||
| Query Model | SQL | SQL/AdHoc | SPARQL | ||
| Semantics | - | Thesaurus | - | - | Ontologies |
| Scalability | Low | Medium | Medium | Medium | Medium |
This table compares the datawarehouse approaches relying on the aspects introduced in Table 2.
Mediator-based Approaches
| BioData | Genes | All types | Genes | All Types | All Types |
| Instantiation | Virtual | ||||
| Integration | Common Access | ||||
| Global View | GAV (S/I) | GAV (S) | LAV | LAV | N.A. |
| Global Model | RDF/OWL | XML | RDF/OWL | XML | |
| Query Model | Boolean | CPL | XQuery | SPARQL | XQuery |
| Semantics | Ontologies | - | - | - | |
| Scalability | Medium | Low | Medium | High | High |
This table compares the mediator-based approaches relying on the aspects introduced in Table 2.
(Biomed. = Biomediator, WS = Web Services approaches, P2P = peer-to-peer approaches).
Figure 5Integration through Web Services.