| Literature DB >> 23300402 |
Maria Victoria Schneider1, Rafael C Jimenez.
Abstract
This article aims to introduce the nature of data integration to life scientists. Generally, the subject of data integration is not discussed outside the field of computational science and is not covered in any detail, or even neglected, when teaching/training trainees. End users (hereby defined as wet-lab trainees, clinicians, lab researchers) will mostly interact with bioinformatics resources and tools through web interfaces that mask the user from the data integration processes. However, the lack of formal training or acquaintance with even simple database concepts and terminology often results in a real obstacle to the full comprehension of the resources and tools the end users wish to access. Understanding how data integration works is fundamental to empowering trainees to see the limitations as well as the possibilities when exploring, retrieving, and analysing biological data from databases. Here we introduce a game-based learning activity for training/teaching the topic of data integration that trainers/educators can adopt and adapt for their classroom. In particular we provide an example using DAS (Distributed Annotation Systems) as a method for data integration.Entities:
Mesh:
Year: 2012 PMID: 23300402 PMCID: PMC3531283 DOI: 10.1371/journal.pcbi.1002789
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Shows popular data integration approaches, with examples, for resources that have implemented these approaches.
Some of the major successful and recognized examples of data integration.
| System and Key Reference | Description | Data Types |
| Biomart | Federated database system that provides unified access to disparate distributed data sources | Genome, gene annotation, protein sequence, protein structure, pathways, gene expression, protein identifications |
| DAS | Client-server system in which a single client integrates information from multiple servers. | Genome, gene annotation, protein sequence, protein structure, molecular interactions, gene expression, protein identifications |
| SRS | SRS is a data integration, analysis, and display tool for bioinformatics, genomic, and related data | Genome, gene annotation, protein sequence, protein structure, molecular interactions, pathways, gene expression |
| Ondex | The Ondex data integration platform enables data from diverse biological data sets to be linked, integrated, and visualised through graph analysis techniques. | Genome, gene annotation, protein sequence, protein structure, pathways, gene expression |
| Bio2RDF | Integrates diverse biological information and enables powerful queries across multiple data types using semantic data integration techniques | Genome, gene annotation, protein sequence, protein structure, pathways, chemical compounds |
| InterMine ( | InterMine is used to create databases from a single data set or integrating multiple sources of data; it is used specially for model organism database | Genome, gene annotation, protein sequence, protein structure, molecular interactions, pathways, gene expression, protein expression, metabolites |
Figure 2Schema of the four major roles in data integration in a federated system.
Figure 3Example of a genome reference source representing a database including a list of genome features.
Figure 4Example of a genome reference source representing a database including a list of nucleotide sequences.
Figure 5Example of the card for the “Registry” after genome sources have been registered.
Figure 6Example of the graphical representation someone playing the “Client” role could come up with.
Figure 7Game card used to play the role of the “User” including a selection of genome queries.
Figure 8Example of a DAS genome client (Dalliance) displaying positional annotations from several sources (e.g., Ensembl, HGNC, Agilent probesets, and Toronto CNVs) [.