Literature DB >> 22536971

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity Database.

Paolo Pannarale1, Domenico Catalano, Giorgio De Caro, Giorgio Grillo, Pietro Leo, Graziano Pappadà, Francesco Rubino, Gaetano Scioscia, Flavio Licciulli.   

Abstract

BACKGROUND: In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database.
METHODS: The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. RESULTS AND
CONCLUSIONS: Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS.

Entities:  

Mesh:

Year:  2012        PMID: 22536971      PMCID: PMC3303717          DOI: 10.1186/1471-2105-13-S4-S4

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  19 in total

1.  The EBI SRS server-new features.

Authors:  Evgeny M Zdobnov; Rodrigo Lopez; Rolf Apweiler; Thure Etzold
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

Review 2.  Biodiversity informatics: organizing and linking information across the spectrum of life.

Authors:  Indra Neil Sarkar
Journal:  Brief Bioinform       Date:  2007-08-17       Impact factor: 11.622

3.  Searching NCBI databases using Entrez.

Authors:  Gretchen Gibney; Andreas D Baxevanis
Journal:  Curr Protoc Bioinformatics       Date:  2011-06

4.  Evolution of the Sequence Ontology terms and relationships.

Authors:  Christopher J Mungall; Colin Batchelor; Karen Eilbeck
Journal:  J Biomed Inform       Date:  2010-03-10       Impact factor: 6.317

Review 5.  Towards a data publishing framework for primary biodiversity data: challenges and potentials for the biodiversity informatics community.

Authors:  Vishwas S Chavan; Peter Ingwersen
Journal:  BMC Bioinformatics       Date:  2009-11-10       Impact factor: 3.169

6.  DDBJ progress report.

Authors:  Eli Kaminuma; Takehide Kosuge; Yuichi Kodama; Hideo Aono; Jun Mashima; Takashi Gojobori; Hideaki Sugawara; Osamu Ogasawara; Toshihisa Takagi; Kousaku Okubo; Yasukazu Nakamura
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

7.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

8.  Atlas - a data warehouse for integrative bioinformatics.

Authors:  Sohrab P Shah; Yong Huang; Tao Xu; Macaire M S Yuen; John Ling; B F Francis Ouellette
Journal:  BMC Bioinformatics       Date:  2005-02-21       Impact factor: 3.169

9.  TBMap: a taxonomic perspective on the phylogenetic database TreeBASE.

Authors:  Roderic D M Page
Journal:  BMC Bioinformatics       Date:  2007-05-18       Impact factor: 3.169

10.  How global is the global biodiversity information facility?

Authors:  Chris Yesson; Peter W Brewer; Tim Sutton; Neil Caithness; Jaspreet S Pahwa; Mikhaila Burgess; W Alec Gray; Richard J White; Andrew C Jones; Frank A Bisby; Alastair Culham
Journal:  PLoS One       Date:  2007-11-07       Impact factor: 3.240

View more
  1 in total

1.  Bioinformatics in Italy: BITS2011, the Eighth Annual Meeting of the Italian Society of Bioinformatics.

Authors:  Paolo Romano; Manuela Helmer-Citterich
Journal:  BMC Bioinformatics       Date:  2012-03-28       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.