Literature DB >> 25097383

IntergenicDB: a database for intergenic sequences.

Daniel Luis Notari¹, Aurione Molin², Vanessa Davanzo², Douglas Picolotto², Helena Graziottin Ribeiro², Scheila de Avila E Silva³.

Abstract

UNLABELLED: A whole genome contains not only coding regions, but also non-coding regions. These are located between the end of a given coding region and the beginning of the following coding region. For this reason, the information about gene regulation process underlies in intergenic regions. There is no easy way to obtain intergenic regions from current available databases. IntergenicDB was developed to integrate data of intergenic regions and their gene related information from NCBI databases. The main goal of INTERGENICDB is to offer friendly database for intergenic sequences of bacterial genomes. AVAILABILITY: http://intergenicdb.bioinfoucs.com/

Keywords: database; gene regulatory elements; intergenic sequences

Year: 2014 PMID： 25097383 PMCID： PMC4110431 DOI： 10.6026/97320630010381

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

The location of coding regions as well as its regulatory sequences is an important task in theoretical analysis of a given genome [1]. Gene regulation process is essential for understanding cellular responses to environmental perturbations and virulence mechanisms of pathogens [2, 3]. The information about gene regulation process underlies in sequences as promoters, transcription factors binding sites and terminators, which are located in non-coding regions [1]. In a simplest way, a non-coding region (in bacteria) can be named as intergenic region. That means, this region comprises a part of genome located between the last nucleotide of a coding region and the first nucleotide of a subsequent coding region. The large-scale genome sequencing methods and high throughput technologies increase the genomic within the last years [4]. NCBI is considerate one of the largest and complete biological database. Apart from it, there are specific databases dedicated to store elements involved in prokaryotic gene regulation process as, RegulonDB [5], EcoGene [6], among others. The databases available have been provided input data for both motif discover [7] and predict genomic elements [2, 8, 9, 10] approaches. However, none of these databases provide an easy-to-use way to download only intergenic regions within their biological information associated. For instance, the download of intergenic regions from NCBI is carried out by using complex queries or by developing an own computer program. The computational background is not the same for all researchers who need to run bioinformatic analysis. Hence, a specific intergenic database remains as an important lacuna. In this context, IntergenicDB (publicly database) was developed for studying intergenic sequences. This database contains a myriad of intergenic regions from 20 bacteria genomes, as well as the information of coding regions related with it.

Methodology

IntergenicDB was developed as a free, structured and searchable source of intergenic sequences.

Dataset:

The intergenic regions of microbial genomes were downloaded from NCBI ftp://ftp.ncbi.nih.gov/genomes/Bacteria/.

Database design:

The data was organized by using MySQL, a relational database management system that serves as the backend for storing data.

Development:

IntergenicDB were developed using ASP.NET MVC (Model- View-Controller) website architecture, C# as programming language and IIS (Internet Information Services) as web server for hosting IntergenicDB portal. The global updates to the database take place every three months, but punctual changes are managed when necessary. These updates are intermediated by an administration user.

Database interface:

IntergenicDB general user interface is well organized and managed at the following levels: IntergenicDB has public, common and administrator user types. The administration area is restricted to the manager of the database. An overview of intergenicDB functionalities is provided in Figure 1.

Figure 1

IntergenicDB scientific workflow.

the initial page presents the aims of the database; the search page allows choosing the necessary information for intergenic sequences queries, the pages with updates and publications related with intergenic sequences and others pages must be accessed using login and password.

Utility to the Biological Community

IntergenicDB allows to users an easy way to access the information related to intergenic sequences. This forehand version of IntergenicDB supports internalization encoding to Portuguese and English languages, and it provides the search engines as follows: Addittionaly, the user can execute queries with crossinformation of the search options described above. To carry out a search, it is not necessary to complete all the fields. If one or some of they are unfilled, the returned result shows all the available data for this/these particular field/fields. all intergenic regions belonging a particular bacteria specie or family; an intergenic region upstream a given gene identified by its name or symbol; all intergenic regions upstream genes with specific GC nucleotide content or a given main role; all intergenic regions with determined nucleotide length; one or more intergenic regions in a specific range position at genome; all intergenic regions located in the forward or reverse DNA strand. By doing a user registration, the results of the queries can be downloaded in the .txt, .xml or .csv file format. The user registration is free of charge. It is required just for statistical and database usage registration purposes. Besides the search, common users can upload intergenic sequences under supervision of database administrator. In addition to this operation, the administrator can handle activities related to user registration and management as well as database population and upgrade.

Future Developments

As future implementations, we are committed in the improvement of the search and download areas. In this case, the user will have the option to refine even more the result provided. Another aim is the integration of the database with on line available tools which require a specific input format of intergenic sequences.

10 in total

1. Study of DNA binding sites using the Rényi parametric entropy measure.

Authors: A Krishnamachari; Vijnan moy Mandal
Journal: J Theor Biol Date: 2004-04-07 Impact factor: 2.691

2. Novel sequence-based method for identifying transcription factor binding sites in prokaryotic genomes.

Authors: Gurmukh Sahota; Gary D Stormo
Journal: Bioinformatics Date: 2010-08-31 Impact factor: 6.937

3. Structure and evolution of gene regulatory networks in microbial genomes.

Authors: Sarath Chandra Janga; J Collado-Vides
Journal: Res Microbiol Date: 2007-10-15 Impact factor: 3.992

4. N4: a precise and highly sensitive promoter predictor using neural network fed by nearest neighbors.

Authors: Amjad Askary; Ali Masoudi-Nejad; Roozbeh Sharafi; Amir Mizbani; Sobhan Naderi Parizi; Malihe Purmasjedi
Journal: Genes Genet Syst Date: 2009-12 Impact factor: 1.517

5. BacPP: bacterial promoter prediction--a tool for accurate sigma-factor specific assignment in enterobacteria.

Authors: Scheila de Avila E Silva; Sergio Echeverrigaray; Günther J L Gerhardt
Journal: J Theor Biol Date: 2011-08-03 Impact factor: 2.691

6. Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability.

Authors: Vetriselvi Rangannan; Manju Bansal
Journal: J Biosci Date: 2007-08 Impact factor: 1.826

7. RNIE: genome-wide prediction of bacterial intrinsic terminators.

Authors: Paul P Gardner; Lars Barquist; Alex Bateman; Eric P Nawrocki; Zasha Weinberg
Journal: Nucleic Acids Res Date: 2011-04-07 Impact factor: 16.971

8. Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress.

Authors: Huiquan Wang; Craig J Benham
Journal: BMC Bioinformatics Date: 2006-05-05 Impact factor: 3.169

9. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more.

Authors: Heladia Salgado; Martin Peralta-Gil; Socorro Gama-Castro; Alberto Santos-Zavaleta; Luis Muñiz-Rascado; Jair S García-Sotelo; Verena Weiss; Hilda Solano-Lira; Irma Martínez-Flores; Alejandra Medina-Rivera; Gerardo Salgado-Osorio; Shirley Alquicira-Hernández; Kevin Alquicira-Hernández; Alejandra López-Fuentes; Liliana Porrón-Sotelo; Araceli M Huerta; César Bonavides-Martínez; Yalbi I Balderas-Martínez; Lucia Pannier; Maricela Olvera; Aurora Labastida; Verónica Jiménez-Jacinto; Leticia Vega-Alvarado; Victor Del Moral-Chávez; Alfredo Hernández-Alvarez; Enrique Morett; Julio Collado-Vides
Journal: Nucleic Acids Res Date: 2012-11-29 Impact factor: 16.971

10. EcoGene 3.0.

Authors: Jindan Zhou; Kenneth E Rudd
Journal: Nucleic Acids Res Date: 2012-11-28 Impact factor: 16.971

10 in total