Literature DB >> 22196361

Junker: an intergenic explorer for bacterial genomes.

Jayavel Sridhar¹, Radhakrishnan Sabarinathan, Shanmugam Siva Balan, Ziauddin Ahamed Rafi, Paramasamy Gunasekaran, Kanagaraj Sekar.

Abstract

In the past few decades, scientists from all over the world have taken a keen interest in novel functional units such as small regulatory RNAs, small open reading frames, pseudogenes, transposons, integrase binding attB/attP sites, repeat elements within the bacterial intergenic regions (IGRs) and in the analysis of those "junk" regions for genomic complexity. Here we have developed a web server, named Junker, to facilitate the in-depth analysis of IGRs for examining their length distribution, four-quadrant plots, GC percentage and repeat details. Upon selection of a particular bacterial genome, the physical genome map is displayed as a multiple loci with options to view any loci of interest in detail. In addition, an IGR statistics module has been created and implemented in the web server to analyze the length distribution of the IGRs and to understand the disordered grouping of IGRs across the genome by generating the four-quadrant plots. The proposed web server is freely available at the URL http://pranag.physics.iisc.ernet.in/junker/.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
DNA, Intergenic

Year: 2011 PMID： 22196361 PMCID： PMC5054447 DOI： 10.1016/S1672-0229(11)60021-1

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

The genomic era has witnessed the sequencing of over 1,400 prokaryotic genomes and this enables scientists to analyze the genome to get a clear insight into its functional aspects. Prokaryotic intergenic regions (IGRs) are a natural home to a variety of functional elements, thus the annotation of IGRs is essential for the complete understanding of bacterial physiology. In the past years, bacterial IGRs were routinely analyzed to identify structural non-coding RNAs (tRNA, rRNA and sRNA), which have multiple roles in the survival of the cell 1, 2. It was identified that IGRs carry important functional units like transposons (, integrase binding sites (, transcription factor binding sites, small open reading frames (ORFs), pseudogenes and inverted repeats (. Recently, the traces of potential coding genes were also determined in IGRs (. Thus, a few qualitative and quantitative studies were performed to identify the dynamics of bacterial IGRs. One such study on the Escherichia coli K12-MG1655 genome ( compared the cumulative length distribution of IGRs between two replicores (left and right) to identify the impact of IGR on the distribution of sRNA-encoding genes. They found that most of the sRNA genes were located in the left core, though the proportions of IGRs were equal on both segments. They also pointed out that a high number of sRNAs were residing within the IGRs of length between 300 to 900 nucleotides. On the other hand, the sum of the total non-coding DNA or IGR content was found to be associated with the increased biological complexities of the organisms (. Although a few computational methods were developed to retrieve the genes and their intergenic contexts 9, 10, no specific tool is available for the identification of the distribution pattern and statistical analysis of IGRs at a genome level. Thus, we have developed a web-based tool, named Junker, to identify the length distribution pattern of IGRs in a complete genome. The proposed server can also be used to calculate the cumulative intergenic content of the four equal segments of the genome (quadrants) or left and right replicores 2, 7. The flanking distance between the neighboring genes provides a measurement of local gene density (LGD) (, which indicates that the quadrant specific intergenic content has inverse relationship with the LGD and is positively correlated with variable segments of the genome 12, 13. The proposed web server is freely available at the URL http://pranag.physics.iisc.ernet.in/junker/.

Web Server

Implementation and utilities

The proposed web server integrates and reports information about the IGRs present in the bacterial genomes. To create a local intergenic database, all the available bacterial genomes have been downloaded from the NCBI portal (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/). Next, the corresponding IGRs from the bacterial genomes were filtered out by excluding the protein and RNA encoding regions. In view of the above, two options, one is protein annotations and the other is protein and RNA annotations, are implemented in the server. Thus, the web interface enables users to search for the distinct IGRs based on their length and location. By default, Junker searches for IGRs with minimum length of 20 nucleotides, but an option is also provided to increase the minimum length. In addition, the nucleotide sequences extracted from IGRs are subjected to various analyses. For example, the experimentally determined pseudogenes are mapped using the annotated gene information from GenBank file. The presence of functional protein coding regions or ORFs in the IGRs is predicted using the gene prediction tools GeneMark2.5 (, Glimmer3 ( and Prodigal2.50 (. In addition, the identical, tandem and inverted repeats in the IGR sequences are identified using FAIR ( and “etandem” and “einverted” programs from EMBOSS suite (. All the IGR extractions, file handling modules and search engine were designed and implemented using Perl/CGI scripts. The histograms and circular maps presented were created using GD graph module (v1.43) (http://search.cpan.org/~mverb/GDGraph-1.43/). The web server runs under Solaris (v10.0) operating system on a 64 bit Quad-core Intel Xenon 5430 processor of 2.67 GHz with 4 GB of random access memory. The web server is implemented with user-friendly options to give explicit results. Presently, the local genome database of Junker contains 1,023 bacterial genomes.

Features

Users can select their genome of interest from the list provided in the index page of the server. Additional options are provided for the users to change the minimum length of IGRs and their location.

Selection of IGR of interest

The web server enables users to select a particular region from the whole genome by using physical genome map viewer. In general, the selected region covers the interval of 200,000 base pairs and is used to list the IGRs present in the selected region. The detailed report of IGRs extracted from the selected map position contains their start and end position, adjacent flanking gene IDs with their length information, different types of repeat elements and known pseudogenes present in the IGR sequence (Figure S1). In addition, options are provided for the users to download (in FASTA format) or display the interested IGR sequence.

IGR statistics module

The IGR statistics module has two major utilities to calculate the length distribution and the four-quadrant plots. The length distribution of the IGRs in different length intervals is represented in an interactive histogram, which also enables users to get the IGR sequences in FASTA format. Similarly, the cumulative lengths of the IGRs within the four quadrants of the genome are displayed using a pie chart known as four-quadrant plots. There are four scale points used in the pie chart to represent the complete genome in four quadrants.

Application

The genome of Sodalis glossinidius str. Morsitans (NC_007712) is reported to have the least coding capacity among the prokaryotes (. Analysis of the S. glossinidius genome using the method indicated by Taft et al. ( shows that the genome has an ncDNA/tgDNA ratio of only 50.91%. This fact was confirmed in our study by comparing the S. glossinidius genome with other Gammaproteobacteria genomes (Figure S2). We analyzed the S. glossinidius genome sequence using Junker with the default options and found a total of 1,837 IGRs. Moreover, the length and positional distribution of these IGRs were analyzed using IGR statistics module (Figure S3). Figure S3A indicates that the genome contains many IGRs in different lengths with the maximum of 16 Kb. In addition, the four-quadrant plot of the genome indicates that the disordered grouping of IGRs accumulated mostly in the fourth quadrant compared to the others (Figure S3B). Furthermore, similar analysis with other genomes has shown that Orientia tsutsugamushi Boryong (NC_009488) ( and Thermocrinis albus DSM14484 (NC_013894) have the highest (51.21%) and the lowest IGR ratio (3.98%), respectively. The calculated percentages of known CDS (percentage of genome coding for proteins) and IGR (percentage of IGR in the genome) ratios for 1,023 bacterial genomes are available in the form of a table in the web server (http://pranag.physics.iisc.ernet.in/cgi-bin/junker/table.pl).

Conclusion

Junker is a web-based tool designed to efficiently access and analyze the IGRs in bacterial genomes. The selected query genome is represented in the form of a physical genome map, which facilitates the users to select a genome region of interest. In addition, the IGR sequences are checked for the presence of known pseudogenes, probable coding regions and other repetitive elements. Moreover, the length distribution of IGRs over the whole genome is shown as histograms and their disordered grouping is plotted onto a four-quadrant pie chart. It is believed that Junker will be helpful for the in-depth analysis of IGRs.

Authors’ contributions

JS conceived and coordinated the construction of the web server. JS and RS drafted the manuscript. RS and SSB developed the web interface and the scripts for prediction. ZAR and KS improved the web server and revised the manuscript. PG conceived the idea of the study and helped the revision of the manuscript. All authors read and approved the final manuscript.

18 in total

1. EMBOSS: the European Molecular Biology Open Software Suite.

Authors: P Rice; I Longden; A Bleasby
Journal: Trends Genet Date: 2000-06 Impact factor: 11.639

2. Improved microbial gene identification with GLIMMER.

Authors: A L Delcher; D Harmon; S Kasif; O White; S L Salzberg
Journal: Nucleic Acids Res Date: 1999-12-01 Impact factor: 16.971

Review 3. Insertion sequences in prokaryotic genomes.

Authors: Patricia Siguier; Jonathan Filée; Michael Chandler
Journal: Curr Opin Microbiol Date: 2006-08-28 Impact factor: 7.934

4. Massive genome erosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host.

Authors: Hidehiro Toh; Brian L Weiss; Sarah A H Perkin; Atsushi Yamashita; Kenshiro Oshima; Masahira Hattori; Serap Aksoy
Journal: Genome Res Date: 2005-12-19 Impact factor: 9.043

5. The complete genome sequence of Escherichia coli K-12.

Authors: F R Blattner; G Plunkett; C A Bloch; N T Perna; V Burland; M Riley; J Collado-Vides; J D Glasner; C K Rode; G F Mayhew; J Gregor; N W Davis; H A Kirkpatrick; M A Goeden; D J Rose; B Mau; Y Shao
Journal: Science Date: 1997-09-05 Impact factor: 47.728

6. Prodigal: prokaryotic gene recognition and translation initiation site identification.

Authors: Doug Hyatt; Gwo-Liang Chen; Philip F Locascio; Miriam L Land; Frank W Larimer; Loren J Hauser
Journal: BMC Bioinformatics Date: 2010-03-08 Impact factor: 3.169

7. The Orientia tsutsugamushi genome reveals massive proliferation of conjugative type IV secretion system and host-cell interaction genes.

Authors: Nam-Hyuk Cho; Hang-Rae Kim; Jung-Hee Lee; Se-Yoon Kim; Jaejong Kim; Sunho Cha; Sang-Yoon Kim; Alistair C Darby; Hans-Henrik Fuxelius; Jun Yin; Ju Han Kim; Jihun Kim; Sang Joo Lee; Young-Sang Koh; Won-Jong Jang; Kyung-Hee Park; Siv G E Andersson; Myung-Sik Choi; Ik-Sang Kim
Journal: Proc Natl Acad Sci U S A Date: 2007-05-02 Impact factor: 11.205

8. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans.

Authors: Brian J Haas; Sophien Kamoun; Michael C Zody; Rays H Y Jiang; Robert E Handsaker; Liliana M Cano; Manfred Grabherr; Chinnappa D Kodira; Sylvain Raffaele; Trudy Torto-Alalibo; Tolga O Bozkurt; Audrey M V Ah-Fong; Lucia Alvarado; Vicky L Anderson; Miles R Armstrong; Anna Avrova; Laura Baxter; Jim Beynon; Petra C Boevink; Stephanie R Bollmann; Jorunn I B Bos; Vincent Bulone; Guohong Cai; Cahid Cakir; James C Carrington; Megan Chawner; Lucio Conti; Stefano Costanzo; Richard Ewan; Noah Fahlgren; Michael A Fischbach; Johanna Fugelstad; Eleanor M Gilroy; Sante Gnerre; Pamela J Green; Laura J Grenville-Briggs; John Griffith; Niklaus J Grünwald; Karolyn Horn; Neil R Horner; Chia-Hui Hu; Edgar Huitema; Dong-Hoon Jeong; Alexandra M E Jones; Jonathan D G Jones; Richard W Jones; Elinor K Karlsson; Sridhara G Kunjeti; Kurt Lamour; Zhenyu Liu; Lijun Ma; Daniel Maclean; Marcus C Chibucos; Hayes McDonald; Jessica McWalters; Harold J G Meijer; William Morgan; Paul F Morris; Carol A Munro; Keith O'Neill; Manuel Ospina-Giraldo; Andrés Pinzón; Leighton Pritchard; Bernard Ramsahoye; Qinghu Ren; Silvia Restrepo; Sourav Roy; Ari Sadanandom; Alon Savidor; Sebastian Schornack; David C Schwartz; Ulrike D Schumann; Ben Schwessinger; Lauren Seyer; Ted Sharpe; Cristina Silvar; Jing Song; David J Studholme; Sean Sykes; Marco Thines; Peter J I van de Vondervoort; Vipaporn Phuntumart; Stephan Wawra; Rob Weide; Joe Win; Carolyn Young; Shiguo Zhou; William Fry; Blake C Meyers; Pieter van West; Jean Ristaino; Francine Govers; Paul R J Birch; Stephen C Whisson; Howard S Judelson; Chad Nusbaum
Journal: Nature Date: 2009-09-09 Impact factor: 49.962

9. A survey of small RNA-encoding genes in Escherichia coli.

Authors: Ruth Hershberg; Shoshy Altuvia; Hanah Margalit
Journal: Nucleic Acids Res Date: 2003-04-01 Impact factor: 16.971

10. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

Authors: Hélène Chiapello; Annie Gendrault; Christophe Caron; Jérome Blum; Marie-Agnès Petit; Meriem El Karoui
Journal: BMC Bioinformatics Date: 2008-11-27 Impact factor: 3.169

5 in total

1. Small transcriptome analysis indicates that the enzyme RppH influences both the quality and quantity of sRNAs in Neisseria gonorrhoeae.

Authors: Jenny Wachter; Stuart A Hill
Journal: FEMS Microbiol Lett Date: 2014-12-20 Impact factor: 2.742