Literature DB >> 19605418

DiProGB: the dinucleotide properties genome browser.

Maik Friedel1, Swetlana Nikolajewa, Jürgen Sühnel, Thomas Wilhelm.   

Abstract

MOTIVATION: DiProGB is an easy to use new genome browser that encodes the primary nucleotide sequence by thermodynamical and geometrical dinucleotide properties. The nucleotide sequence is thus converted into a sequence graph. This visualization, supported by different graph manipulation options, facilitates genome analyses, because the human brain can process visual information better than textual information. Also, DiProGB can identify genomic regions where certain physical properties are more conserved than the nucleotide sequence itself. Most of the DiProGB tools can be applied to both, the primary nucleotide sequence and the sequence graph. They include motif and repeat searches as well as statistical analyses. DiProGB adds a new dimension to the common genome analysis approaches by taking into account the physical properties of DNA and RNA.
AVAILABILITY AND IMPLEMENTATION: Source code and binaries are freely available for download at http://diprogb.fli-leibniz.de, implemented in C++ and supported on MS Windows and Linux (using e.g. WineHQ).

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19605418      PMCID: PMC2752610          DOI: 10.1093/bioinformatics/btp436

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


Computational genome analysis aims at understanding the information encoded in the genomes. So far this is almost exclusively done by analyzing the character string of the primary sequence (Cline and Kent, 2009). However, the conservation of certain physical properties can be more important than conservation of the nucleotide sequence itself, especially for non-coding DNA (Babbitt and Kim, 2008). It has been shown that sequence-constraint algorithms often fail to identify non-coding functional elements because these methods neglect the 3D structure of DNA. By incorporating hydroxyl radical cleavage patterns which interrogate the solvent accessible surface area of DNA, it was recently found that 12% of the bases in the human genome are evolutionary constrained (Parker et al., 2009). This is twice the fraction detected by common sequence-based algorithms. We have developed the new genome browser DiProGB, which considers physico-chemical properties of nucleotide sequences. It helps to detect functionally relevant motifs that cannot be found by analyzing the primary nucleotide sequence alone. More specifically, DiProGB encodes a DNA or RNA sequence by thermodynamical or geometrical dinucleotide properties. In addition, character-based sequence information such as GC or purine content is also encoded on the dinucleotide level. Plotting the dinucleotide property values versus the sequence position leads to a graphical representation of the sequence that we call a sequence graph (Fig. 1). The rationale for exploiting dinucleotide properties is the widely accepted nearest neighbor model saying that thermodynamic properties of nucleic acids can be understood and predicted by considering dinucleotide contributions (Turner, 1996; Zimm and Bragg, 1959). This is also the basis for RNA secondary structure predictions (Yoon et al., 1975). By default, DiProGB offers 10 dinucleotide property sets. Further properties can be downloaded from the dinucleotide property database DiProDB (http://diprodb.fli-leibniz.de) (Friedel et al., 2009) containing > 100 of such sets. A corresponding editor enables also manual input of new properties.
Fig. 1.

The full genome of the chloroplast of Euglena gracilis is shown in the main window of DiProGB. The sequence graph in the middle is encoded by the physicochemical dinucleotide property stacking energy (Pérez et al., 2004) and smoothened applying a shifting window of size 800 nt. The color coding of annotated features on both strands is explained in the color list (right window). Note the significantly different graph shape for the rRNA genes in green. The bottom panel shows the feature graph.

The full genome of the chloroplast of Euglena gracilis is shown in the main window of DiProGB. The sequence graph in the middle is encoded by the physicochemical dinucleotide property stacking energy (Pérez et al., 2004) and smoothened applying a shifting window of size 800 nt. The color coding of annotated features on both strands is explained in the color list (right window). Note the significantly different graph shape for the rRNA genes in green. The bottom panel shows the feature graph. DiProGB allows loading sequence and annotated feature information as GenBank or FASTA files and from corresponding feature files (e.g. GFF or PTT) (Leonard et al., 2007). It is also possible to open multiple FASTA files or raw sequence data, and there is an option for downloading sequence information from the NCBI web site http://www.ncbi.nlm.nih.gov/. All annotated features such as genes, exons, introns or repeat regions and the corresponding qualifiers such as gene name, product and function can be separately addressed and specifically colored. All or parts of the annotated information can be displayed for either a single strand or for both strands together. Overlapping features are visualized by stacked bars in the so-called feature graph below the sequence graph (Fig. 1). The sequence graph can be smoothened by a shifting window technique. Using the mouse wheel, the shifting window size as well as the graph amplitude and the zoom status can be changed in real time. DiProGB offers four lists for information handling: (i) the ‘Sequence list’ allows switching between pre-selected sequences; (ii) the ‘DiPro list’ contains the loaded dinucleotide property sets; (iii) the ‘Feature list’ allows searching for features and qualifiers and highlighting them in the sequence graph; and (iv) the ‘Color list’ displays all colors used for indicating the annotated information. DiProGB combines sequence analyses based on both the primary sequence character string and the sequence graph representation. The latter often allows pattern recognition by visual inspection (e.g. identification of large repeats or of certain nucleotide distributions). In addition, DiProGB offers tools for a systematic motif and repeat search for both, the character string and the sequence graph. The motif search algorithms are based on Gusfield's Z-boxes (Gusfield, 1997). The repeat finder searches for maximal and supermaximal repeats. It is based on suffix arrays which is one of the most efficient methods for repeat search (Abouelhoda et al., 2004). DiProGB also offers a fast Fourier transformation for the identification of sequence periodicities. DiProGB provides two types of statistical analyses. First, the user can calculate mean values for either a partial or the complete sequence. This allows, for example, comparing mean values of physical properties (e.g. entropy) for a given feature (e.g. gene) with the corresponding mean value of the whole genome. Second, there is a so-called position-specific statistics. Here, selected sub-sequences, for example coding sequences, are aligned relative to a specific sequence position (e.g. 100 nt upstream of start) and mean values are calculated for each position in the alignment. The position-specific statistics is a powerful tool for detecting common motifs in annotated features. DiProGB is a standalone computer program written in VC++. It has been optimized to cope with large genomes. The program has been developed under the Microsoft Windows operating system. It can, however, also be used under Linux, Mac, BSD and Solaris after installing the program WineHQ (http://winehq.org), for example. A more detailed description of DiProGB is available at http://diprogb.fli-leibniz.de . In summary, DiProGB is a new genome browser for enhanced genome analysis. Its application will lead to deeper insight into organization and functioning of the genome.
  8 in total

1.  The relative flexibility of B-DNA and A-RNA duplexes: database analysis.

Authors:  Alberto Pérez; Agnes Noy; Filip Lankas; F Javier Luque; Modesto Orozco
Journal:  Nucleic Acids Res       Date:  2004-11-23       Impact factor: 16.971

2.  Understanding genome browsing.

Authors:  Melissa S Cline; W James Kent
Journal:  Nat Biotechnol       Date:  2009-02       Impact factor: 54.908

Review 3.  Common file formats.

Authors:  Shonda A Leonard; Timothy G Littlejohn; Andreas D Baxevanis
Journal:  Curr Protoc Bioinformatics       Date:  2007-01

Review 4.  Thermodynamics of base pairing.

Authors:  D H Turner
Journal:  Curr Opin Struct Biol       Date:  1996-06       Impact factor: 6.809

5.  The kinetics of codon-anticodon interaction in yeast phenylalanine transfer RNA.

Authors:  K Yoon; D H Turner; I Tinoco
Journal:  J Mol Biol       Date:  1975-12-25       Impact factor: 5.469

6.  DiProDB: a database for dinucleotide properties.

Authors:  Maik Friedel; Swetlana Nikolajewa; Jürgen Sühnel; Thomas Wilhelm
Journal:  Nucleic Acids Res       Date:  2008-09-19       Impact factor: 16.971

7.  Local DNA topography correlates with functional noncoding regions of the human genome.

Authors:  Stephen C J Parker; Loren Hansen; Hatice Ozel Abaan; Thomas D Tullius; Elliott H Margulies
Journal:  Science       Date:  2009-03-12       Impact factor: 47.728

8.  Inferring natural selection on fine-scale chromatin organization in yeast.

Authors:  G A Babbitt; Y Kim
Journal:  Mol Biol Evol       Date:  2008-05-29       Impact factor: 16.240

  8 in total
  13 in total

1.  DASS-GUI: a user interface for identification and analysis of significant patterns in non-sequential data.

Authors:  Jens Hollunder; Maik Friedel; Martin Kuiper; Thomas Wilhelm
Journal:  Bioinformatics       Date:  2010-02-19       Impact factor: 6.937

2.  An alternative beads-on-a-string chromatin architecture in Thermococcus kodakarensis.

Authors:  Hugo Maruyama; Janet C Harwood; Karen M Moore; Konrad Paszkiewicz; Samuel C Durley; Hisanori Fukushima; Haruyuki Atomi; Kunio Takeyasu; Nicholas A Kent
Journal:  EMBO Rep       Date:  2013-07-09       Impact factor: 8.807

3.  The genomic signature of human rhinoviruses A, B and C.

Authors:  Spyridon Megremis; Philippos Demetriou; Heidi Makrinioti; Alkistis E Manoussaki; Nikolaos G Papadopoulos
Journal:  PLoS One       Date:  2012-09-13       Impact factor: 3.240

4.  Genome-scale computational analysis of DNA curvature and repeats in Arabidopsis and rice uncovers plant-specific genomic properties.

Authors:  Ali Masoudi-Nejad; Sara Movahedi; Ruy Jáuregui
Journal:  BMC Genomics       Date:  2011-05-06       Impact factor: 3.969

5.  CompaGB: An open framework for genome browsers comparison.

Authors:  Thomas Lacroix; Valentin Loux; Annie Gendrault; Jean-François Gibrat; Hélène Chiapello
Journal:  BMC Res Notes       Date:  2011-05-04

6.  Gene expression divergence is coupled to evolution of DNA structure in coding regions.

Authors:  Zhiming Dai; Xianhua Dai
Journal:  PLoS Comput Biol       Date:  2011-11-17       Impact factor: 4.475

7.  Characterization of chromosomal translocation breakpoint sequences in solid tumours: "an in silico analysis".

Authors:  Aditi Daga; Afzal Ansari; Rakesh Rawal; Valentina Umrania
Journal:  Open Med Inform J       Date:  2015-04-30

8.  Biological database of images and genomes: tools for community annotations linking image and genomic information.

Authors:  Andrew T Oberlin; Dominika A Jurkovic; Mitchell F Balish; Iddo Friedberg
Journal:  Database (Oxford)       Date:  2013-04-02       Impact factor: 3.451

9.  DNA structural properties in the classification of genomic transcription regulation elements.

Authors:  Pieter Meysman; Kathleen Marchal; Kristof Engelen
Journal:  Bioinform Biol Insights       Date:  2012-07-02

10.  AgeFactDB--the JenAge Ageing Factor Database--towards data integration in ageing research.

Authors:  Rolf Hühne; Torsten Thalheim; Jürgen Sühnel
Journal:  Nucleic Acids Res       Date:  2013-11-11       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.