Literature DB >> 27153821

DensityMap: a genome viewer for illustrating the densities of features.

Sébastien Guizard1, Benoît Piégu1, Yves Bigot2.   

Abstract

BACKGROUND: Several tools are available for visualizing genomic data. Some, such as Gbrowse and Jbrowse, are very efficient for small genomic regions, but they are not suitable for entire genomes. Others, like Phenogram and CViT, can be used to visualise whole genomes, but are not designed to display very dense genomic features (eg: interspersed repeats). We have therefore developed DensityMap, a lightweight Perl program that can display the densities of several features (genes, ncRNA, cpg, etc.) along chromosomes on the scale of the whole genome. A critical advantage of DensityMap is that it uses GFF annotation files directly to compute the densities of features without needing additional information from the user. The resulting picture is readily configurable, and the colour scales used can be customized for a best fit to the data plotted.
RESULTS: DensityMap runs on Linux architecture with few requirements so that users can easily and quickly visualize the distributions and densities of genomic features for an entire genome. The input is GFF3-formated data representing chromosomes (linkage groups or pseudomolecules) and sets of features which are used to calculate representations in density maps. In practise, DensityMap uses a tilling window to compute the density of one or more features and the number of bases covered by these features along chromosomes. The densities are represented by colour scales that can be customized to highlight critical points. DensityMap can compare the distributions of features; it calculates several chromosomal density maps in a single image, each of which describes a different genomic feature. It can also use the genome nucleotide sequence to compute and plot a density map of the GC content along chromosomes.
CONCLUSIONS: DensityMap is a compact, easily-used tool for displaying the distribution and density of all types of genomic features within a genome. It is flexible enough to visualize the densities of several types of features in a single representation. The images produced are readily configurable and their SVG format ensures that they can be edited.

Entities:  

Keywords:  Annotation; GFF; Genome; Visualization

Mesh:

Substances:

Year:  2016        PMID: 27153821      PMCID: PMC4858867          DOI: 10.1186/s12859-016-1055-0

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

Visualizing the ever-increasing amounts of DNA sequence data for genomic purposes is becoming a great challenge [1]. One solution is to develop genome browsers. The first, and probably the most popular, was the UCSC Genome Browser, which was released in 2002 and used to display human genomic data [2]. Several others, including Gbrowse, JBrowse, Abrowse and Annot-J [3], are now available. They are ergonomically more efficient than the original and include new functions, such as collaborative annotation with web Appollo [4]. These browsers are useful for displaying discrete chromosome regions but are not suitable for visualizing whole chromosomes. Other tools have been developed for visualizing whole chromosomes. One of the most widely used is Circos [5, 6], which represents chromosomes by arranging them on a circle. It can also be used to plot annotations, quantitative data and relationships between parts of different chromosomes or genomes [7]. However, Circos representations become dense as their complexity increases, which alters the efficacy of their visualization. Two new programs designed to simplify visualization of whole chromosome sequences were released recently. PhenoGram [8] represents chromosomes and uses ideograms, lines, and different coloured symbols to locate information like phenotypes, genes, CNVs, SNPs, etc. While the PhenoGram web-interface is user-friendly, it requires the input files to be in a specific tabulated format rather than a standard format like Generic feature format (GFF), the most common format for annotation files. It also cannot display the density of a specific feature at a given position in a chromosome. CviT (ChromosomeVisualization Tool) [9] circumvents these limitations. It can represent chromosome contents from a GFF file, is readily configurable and the output image can be customized. CViT can also plot the densities of some features along chromosomes using histograms placed beside the chromosome representation. This tool produces reliable images when the features are not too dense but becomes limited when the density of a feature like interspersed repeats or DNA motifs is high. CViT must also use a GFF file that contains the density of a feature for a given set of windows along a chromosome. As Cvit is not designed to compute these densities, the GFF file must be revised each time the window width is changed. We have therefore developed a program, DensityMap.pl, inspired by CviT, which can produce maps that include the densities of one or more types of features while displaying the whole genome in a chromosome.

Implementation

DensityMap is run with Perl script in the command line and uses the GD::SVG Perl package to produce SVG pictures. DensityMap computes a representation of the density of a feature on chromosomes using one GFF file (GFF2, GFF2.5 or GFF3) describing a chromosome as input. The program plots as many density maps along a chromosome as there are features specified. It can plot a density map for the plus strand, minus strand, or the plus and minus strands, combinations of plus and minus strands, or plus, minus and compiled strands, for each feature. Density is computed using a tilling window without overlap whose length is fixed by the user or automatically computed to produce an output image that fits the maximum image size. All this information can be set by the user in the command line. DensityMap also automatically calculates the density of a feature for each pixelized region of a chromosome, whatever the representation scale used. The way the density of a feature varies along a chromosome is represented using a colour scale from 0 to 100 %. A single colour scale can be used for all features investigated or each feature can have is own colour scale. Like CViT, DensityMap.pl produces visualizations that are fully configurable in a Scalable Vector Graphics (SVG) format. This makes it easy to edit high quality images for publication. The program also includes graphical options for configuring almost all elements (margins, map width, scale, etc.) of the image. The options are shown in Table 1.
Table 1

DensityMap options

ShortLongTypeDescription
Mandatory options
-i--inputstringGFF file name
-re--region_filestringA BED file describing sequence regions to plot.It allow to plot specific regions and not the whole seq.Example of file content:2L[TAB]100000[TAB]2000002R[TAB]300000[TAB]450000
-o--output_img_namestringoutput image name
-ty--type_to_drawstringType (column 3 of GFF) to draw, strand(s) to plot and colour scale to use Type: Match, gene, CDS, etc. Strand: - - > strand – (1 Density Map (or DM))+ − > strand + (1 DM)both - > strand - and strand + (2 DM)fused - > Combination of strand - and strand + (1 DM)all - > strand - and strand + and fused (3 DM) Format: “Type1 = Strand = colour_scale” i.e.: “match = all = 7;gene = both = 4;CDS = fused = 10”
Generic options
-for--forcenoneAutomatically answers yes to picture size validation
-v--verbosenoneActivate verbose
-h--helpnonePrint help
Density options
-c--colour_scaleintegerNumber of the colour scale to use
-sc--scale_factorintegerWindow length (in base pairs) to use
-a--auto_scale_factorintegerMaximum picture height in pixels
-ro--rounding_methodstringRounding densities with floor or ceiling
-gc--gcintegerColour scale number for density map of the GC % of chromosome,Requires the presence of the sequence in ##FASTA section of the GFF file
Graphical options
-ti--titlestringPicture title
-w--win_sizeintegerPicture height in pixels
-sh--show_scaleintegerDraw scale, with the integer indicating the maximum number of ticks to print on the scale
-str_w--strand_widthintegerStrand width in pixels
-str_s--strand_spaceintegerSpace between strands in pixels
-sp--space_chrIntegerSpace between chromosomes
-lm--lmarginintegerLeft margin in pixels
-rm--rmarginintegerRight margin in pixels
-tm--tmarginintegerTop margin in pixels
-bm--bmarginintegerBottom margin in pixels
-ba--backgroundintegerPicture background colour
-la--label_strand_rotationintegerRotation (in degrees) of strand label
-ft_f--ft_familystringText font
-ft_s--ft_sizeintegerFont size
DensityMap options The program computes the size of the output image according to the number of chromosomes (GFF files), the number of features to represent, the number of strands to plot and the window size. If the user chooses automatic scale computing, the program calculates a windows size that gives an image that lies within the maximum image size defined by the user. The program asks the user to check the output picture size before processing the data. It then builds the image by adding the various graphical elements (background, title, scale) and processes the data for plotting the chromosome strands. It sequentially opens GFF files, filter features (GFF file third column) selected by the user with the option -ty (types). The intervals are collected and sorted by their beginnings and merged to remove overlaps. Lastly, the program computes the densities - (number of bases covered by the feature /window size) x 100 - and then draws it within the image. A synopsis of the main algorithm and functions is supplied in Additional file 1 and a manual in Additional file 2. Even if the main purpose of DensityMap is to plot whole genome data, it can be interesting to compare specific loci of several sequences. This can be done using the --region_file option. The user has to provide a BED file - a tabular formatted file compound of three column where the first column design the sequence, the second the region start position and the third region end position - describing the region of interest on each sequence. In addition to the density map, the program produce a CSV file - a tabular formatted file - that contain the densities computed for all features, windows and sequences.

Results

We have used DensitMap to examine two examples based on data on the genome of Drosophila melanogaster (available at http://flybase.org). The first (Fig. 1) illustrates the capacity of DensityMap to represent features that occur very frequently in a genome. This study is of the genes, exons, regions coding ncRNAs and the GC content of D. melanogaster chromosomes. The image produced shows that genes cover very large regions of the chromosomes, are absent from the centromeres and less frequent on the Y chromosome. As expected, the distribution of exons agrees with that of the genes. The representation of the GC content shows that the centromeres are GC-poor while the regions covered by genes are GC-enriched. The terminal regions are different of the rest of the X chromosome in that they are very GC-rich. The image also shows that ncRNAs are evenly distributed throughout the chromosomes, except for the centromeres and chromosome Y and a few regions where the ncRNA density is over 10 %.
Fig. 1

Density map of genes, exons, ncRNA and GC % in chromosomes of D. melanogater. The command line was: DensityMap.pl -i dmel.gff3 -o egn -ty ‘gene = fused;exon = fused;ncRNA = fused = 10’ -gc 12 -sc 40000 -ba white -str_s 15 -str_w 25 -sp 35 -sh 50 -title “Density Map of Gene, Exon, ncRNA and GC%” -la −15 -ro ceil. The arms of chromosomes 2 and 3 are split into two annotation files 2 L, 2R and 3 L, 3R. Four density map were drawn for each chromosome, one each for genes, exons, ncRNA and GC%. Tilling windows were 40,000 bp long. Densities are represented with three colour scales. That for genes and exons is blue - red with colour tone changes for each 10 % density change. The second, for ncRNA, shows 0 % density as grey, densities of 1 to 9 % are represented by a blue—red colour gradient, and densities of 10 %l or greater by dark red. The third colour scale, for GC content, shows densities below 30 % in grey, 30–49 % as a green—red colour gradient, and densities of 50 % and above in dark red. The scale is in Mbp

Density map of genes, exons, ncRNA and GC % in chromosomes of D. melanogater. The command line was: DensityMap.pl -i dmel.gff3 -o egn -ty ‘gene = fused;exon = fused;ncRNA = fused = 10’ -gc 12 -sc 40000 -ba white -str_s 15 -str_w 25 -sp 35 -sh 50 -title “Density Map of Gene, Exon, ncRNA and GC%” -la −15 -ro ceil. The arms of chromosomes 2 and 3 are split into two annotation files 2 L, 2R and 3 L, 3R. Four density map were drawn for each chromosome, one each for genes, exons, ncRNA and GC%. Tilling windows were 40,000 bp long. Densities are represented with three colour scales. That for genes and exons is blue - red with colour tone changes for each 10 % density change. The second, for ncRNA, shows 0 % density as grey, densities of 1 to 9 % are represented by a blue—red colour gradient, and densities of 10 %l or greater by dark red. The third colour scale, for GC content, shows densities below 30 % in grey, 30–49 % as a green—red colour gradient, and densities of 50 % and above in dark red. The scale is in Mbp The second example illustrates the ability of DensityMap to produce images describing features that occur at extreme (high or low) densities. We looked at the distributions and densities of three kinds of transposable elements (TEs): LTR and LINE retrotransposons and rolling-circle transposons. Rolling-circle transposons like helitrons are present in this genome, but they are much less abundant than LTR or LINE retrotransposons. These features were visualized with colour scales that were appropriate for features present at low density (Fig. 2). The default program setting rounds down values using a floor method that transforms values between 0 and 1 to 0. But, in this case, we selected the ceiling method, which rounds up values between 0 and 1 to 1 and are thus visualized. The densities of the LTR and LINE retrotransposons can also be visualized. Their distributions in the D. melanogaster genome are similar, except that LTRs are very dense in the inner regions of the Y chromosome while most LINEs are present at one end. The TEs in chromosomes 2 and 3 are clustered in the telomeres. A large intra-chromosomal region is devoid of repeated elements. Rolling circle transposons are concentrated at the ends of chromosomes 2 and 3 and the arms of the Y chromosome. The red windows seem to indicate helitron hotspots. Helitrons are also present in the inner regions of chromosomes but their densities are very low. There are two hotspots of these TEs on the X chromosome, one in each telomere; they are absent from most of the other regions. The density of helitrons in most regions of chromosome 4 is over 10 %.
Fig. 2

Density map of LINE and LTR retrotransposons and rolling-circle transposons (RC) in D. melanogaster. The command line was: DensityMap.pl -i dmel.gff3 -o te -ty 'LINE = fused;LTR = fused;RC = fused = 10′ -sc 40000 -ba white -str_s 15 -str_w 25 -sp 35 -sh 50 -title “Density Map of Gene, Exon, ncRNA and GC%” -la −15 -ro ceil. The arms of chromosomes 2 and 3 are shown in two annotation files 2 L, 2R and 3 L, 3R. Two density maps were drawn for each chromosome, one for LINE retrotransposon and one for LTR transposon. Tilling windows were 40, 000 bp long. The densities of LTR or LINE are shown as a blue- red gradient with 10 % intervals. Zero % RC is shown in grey. Densities of 1 to 9 % are shown in dark blue to red, and those over 10 % are in dark red. The scale is in Mbp

Density map of LINE and LTR retrotransposons and rolling-circle transposons (RC) in D. melanogaster. The command line was: DensityMap.pl -i dmel.gff3 -o te -ty 'LINE = fused;LTR = fused;RC = fused = 10′ -sc 40000 -ba white -str_s 15 -str_w 25 -sp 35 -sh 50 -title “Density Map of Gene, Exon, ncRNA and GC%” -la −15 -ro ceil. The arms of chromosomes 2 and 3 are shown in two annotation files 2 L, 2R and 3 L, 3R. Two density maps were drawn for each chromosome, one for LINE retrotransposon and one for LTR transposon. Tilling windows were 40, 000 bp long. The densities of LTR or LINE are shown as a blue- red gradient with 10 % intervals. Zero % RC is shown in grey. Densities of 1 to 9 % are shown in dark blue to red, and those over 10 % are in dark red. The scale is in Mbp

Conclusion

The development of sequencing technologies has led to improvements in genome sequence models—they have become better adapted and much more varied. This, in turn, has led to the development of tools for analysing the genome models, such as genome browsers. While these tools are most useful for viewing small regions of chromosomes, very few provide an overall view of the complete genome. CViT and Phenogram provide two solutions, but they also have limitations: non-standard annotation file formats, or not designed to deal with very dense annotation files such as repeated sequences. DensityMap can automatically compute the densities of features to give a series of windows along chromosomes—and this for a complete genome. It is very flexible; it can be used to analyse not just very dense annotations but also low density annotations by applying the computing and graphical options provided. It is also very efficient for plotting density maps of total repeats – satellites, TEs, simple sequence repeats - of human genome – 5 295 850 features – in 2 min 14 second a on computer equipped of a Intel(R) Xeon(R) W3670 CPU @ 3.20GHz and 16 Go of RAM. DensityMap is very simple to install and run, and so is a good way to obtain a global view of genomic data. To make easier the usage of DensityMap to persons non initiate to linux command line, we developed a web graphical user interface for online DensityMap analysis.

Availability and requirements

Project name: DensityMap.pl Project home page: https://github.com/sguizard/DensityMap Graphical user interface: http://chicken-repeats.inra.fr/launchDM_form.php Operating system(s): Linux Programming language: Perl Other requirements: Perl module GD::SVG License: GNU GPL v3 Restrictions on its non-academic use: None
  9 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

Review 2.  Genome sequence data: management, storage, and visualization.

Authors:  Jacqueline Batley; David Edwards
Journal:  Biotechniques       Date:  2009-04       Impact factor: 1.993

3.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

4.  J-Circos: an interactive Circos plotter.

Authors:  Jiyuan An; John Lai; Atul Sajjanhar; Jyotsna Batra; Chenwei Wang; Colleen C Nelson
Journal:  Bioinformatics       Date:  2014-12-24       Impact factor: 6.937

Review 5.  A brief introduction to web-based genome browsers.

Authors:  Jun Wang; Lei Kong; Ge Gao; Jingchu Luo
Journal:  Brief Bioinform       Date:  2012-07-03       Impact factor: 11.622

6.  Wheat syntenome unveils new evidences of contrasted evolutionary plasticity between paleo- and neoduplicated subgenomes.

Authors:  Caroline Pont; Florent Murat; Sébastien Guizard; Raphael Flores; Séverine Foucrier; Yannick Bidet; Umar Masood Quraishi; Michael Alaux; Jaroslav Doležel; Tzion Fahima; Hikmet Budak; Beat Keller; Silvio Salvi; Marco Maccaferri; Delphine Steinbach; Catherine Feuillet; Hadi Quesneville; Jérôme Salse
Journal:  Plant J       Date:  2013-11-29       Impact factor: 6.417

7.  Chromosome visualization tool: a whole genome viewer.

Authors:  Ethalinda K S Cannon; Steven B Cannon
Journal:  Int J Plant Genomics       Date:  2011-12-19

8.  Web Apollo: a web-based genomic annotation editing platform.

Authors:  Eduardo Lee; Gregg A Helt; Justin T Reese; Monica C Munoz-Torres; Chris P Childers; Robert M Buels; Lincoln Stein; Ian H Holmes; Christine G Elsik; Suzanna E Lewis
Journal:  Genome Biol       Date:  2013-08-30       Impact factor: 13.583

9.  Visualizing genomic information across chromosomes with PhenoGram.

Authors:  Daniel Wolfe; Scott Dudek; Marylyn D Ritchie; Sarah A Pendergrass
Journal:  BioData Min       Date:  2013-10-16       Impact factor: 2.522

  9 in total
  5 in total

1.  Structure and Distribution of Centromeric Retrotransposons at Diploid and Allotetraploid Coffea Centromeric and Pericentromeric Regions.

Authors:  Renata de Castro Nunes; Simon Orozco-Arias; Dominique Crouzillat; Lukas A Mueller; Suzy R Strickler; Patrick Descombes; Coralie Fournier; Deborah Moine; Alexandre de Kochko; Priscila M Yuyama; André L L Vanzela; Romain Guyot
Journal:  Front Plant Sci       Date:  2018-02-15       Impact factor: 5.753

2.  De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture.

Authors:  Shivani Mahajan; Kevin H-C Wei; Matthew J Nalley; Lauren Gibilisco; Doris Bachtrog
Journal:  PLoS Biol       Date:  2018-07-30       Impact factor: 8.029

3.  A chromosome-level genome sequence assembly of the red raspberry (Rubus idaeus L.).

Authors:  Jahn Davik; Dag Røen; Erik Lysøe; Matteo Buti; Simeon Rossman; Muath Alsheikh; Erez Lieberman Aiden; Olga Dudchenko; Daniel James Sargent
Journal:  PLoS One       Date:  2022-03-16       Impact factor: 3.240

4.  Chromosome-Scale Genome Assemblies of Aphids Reveal Extensively Rearranged Autosomes and Long-Term Conservation of the X Chromosome.

Authors:  Thomas C Mathers; Roland H M Wouters; Sam T Mugford; David Swarbreck; Cock van Oosterhout; Saskia A Hogenhout
Journal:  Mol Biol Evol       Date:  2021-03-09       Impact factor: 16.240

5.  Identification of Genomic Safe Harbors in the Anhydrobiotic Cell Line, Pv11.

Authors:  Yugo Miyata; Shoko Tokumoto; Tomohiko Arai; Nurislam Shaikhutdinov; Ruslan Deviatiiarov; Hiroto Fuse; Natalia Gogoleva; Sofya Garushyants; Alexander Cherkasov; Alina Ryabova; Guzel Gazizova; Richard Cornette; Elena Shagimardanova; Oleg Gusev; Takahiro Kikawada
Journal:  Genes (Basel)       Date:  2022-02-24       Impact factor: 4.096

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.