| Literature DB >> 23605045 |
Georgios A Pavlopoulos1, Parveen Kumar, Alejandro Sifrim, Ryo Sakai, Meng Lay Lin, Thierry Voet, Yves Moreau, Jan Aerts.
Abstract
The introduction of next generation sequencing methods in genome studies has made it possible to shift research from a gene-centric approach to a genome wide view. Although methods and tools to detect single nucleotide polymorphisms are becoming more mature, methods to identify and visualize structural variation (SV) are still in their infancy. Most genome browsers can only compare a given sequence to a reference genome; therefore, direct comparison of multiple individuals still remains a challenge. Therefore, the implementation of efficient approaches to explore and visualize SVs and directly compare two or more individuals is desirable. In this article, we present a visualization approach that uses space-filling Hilbert curves to explore SVs based on both read-depth and pair-end information. An interactive open-source Java application, called Meander, implements the proposed methodology, and its functionality is demonstrated using two cases. With Meander, users can explore variations at different levels of resolution and simultaneously compare up to four different individuals against a common reference. The application was developed using Java version 1.6 and Processing.org and can be run on any platform. It can be found at http://homes.esat.kuleuven.be/~bioiuser/meander.Entities:
Mesh:
Year: 2013 PMID: 23605045 PMCID: PMC3675473 DOI: 10.1093/nar/gkt254
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Folding levels of a Hilbert curve. The number of the edges of the Hilbert curve is 4, where N denotes the fold level. For a canvas of 29 × 29 = 512 × 512 = 262 144 pixel dimension, the fold level N = 9 covers every pixel of the plane.
Figure 2.Space filling curves in genomic data. (A) Resolution gain comparing the linear with the Hilbert representation. (B) Colour mapping: The transparency of the colour is adjusted according to the signal value. Dark areas indicate high coverage; light grey areas lower coverage. White areas indicate zero coverage or absence of data. The red arrows show the coordinate system of the system curve. (C) Comparison of a sample against a reference: Left: The sample and reference human chromosome 1 in both a Hilbert and a linear representation. Right: The log2 ratio between the reference and the signal. Blue signals indicate possible tandem duplications as reference < sample, and yellow blocks indicate possible deletions as reference > sample.
Figure 3.Comparison of chromosome 1 between strain ICE153 from central Asia and strain ICE97 from southern Italy. (A) An example of a deletion and a tandem duplication supported by both pair-end and read-depth information. (B) The advantage of the Hilbert representation. Left: A tandem duplication that is not visible in the linear representation (1 pixel length) but very clear in the Hilbert representation as a bigger block. Right: The same tandem duplication at zoom level 5 supported both by read-depth and pair-end evidence.
Figure 4.The unstable nature of HCC38. (A) Hierarchy of the single-cell derived subclones and comparison with the PD4198b reference genome. (B) Comparison between the four subclones against the PD4198b reference genome. Subclone B8FF4C demonstrates a de novo tandem duplication and flanking deletion not present in the other subclones. (C) Visualization of an inter-chromosomal variation (linked to the q-arm of chromosome 17), a unique deletion and tandem duplication around position 15 200 000 for chromosome 20 not present in the other subclones.