| Literature DB >> 23704902 |
Geòrgia Escaramís1, Cristian Tornador, Laia Bassaganyas, Raquel Rabionet, Jose M C Tubio, Alexander Martínez-Fundichely, Mario Cáceres, Marta Gut, Stephan Ossowski, Xavier Estivill.
Abstract
UNLABELLED: Next-generation sequencing technologies expedited research to develop efficient computational tools for the identification of structural variants (SVs) and their use to study human diseases. As deeper data is obtained, the existence of higher complexity SVs in some genomes becomes more evident, but the detection and definition of most of these complex rearrangements is still in its infancy. The full characterization of SVs is a key aspect for discovering their biological implications. Here we present a pipeline (PeSV-Fisher) for the detection of deletions, gains, intra- and inter-chromosomal translocations, and inversions, at very reasonable computational costs. We further provide comprehensive information on co-localization of SVs in the genome, a crucial aspect for studying their biological consequences. The algorithm uses a combination of methods based on paired-reads and read-depth strategies. PeSV-Fisher has been designed with the aim to facilitate identification of somatic variation, and, as such, it is capable of analysing two or more samples simultaneously, producing a list of non-shared variants between samples. We tested PeSV-Fisher on available sequencing data, and compared its behaviour to that of frequently deployed tools (BreakDancer and VariationHunter). We have also tested this algorithm on our own sequencing data, obtained from a tumour and a normal blood sample of a patient with chronic lymphocytic leukaemia, on which we have also validated the results by targeted re-sequencing of different kinds of predictions. This allowed us to determine confidence parameters that influence the reliability of breakpoint predictions. AVAILABILITY: PeSV-Fisher is available at http://gd.crg.eu/tools.Entities:
Mesh:
Year: 2013 PMID: 23704902 PMCID: PMC3660373 DOI: 10.1371/journal.pone.0063377
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Combination of different types of anomalous read-pair alignments together with read depth (RD) pattern to define four different categories of structural variants (SVs).
Case a.1 plus a decrease in RD represents a deletion. Cases a.2, a.3, a.4, a5 and a.6 together with an increase in RD represent different types of copy number gains in terms of the co-localization of the copy. Cases a.2 and a.3 represent a copy inserted within the same chromosome, were in case a.3 the copy is inserted in an inverted orientation. Analogous cases a.4 and a.5 represent straight or inverted insertions of copies in another chromosome. Case a.6 represents tandem duplications. Case b1 corresponds to an inversion. Cases b2 and b3 correspond to an intra-chromosomal translocation, but in case b3 the translocated region is inserted in inverted orientation. Similarly cases b4 and b5 correspond to inter-chromosomal translocations.
Whole-genome sequencing data statistics.
| Readlength | Insert size | #Reads | Seq.coverage | #Reads | Seq.coverage | ||||
| 1st percentile | Median | 99th percentile | Unpaired+paired-reads | Paired-reads | |||||
|
|
| ∼95 | 112 | 271 | 400 | 1.178.606.714 | 38.7x | 1.138.687.234 | 37.4x |
|
| ∼95 | 78 | 261 | 370 | 1.161.111.680 | 38.1x | 1.120.131.088 | 36.8x | |
|
|
| ∼95 | 84 | 264 | 405 | 1.377.406.890 | 45.2x | 1.355.526.562 | 44.5x |
|
| ∼95 | 68 | 247 | 402 | 1.364.685.267 | 44.8x | 1.322.094.298 | 43.4x | |
|
|
| ∼40 | 42 | 186 | 966 | 2.328.156.503 | 32.2x | 2.178.558.262 | 30.3x |
Figure 2Performance of PeSV-Fisher based on the analysis of a high-coverage sequenced blood-cancer genome.
Results according to different scenarios based on the set of confidence parameters that influence the reliability of breakpoint predictions of clusters made by anomalous alignment read-pairs. These are phred-scaled quality scores Q; the number of read-pairs supporting the aberrant cluster; and the length of the potential variant call. For each scenario and cluster type are represented the number of breakpoints captured by target re-sequecing of PeSV-Fisher calls from whole genome sequencing (n), percentage of breakpoints validated by target-re-sequencing using the paired-reads strategy (%PR) and the percentage of breakpoints validated by target-re-sequencing using split-reads analysis (%SR).
Figure 3Sensitivity analysis and comparison with results from other SV prediction methods based on the analysis of the Yoruba daughter (NA19240) from the high-coverage trio from the 1000 Genomes Project dataset.
(a) 90% based of the overlap of PeSV-Fisher calls with the non-redundant set of deletions from the publication of Mills et al. [5] that were validated by PCR and/or assembly. Breakpoints overlapping indicate distance clusters with predicted overlapping breakpoint ranges from each end of the variant; (b) three-way comparison of the deletion calls made by PeSV-Fisher, BreakDancer and VariationHunter. The analysis is carried out using a 99% of reciprocal overlap.