Literature DB >> 20639544

SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data.

Bruno Zeitouni1, Valentina Boeva, Isabelle Janoueix-Lerosey, Sophie Loeillet, Patricia Legoix-né, Alain Nicolas, Olivier Delattre, Emmanuel Barillot.   

Abstract

SUMMARY: We present SVDetect, a program designed to identify genomic structural variations from paired-end and mate-pair next-generation sequencing data produced by the Illumina GA and ABI SOLiD platforms. Applying both sliding-window and clustering strategies, we use anomalously mapped read pairs provided by current short read aligners to localize genomic rearrangements and classify them according to their type, e.g. large insertions-deletions, inversions, duplications and balanced or unbalanced inter-chromosomal translocations. SVDetect outputs predicted structural variants in various file formats for appropriate graphical visualization. AVAILABILITY: Source code and sample data are available at http://svdetect.sourceforge.net/

Entities:  

Mesh:

Year:  2010        PMID: 20639544      PMCID: PMC2905550          DOI: 10.1093/bioinformatics/btq293

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The identification of genomic structural variations is a key step in understanding human genetic diversity and evolution as well as disease etiology. Numerous genetic diseases, including cancer, have been associated with structural variants (SVs; Futreal et al., 2004). Although array-based techniques have been successful in many studies for detecting SVs, the relatively low resolution in the detection of breakpoints and the characterization of small SVs remained challenging. With the arrival of high-throughput sequencing technologies such as the Illumina Genome Analyzer or the Applied Biosystems SOLiD system, using short-insert paired-end or mate-paired reads (referred here as paired-ends) has improved our ability to detect SVs (Korbel et al., 2007). With a priori information from paired-ends such as order, orientation and insert size of pairs as constraints during read alignment to the reference genome, anomalously mapped pairs indicate potential genomic variations from the reference. The need for algorithms specifically designed for SV prediction from paired-end mapping (PEM) data has recently led to the development of new software packages, including GASV (Sindi et al., 2009), BreakDancer (Chen et al., 2009) and others (see review, Medvedev et al., 2009). Here, we present a new freely available program called SVDetect for SV detection and type prediction from PEM data. SVDetect identifies different types of SVs, e.g. large insertions–deletions and inversions, with both clustering and sliding-window strategies, and helps to visualize them at the genomic scale. Compared to other tools, the novelty of our method consists in its multiple ability to: (i) analyze both paired-end and mate-pair sequencing data; (ii) use unique PEM constraints to improve SV detection; (iii) predict various types of tandem duplication and to distinguish between balanced and unbalanced rearrangements; (iv) compare SVs across multiple samples; (v) construct copy number profiles; and (vi) create various output file formats for graphical views of SV.

2 METHODS

The first step in SVDetect is to regroup all pairs that are suspected to originate from the same SV. The input consists of paired-ends mapped to the reference genome, so that either the orientation of pairs is incorrect and/or the distance between them is out of the typical range. Starting from a list of such paired-end anomalous mapping, SVDetect uses a sliding-window strategy to identify all groups of pairs sharing a similar genomic location. The reference genome is divided into overlapping windows of fixed size, and each pair of windows can possibly form a link if at least one pair anchors them by its ends (Fig. 1A). To each link connecting two genomic fragments, we assigned a certain number of features, such as chromosomal location, number of pairs, orientation and order of the involved paired-ends. After removing redundant links, these features are further used in the filtering step to call clusters of anomalous paired-end reads and detect the type of corresponding SVs.
Fig. 1.

Overview of SVDetect algorithm and output. (A) The workflow. (B) Graphical visualization of predicted SVs. Genomic locations of inter- and intra-chromosomal links are shown using the Circos software. Starting from outside of the circle, the following features are displayed: chromosome ideograms, scatter plot of the copy-number profile and color-coded spans of chromosomal links.

Overview of SVDetect algorithm and output. (A) The workflow. (B) Graphical visualization of predicted SVs. Genomic locations of inter- and intra-chromosomal links are shown using the Circos software. Starting from outside of the circle, the following features are displayed: chromosome ideograms, scatter plot of the copy-number profile and color-coded spans of chromosomal links. The filtering procedure of SVDetect takes as input all links previously identified and uses user-defined filtering parameter values to call PEM clusters. The minimum number of paired-ends is one of the most important filtering parameters to call a cluster. Use of such a threshold improves confidence in the detection of SVs. Another option is filtering of pairs whose ends are not oriented in the same way as the ends of the majority of pairs in the two linked regions. If one of the two ends of remaining pairs has an unexpected strand orientation, the cluster is annotated as a potential inversion. The order of paired-ends is used to annotate an inter-chromosomal cluster both as a balanced or an unbalanced translocation, and to estimate a genomic coordinate range of predicted breakpoints (the resolution depending on the insert size). To achieve this goal, SVDetect filters out any pair for which the order of a read and its mate is inconsistent with the majority of pairs in the cluster. The read order is also used to characterize balanced SV affecting only one chromosome, e.g. to predict the two breakpoints of an inversion (see Supplementary Material for details of PEM signatures and the order filtering procedure). Constraints on the distance between mapped ends in a pair are used to filter intra-chromosomal PEM clusters when calling insertion–deletion events. Assuming the normality of insert size distribution of aligned reads, we generally use 2 to 3 SD from the mean insert size to detect potential deletions and insertions. The mean insert size for pairs in a called cluster is then provided. By combining the information about strand, order and insert size of paired-end reads for each cluster, the type of rearrangement can be predicted. Lastly, predicted SVs coming from multiple paired-end datasets can be compared to identify common or sample-specific variants. SVDetect provides additional functionality to analyze paired-end coverage complementary to the previously described strategy, to improve SV characterization. From all pairs correctly mapped with the expected insert size, we calculate the log-ratio of depth-of-coverage between a sample and a control dataset in a sliding window along the genome. This ratio is used to characterize copy-number profiles by identifying potential loss or gain events, and can be compared to the previously predicted SV positions. SVDetect offers different output formats to facilitate the subsequent analysis of reported paired-end clusters. Data conversion to the BED format or the Circos link format (Krzywinski et al., 2009) is available for the graphical representation of pairs and/or copy-number profiles (Fig. 1B). A user-defined color-code related to the number of pairs can be chosen for better visualization of potentially significant SVs.

3 RESULTS

To illustrate the use of SVDetect, we tested the program to predict SVs from two different types of mate-pair sequencing data: Illumina GAII 50 bp reads of neuroblastoma cell lines, and Applied Biosystems SOLiD v2 25 bp reads of a wild-type and mutant pif1Δ strains in yeast. First, from 2070 anomalous Illumina read pairs in the chromosome arms 5q and 11p, we found 37 clusters specific to the neuroblastoma sample compared with a reference sample. Only one significant cluster (seven pairs) suggests an inter-chromosomal SV, predicted to be an unbalanced translocation that is also supported by the corresponding copy-number profile (Fig. 1B). Close to the predicted breakpoint location in 5q, we found an intra-chromosomal cluster predicted to be an inverted duplication, suggesting a more complex genome rearrangement. The yeast mate-paired datasets were used to compare SVDetect with the variant detection tool GASV. From approximately 1 million of aberrantly mapped read pairs in the wild-type and mutant strains, both tools retrieved all five known SVs in the mutant strain. With respect to GASV, the specific filtering procedures introduced by SVDetect discard hypothetical rearrangements supported with inconsistent orientation or order of their read pairs (see Supplementary Material for tool comparison and data analysis). Funding: ‘Projet Incitatif Collaboratif Bioinformatique et Biostatistiques’ of the Institut Curie; Ligue Nationale Contre le Cancer; Institut National du Cancer. Conflict of Interest: none declared.
  6 in total

1.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

Review 2.  Computational methods for discovering structural variation with next-generation sequencing.

Authors:  Paul Medvedev; Monica Stanciu; Michael Brudno
Journal:  Nat Methods       Date:  2009-11       Impact factor: 28.547

Review 3.  A census of human cancer genes.

Authors:  P Andrew Futreal; Lachlan Coin; Mhairi Marshall; Thomas Down; Timothy Hubbard; Richard Wooster; Nazneen Rahman; Michael R Stratton
Journal:  Nat Rev Cancer       Date:  2004-03       Impact factor: 60.716

4.  A geometric approach for classification and comparison of structural variants.

Authors:  Suzanne Sindi; Elena Helman; Ali Bashir; Benjamin J Raphael
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

5.  Paired-end mapping reveals extensive structural variation in the human genome.

Authors:  Jan O Korbel; Alexander Eckehart Urban; Jason P Affourtit; Brian Godwin; Fabian Grubert; Jan Fredrik Simons; Philip M Kim; Dean Palejev; Nicholas J Carriero; Lei Du; Bruce E Taillon; Zhoutao Chen; Andrea Tanzer; A C Eugenia Saunders; Jianxiang Chi; Fengtang Yang; Nigel P Carter; Matthew E Hurles; Sherman M Weissman; Timothy T Harkins; Mark B Gerstein; Michael Egholm; Michael Snyder
Journal:  Science       Date:  2007-09-27       Impact factor: 47.728

6.  BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.

Authors:  Ken Chen; John W Wallis; Michael D McLellan; David E Larson; Joelle M Kalicki; Craig S Pohl; Sean D McGrath; Michael C Wendl; Qunyuan Zhang; Devin P Locke; Xiaoqi Shi; Robert S Fulton; Timothy J Ley; Richard K Wilson; Li Ding; Elaine R Mardis
Journal:  Nat Methods       Date:  2009-08-09       Impact factor: 28.547

  6 in total
  104 in total

Review 1.  Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants.

Authors:  Michael Gundry; Jan Vijg
Journal:  Mutat Res       Date:  2011-10-12       Impact factor: 2.433

2.  Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber.

Authors:  Zhonghua Zhang; Linyong Mao; Huiming Chen; Fengjiao Bu; Guangcun Li; Jinjing Sun; Shuai Li; Honghe Sun; Chen Jiao; Rachel Blakely; Junsong Pan; Run Cai; Ruibang Luo; Yves Van de Peer; Evert Jacobsen; Zhangjun Fei; Sanwen Huang
Journal:  Plant Cell       Date:  2015-05-22       Impact factor: 11.277

3.  Characterization of complex chromosomal rearrangements by targeted capture and next-generation sequencing.

Authors:  Nara L M Sobreira; Veena Gnanakkan; Michael Walsh; Beth Marosy; Elizabeth Wohler; George Thomas; Julie E Hoover-Fong; Ada Hamosh; Sarah J Wheelan; David Valle
Journal:  Genome Res       Date:  2011-09-02       Impact factor: 9.043

4.  InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.

Authors:  Anand Patel; Peter Edge; Siddarth Selvaraj; Vikas Bansal; Vineet Bafna
Journal:  Nucleic Acids Res       Date:  2016-04-21       Impact factor: 16.971

5.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives.

Authors:  Min Zhao; Qingguo Wang; Quan Wang; Peilin Jia; Zhongming Zhao
Journal:  BMC Bioinformatics       Date:  2013-09-13       Impact factor: 3.169

Review 6.  Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives.

Authors:  Qingguo Wang; Junfeng Xia; Peilin Jia; William Pao; Zhongming Zhao
Journal:  Brief Bioinform       Date:  2012-08-09       Impact factor: 11.622

7.  Detection of genomic variations and DNA polymorphisms and impact on analysis of meiotic recombination and genetic mapping.

Authors:  Ji Qi; Yamao Chen; Gregory P Copenhaver; Hong Ma
Journal:  Proc Natl Acad Sci U S A       Date:  2014-06-23       Impact factor: 11.205

Review 8.  Sequencing XMET genes to promote genotype-guided risk assessment and precision medicine.

Authors:  Yaqiong Jin; Geng Chen; Wenming Xiao; Huixiao Hong; Joshua Xu; Yongli Guo; Wenzhong Xiao; Tieliu Shi; Leming Shi; Weida Tong; Baitang Ning
Journal:  Sci China Life Sci       Date:  2019-05-20       Impact factor: 6.038

9.  Whole-Genome Resequencing Reveals Extensive Natural Variation in the Model Green Alga Chlamydomonas reinhardtii.

Authors:  Jonathan M Flowers; Khaled M Hazzouri; Gina M Pham; Ulises Rosas; Tayebeh Bahmani; Basel Khraiwesh; David R Nelson; Kenan Jijakli; Rasha Abdrabu; Elizabeth H Harris; Paul A Lefebvre; Erik F Y Hom; Kourosh Salehi-Ashtiani; Michael D Purugganan
Journal:  Plant Cell       Date:  2015-09-21       Impact factor: 11.277

10.  Germline loss-of-function mutations in LZTR1 predispose to an inherited disorder of multiple schwannomas.

Authors:  Arkadiusz Piotrowski; Jing Xie; Ying F Liu; Andrzej B Poplawski; Alicia R Gomes; Piotr Madanecki; Chuanhua Fu; Michael R Crowley; David K Crossman; Linlea Armstrong; Dusica Babovic-Vuksanovic; Amanda Bergner; Jaishri O Blakeley; Andrea L Blumenthal; Molly S Daniels; Howard Feit; Kathy Gardner; Stephanie Hurst; Christine Kobelka; Chung Lee; Rebecca Nagy; Katherine A Rauen; John M Slopis; Pim Suwannarat; Judith A Westman; Andrea Zanko; Bruce R Korf; Ludwine M Messiaen
Journal:  Nat Genet       Date:  2013-12-22       Impact factor: 38.330

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.