Literature DB >> 29982625

SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution.

Li Charlie Xia1,2, Dongmei Ai3, Hojoon Lee1, Noemi Andor1, Chao Li3, Nancy R Zhang2, Hanlee P Ji1,4.   

Abstract

Background: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes. Findings: We developed SVEngine, an open-source tool to address this need. SVEngine simulates next-generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file, and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs), and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions, and translocations. Finally, SVEngine simulates sequence data that replicate the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time. Conclusions: We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogeneous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift, and neighboring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use .

Entities:  

Mesh:

Year:  2018        PMID: 29982625      PMCID: PMC6057526          DOI: 10.1093/gigascience/giy081

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  45 in total

1.  DendroPy: a Python library for phylogenetic computing.

Authors:  Jeet Sukumaran; Mark T Holder
Journal:  Bioinformatics       Date:  2010-04-25       Impact factor: 6.937

2.  Introduction to next generation sequencing and genotyping issue.

Authors:  Victor M Ugaz
Journal:  Electrophoresis       Date:  2012-12       Impact factor: 3.535

3.  A genome-wide approach for detecting novel insertion-deletion variants of mid-range size.

Authors:  Li C Xia; Sukolsak Sakshuwong; Erik S Hopmans; John M Bell; Susan M Grimes; David O Siegmund; Hanlee P Ji; Nancy R Zhang
Journal:  Nucleic Acids Res       Date:  2016-06-20       Impact factor: 16.971

4.  NanoSim: nanopore sequence read simulator based on statistical characterization.

Authors:  Chen Yang; Justin Chu; René L Warren; Inanç Birol
Journal:  Gigascience       Date:  2017-04-01       Impact factor: 6.524

5.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads.

Authors:  Kai Ye; Marcel H Schulz; Quan Long; Rolf Apweiler; Zemin Ning
Journal:  Bioinformatics       Date:  2009-06-26       Impact factor: 6.937

Review 6.  Sequencing depth and coverage: key considerations in genomic analyses.

Authors:  David Sims; Ian Sudbery; Nicholas E Ilott; Andreas Heger; Chris P Ponting
Journal:  Nat Rev Genet       Date:  2014-02       Impact factor: 53.242

7.  Read length versus depth of coverage for viral quasispecies reconstruction.

Authors:  Osvaldo Zagordi; Martin Däumer; Christian Beisel; Niko Beerenwinkel
Journal:  PLoS One       Date:  2012-10-03       Impact factor: 3.240

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

9.  DELLY: structural variant discovery by integrated paired-end and split-read analysis.

Authors:  Tobias Rausch; Thomas Zichner; Andreas Schlattl; Adrian M Stütz; Vladimir Benes; Jan O Korbel
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

10.  Chromosome-scale mega-haplotypes enable digital karyotyping of cancer aneuploidy.

Authors:  John M Bell; Billy T Lau; Stephanie U Greer; Christina Wood-Bouwens; Li C Xia; Ian D Connolly; Melanie H Gephart; Hanlee P Ji
Journal:  Nucleic Acids Res       Date:  2017-11-02       Impact factor: 16.971

View more
  3 in total

1.  SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution.

Authors:  Li Charlie Xia; Dongmei Ai; Hojoon Lee; Noemi Andor; Chao Li; Nancy R Zhang; Hanlee P Ji
Journal:  Gigascience       Date:  2018-07-01       Impact factor: 6.524

Review 2.  Eleven grand challenges in single-cell data science.

Authors:  David Lähnemann; Johannes Köster; Ewa Szczurek; Davis J McCarthy; Stephanie C Hicks; Mark D Robinson; Catalina A Vallejos; Kieran R Campbell; Niko Beerenwinkel; Ahmed Mahfouz; Luca Pinello; Pavel Skums; Alexandros Stamatakis; Camille Stephan-Otto Attolini; Samuel Aparicio; Jasmijn Baaijens; Marleen Balvert; Buys de Barbanson; Antonio Cappuccio; Giacomo Corleone; Bas E Dutilh; Maria Florescu; Victor Guryev; Rens Holmer; Katharina Jahn; Thamar Jessurun Lobo; Emma M Keizer; Indu Khatri; Szymon M Kielbasa; Jan O Korbel; Alexey M Kozlov; Tzu-Hao Kuo; Boudewijn P F Lelieveldt; Ion I Mandoiu; John C Marioni; Tobias Marschall; Felix Mölder; Amir Niknejad; Lukasz Raczkowski; Marcel Reinders; Jeroen de Ridder; Antoine-Emmanuel Saliba; Antonios Somarakis; Oliver Stegle; Fabian J Theis; Huan Yang; Alex Zelikovsky; Alice C McHardy; Benjamin J Raphael; Sohrab P Shah; Alexander Schönhuth
Journal:  Genome Biol       Date:  2020-02-07       Impact factor: 13.583

3.  A benchmark of structural variation detection by long reads through a realistic simulated model.

Authors:  Nicolas Dierckxsens; Tong Li; Joris R Vermeesch; Zhi Xie
Journal:  Genome Biol       Date:  2021-12-15       Impact factor: 13.583

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.