Marwan A Hawari1, Celine S Hong2, Leslie G Biesecker1. 1. Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. 2. Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. celine.hong@nih.gov.
Abstract
BACKGROUND: Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performance. However, many existing somatic variant simulation tools are limited because they rely on generating entirely synthetic reads derived from a reference genome or because they do not allow for the precise customizability that would enable a more focused understanding of single nucleotide variant calling performance. RESULTS: SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results. CONCLUSIONS: SomatoSim is a user-friendly tool that offers a high level of customizability for simulating somatic single nucleotide variants. SomatoSim is available at https://github.com/BieseckerLab/SomatoSim .
BACKGROUND: Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performance. However, many existing somatic variant simulation tools are limited because they rely on generating entirely synthetic reads derived from a reference genome or because they do not allow for the precise customizability that would enable a more focused understanding of single nucleotide variant calling performance. RESULTS: SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results. CONCLUSIONS: SomatoSim is a user-friendly tool that offers a high level of customizability for simulating somatic single nucleotide variants. SomatoSim is available at https://github.com/BieseckerLab/SomatoSim .
Entities:
Keywords:
Mosaicism; SNV; Simulation; Single nucleotide variants; Somatic variants
Authors: Justin M Zook; David Catoe; Jennifer McDaniel; Lindsay Vang; Noah Spies; Arend Sidow; Ziming Weng; Yuling Liu; Christopher E Mason; Noah Alexander; Elizabeth Henaff; Alexa B R McIntyre; Dhruva Chandramohan; Feng Chen; Erich Jaeger; Ali Moshrefi; Khoa Pham; William Stedman; Tiffany Liang; Michael Saghbini; Zeljko Dzakula; Alex Hastie; Han Cao; Gintaras Deikus; Eric Schadt; Robert Sebra; Ali Bashir; Rebecca M Truty; Christopher C Chang; Natali Gulbahce; Keyan Zhao; Srinka Ghosh; Fiona Hyland; Yutao Fu; Mark Chaisson; Chunlin Xiao; Jonathan Trow; Stephen T Sherry; Alexander W Zaranek; Madeleine Ball; Jason Bobe; Preston Estep; George M Church; Patrick Marks; Sofia Kyriazopoulou-Panagiotopoulou; Grace X Y Zheng; Michael Schnall-Levin; Heather S Ordonez; Patrice A Mudivarti; Kristina Giorda; Ying Sheng; Karoline Bjarnesdatter Rypdal; Marc Salit Journal: Sci Data Date: 2016-06-07 Impact factor: 6.444
Authors: Adam D Ewing; Kathleen E Houlahan; Yin Hu; Kyle Ellrott; Cristian Caloian; Takafumi N Yamaguchi; J Christopher Bare; Christine P'ng; Daryl Waggott; Veronica Y Sabelnykova; Michael R Kellen; Thea C Norman; David Haussler; Stephen H Friend; Gustavo Stolovitzky; Adam A Margolin; Joshua M Stuart; Paul C Boutros Journal: Nat Methods Date: 2015-05-18 Impact factor: 28.547
Authors: Jacob F Degner; John C Marioni; Athma A Pai; Joseph K Pickrell; Everlyne Nkadori; Yoav Gilad; Jonathan K Pritchard Journal: Bioinformatics Date: 2009-10-06 Impact factor: 6.937
Authors: Adam Frankish; Mark Diekhans; Anne-Maud Ferreira; Rory Johnson; Irwin Jungreis; Jane Loveland; Jonathan M Mudge; Cristina Sisu; James Wright; Joel Armstrong; If Barnes; Andrew Berry; Alexandra Bignell; Silvia Carbonell Sala; Jacqueline Chrast; Fiona Cunningham; Tomás Di Domenico; Sarah Donaldson; Ian T Fiddes; Carlos García Girón; Jose Manuel Gonzalez; Tiago Grego; Matthew Hardy; Thibaut Hourlier; Toby Hunt; Osagie G Izuogu; Julien Lagarde; Fergal J Martin; Laura Martínez; Shamika Mohanan; Paul Muir; Fabio C P Navarro; Anne Parker; Baikang Pei; Fernando Pozo; Magali Ruffier; Bianca M Schmitt; Eloise Stapleton; Marie-Marthe Suner; Irina Sycheva; Barbara Uszczynska-Ratajczak; Jinuri Xu; Andrew Yates; Daniel Zerbino; Yan Zhang; Bronwen Aken; Jyoti S Choudhary; Mark Gerstein; Roderic Guigó; Tim J P Hubbard; Manolis Kellis; Benedict Paten; Alexandre Reymond; Michael L Tress; Paul Flicek Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971
Authors: John G Tate; Sally Bamford; Harry C Jubb; Zbyslaw Sondka; David M Beare; Nidhi Bindal; Harry Boutselakis; Charlotte G Cole; Celestino Creatore; Elisabeth Dawson; Peter Fish; Bhavana Harsha; Charlie Hathaway; Steve C Jupe; Chai Yin Kok; Kate Noble; Laura Ponting; Christopher C Ramshaw; Claire E Rye; Helen E Speedy; Ray Stefancsik; Sam L Thompson; Shicai Wang; Sari Ward; Peter J Campbell; Simon A Forbes Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971