Literature DB >> 22336055

GemSIM: general, error-model based simulator of next-generation sequencing data.

Kerensa E McElroy1, Fabio Luciani, Torsten Thomas.   

Abstract

BACKGROUND: GemSIM, or General Error-Model based SIMulator, is a next-generation sequencing simulator capable of generating single or paired-end reads for any sequencing technology compatible with the generic formats SAM and FASTQ (including Illumina and Roche/454). GemSIM creates and uses empirically derived, sequence-context based error models to realistically emulate individual sequencing runs and/or technologies. Empirical fragment length and quality score distributions are also used. Reads may be drawn from one or more genomes or haplotype sets, facilitating simulation of deep sequencing, metagenomic, and resequencing projects.
RESULTS: We demonstrate GemSIM's value by deriving error models from two different Illumina sequencing runs and one Roche/454 run, and comparing and contrasting the resulting error profiles of each run. Overall error rates varied dramatically, both between individual Illumina runs, between the first and second reads in each pair, and between datasets from Illumina and Roche/454 technologies. Indels were markedly more frequent in Roche/454 than Illumina and both technologies suffered from an increase in error rates near the end of each read.The effects of these different profiles on low-frequency SNP-calling accuracy were investigated by analysing simulated sequencing data for a mixture of bacterial haplotypes. In general, SNP-calling using VarScan was only accurate for SNPs with frequency > 3%, independent of which error model was used to simulate the data. Variation between error profiles interacted strongly with VarScan's 'minumum average quality' parameter, resulting in different optimal settings for different sequencing runs.
CONCLUSIONS: Next-generation sequencing has unprecedented potential for assessing genetic diversity, however analysis is complicated as error profiles can vary noticeably even between different runs of the same technology. Simulation with GemSIM can help overcome this problem, by providing insights into the error profiles of individual sequencing runs and allowing researchers to assess the effects of these errors on downstream data analysis.

Entities:  

Mesh:

Year:  2012        PMID: 22336055      PMCID: PMC3305602          DOI: 10.1186/1471-2164-13-74

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  15 in total

1.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

2.  Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing.

Authors:  André Gilles; Emese Meglécz; Nicolas Pech; Stéphanie Ferreira; Thibaut Malausa; Jean-François Martin
Journal:  BMC Genomics       Date:  2011-05-19       Impact factor: 3.969

3.  GenFrag 2.1: new features for more robust fragment assembly benchmarks.

Authors:  M L Engle; C Burks
Journal:  Comput Appl Biosci       Date:  1994-09

Review 4.  Genotype and SNP calling from next-generation sequencing data.

Authors:  Rasmus Nielsen; Joshua S Paul; Anders Albrechtsen; Yun S Song
Journal:  Nat Rev Genet       Date:  2011-06       Impact factor: 53.242

5.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

6.  Sequential bottlenecks drive viral evolution in early acute hepatitis C virus infection.

Authors:  Rowena A Bull; Fabio Luciani; Kerensa McElroy; Silvana Gaudieri; Son T Pham; Abha Chopra; Barbara Cameron; Lisa Maher; Gregory J Dore; Peter A White; Andrew R Lloyd
Journal:  PLoS Pathog       Date:  2011-09-01       Impact factor: 6.823

7.  Accuracy and quality of massively parallel DNA pyrosequencing.

Authors:  Susan M Huse; Julie A Huber; Hilary G Morrison; Mitchell L Sogin; David Mark Welch
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

8.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

Authors:  Juliane C Dohm; Claudio Lottaz; Tatiana Borodina; Heinz Himmelbauer
Journal:  Nucleic Acids Res       Date:  2008-07-26       Impact factor: 16.971

9.  MetaSim: a sequencing simulator for genomics and metagenomics.

Authors:  Daniel C Richter; Felix Ott; Alexander F Auch; Ramona Schmid; Daniel H Huson
Journal:  PLoS One       Date:  2008-10-08       Impact factor: 3.240

10.  Accurate whole human genome sequencing using reversible terminator chemistry.

Authors:  David R Bentley; Shankar Balasubramanian; Harold P Swerdlow; Geoffrey P Smith; John Milton; Clive G Brown; Kevin P Hall; Dirk J Evers; Colin L Barnes; Helen R Bignell; Jonathan M Boutell; Jason Bryant; Richard J Carter; R Keira Cheetham; Anthony J Cox; Darren J Ellis; Michael R Flatbush; Niall A Gormley; Sean J Humphray; Leslie J Irving; Mirian S Karbelashvili; Scott M Kirk; Heng Li; Xiaohai Liu; Klaus S Maisinger; Lisa J Murray; Bojan Obradovic; Tobias Ost; Michael L Parkinson; Mark R Pratt; Isabelle M J Rasolonjatovo; Mark T Reed; Roberto Rigatti; Chiara Rodighiero; Mark T Ross; Andrea Sabot; Subramanian V Sankar; Aylwyn Scally; Gary P Schroth; Mark E Smith; Vincent P Smith; Anastassia Spiridou; Peta E Torrance; Svilen S Tzonev; Eric H Vermaas; Klaudia Walter; Xiaolin Wu; Lu Zhang; Mohammed D Alam; Carole Anastasi; Ify C Aniebo; David M D Bailey; Iain R Bancarz; Saibal Banerjee; Selena G Barbour; Primo A Baybayan; Vincent A Benoit; Kevin F Benson; Claire Bevis; Phillip J Black; Asha Boodhun; Joe S Brennan; John A Bridgham; Rob C Brown; Andrew A Brown; Dale H Buermann; Abass A Bundu; James C Burrows; Nigel P Carter; Nestor Castillo; Maria Chiara E Catenazzi; Simon Chang; R Neil Cooley; Natasha R Crake; Olubunmi O Dada; Konstantinos D Diakoumakos; Belen Dominguez-Fernandez; David J Earnshaw; Ugonna C Egbujor; David W Elmore; Sergey S Etchin; Mark R Ewan; Milan Fedurco; Louise J Fraser; Karin V Fuentes Fajardo; W Scott Furey; David George; Kimberley J Gietzen; Colin P Goddard; George S Golda; Philip A Granieri; David E Green; David L Gustafson; Nancy F Hansen; Kevin Harnish; Christian D Haudenschild; Narinder I Heyer; Matthew M Hims; Johnny T Ho; Adrian M Horgan; Katya Hoschler; Steve Hurwitz; Denis V Ivanov; Maria Q Johnson; Terena James; T A Huw Jones; Gyoung-Dong Kang; Tzvetana H Kerelska; Alan D Kersey; Irina Khrebtukova; Alex P Kindwall; Zoya Kingsbury; Paula I Kokko-Gonzales; Anil Kumar; Marc A Laurent; Cynthia T Lawley; Sarah E Lee; Xavier Lee; Arnold K Liao; Jennifer A Loch; Mitch Lok; Shujun Luo; Radhika M Mammen; John W Martin; Patrick G McCauley; Paul McNitt; Parul Mehta; Keith W Moon; Joe W Mullens; Taksina Newington; Zemin Ning; Bee Ling Ng; Sonia M Novo; Michael J O'Neill; Mark A Osborne; Andrew Osnowski; Omead Ostadan; Lambros L Paraschos; Lea Pickering; Andrew C Pike; Alger C Pike; D Chris Pinkard; Daniel P Pliskin; Joe Podhasky; Victor J Quijano; Come Raczy; Vicki H Rae; Stephen R Rawlings; Ana Chiva Rodriguez; Phyllida M Roe; John Rogers; Maria C Rogert Bacigalupo; Nikolai Romanov; Anthony Romieu; Rithy K Roth; Natalie J Rourke; Silke T Ruediger; Eli Rusman; Raquel M Sanches-Kuiper; Martin R Schenker; Josefina M Seoane; Richard J Shaw; Mitch K Shiver; Steven W Short; Ning L Sizto; Johannes P Sluis; Melanie A Smith; Jean Ernest Sohna Sohna; Eric J Spence; Kim Stevens; Neil Sutton; Lukasz Szajkowski; Carolyn L Tregidgo; Gerardo Turcatti; Stephanie Vandevondele; Yuli Verhovsky; Selene M Virk; Suzanne Wakelin; Gregory C Walcott; Jingwen Wang; Graham J Worsley; Juying Yan; Ling Yau; Mike Zuerlein; Jane Rogers; James C Mullikin; Matthew E Hurles; Nick J McCooke; John S West; Frank L Oaks; Peter L Lundberg; David Klenerman; Richard Durbin; Anthony J Smith
Journal:  Nature       Date:  2008-11-06       Impact factor: 49.962

View more
  76 in total

1.  Polyester: simulating RNA-seq datasets with differential transcript expression.

Authors:  Alyssa C Frazee; Andrew E Jaffe; Ben Langmead; Jeffrey T Leek
Journal:  Bioinformatics       Date:  2015-04-28       Impact factor: 6.937

2.  Wessim: a whole-exome sequencing simulator based on in silico exome capture.

Authors:  Sangwoo Kim; Kyowon Jeong; Vineet Bafna
Journal:  Bioinformatics       Date:  2013-02-14       Impact factor: 6.937

3.  Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations.

Authors:  Sergio Pulido-Tamayo; Aminael Sánchez-Rodríguez; Toon Swings; Bram Van den Bergh; Akanksha Dubey; Hans Steenackers; Jan Michiels; Jan Fostier; Kathleen Marchal
Journal:  Nucleic Acids Res       Date:  2015-05-18       Impact factor: 16.971

4.  Parametric modeling of whole-genome sequencing data for CNV identification.

Authors:  Saran Vardhanabhuti; X Jessie Jeng; Yinghua Wu; Hongzhe Li
Journal:  Biostatistics       Date:  2014-01-28       Impact factor: 5.899

5.  Strain-level microbial epidemiology and population genomics from shotgun metagenomics.

Authors:  Matthias Scholz; Doyle V Ward; Edoardo Pasolli; Thomas Tolio; Moreno Zolfo; Francesco Asnicar; Duy Tin Truong; Adrian Tett; Ardythe L Morrow; Nicola Segata
Journal:  Nat Methods       Date:  2016-03-21       Impact factor: 28.547

Review 6.  A broad survey of DNA sequence data simulation tools.

Authors:  Shatha Alosaimi; Armand Bandiang; Noelle van Biljon; Denis Awany; Prisca K Thami; Milaine S S Tchamga; Anmol Kiran; Olfa Messaoud; Radia Ismaeel Mohammed Hassan; Jacquiline Mugo; Azza Ahmed; Christian D Bope; Imane Allali; Gaston K Mazandu; Nicola J Mulder; Emile R Chimusa
Journal:  Brief Funct Genomics       Date:  2020-01-22       Impact factor: 4.241

7.  SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution.

Authors:  Li Charlie Xia; Dongmei Ai; Hojoon Lee; Noemi Andor; Chao Li; Nancy R Zhang; Hanlee P Ji
Journal:  Gigascience       Date:  2018-07-01       Impact factor: 6.524

8.  Empirical assessment of sequencing errors for high throughput pyrosequencing data.

Authors:  Paulo G S da Fonseca; Jorge A P Paiva; Luiz G P Almeida; Ana T R Vasconcelos; Ana T Freitas
Journal:  BMC Res Notes       Date:  2013-01-22

9.  MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples.

Authors:  Moreno Zolfo; Adrian Tett; Olivier Jousson; Claudio Donati; Nicola Segata
Journal:  Nucleic Acids Res       Date:  2016-09-19       Impact factor: 16.971

10.  SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences From Reference Genomes.

Authors:  Yue Xing; Alan R Dabney; Xiao Li; Guosong Wang; Clare A Gill; Claudio Casola
Journal:  Front Genet       Date:  2020-02-21       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.