Literature DB >> 31867604

A broad survey of DNA sequence data simulation tools.

Shatha Alosaimi1, Armand Bandiang1, Noelle van Biljon2, Denis Awany1, Prisca K Thami1,3, Milaine S S Tchamga1, Anmol Kiran4,5, Olfa Messaoud6, Radia Ismaeel Mohammed Hassan1, Jacquiline Mugo2, Azza Ahmed7, Christian D Bope2, Imane Allali2, Gaston K Mazandu1,2,8, Nicola J Mulder2, Emile R Chimusa1.   

Abstract

In silico DNA sequence generation is a powerful technology to evaluate and validate bioinformatics tools, and accordingly more than 35 DNA sequence simulation tools have been developed. With such a diverse array of tools to choose from, an important question is: Which tool should be used for a desired outcome? This question is largely unanswered as documentation for many of these DNA simulation tools is sparse. To address this, we performed a review of DNA sequence simulation tools developed to date and evaluated 20 state-of-art DNA sequence simulation tools on their ability to produce accurate reads based on their implemented sequence error model. We provide a succinct description of each tool and suggest which tool is most appropriate for the given different scenarios. Given the multitude of similar yet non-identical tools, researchers can use this review as a guide to inform their choice of DNA sequence simulation tool. This paves the way towards assessing existing tools in a unified framework, as well as enabling different simulation scenario analysis within the same framework.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Keywords:  DNA sequence; bioinformatics tools; genomics; next generation sequence; simulation

Mesh:

Substances:

Year:  2020        PMID: 31867604      PMCID: PMC7030445          DOI: 10.1093/bfgp/elz033

Source DB:  PubMed          Journal:  Brief Funct Genomics        ISSN: 2041-2649            Impact factor:   4.241


  42 in total

1.  pIRS: Profile-based Illumina pair-end reads simulator.

Authors:  Xuesong Hu; Jianying Yuan; Yujian Shi; Jianliang Lu; Binghang Liu; Zhenyu Li; Yanxiang Chen; Desheng Mu; Hao Zhang; Nan Li; Zhen Yue; Fan Bai; Heng Li; Wei Fan
Journal:  Bioinformatics       Date:  2012-04-15       Impact factor: 6.937

2.  SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.

Authors:  Sam Benidt; Dan Nettleton
Journal:  Bioinformatics       Date:  2015-02-26       Impact factor: 6.937

3.  SimLoRD: Simulation of Long Read Data.

Authors:  Bianca K Stöcker; Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2016-05-10       Impact factor: 6.937

4.  Artificially generated data sets for testing DNA sequence assembly algorithms.

Authors:  M L Engle; C Burks
Journal:  Genomics       Date:  1993-04       Impact factor: 5.736

5.  NanoSim: nanopore sequence read simulator based on statistical characterization.

Authors:  Chen Yang; Justin Chu; René L Warren; Inanç Birol
Journal:  Gigascience       Date:  2017-04-01       Impact factor: 6.524

6.  FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

Authors:  Anna Shcherbina
Journal:  BMC Res Notes       Date:  2014-08-15

7.  NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

Authors:  Sophia S Liu; Adam J Hockenberry; Andrea Lancichinetti; Michael C Jewett; Luís A N Amaral
Journal:  PLoS Comput Biol       Date:  2016-11-11       Impact factor: 4.475

8.  Pysim-sv: a package for simulating structural variation data with GC-biases.

Authors:  Yuchao Xia; Yun Liu; Minghua Deng; Ruibin Xi
Journal:  BMC Bioinformatics       Date:  2017-03-14       Impact factor: 3.169

9.  XS: a FASTQ read simulator.

Authors:  Diogo Pratas; Armando J Pinho; João M O S Rodrigues
Journal:  BMC Res Notes       Date:  2014-01-16

10.  LRSim: A Linked-Reads Simulator Generating Insights for Better Genome Partitioning.

Authors:  Ruibang Luo; Fritz J Sedlazeck; Charlotte A Darby; Stephen M Kelly; Michael C Schatz
Journal:  Comput Struct Biotechnol J       Date:  2017-11-09       Impact factor: 7.271

View more
  6 in total

1.  J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments.

Authors:  Fabrizio Angaroni; Alex Graudenzi; Alessandro Guidi; Gianluca Ascolani; Alberto d'Onofrio; Marco Antoniotti
Journal:  BMC Bioinformatics       Date:  2022-07-08       Impact factor: 3.307

2.  Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches.

Authors:  Shatha Alosaimi; Noëlle van Biljon; Denis Awany; Prisca K Thami; Joel Defo; Jacquiline W Mugo; Christian D Bope; Gaston K Mazandu; Nicola J Mulder; Emile R Chimusa
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

3.  ReSeq simulates realistic Illumina high-throughput sequencing data.

Authors:  Stephan Schmeing; Mark D Robinson
Journal:  Genome Biol       Date:  2021-02-19       Impact factor: 13.583

4.  Bioinformatics in Sudan: Status and challenges case study: The National University-Sudan.

Authors:  Sofia B Mohamed; Sumaya Kambal; Sabah A E Ibrahim; Esra Abdalwhab; Abdalla Munir; Arwa Ibrahim; Qurashi Mohamed Ali
Journal:  PLoS Comput Biol       Date:  2021-10-21       Impact factor: 4.475

5.  Accurate prediction of metagenome-assembled genome completeness by MAGISTA, a random forest model built on alignment-free intra-bin statistics.

Authors:  Gleb Goussarov; Jürgen Claesen; Mohamed Mysara; Ilse Cleenwerck; Natalie Leys; Peter Vandamme; Rob Van Houdt
Journal:  Environ Microbiome       Date:  2022-03-05

6.  PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.

Authors:  Yukiteru Ono; Kiyoshi Asai; Michiaki Hamada
Journal:  Bioinformatics       Date:  2021-05-05       Impact factor: 6.937

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.