Literature DB >> 23435068

Genetic Simulation Resources: a website for the registration and discovery of genetic data simulators.

Bo Peng1, Huann-Sheng Chen, Leah E Mechanic, Ben Racine, John Clarke, Lauren Clarke, Elizabeth Gillanders, Eric J Feuer.   

Abstract

SUMMARY: Many simulation methods and programs have been developed to simulate genetic data of the human genome. These data have been widely used, for example, to predict properties of populations retrospectively or prospectively according to mathematically intractable genetic models, and to assist the validation, statistical inference and power analysis of a variety of statistical models. However, owing to the differences in type of genetic data of interest, simulation methods, evolutionary features, input and output formats, terminologies and assumptions for different applications, choosing the right tool for a particular study can be a resource-intensive process that usually involves searching, downloading and testing many different simulation programs. Genetic Simulation Resources (GSR) is a website provided by the National Cancer Institute (NCI) that aims to help researchers compare and choose the appropriate simulation tools for their studies. This website allows authors of simulation software to register their applications and describe them with well-defined attributes, thus allowing site users to search and compare simulators according to specified features. AVAILABILITY: http://popmodels.cancercontrol.cancer.gov/gsr.

Entities:  

Mesh:

Year:  2013        PMID: 23435068      PMCID: PMC3624809          DOI: 10.1093/bioinformatics/btt094

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Owing to the cost and availability of genetic samples, lack of knowledge of causal variants that contribute to observed phenotypes and mathematical intractability of complex evolutionary models, computer simulations have been widely used, among many applications, to predict outcomes under realistic genetic scenarios (e.g. Peng and Kimmel, 2007), to compare and verify analytical methods or tools (e.g. Spencer ) and to estimate parameters of evolutionary models (e.g. Peter ). With increasing power of personal computers and the availability of computer clusters, novel simulation methods and sophisticated simulation programs have been and continue to be developed to simulate genetic data for new application areas such as large-scale genomic studies (Dalquen ). Despite the availability of a large number of simulation programs, choosing appropriate simulation programs for a particular research topic can be a time-consuming process that usually involves studying, downloading and testing many different tools with varying quality. Adding to the difficulties is the fact that many software applications lack comprehensive documentation, and use implicit assumptions and terminologies that are familiar only to researchers in particular research areas. As a result, at an NCI-sponsored conference, meeting participants recommended creating a web resource that summarizes available genetic simulation programs (Mechanic ). Genetic Simulation Resources (GSR) is a website provided by NCI that aims to help researchers compare and choose the right simulation tools for their studies. This website allows authors of simulation software to register their applications and describe them with standardized attributes that are understandable to researchers in diverse research areas. Visitors of this website can browse a catalogue of genetic data simulators, review simulators of interest and search and compare simulators according to specified features. This pre-sorting allows researchers to focus on the most applicable simulators before starting the time-consuming process of downloading and testing the packages themselves.

2 METHODS

We searched published articles for software applications that simulate genetic data for the human genome in scientific journals such as Bioinformatics, BMC Bioinformatics, Genetics and Molecular Biology and Evolution. We selected simulators that can simulate genetic markers, haploid and diploid DNA sequences and RNA and protein sequences of the human genome. We excluded simulators without an accessible web page or download link and those that are designed for teaching purposes and are limited in their ability to simulate usable genetic data. We also excluded packages that have been replaced by newer or updated packages from the same authors. We collected basic information of selected simulators, including short and long descriptions, URL to package web page, project start date and version and release date of the most recent release. We went through publications and documentation of these simulators and summarized their features with 167 attributes in 8 categories and 25 subcategories. These attributes range from key features such as type of genetic variations that can be simulated (e.g. single nucleotide polymorphism, insertion and deletion and microsatellite) and simulation methods (e.g. coalescent, forward time, resampling based and phylogenetic), to development features such as programming language, supported platform and license information. Because not all aspects of packages will be captured using these standard attributes, we allow package owners to annotate existing attributes with package-specific comments and define package-specific attributes. We entered attributes of selected simulators and characterized them to the best of our knowledge. To ensure the accuracy of data, we sent a questionnaire to all package authors and received responses from approximately half of the authors, which may suggest that some packages have been left unmaintained for various reasons. We revised attributes of packages according to feedback from authors. The GSR website currently provides an interface to a catalogue of 80 registered packages (Fig. 1), with a global search box, a list view of all software resources and interfaces to rank packages according to selected attributes and compare attributes of selected packages. Packages in this catalogue are continuously being added and updated by authors and users of simulation programs. GSR does not host or maintain individual packages and is not responsible for the accuracy and timely update of information related to these packages. We plan to evaluate the activity of packages regularly, based on factors including, but not limited to, availability of website and download links, number of updates and web visitors to package pages on GSR, number of applications (citations) and feedback from users of GSR. Packages that are no longer used by the research community will be phased out and eventually removed from GSR.
Fig. 1.

Illustration of the genetic simulation resources website

Illustration of the genetic simulation resources website

3 DISCUSSION

GSR provides a catalogue of genetic data simulators with detailed descriptions and list of features of each package, which make it easier for users of GSR to search and compare simulators and identify the most appropriate simulators for particular research topics. Package authors will also benefit from this service because a centralized catalogue would increase visibility of their software, and a clear list of features would help with documentation of their packages. GSR compliments existing review articles (e.g. Hoban ; Liu ) on genetic simulation programs by providing a comprehensive up-to-date list of programs, with links to web pages and searchable attributes, in a user-friendly format. GSR is still under active development. Features that will be provided in the near future include an automated revision proposal and approval process, a citation management interface to track the applications of packages and a user-feedback system. We encourage all authors of genetic data simulators to register their packages in GSR and place a link to GSR on their websites, which would turn individually hosted packages to a web of simulators that could greatly facilitate the application, development and dissemination of genetic simulators.
  7 in total

1.  Next generation analytic tools for large scale genetic epidemiology studies of complex diseases.

Authors:  Leah E Mechanic; Huann-Sheng Chen; Christopher I Amos; Nilanjan Chatterjee; Nancy J Cox; Rao L Divi; Ruzong Fan; Emily L Harris; Kevin Jacobs; Peter Kraft; Suzanne M Leal; Kimberly McAllister; Jason H Moore; Dina N Paltoo; Michael A Province; Erin M Ramos; Marylyn D Ritchie; Kathryn Roeder; Daniel J Schaid; Matthew Stephens; Duncan C Thomas; Clarice R Weinberg; John S Witte; Shunpu Zhang; Sebastian Zöllner; Eric J Feuer; Elizabeth M Gillanders
Journal:  Genet Epidemiol       Date:  2011-12-06       Impact factor: 2.135

Review 2.  Computer simulations: tools for population and evolutionary genetics.

Authors:  Sean Hoban; Giorgio Bertorelle; Oscar E Gaggiotti
Journal:  Nat Rev Genet       Date:  2012-01-10       Impact factor: 53.242

3.  Simulations provide support for the common disease-common variant hypothesis.

Authors:  Bo Peng; Marek Kimmel
Journal:  Genetics       Date:  2006-12-06       Impact factor: 4.562

4.  Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure.

Authors:  Benjamin M Peter; Daniel Wegmann; Laurent Excoffier
Journal:  Mol Ecol       Date:  2010-08-23       Impact factor: 6.185

5.  ALF--a simulation framework for genome evolution.

Authors:  Daniel A Dalquen; Maria Anisimova; Gaston H Gonnet; Christophe Dessimoz
Journal:  Mol Biol Evol       Date:  2011-12-08       Impact factor: 16.240

Review 6.  A survey of genetic simulation software for population and epidemiological studies.

Authors:  Youfang Liu; Georgios Athanasiadis; Michael E Weale
Journal:  Hum Genomics       Date:  2008-09       Impact factor: 4.639

7.  Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip.

Authors:  Chris C A Spencer; Zhan Su; Peter Donnelly; Jonathan Marchini
Journal:  PLoS Genet       Date:  2009-05-15       Impact factor: 5.917

  7 in total
  14 in total

1.  Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report.

Authors:  Carolyn M Hutter; Leah E Mechanic; Nilanjan Chatterjee; Peter Kraft; Elizabeth M Gillanders
Journal:  Genet Epidemiol       Date:  2013-10-05       Impact factor: 2.135

2.  skelesim: an extensible, general framework for population genetic simulation in R.

Authors:  Christian M Parobek; Frederick I Archer; Michelle E DePrenger-Levin; Sean M Hoban; Libby Liggins; Allan E Strand
Journal:  Mol Ecol Resour       Date:  2016-11-16       Impact factor: 7.090

3.  Genetic Simulation Resources and the GSR Certification Program.

Authors:  Bo Peng; Man Chong Leong; Huann-Sheng Chen; Melissa Rotunno; Katy R Brignole; John Clarke; Leah E Mechanic
Journal:  Bioinformatics       Date:  2019-02-15       Impact factor: 6.937

4.  Genetic simulation tools for post-genome wide association studies of complex diseases.

Authors:  Huann-Sheng Chen; Carolyn M Hutter; Leah E Mechanic; Elizabeth M Gillanders; Eric J Feuer; Christopher I Amos; Vineet Bafna; Elizabeth R Hauser; Ryan D Hernandez; Chun Li; David A Liberles; Kimberly McAllister; Jason H Moore; Dina N Paltoo; George J Papanicolaou; Bo Peng; Marylyn D Ritchie; Gabriel Rosenfeld; John S Witte
Journal:  Genet Epidemiol       Date:  2014-11-04       Impact factor: 2.135

5.  Genetic data simulators and their applications: an overview.

Authors:  Bo Peng; Huann-Sheng Chen; Leah E Mechanic; Ben Racine; John Clarke; Elizabeth Gillanders; Eric J Feuer
Journal:  Genet Epidemiol       Date:  2014-12-13       Impact factor: 2.135

6.  Metapop: An individual-based model for simulating the evolution of tree populations in spatially and temporally heterogeneous landscapes.

Authors:  Jean-Paul Soularue; Armel Thöni; Léo Arnoux; Valérie Le Corre; Antoine Kremer
Journal:  Mol Ecol Resour       Date:  2018-12-04       Impact factor: 7.090

7.  Clotho: addressing the scalability of forward time population genetic simulation.

Authors:  Patrick P Putnam; Philip A Wilsey; Ge Zhang
Journal:  BMC Bioinformatics       Date:  2015-06-10       Impact factor: 3.169

8.  SimBA: simulation algorithm to fit extant-population distributions.

Authors:  Laxmi Parida; Niina Haiminen
Journal:  BMC Bioinformatics       Date:  2015-03-14       Impact factor: 3.169

9.  GESDB: a platform of simulation resources for genetic epidemiology studies.

Authors:  Po-Ju Yao; Ren-Hua Chung
Journal:  Database (Oxford)       Date:  2016-05-30       Impact factor: 3.451

10.  Evaluation of breeding strategies for polledness in dairy cattle using a newly developed simulation framework for quantitative and Mendelian traits.

Authors:  Carsten Scheper; Monika Wensch-Dorendorf; Tong Yin; Holger Dressel; Herrmann Swalve; Sven König
Journal:  Genet Sel Evol       Date:  2016-06-29       Impact factor: 4.297

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.