| Literature DB >> 27242038 |
Abstract
Computer simulations are routinely conducted to evaluate new statistical methods, to compare the properties among different methods, and to mimic the observed data in genetic epidemiology studies. Conducting simulation studies can become a complicated task as several challenges can occur, such as the selection of an appropriate simulation tool and the specification of parameters in the simulation model. Although abundant simulated data have been generated for human genetic research, currently there is no public database designed specifically as a repository for these simulated data. With the lack of such a database, for similar studies, similar simulations may have been repeated, which resulted in redundant work. Thus, we created an online platform, the Genetic Epidemiology Simulation Database (GESDB), for simulation data sharing and discussion of simulation techniques for genetic epidemiology studies. GESDB consists of a database for storing simulation scripts, simulated data and documentation from published articles as well as a discussion forum, which provides a platform for discussion of the simulated data and exchanging simulation ideas. Moreover, summary statistics such as the simulation tools that are most commonly used and datasets that are most frequently downloaded are provided. The statistics will be informative for researchers to choose an appropriate simulation tool or select a common dataset for method comparisons. GESDB can be accessed at http://gesdb.nhri.org.twDatabase URL: http://gesdb.nhri.org.tw.Entities:
Mesh:
Year: 2016 PMID: 27242038 PMCID: PMC4885602 DOI: 10.1093/database/baw082
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.The hardware architecture of GESDB.
Figure 2.Flowchart for accessing GESDB.
Information form for the author
| Entry | Example | Description |
|---|---|---|
| Journal name | The name of the journal where the article is published. Fill in ‘Under review’ for unpublished articles. | |
| Year | 2011 | The year when the article was published. Fill in the current year for unpublished articles. |
| Article title | Rare-variant association testing for sequencing data with the sequence kernel association test | The title of the article. |
| Author | Michael C. Wu, Seunggeun Lee, Tianxi Cai, Michael Boehnke, Xihong Lin | List of author names in the article |
| Keywords | NA | Keywords in the article |
| Simulated data type | Sequence | Genotype or sequence |
| Simulation tool name | SeqSIMLA2 | Name(s) of simulation tools used to generate the data |
| Certification | NA | Certification for the simulation tool, such as GSR certification |
| Sample type | Case-control | Random or independent; sibpairs, trios and nuclear families; extended or complete pedigrees; case-control; longitudinal |
| Trait type | Multiple | Binary or qualitative; quantitative; multiple |
| Determinants of the trait | Multiple genetic markers | Single genetic marker; multiple genetic markers; sex-linked; gene-gene interaction; environmental factors; gene-environment interaction |
| Brief description of the uploaded data | We followed the descriptions in ‘Numerical Experiments and Simulation’ in the SKAT article (Wu | A brief description of the uploaded data |
Note that the simulated data used in the original article were generated with the tool developed by the article authors. The datasets on GESDB were the replicated datasets generated by our group using SeqSIMLA2 (12).
Comparison between GESDB and other public data repositories
| GESDB | Dryad | figshare | |
|---|---|---|---|
| Data type | Any files related to genetic simulations | Any | Any |
| Targeted research field | Genetic epidemiology | General | General |
| Space limit | 50 GB free space per study | $120 for the first 20 GB and $50 for each additional 10 GB | 20 GB free space |
| Statistics for each dataset | |||
| Number of views | Yes | Yes | Yes |
| Number of downloads | Yes | Yes | Yes |
| Number of votes | Yes | No | No |
| Summary statistics | |||
| Most frequently downloaded data | Yes | Yes | No |
| Most viewed data | Yes | No | Yes |
| Most voted data | Yes | No | No |
| Most frequently used tools | Yes | No | No |
| User comment | Yes | No | Yes |
| Share on social media | No | Yes | Yes |
| Discussion forum | Yes | No | No |
| Unique identifier | Yes (GESDB | Yes (DOI | Yes (DOI) |
Number of votes given by users.
Whether users can leave comments on the dataset.
The identifier is self-defined by GESDB.
Digital object identifier.