| Literature DB >> 23782512 |
Ren-Hua Chung1, Chung-Chin Shih.
Abstract
BACKGROUND: Association studies based on next-generation sequencing (NGS) technology have become popular, and statistical association tests for NGS data have been developed rapidly. A flexible tool for simulating sequence data in either unrelated case-control or family samples with different disease and quantitative trait models would be useful for evaluating the statistical power for planning a study design and for comparing power among statistical methods based on NGS data.Entities:
Mesh:
Year: 2013 PMID: 23782512 PMCID: PMC3693898 DOI: 10.1186/1471-2105-14-199
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Family structure simulated in SeqSIMLA.
Run time of SeqSIMLA for generating family and case–control data
| Prevalence1 | 0.11 | 0.012 | 0.0014 | 0.11 | 0.012 | 0.0014 |
| Family2 | 4.88 s | 6.09 s | 16.6 s | 3.85 m | 4.06 m | 7.58 m |
| (94%)7 | (76%) | (30%) | (82%)8 | (76%) | (46%) | |
| Family (T)3 | 4.58 s | 5.33 s | 7.52 s | 3.35 m | 3.45 m | 4.51 m |
| (98%) | (91%) | (57%) | (87%) | (84%) | (67%) | |
| Case-control4 | 3.17 s | 4.86 s | 26.35 s | 3.10 m | 3.75 m | 11.01 m |
| (64%) | (33%) | (6%) | (34%) | (29%) | (11%) | |
| Case–control (T)5 | 2.02 s | 2.57 s | 7.56 s | 1.96 m | 2.76 m | 3.68 m |
| (88%) | (74%) | (20%) | (34%) | (37%) | (22%) | |
| Case–control (F)6 | 2.42 s | 2.44 s | 2.27 s | 1.95 m | 1.87 m | 1.87 m |
| (91%) | (91%) | (92%) | (58%) | (58%) | (58%) | |
1The estimated prevalence based on 10,000 prospective cohorts generated under Model 2.
2500 families generated with 1 thread.
3500 families generated with 12 threads.
41000 cases and 1000 controls generated with 1 thread.
51000 cases and 1000 controls generated with 12 threads.
61000 cases and 1000 controls generated with the conditional probability of multilocus genotypes given the disease status.
7Run time (seconds) and the percentage of run time spent on I/O.
8Run time (minutes) and the percentage of run time spent on I/O.
Power and type I error rates for linkage and association tests
| REGION1 | MERLIN | 0.833 |
| REGION2 | MERLIN | 0.806 |
| REGION3 | SKAT | 0.948 |
| REGION4 | SKAT | 0.040 |
1The proportion of tests being rejected at the 0.05 significance level over 1000 replicates.
Comparisons of features between SeqSIMLA and SIMRARE
| Language | C++/Java | Python/C++ |
| Sequence simulator | GENOME (default) or a sequence pool generated by a simulator | srv in SimuPOP |
| Simulated region | Multiple genes on multiple chromosomes | A gene or multiple independent genes |
| Simulated data type | Families/unrelated cases and controls | Unrelated cases and controls |
| Recombination rate | Variable/Fixed | Fixed |
| Number of disease/trait models | 3 | 6 |
| User interface | Command line | Graphical |