| Literature DB >> 16803631 |
Abstract
BACKGROUND: With the recent advances in high-throughput genotyping technologies that allow for large-scale association mapping of human complex traits, promising statistical designs and methods have been emerging. Efficient simulation software are key elements for the evaluation of the properties of new statistical tests. SLINK is a flexible simulation tool that has been widely used to generate the segregation and recombination processes of markers linked to, and possibly associated with, a trait locus, conditional on trait values in arbitrary pedigrees. In practice, its most serious limitation is the small number of loci that can be simulated, since the complexity of the algorithm scales exponentially with this number.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16803631 PMCID: PMC1524809 DOI: 10.1186/1471-2156-7-40
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1A two step implementation. (A) First step: simulating genotypes at the trait locus (with two alleles, D and d, the latter being the high risk allele) conditional on the observed trait values, and genotypes at a perfectly informative marker (the descent marker) at the same genetic position as the trait locus. Genotypes are phased. Individuals in black are affected; individuals in white are unaffected. (B) Second step: simulating haplotypes in the founders, allowing for the possibility for markers to be in LD with the trait locus (here warmer colors are associated with high risk alleles) followed by segregation of the chromosomes according to the descent marker and recombination on both sides of the trait locus (at the position of the descent marker) allowing for recombination to occur under sex-specific maps.
Characteristics of three software that simulate marker data under non-null genetic models.
| SLINK/SUP | SIMLA | ALLEGRO | |
| Simulates on pedigrees as they have been collected | Yes | No | Yes |
| Simulates marker data conditional on | |||
| 1) observed affection status | Yes | Yes | Yes |
| 2) observed affection status and observed exposure/liability class | Yes | No | No |
| 3) observed quantitative trait values | Yes | No | No |
| Simulates LD between | |||
| 1) marker loci | Yes | Yes | No |
| 2) marker and trait loci | Yes | Yes | No |
| Simulates values for | |||
| 1) affection status | Yes | Yes | No |
| 2) environmental exposure | No | Yes | No |
| 3) quantitative trait | Yes | Yes | No |
| 4) covariates | No | Yes | No |
| Simulates multi-locus susceptibility between | |||
| 1) unlinked loci | Yes (indirectly) | Yes | No |
| 2) linked loci | No/Future | Yes | No |
| Simulates under sex-specific maps | Yes | No | Yes |
| Simulates X-linked genetic data | Yes/Future | No | Yes |
| Simulates upon pedigrees with loops | Few | No | Yes |
| Pedigree restrictions | 16 founders | 4 founders | < 31 bits† |
Some features are planned future extensions of SUP. †The number of bits is defined as twice the number of non-founders minus the number of founders.
Figure 2One large pedigree and three sub-pedigrees used for simulations. Pedigrees are named after the number of founders they contain. Individuals in black are affected; individuals in white are of unknown disease status.
Haplotypes and their frequencies for simulations and validation. When simulating under a non-null genetic model, the high-risk allele lies exclusively on the second haplotype. For validation purposes, haplotype frequencies are compared between software in samples of 1000 affected sib-pair families, averaged over 25 runs, under a non-null genetic model.
| Marker haplotype | Population frequency | Estimated frequency of haplotype in LD with trait locus | |
| SLINK/SUP | SIMLA | ||
| 1-1-1-1-1-1 | 0.5 | 0.463 | 0.461 |
| 1-1-1-1-1-2 | 0.25 | 0.305 | 0.308 |
| 1-1-1-1-2-2 | 0.125 | 0.115 | 0.116 |
| 1-1-1-2-2-2 | 0.0625 | 0.0592 | 0.0578 |
| 1-1-2-2-2-2 | 0.0312 | 0.0293 | 0.0292 |
| 1-2-2-2-2-2 | 0.0156 | 0.0143 | 0.0143 |
| 2-1-1-1-1-1 | 0.0078 | 0.00764 | 0.00716 |
| 2-2-1-1-1-1 | 0.0040 | 0.00352 | 0.00376 |
| 2-2-2-1-1-1 | 0.0020 | 0.00188 | 0.00208 |
| 2-2-2-2-1-1 | 0.0010 | 0.00048 | 0.00060 |
| 2-2-2-2-2-1 | 0.0005 | 0.00016 | 0.00008 |
| 2-2-2-2-2-2 | 0.0004 | 0.00001 | 0.00004 |
Time in seconds to generate a sample of families of identical structure. The pedigree structures are named after the number of founders (see Figure 2). Markers and trait locus are taken to be in linkage equilibrium (LE) or disequilibrium (LD). A dash indicates that the software is unable to simulate upon the pedigree structure or the model. K is the prevalence of the disease.
| Number of markers | Pedigree structure (sample size) | FastSLINK/SUP | SIMLA | ALLEGRO | |
| 2 × 6 LE | 2F (1000) | 1.4 | 1.1 | 1.4 | 0.4 |
| 4F (1000) | 10.9 | 25.4 | 196.7 | 1.1 | |
| 8F (100) | 12.3 | - | - | 0.8 | |
| 16F (10) | 27.3 | - | - | - | |
| 2 × 6000 LE | 2F (1000) | 64.5 | 160.8 | 189.8 | 51.5 |
| 4F (1000) | 163.2 | 2494.0 | 18403.1 | 125.5 | |
| 8F (100) | 44.8 | - | - | 28.4 | |
| 16F (10) | 33.6 | - | - | - | |
| 2 × 6 LD | 2F (1000) | 1.3 | 1.1 | 1.3 | - |
| 4F (1000) | 11.0 | 26.1 | 197.7 | - | |
| 8F (100) | 11.8 | - | - | - | |
| 16F (10) | 27.1 | - | - | - | |
| 2 × 6 LD multi-locus susceptibility | 2F (1000) | 7.5 | 1.1 | 1.4 | - |
| 4F (1000) | 50.0 | 25.3 | 197.1 | - | |
| 8F (100) | 25.4 | - | - | - | |
| 16F (10) | 53.5 | - | - | - | |