| Literature DB >> 16004730 |
Abstract
This paper reviews the theoretical basis for single nucleotide polymorphism (SNP) tagging and considers the use of current software made freely available for this task. A distinction between haplotype block-based and non-block-based approaches yields two classes of procedures. Analysis of two different sets of SNP genotype data from the HapMap is used to judge the practical aspects of using each of the programs considered, as well as to make some general observations about the performance of the programs in finding optimal sets of tagging SNPs. Pairwise R2 methods, while the simplest of those considered, do tend to pick more tagging SNPs than are strictly needed to predict unmeasured (non-tagging) SNPs, since a combination of two or more tagging SNPs can form a prediction of SNPs that have no direct (pairwise) surrogate. Block-based methods that exploit the linkage disequilibrium structure within haplotype blocks exploit this sort of redundancy, but run a risk of over-fitting if used without some care. A compromise approach which eliminates the need first to analyse block structure, but which still exploits simple relationships between SNPs, appears promising.Entities:
Mesh:
Year: 2005 PMID: 16004730 PMCID: PMC3525260 DOI: 10.1186/1479-7364-2-2-144
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Features of the tag single nucleotide polymorphism (SNP) programs evaluated
| Program | Block-based? | Estimates blocks itself? | Reads HapMap data? | PLEM incorporates trio data? | Allows 'force in' of special SNPs | Platforms/require software | |
|---|---|---|---|---|---|---|---|
| Haploview | YES | YES | YES | YES | Unknown | No | Java |
| Hapblock | YES | YES | NO | YES | Haplotype | Yes | Compiled C program for Win/DOS Unix and Linux |
| LDSelect | NO | N/A | NO | N/A | Pairwise | YES | Perl |
| htSNP | YES | NO | NO | NO | Allelic | NO | Stata + compiled C program (SNPHAP) |
| TagIT | YES | NO | NO | YESb | YES | Matlab | |
| Tagger | NO | N/A | YES | YES | Pairwise | YES | Web application |
| tagSNPs | YES | NO | YES | YES | Haplotype | YES | Compiled Fortran program for WIN/DOS Unix and Linux |
a See text for definition of notation.
b Performs a standard EM rather than PLEM algorithm (see text for more information).
Tag single nucleotide polymorphisms (SNPs) selected for TGFBR1
| Program | Block definition | Tag criteria | Tag SNPs in block 1, as defined by Haploview | Tag SNPs in block 2, as defined by Haploview | htSNPs not in blocks, as defined by Haploview |
|---|---|---|---|---|---|
| LDSelect | N/A | Pairwise | 1, 2 | 5, 6, 7, 13 | 4, 15 |
| Haploview | Gabriel defaulta | Not stated | 1, 2 | 5, 6, 7, 13 | N/A |
| htSNP | Used results of Haploview | 1, 2 | 5, 7, 8, 13 | N/A | |
| TagIT | Used results of Haploview | 1, 2 | 5, 6, 13, 14 | N/A | |
| tagSNPs | Used results of Haploview | 1, 2 | 5, 6, 13 | N/A | |
| 1, 2 | 5, 7, 8, 13 | N/A | |||
| Hapblock | Empirical LD optionb | 1, 2 | 5, 9, 13 | 4 | |
| 1, 2 | 5, 7, 13 | 4 | |||
| Pairwise | 1, 2 | 5, 6, 7, 13 | 4, 15 | ||
| Tagger | N/A | Pairwise | 2, 3 | 5, 6, 10, 13 | 4, 15 |
| Restricted | 2, 3 | 5, 10, 13 | 4, 15 |
a See reference 2.
b Hapblock found two blocks (SNPs 1-3, and SNPs 4-15).
Tag single nucleotide polymorphisms (SNPs) when all SNPs in TGFBR1 are treated as being in one block
| Program | Tag criteria | htSNPs |
|---|---|---|
| LDSelect | Pairwise | 1, 2, 4, 5, 6, 7, 13, 15 |
| Tagger | Pairwise | 2, 3, 5, 6, 10, 13, 15 |
| Restricted | 2, 3, 4, 5, 10, 13, 15 | |
| Haploview | Not stated | 1, 2, 4, 5, 6, 13 |
| htSNP | 2, 3, 5, 10, 13, 14, 15 | |
| TagIT | Failed | |
| tagSNPs | Criterion unreachable | |
| 1, 2, 5, 6, 13, 14, 15 | ||
| Hapblock | Criterion unreachable | |
| Entropy | 1, 2, 4, 5, 6, 7, 13, 15 | |
| Pairwise | 1, 2, 4, 5, 6, 7, 13, 15 |