| Literature DB >> 15238162 |
Peisen Zhang1, Huitao Sheng, Ryuhei Uehara.
Abstract
BACKGROUND: In population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets.Entities:
Mesh:
Year: 2004 PMID: 15238162 PMCID: PMC476734 DOI: 10.1186/1471-2105-5-89
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
A data sample to show the algorithm.
| Haplotype1 | ACAGATG |
| Haplotype2 | ACGAATG |
| Haplotype3 | ATGGGTG |
| Haplotype4 | GTAAGTG |
| Haplotype5 | GTGGGCA |
| Haplotype6 | GTAGACA |
| Haplotype7 | ATAAGCA |
| Haplotype8 | GTGGACA |
Figure 1Classification tree search algorithm for the data in Table 1.
Figure 2Flow chart of the algorithm.
A data sample to show the second round search is needed.
| 1 | 2 | 3 | 4 | 5 | |
| A | 1 | 1 | 1 | 1 | 1 |
| B | 1 | 1 | 1 | 1 | 0 |
| C | 1 | 1 | 1 | 0 | 1 |
| D | 0 | 1 | 1 | 0 | 0 |
| E | 1 | 1 | 0 | 1 | 1 |
| F | 1 | 1 | 0 | 1 | 0 |
| G | 0 | 1 | 0 | 0 | 1 |
| H | 0 | 1 | 0 | 0 | 0 |
| I | 1 | 0 | 1 | 1 | 1 |
| J | 0 | 0 | 1 | 1 | 0 |
| K | 0 | 0 | 1 | 0 | 1 |
| L | 0 | 0 | 0 | 1 | 1 |
Figure 3A test data set was downloaded from UW-FHCRC Variation Discovery Resource (SeattleSNPs). On the top of this table are the locations of the SNPs. For example, the first SNP is located on the 31st base. The last figure on every haplotype is the frequency. For example haplotype one (hap1) has frequency 1087. By our program, 10 index SNPs were selected from left to right numbered 1, 10, 13, 14, 15, 20, 21, 27, 29, and 36. The locations are in bold type. This is a minimum index SNP set. We tried to run the Best program. After one night, we cancelled the process without any results.