| Literature DB >> 17393851 |
Jingwu He1, Alexander Zelikovsky.
Abstract
The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs, i.e., tag SNPs, accurately representing the rest of the SNPs. Tag SNP selection can achieve: 1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or 2) necessary reduction of the huge SNP sets (obtained, e.g., from Affymetrix) for further fine haplotype analysis. In this paper, we show that the tag SNP selection strongly depends on how the chosen tags will be used-advantage of one tag set over another can only be considered with respect to a certain prediction method. We show how to separate tag selection from SNP prediction and propose greedy and local-minimization algorithms for tag SNP selection. We give two novel approaches to SNP prediction based on multiple linear regression (MLR) and support vector machines (SVMs). An extensive experimental study on various datasets including ten regions from hapMap project shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin et al. The MLR-based method also uses on average 30% fewer tags than IdSelect for statistical covering all SNPs. The tag selection based on SVM SNP prediction uses fewer tags to achieve the same prediction accuracy as the methods of Halldorsson et al.Mesh:
Year: 2007 PMID: 17393851 DOI: 10.1109/tnb.2007.891901
Source DB: PubMed Journal: IEEE Trans Nanobioscience ISSN: 1536-1241 Impact factor: 2.935