| Literature DB >> 17196097 |
Daekwan Seo1, Cizhong Jiang, Zhongming Zhao.
Abstract
BACKGROUND: The local environment of single nucleotide polymorphisms (SNPs) contains abundant genetic information for the study of mechanisms of mutation, genome evolution, and causes of diseases. Recent studies revealed that neighboring-nucleotide biases on SNPs were strong and the genome-wide bias patterns could be represented by a small subset of the total SNPs. It remains unsolved for the estimation of the effective SNP size, the number of SNPs that are sufficient to represent the bias patterns observed from the whole SNP data.Entities:
Mesh:
Year: 2006 PMID: 17196097 PMCID: PMC1769377 DOI: 10.1186/1471-2164-7-329
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Flowchart of the SNPKS. This figure illustrates the integrated two-step procedures in the SNPKS method. KS test: Kolmogorov-Smirnov test; C.I.: confidence interval; N: intermediate effective SNP size; N: effective SNP size.
Figure 2Annotation of a SNP and its flanking sites. SNPKS uses ten sites immediately adjacent to the polymorphic site (A/G) at the 5' side and 3' side. A minus sign indicates the flanking site of the 5' side and a positive sign indicates the 3' side.
Estimation of the effective SNP size
| Total # of test SNPs | Effective SNP size ( | 95% C.I. | |
| Human | 5,200,425 | 38,200 | 35,700 – 40,700 |
| Chimpanzee | 1,470,501 | 39,300 | 37,200 – 41,400 |
| Dog | 2,690,084 | 38,000 | 35,700 – 40,300 |
| Mouse (Build 126) | 7,832,159 | 38,700 | 36,400 – 41,000 |
| Mouse (Build 123) | 376,146 | 39,100 | 36,800 – 41,400 |
| Human HapMap phase I | 861,498 | 38,400 | 35,800 – 41,000 |
| Human HapMap phase II | 2,435,362 | 39,100 | 36,900 – 41,300 |
| Human intergenic regiona | 2,422,730 | 39,100 | 36,800 – 41,400 |
| Human genea | 744,987 | 39,600 | 37,400 – 41,800 |
| Human introna | 889,956 | 39,200 | 37,000 – 41,400 |
| Human CpG islanda | 95,561 | 42,200 | 39,500 – 44,900 |
aThe average nucleotide frequencies in the genomic regions were used.
Figure 3Neighboring-nucleotide bias patterns for dog SNPs and interval evaluation. The color lines show the neighboring-nucleotide biases relative to the dog genome sequence average using 2,690,084 dog SNPs. For 30 random sample sets with size 38,000, we obtained their average bias () and standard deviation (s) for each nucleotide at each position. The vertical bars represents the interval ± s. The figure shows that the intervals at all positions cover the corresponding biases observed from the whole dog SNPs. On the x axis, a minus sign indicates the 5' side and a positive sign indicates the 3' side of the SNPs.
Performance comparison with SNPNB
| SNP data | Processa | SNPNB [8] | SNPKS | ||
| 1 round | 5 rounds | 10 rounds | |||
| Human | Preprocessing data | 2 h 50 m 25 s | 2 h 24 m 49 s | ||
| Estimation of | 24 h 56 m 1 s | 82 h 48 m 1 s | 147 h 39 m 40 s | 5 h 7 m 35 s | |
| Total elapsed time | 27 h 46 m 26 s | 85 h 38 m 26 s | 151 h 8 m 56 s | 7 h 32 m 24 s | |
| Mouse (Build 123) | Preprocessing data | 0 h 2 m 51 s | 0 h 2 m 52 s | ||
| Estimation of | 7 h 27 m 55 s | 37 h 53 m 53 s | 75 h 18 m 28 s | 2 h 4 m 50 s | |
| Total elapsed time | 7 h 30 m 46 s | 37 h 56 m 44 s | 75 h 21 m 19 s | 2 h 7 m 42 s | |
aThe tests were performed in a Linux workstation (CPU 2 × 3.0 GHz, memory 4 GB).
Estimation of the effective SNP size in random subsets of human SNPs
| Sample size | Effective SNP size ( | 95% C.I. |
| 1.0 × 106 | 37,000 | 34,700 – 39,300 |
| 1.5 × 106 | 39,600 | 37,200 – 42,000 |
| 2.0 × 106 | 38,600 | 36,000 – 41,200 |
| 2.5 × 106 | 38,100 | 35,800 – 40,400 |
| 3.0 × 106 | 36,200 | 33,700 – 38,700 |
| 3.5 × 106 | 38,600 | 36,500 – 40,700 |
| 4.0 × 106 | 37,300 | 35,000 – 39,600 |
| 4.5 × 106 | 39,200 | 36,900 – 41,500 |
| 5.0 × 106 | 39,800 | 37,500 – 42,100 |
The effective SNP sizes of these 9 subsets are not statistically significantly different (ANOVA, P = 0.40).