| Literature DB >> 16146575 |
Abstract
BACKGROUND: The allele frequencies of single-nucleotide polymorphisms (SNPs) are needed to select an optimal subset of common SNPs for use in association studies. Sequence-based methods for finding SNPs with allele frequencies may need to handle thousands of sequences from the same genome location (sequences of deep coverage).Entities:
Mesh:
Year: 2005 PMID: 16146575 PMCID: PMC1239908 DOI: 10.1186/1471-2105-6-220
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Results by PolyFreq and PolyBayes on eighteen data sets of EST sequences
| Data set | Size | PolyBayes (trimmed) | PolyFreq (full length) | PolyBayes (full length) | |||||||||
| TP | FP | FN | NSU | TP | FP | FN | NSU | TP | FP | FN | NSU | ||
| Hs.119589 | 4403 | 12 | 170 | 50 | 1491 | 5 | 24 | 57 | 4391 | 12 | 152 | 50 | 1531 |
| Hs.129673 | 1665 | 7 | 48 | 11 | 1457 | 5 | 11 | 13 | 1662 | T/A | |||
| Hs.148340 | 1603 | 3 | 36 | 9 | 1563 | 4 | 9 | 8 | 1583 | 3 | 67 | 9 | 1560 |
| Hs.170622 | 1514 | 1 | 37 | 12 | 429 | 3 | 18 | 10 | 1507 | 2 | 73 | 11 | 365 |
| Hs.178551 | 1685 | 5 | 62 | 12 | 1632 | 6 | 14 | 11 | 1676 | T/A | |||
| Hs.180909 | 1017 | 3 | 50 | 6 | 983 | 3 | 20 | 6 | 1012 | 3 | 164 | 6 | 996 |
| Hs.187199 | 2041 | T/A | N/A | 1997 | T/A | ||||||||
| Hs.198281 | 3156 | 9 | 110 | 22 | 3077 | 15 | 54 | 16 | 3149 | T/A | |||
| Hs.350927 | 1017 | 5 | 42 | 11 | 976 | 9 | 21 | 7 | 1015 | 6 | 139 | 10 | 995 |
| Hs.356331 | 1441 | 2 | 55 | 9 | 318 | 1 | 14 | 10 | 1436 | 2 | 85 | 9 | 239 |
| Hs.356572 | 2822 | 0 | 46 | 2 | 2534 | 0 | 17 | 2 | 2821 | T/A | |||
| Hs.439552 | 7163 | T/A | N/A | 6873 | T/A | ||||||||
| Hs.444467 | 4033 | N/A | 805 | N/A | 4028 | N/A | 679 | ||||||
| Hs.446628 | 1490 | 4 | 32 | 12 | 1338 | 5 | 12 | 11 | 1486 | T/A | |||
| Hs.520640 | 4120 | T/A | 9 | 52 | 27 | 4099 | T/A | ||||||
| Hs.522463 | 8294 | T/A | N/A | 8280 | T/A | ||||||||
| Hs.524390 | 4462 | T/A | 9 | 48 | 24 | 4454 | T/A | ||||||
| Hs.544577 | 7537 | 14 | 60 | 43 | 1716 | 10 | 17 | 47 | 7517 | 18 | 175 | 39 | 1556 |
The mark N/A means that no SNP from dbSNP was mapped to the anchor sequence because of lack of a RefSeq sequence. The mark T/A means that PolyBayes terminated abnormally without producing any output file. A candidate SNP from the program is considered as true positive (TP) if it is in dbSNP or false positive (FP) otherwise. A SNP from dbSNP that occurs in the data set is considered as false negative (FN) if it is not reported as a candidate SNP from the program. The number of sequences used (NSU) by the program in generation of candidate SNPs is reported.
Figure 1Acceptable and unacceptable substitutions in a pairwise alignment and a candidate SNP from a real data set. (A) The line shows an alignment of query and anchor sequences with thick parts indicating highly similar regions. The large rectangular box gives a detailed view of the small box. On the left is an unacceptable substitution that is flanked by a block of low-quality bases, and on the right is an acceptable substitution that is flanked by a perfect block on each side. The quality value of each query base in the large box is shown next to the base. (B) Shown is a candidate SNP with allele frequencies from PolyFreq on a real data set (Hs. 119589) in Table 1.