| Literature DB >> 19674453 |
Andreia J Amaral1, Hendrik-Jan Megens, Hindrik H D Kerstens, Henri C M Heuven, Bert Dibbits, Richard P M A Crooijmans, Johan T den Dunnen, Martien A M Groenen.
Abstract
BACKGROUND: Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale.Entities:
Mesh:
Year: 2009 PMID: 19674453 PMCID: PMC2739861 DOI: 10.1186/1471-2164-10-374
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequence production and filtering for the three strategies used to identify SNPs.
| Total sequences after filtering | 45,498,558 | 41,610,684 | 34,061,918 |
| Total number of SNPs | 16,768 | 17,047 | 17,489 |
| Mapping qualitya | 61.76 (0.027) | 61.78 (0.027) | 62.02 (0.0237) |
| Consensus qualitya | 59.04 (0.257) | 60.23 (0.259) | 63.30 (0.263) |
| Target coveragea | 29.37 (0.164) | 29.19 (0.163) | 28.60 (0.155) |
| MAFa, b | 0.36 (0.0007) | 0.36 (0.0007) | 0.36 (0.0007) |
a Mean (s.e.)
b MAF, minor allele frequency.
Figure 1Maximum mapping quality (MMQ) (mapping quality of the best mapped sequence of a cluster) on an SNP position versus target coverage. Box plots show the data distribution for each parameter. Red dots show MMQ values for the best mapped sequence on an SNP position versus target coverage. The black solid line shows the smooth-fit line.
Figure 2Venn diagram showing the number of identical SNPs between the analyzed data sets with different levels of sequence quality.
Figure 3Number of identified SNPs per position in a short read for Data 20.
Summary of in silico digest of reference genome and analysis of consensus sequences.
| 1 | 1,942,512 | 2,167,060 | 8,738.15 | 204.29 | 0.0013 | 0.00007 | 31.72 | 0.11 |
| 2 | 564,894 | 704,052 | 8,912.05 | 430.80 | 0.0017 | 0.00015 | 31.82 | 0.21 |
| 3 | 396,990 | 510,151 | 7,971.11 | 337.50 | 0.0013 | 0.00011 | 32.60 | 0.22 |
| 4 | 931,194 | 988,690 | 7,724.14 | 286.62 | 0.0012 | 0.00009 | 32.26 | 0.16 |
| 5 | 541,200 | 589,399 | 7,805.12 | 417.18 | 0.0019 | 0.00012 | 32.29 | 0.24 |
| 6 | 263,538 | 354,332 | 7,874.04 | 508.06 | 0.0015 | 0.00013 | 32.65 | 0.24 |
| 7 | 263,538 | 831,673 | 6,253.18 | 283.95 | 0.0015 | 0.00010 | 32.62 | 0.17 |
| 8 | 590,700 | 732,338 | 10,930.42 | 338.24 | 0.0011 | 0.00013 | 31.36 | 0.22 |
| 9 | 582,384 | 763,287 | 9,541.09 | 406.33 | 0.0014 | 0.00014 | 32.06 | 0.18 |
| 10 | 292,842 | 367,327 | 8,959.20 | 380.01 | 0.0021 | 0.00019 | 32.44 | 0.21 |
| 11 | 527,274 | 584,831 | 9,137.98 | 370.76 | 0.0015 | 0.00013 | 31.85 | 0.30 |
| 12 | 135,894 | 174,581 | 6,020.03 | 462.41 | 0.0018 | 0.00015 | 33.71 | 0.36 |
| 13 | 924,396 | 1,177,791 | 9,897.40 | 293.47 | 0.0014 | 0.00009 | 31.64 | 0.14 |
| 14 | 874,500 | 952,048 | 6,476.52 | 265.24 | 0.0014 | 0.00008 | 32.83 | 0.17 |
| 15 | 822,822 | 974,572 | 10,185.12 | 326.63 | 0.0010 | 0.00009 | 31.47 | 0.14 |
| 16 | 402,270 | 500,390 | 10,007.80 | 481.24 | 0.0016 | 0.00013 | 31.97 | 0.26 |
| 17 | 280,434 | 303,111 | 5,511.11 | 367.98 | 0.0015 | 0.00011 | 33.22 | 0.24 |
| 18 | 256,806 | 314,098 | 9,518.12 | 593.62 | 0.0007 | 0.00010 | 32,45 | 0.29 |
| X | 495,726 | 386,932 | 5,300.44 | 181.11 | 0.0005 | 0.00009 | 31.77 | 0.20 |
Figure 4SNP map of each chromosome based on Data 20. The colored vertical lines represent the location of each SNP.
Figure 5Sequence coverage, nucleotide diversity, and SNP occurrence along chromosome 1. Each bar represents a window of 1 Mb. Red bars show the length of the aligned consensus sequence, blue bars show the estimated level of nucleotide diversity, and green bars show the number of SNPs found in each window. The red triangle designates the position of the centromere. The blue triangle designates a position where nucleotide diversity is high where coverage is low.
Percentage of monomorphic SNPs and average minor allele frequencies (MAF) by breed for 3,142 SNPs.
| Duroc | 82 | 34 | 0.13 | 28 | 0.16 | 17 | 0.17 | 29 | 0.13 |
| Large White | 136 | 13 | 0.23 | 7 | 0.24 | 8 | 0.28 | 5 | 0.24 |
| Landrace | 80 | 12 | 0.22 | 12 | 0.20 | 17 | 0.20 | 11 | 0.21 |
| Pietrain | 90 | 16 | 0.19 | 12 | 0.22 | 10 | 0.26 | 12 | 0.22 |
| Berkshire | 67 | 32 | 0.14 | 31 | 0.14 | 21 | 0.18 | 28 | 0.14 |
| Hampshire | 59 | 28 | 0.15 | 28 | 0.16 | 23 | 0.17 | 27 | 0.16 |
| Wild boar | 20 | 34 | 0.17 | 25 | 0.19 | 17 | 0.20 | 23 | 0.18 |
| PW | 6 | 13 | 0.27 | 6 | 0.33 | 6 | 0.37 | 4 | 0.33 |
*SNPs identified in Data 12, Data 15, and Data 20.