| Literature DB >> 19390634 |
Lakshmi K Matukumalli1, Cynthia T Lawley, Robert D Schnabel, Jeremy F Taylor, Mark F Allan, Michael P Heaton, Jeff O'Connell, Stephen S Moore, Timothy P L Smith, Tad S Sonstegard, Curtis P Van Tassell.
Abstract
The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.Entities:
Mesh:
Year: 2009 PMID: 19390634 PMCID: PMC2669730 DOI: 10.1371/journal.pone.0005350
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of performance of SNP by source.
| SNP source | Number of SNP available for selection | Estimated conversion rate (%) | Number of SNP selected for the BovineSNP50 | Number of SNP producing calls (%) | Number confirmed SNP (%) | Average MAF |
| Draft | 235,725 | 85 | 10,244 (17.6) | 9,361 (91.4) | 9,284 (99.1) | 0.24 |
| Interbreed | 73,127 | 84 | 6,035 (10.3) | 5,493 (91.0) | 5,244 (95.5) | 0.24 |
| BAC | 36,387 | 82 | 1,526 (2.6) | 1,409 (92.3) | 1,239 (87.9) | 0.24 |
| RRL | 65,180 | 92 | 25,833 (44.3) | 23,840 (92.3) | 21,914 (91.9) | 0.25 |
| HapMap | 29,853 | 100 | 13,236 (22.7) | 12,613 (95.3) | 12,503 (99.1) | 0.26 |
| Parentage | 121 | 100 | 121 (0.2) | 116 (95.9) | 116 (100) | 0.31 |
| Various | 4399 | 100 | 1341 (2.3) | 1,169 (87.2) | 1,083 (92.6) | 0.26 |
|
| 444,792 | NA | 58,336 | 54,001 (92.6) | 51,383 (95.1) | 0.26 |
Sources of SNP are defined in Materials and Methods.
The number of SNP input to the spacing/selection algorithm.
Percent of markers detected as polymorphic in validation studies which tested from 48 to 25,125 SNP.
BovineSNP50 is the name of the developed high-density genotyping assay.
Number of markers that produced genotype calls (>90% call rate) among the 556 tested animals.
Number of SNP for which at least one animal was heterozygous among the 556 tested animals and percent of the total number of markers producing genotype calls.
Average minor allele frequency among the 556 animals in the validation panel.
SNP selected from these sources had previously been shown to be informative in some populations.
NA = not applicable.
Relationship between Illumina design scores and SNP performance.
| Design score | No. SNP submitted | No. SNP passing assay production pipeline (%) | No. high quality SNP called (%) |
| ≥0.9 | 41,715 | 40,748 (97.7) | 39,390 (96.7) |
| 0.8–0.9 | 11,855 | 11,574 (97.6) | 10,705 (92.5) |
| 0.7–0.8 | 3146 | 3061 (97.3) | 2699 (88.2) |
| 0.6–0.7 | 1216 | 1170 (96.2) | 923 (78.9) |
| 0.5–0.6 | 382 | 372 (97.4) | 262 (70.4) |
| ND* | 22 | 22 (100.0) | 22 (100.0) |
| Total | 58,336 | 56,947 (97.6) | 54,001 (92.6) |
ND*: Design scores were not determined for 5 gene coding SNP and 17 Parentage SNP failed in the design.
Figure 1The BovineSNP50 assay has a compact gap distribution ideal for genome wide association studies as compared to the Affymetrix 25 K SNP panel that has an excess of adjacent markers either too close or too far apart, leaving large sections of the genome unrepresented on the assay.
Figure 2Distribution of SNP by call rates on the BovineSNP50 assay.
The overall call rate for all markers exceeded 99.1% and more than 90% had call rates above 99.98%.
Comparison of overall SNP performance between Infinium I and Infinium II assays.
| Design Scores | Number of SNP submitted | SNP passed assay production pipeline | High quality SNP called | ||
| # SNP | % | # SNP | % | ||
|
| |||||
| 0.9–1.0 | 1689 | 1557 | 92.2 | 1493 | 88.4 |
| 0.7–0.8 | 641 | 597 | 93.1 | 551 | 86.0 |
| 0.8–0.9 | 448 | 419 | 93.5 | 391 | 87.3 |
| 0.6–0.7 | 131 | 124 | 94.7 | 107 | 81.7 |
| 0.5–0.6 | 0 | ||||
|
| |||||
| 0.9–1.0 | 40,676 | 39,826 | 97.9 | 38,444 | 94.5 |
| 0.7–0.8 | 13,810 | 13,499 | 97.8 | 12,377 | 89.6 |
| 0.8–0.9 | 10,909 | 10,666 | 97.8 | 9884 | 90.6 |
| 0.6–0.7 | 989 | 954 | 96.5 | 743 | 75.1 |
| 0.5–0.6 | 339 | 329 | 97.1 | 232 | 68.4 |
Percent success of Infinium I and Infinium II assays derived from the same SNP pools.
| Waves Compared | # Infinium II | # Infinium I | % Successful Infinium II | % Successful Infinium I |
| 7 vs 6 | 1570 | 188 | 91.34 | 82.45 |
| 3 vs 9 | 6464 | 487 | 92.95 | 87.47 |
| 11 vs 13 | 9863 | 186 | 91.68 | 86.02 |
| Total | 48,630 | 2312 |
See Table S5 for wave definition.
Minor allele frequencies.
| Breed | Avg MAF | Informative | Hetero-zygous | MAF | ||||
| 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | ||||
|
| ||||||||
| Hereford | 0.27 | 0.89 | 0.29 | 0.15 | 0.17 | 0.15 | 0.21 | 0.21 |
| Charolais | 0.26 | 0.91 | 0.31 | 0.15 | 0.20 | 0.18 | 0.20 | 0.19 |
| Holstein | 0.26 | 0.90 | 0.31 | 0.16 | 0.17 | 0.18 | 0.20 | 0.20 |
| Piedmontese | 0.26 | 0.89 | 0.31 | 0.15 | 0.18 | 0.19 | 0.17 | 0.20 |
| Norwegian Red | 0.26 | 0.88 | 0.31 | 0.13 | 0.20 | 0.17 | 0.20 | 0.18 |
| Limousin | 0.25 | 0.90 | 0.30 | 0.17 | 0.19 | 0.17 | 0.19 | 0.19 |
| Romagnola | 0.25 | 0.84 | 0.28 | 0.16 | 0.18 | 0.17 | 0.16 | 0.18 |
| Angus | 0.25 | 0.89 | 0.30 | 0.16 | 0.18 | 0.16 | 0.19 | 0.19 |
| Red Angus | 0.26 | 0.84 | 0.30 | 0.12 | 0.19 | 0.15 | 0.20 | 0.18 |
| Guernsey | 0.25 | 0.80 | 0.27 | 0.13 | 0.19 | 0.15 | 0.17 | 0.16 |
| Jersey | 0.24 | 0.78 | 0.26 | 0.17 | 0.16 | 0.14 | 0.16 | 0.15 |
| Brown Swiss | 0.25 | 0.80 | 0.27 | 0.16 | 0.16 | 0.16 | 0.15 | 0.16 |
| Simmental | 0.3 | 0.62 | 0.30 | 0.00 | 0.24 | 0.00 | 0.25 | 0.12 |
| Gelbvieh | 0.3 | 0.65 | 0.30 | 0.00 | 0.25 | 0.00 | 0.26 | 0.13 |
|
| ||||||||
| Beefmaster | 0.26 | 0.92 | 0.32 | 0.16 | 0.19 | 0.19 | 0.17 | 0.20 |
| Santa Gertrudis | 0.25 | 0.91 | 0.30 | 0.18 | 0.19 | 0.18 | 0.17 | 0.19 |
|
| ||||||||
| Sheko | 0.24 | 0.75 | 0.25 | 0.15 | 0.17 | 0.13 | 0.16 | 0.14 |
| N'dama | 0.24 | 0.64 | 0.20 | 0.14 | 0.15 | 0.11 | 0.13 | 0.11 |
|
| ||||||||
| Brahman | 0.18 | 0.76 | 0.19 | 0.28 | 0.19 | 0.11 | 0.10 | 0.08 |
| Gir | 0.19 | 0.59 | 0.16 | 0.20 | 0.14 | 0.10 | 0.07 | 0.08 |
| Nelore | 0.19 | 0.59 | 0.15 | 0.21 | 0.13 | 0.09 | 0.08 | 0.07 |
Average MAF calculated across all loci including the monomorphic SNP within a given breed,.
The fraction of informative SNP with MAF≥0.01.
The fraction of heterozygous SNP averaged across all animals within a breed.
Figure 3Average MAF by SNP source (see Methods) demonstrates the utility of the assay in taurine, composite, African and indicine cattle.
Figure 4Distribution of SNP minor allele frequency by SNP source.
Figure 5Distribution of SNP MAF by group.
A – waves 1, 4, 5 and 6 with High MAF, B – waves 2 and 7 with Low MAF, C – waves 3, 8, 9 and 10 with no MAF available, and D – waves 11, 12, 13 and 14 comprising Draft SNP. (Note: Trend lines drawn only for better illustration).
Figure 6Genome-wide association analyses for (a) coat color based on Fisher's exact test applied to allele frequencies and (b) the POLL locus genotypes based upon a likelihood ratio test for the extent of linkage disequilibrium (r2) between each SNP and the POLL locus.
Figure 7Schema used to produce a weighting factor for assay selection for each candidate SNP depending upon location within a chromosomal interval.