| Literature DB >> 29599798 |
Matthew C McClure1, John McCarthy1, Paul Flynn2, Jennifer C McClure1, Emma Dair1, D K O'Connell1, John F Kearney1.
Abstract
A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 B. taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800) selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present), and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non-matching genotypes per animal, SNP duplicates, sex and breed prediction mismatches, parentage and progeny validation results, and other situations. The Animal QC pipeline make use of ICBF800 SNP set where appropriate to identify errors in a computationally efficient yet still highly accurate method.Entities:
Keywords: ISAG200; SNP; parentage; parentage prediction; quality control
Year: 2018 PMID: 29599798 PMCID: PMC5862794 DOI: 10.3389/fgene.2018.00084
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Genotype quality control pipeline.
Count of animal's genotype frequency for AA, AB, and BB genotypes from 846,868 genotyped individuals with a genotype CR>0.9 and all 3 genotypes present.
| 0 | – | – | – |
| 0.005 | – | – | – |
| 0.01 | – | – | – |
| 0.02 | – | – | – |
| 0.03 | 1 | – | – |
| 0.04 | – | – | – |
| 0.05 | – | – | – |
| 0.06 | 1 | – | – |
| 0.07 | – | – | – |
| 0.08 | 2 | – | – |
| 0.09 | – | – | – |
| 0.1 | – | – | – |
| 0.15 | – | 1 | – |
| 0.2 | 2 | 68 | 1 |
| 0.25 | 1,977 | 2,745 | 6 |
| 0.3 | 6,20,489 | 57,372 | 44,438 |
| 0.35 | 2,23,499 | 5,64,809 | 1,53,114 |
| 0.4 | 885 | 78,550 | 6,37,289 |
| 0.45 | 2 | 1,38,927 | 12,002 |
| 0.5 | – | 4,385 | 9 |
| >0.5 | 1 | 11 | 9 |
Percent of major breed represented the total ICBF genotype database by date.
| AA | 4.41 | 7.22 | 9.51 | 9.58 |
| AU | 1 | 0.53 | 0.60 | 0.61 |
| BA | 0.04 | 0.59 | 0.77 | 0.81 |
| BB | 1.01 | 2.75 | 3.63 | 3.63 |
| CH | 9.09 | 19.2 | 19.94 | 21.48 |
| HE | 3.23 | 4.73 | 5.13 | 5.19 |
| HO | 68.02 | 30.08 | 13.89 | 13.63 |
| JE | 0.17 | 0.67 | 0.81 | 0.52 |
| LM | 9.72 | 22.93 | 30.67 | 31.5 |
| MO | 0.14 | 0.05 | 0.16 | 0.17 |
| PI | 0.54 | 0.17 | 0.21 | 0.21 |
| PT | 0.17 | 0.64 | 0.67 | 0.7 |
| SA | 0.04 | 0.92 | 1.70 | 1.72 |
| SI | 0.06 | 1.57 | 1.90 | 1.76 |
| SM | 2.34 | 6.77 | 8.12 | 7.75 |
| Total | 99.98 | 98.82 | 97.7 | 99.26 |
The breed represents the animal's major breed component, so an animal that is 75% LM and 25% HO is counted as a LM individual.
Breed abbreviations defined in Table .
Total values do not add up to 100% as not all breeds are represented.
Crossbred animals are counted by their main breed. F1 animals (50% breed 1 and 50% breed 2) are counted under which breed is alphabetically first.
Figure 2Parentage pipeline.
Figure 3Animal quality control pipeline.
Figure 4The percent of 37,281 males and 568,641 females at each heterozygosity level for chromosome X SNP that are present pseudoautosomal region (PAR) and non-pseudoautosomal regions (nPAR) of chromosome X.
Figure 5Plot of principle components (PC) 1 and 2 from a PCA analysis of the 22,610 reference animals from 14 breeds. Breed abbreviation listed in Table S1.
Breed Composition reference population size and the correlation between the predicted and listed breed compositions from 713,814 animals.
| NA | 0.9410 | 0.9493 | 0.9021 | 0.9036 | 0.9432 | 0.6220 | 0.9258 | 0.8799 | 0.9321 | 0.9385 | 0.8971 | 0.9497 | 0.9083 | 0.9272 | 0.9014 |
| Neither validated | 0.9044 | 0.9367 | 0.8752 | 0.7871 | 0.8869 | 0.4478 | 0.8907 | 0.7967 | 0.9219 | 0.8886 | 0.8607 | 0.9290 | 0.8775 | 0.8854 | 0.8492 |
| One validates | 0.9607 | 0.9520 | 0.9240 | 0.9435 | 0.9647 | 0.7306 | 0.9488 | 0.9135 | 0.9518 | 0.9617 | 0.9111 | 0.9606 | 0.9293 | 0.9492 | 0.9287 |
| Both validate | 0.9713 | 0.9656 | 0.9196 | 0.9385 | 0.9660 | 0.7265 | 0.9612 | 0.9408 | 0.7627 | 0.9642 | 0.9346 | 0.9706 | 0.9462 | 0.9583 | 0.9233 |
| NA | 0.9417 | 0.9501 | 0.9035 | 0.9043 | 0.9432 | 0.6225 | 0.9261 | 0.8830 | 0.9344 | 0.9385 | 0.9096 | 0.9497 | 0.9113 | 0.9285 | 0.9033 |
| Neither validated | 0.9057 | 0.9381 | 0.8773 | 0.7894 | 0.8870 | 0.4483 | 0.8911 | 0.8032 | 0.9250 | 0.8885 | 0.8812 | 0.9291 | 0.8823 | 0.8878 | 0.8524 |
| One validates | 0.9610 | 0.9524 | 0.9250 | 0.9437 | 0.9648 | 0.7308 | 0.9490 | 0.9153 | 0.9529 | 0.9617 | 0.9185 | 0.9606 | 0.9308 | 0.9499 | 0.9297 |
| Both validate | 0.9717 | 0.9659 | 0.9205 | 0.9386 | 0.9660 | 0.7267 | 0.9614 | 0.9408 | 0.7720 | 0.9642 | 0.9411 | 0.9706 | 0.9474 | 0.9585 | 0.9247 |
| NA | 0.9429 | 0.9511 | 0.9049 | 0.9049 | 0.9433 | 0.6245 | 0.9267 | 0.8836 | 0.9363 | 0.9386 | 0.9167 | 0.9499 | 0.9140 | 0.9303 | 0.9049 |
| Neither validated | 0.9075 | 0.9398 | 0.8792 | 0.7907 | 0.8872 | 0.4504 | 0.8922 | 0.8045 | 0.9273 | 0.8885 | 0.8921 | 0.9294 | 0.8864 | 0.8913 | 0.8547 |
| One validates | 0.9615 | 0.9531 | 0.9258 | 0.9440 | 0.9648 | 0.7318 | 0.9492 | 0.9155 | 0.9537 | 0.9617 | 0.9236 | 0.9607 | 0.9320 | 0.9508 | 0.9306 |
| Both validate | 0.9726 | 0.9662 | 0.9213 | 0.9389 | 0.9660 | 0.7291 | 0.9617 | 0.9410 | 0.7791 | 0.9643 | 0.9450 | 0.9707 | 0.9494 | 0.9590 | 0.9260 |
| NA | 0.9815 | 0.9833 | 0.9620 | 0.9528 | 0.9778 | 0.8313 | 0.9821 | 0.9498 | 0.9720 | 0.9758 | 0.9599 | 0.9816 | 0.9649 | 0.9742 | 0.9606 |
| Neither validated | 0.9677 | 0.9839 | 0.9611 | 0.9085 | 0.9493 | 0.7576 | 0.9631 | 0.9315 | 0.9725 | 0.9524 | 0.9624 | 0.9769 | 0.9572 | 0.9532 | 0.9427 |
| One validates | 0.9874 | 0.9823 | 0.9674 | 0.9686 | 0.9863 | 0.8442 | 0.9892 | 0.9496 | 0.9747 | 0.9848 | 0.9522 | 0.9852 | 0.9709 | 0.9826 | 0.9661 |
| Both validate | 0.9869 | 0.9841 | 0.9535 | 0.9554 | 0.9810 | 0.8887 | 0.9883 | 0.9755 | 0.8733 | 0.9790 | 0.9656 | 0.9832 | 0.9686 | 0.9783 | 0.9615 |
Correlation on 710,000 animals between the predicted and listed breed composition for 14 breeds.
Parent SNP validation possibilities: NA, results for all animals regardless of parentage validation status; Neither validated, neither parent validated, either parent not SNP genotyped or SNP failed; One validates, one parent SNP validates other SNP failed or not SNP genotyped; Both validate, both parents SNP validate.
Breed abbreviations defined in Table .
Comparison results of parentage validation test for 300,020 animals between the ICBF800 and the smaller ISAG100 and ISAG200 SNP panels.
| Sire | 292462 | 0.249 | 0.022 |
| Dam | 27330 | 0.040 | 0.004 |
| Total | 319792 | 0.231 | 0.021 |
ISAG200 SNP with clustering issues in Table .
Count of SNP parentage checks analyzed, by sire, dam, or total.
Percent of animals that had a different parentage verification result for the 100 SNP panel when compared to the ICBF800 panel at the 1% misconcordance level.
19,774 animals had both sire and dam SNP checked.
MAF summary of parentage SNP across 852,087 animals and for 25 breeds.
| ICBF800 | Max | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 |
| Min | 0.267 | 0.255 | 0.269 | 0.040 | 0.053 | 0.091 | 0.058 | 0.073 | 0.000 | 0.082 | 0.017 | 0.082 | 0.017 | 0.000 | 0.000 | 0.051 | 0.000 | 0.069 | 0.000 | 0.000 | 0.114 | 0.132 | 0.000 | 0.027 | 0.050 | 0.035 | 0.024 | 0.038 | |
| Ave | 0.470 | 0.459 | 0.471 | 0.349 | 0.380 | 0.390 | 0.368 | 0.404 | 0.334 | 0.387 | 0.337 | 0.398 | 0.327 | 0.308 | 0.215 | 0.393 | 0.311 | 0.366 | 0.359 | 0.348 | 0.388 | 0.388 | 0.312 | 0.349 | 0.340 | 0.366 | 0.328 | 0.377 | |
| Non ISAG | Max | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.499 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.499 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.499 | 0.500 | 0.500 |
| Min | 0.428 | 0.345 | 0.435 | 0.040 | 0.109 | 0.091 | 0.105 | 0.144 | 0.000 | 0.083 | 0.028 | 0.124 | 0.034 | 0.000 | 0.000 | 0.051 | 0.000 | 0.069 | 0.000 | 0.000 | 0.130 | 0.132 | 0.000 | 0.027 | 0.072 | 0.094 | 0.024 | 0.058 | |
| Ave | 0.484 | 0.470 | 0.485 | 0.354 | 0.385 | 0.394 | 0.371 | 0.408 | 0.337 | 0.392 | 0.338 | 0.396 | 0.332 | 0.312 | 0.219 | 0.401 | 0.312 | 0.364 | 0.364 | 0.349 | 0.392 | 0.390 | 0.312 | 0.352 | 0.342 | 0.373 | 0.333 | 0.377 | |
| ISAG | Max | 0.499 | 0.500 | 0.498 | 0.499 | 0.499 | 0.500 | 0.500 | 0.498 | 0.500 | 0.499 | 0.497 | 0.498 | 0.500 | 0.500 | 0.500 | 0.499 | 0.500 | 0.500 | 0.500 | 0.500 | 0.499 | 0.499 | 0.500 | 0.500 | 0.499 | 0.500 | 0.500 | 0.500 |
| Min | 0.267 | 0.255 | 0.269 | 0.042 | 0.053 | 0.109 | 0.058 | 0.073 | 0.000 | 0.082 | 0.017 | 0.082 | 0.017 | 0.000 | 0.000 | 0.052 | 0.000 | 0.071 | 0.000 | 0.000 | 0.114 | 0.145 | 0.017 | 0.038 | 0.050 | 0.035 | 0.024 | 0.038 | |
| Ave | 0.427 | 0.422 | 0.428 | 0.334 | 0.363 | 0.374 | 0.356 | 0.389 | 0.327 | 0.374 | 0.334 | 0.402 | 0.310 | 0.296 | 0.201 | 0.370 | 0.308 | 0.371 | 0.340 | 0.345 | 0.376 | 0.381 | 0.313 | 0.339 | 0.332 | 0.346 | 0.315 | 0.376 | |
Value when only purebred animals are analyzed, including breeds with <10 purebred animals genotyped.
Values when only crossbred animals are analyzed.
Breed abbreviations defined in Table .
ISAG includes all ISAG200 SNP except those excluded for QC issues in Table .
Number of genotyped animals.