| Literature DB >> 35328039 |
Moran Gershoni1, Andrey Shirak1, Rotem Raz1, Eyal Seroussi1.
Abstract
Microarray-based genomic selection is a central tool to increase the genetic gain of economically significant traits in dairy cattle. Yet, the effectivity of this tool is slightly limited, as estimates based on genotype data only partially explain the observed heritability. In the analysis of the genomes of 17 Israeli Holstein bulls, we compared genotyping accuracy between whole-genome sequencing (WGS) and microarray-based techniques. Using the standard GATK pipeline, the short-variant discovery within sequence reads mapped to the reference genome (ARS-UCD1.2) was compared to the genotypes from Illumina BovineSNP50 BeadChip and to an alternative method, which computationally mimics the hybridization procedure by mapping reads to 50 bp spanning the BeadChip source sequences. The number of mismatches between the BeadChip and WGS genotypes was low (0.2%). However, 17,197 (40% of the informative SNPs) had extra variation within 50 bp of the targeted SNP site, which might interfere with hybridization-based genotyping. Consequently, with respect to genotyping errors, BeadChip varied significantly and systematically from WGS genotyping, introducing null allele-like effects and Mendelian errors (<0.5%), whereas the GATK algorithm of local de novo assembly of haplotypes successfully resolved the genotypes in the extra-variable regions. These findings suggest that the microarray design should avoid polymorphic genomic regions that are prone to extra variation and that WGS data may be used to resolve erroneous genotyping, which may partially explain missing heritability.Entities:
Keywords: genomic evaluation; genotyping platforms; single nucleotide polymorphism
Mesh:
Year: 2022 PMID: 35328039 PMCID: PMC8948885 DOI: 10.3390/genes13030485
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Comparison between BeadChip and WGS genotyping.
| Sire | NGS 1 | # of Spots 2 | Match GAP5 3 | Match VCF 3 | Miss GAP5 4 | Miss VCF 4 |
|---|---|---|---|---|---|---|
| 3376 | NS | 401,967,974 | 34,493 | 35,030 | 3 | 25 |
| 3651 | NS | 361,519,877 | 34,452 | 35,081 | 3 | 13 |
| 3756 | NS | 449,885,842 | 34,507 | 35,033 | 6 | 18 |
| 3811 | NS | 399,980,764 | 34,543 | 35,061 | 0 | 17 |
| 7165 | NS | 382,156,099 | 34,397 | 35,055 | 5 | 21 |
| 7592 | NS | 458,256,777 | 33,397 | 33,796 | 0 | 9 |
| 7733 | NS | 439,943,852 | 42,542 | 38,845 | 16 | 81 |
|
|
|
|
|
|
| |
| 7396 | HS | 310,285,759 | 44,238 | 40,753 | 15 | 54 |
| 7400 | HS | 319,575,305 | 43,364 | 41,313 | 5 | 67 |
| 7424 | HS | 337,946,649 | 44,879 | 41,136 | 5 | 57 |
| 7510 | HS | 300,382,216 | 44,890 | 41,376 | 2 | 62 |
| 7559 | HS | 273,572,927 | 44,626 | 41,221 | 6 | 62 |
| 7679 | HS | 285,115,013 | 43,762 | 40,470 | 8 | 62 |
| 7733 | HS | 302,259,900 | 42,224 | 38,849 | 16 | 78 |
| 7738 | HS | 296,384,014 | 44,778 | 41,334 | 6 | 76 |
| 7851 | HS | 327,126,812 | 44,945 | 41,320 | 5 | 56 |
| 7936 | HS | 317,888,134 | 44,721 | 41,154 | 7 | 56 |
| 9078 | HS | 304,472,448 | 44,890 | 41,355 | 10 | 64 |
|
|
|
|
|
|
|
1 WGS was performed on the NovaSeq (NS) and HiSeq (HS) platforms. 2 Each spot produced two reads (forward and reverse). 3 Match GAP5 and Match VCF represent the number of concordant genotypes between the BeadChip data and the GAP5 and VCF methods, respectively. 4 Miss GAP5 and Miss VCF represent the number of non-concordant genotypes between the BeadChip data and the GAP5 and VCF methods, respectively. 5 Means and their standard errors are given for each of the platforms (boldface).
Figure 1Output of the assembly visualizers for SNPs ARS-BFGL-NGS-72133 and BTB-01793064. Position 118,228,308 on BTA1 of sire 7424 genome was examined. (a) An IGV output. Using the VCF method, this SNP was genotyped as heterozygous with 17 and 13 reads of A and of G alleles, respectively, whereas the BeadChip and GAP5 genotypes were AA homozygous. (b) GAP5 output with 9 reads of the A allele. Pink background denotes the template with 50 bp spanning the SNP site (black background on the consensus line). Bases identical to the consensus sequence are denoted as dots with background color that corresponds to the quality score, with light gray indicating higher quality. (c) Position 94,190,023 on BTA1 of father and son (7396 and 7851, respectively) genomes were examined. Using the VCF method, the father’s SNP was genotyped as heterozygous with 12 and 7 reads of C and of T alleles, respectively, whereas the son was homozygous for the T allele (19 reads).
BeadChip SNPs with additional variation within the probe target sequence 1.
| Total | Polymorphic | Variation Upstream | Variation Downstream | Any Extra Variation | Not SNP 2 | Potential Problem | |
|---|---|---|---|---|---|---|---|
| # of SNPs | 50,392 | 42,848 |
| 17,265 | |||
| % | 100 | 85 | 24.2 | 23.9 | 40.1 | 0.9 | 40.3 |
1 Additional variation within 50 bp of an SNP site that was found in the 17 analyzed genomes. 2 At the SNP site, more than two alleles, indel, or none single nucleotide variation were found.