| Literature DB >> 28750625 |
Robert J Schaefer1, Mikkel Schubert2, Ernest Bailey3, Danika L Bannasch4, Eric Barrey5, Gila Kahila Bar-Gal6, Gottfried Brem7, Samantha A Brooks8, Ottmar Distl9, Ruedi Fries10, Carrie J Finno4, Vinzenz Gerber11, Bianca Haase12, Vidhya Jagannathan13, Ted Kalbfleisch14, Tosso Leeb13, Gabriella Lindgren15, Maria Susana Lopes16, Núria Mach5, Artur da Câmara Machado16, James N MacLeod3, Annette McCoy17, Julia Metzger9, Cecilia Penedo18, Sagi Polani6, Stefan Rieder19, Imke Tammen12, Jens Tetens20,21, Georg Thaller20, Andrea Verini-Supplizi22, Claire M Wade12, Barbara Wallner7, Ludovic Orlando2,23, James R Mickelson24, Molly E McCue25.
Abstract
BACKGROUND: To date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array.Entities:
Keywords: Equine genomics; Linkage disequilibrium; SNP chip; SNP discovery; SNP informativeness; SNP validation; SNP-tagging; Variant recalibration; Whole genome sequence
Mesh:
Year: 2017 PMID: 28750625 PMCID: PMC5530493 DOI: 10.1186/s12864-017-3943-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
SNP Sets used at various steps in array design
| Set Name | Number of unique sites | Set Description | WGS Data | Array Data |
|---|---|---|---|---|
| 23M | 22,557,988 | Possible/Discovered Variants | x | - |
| 10M | 11,435,936 | Array Compatible | x | - |
| 5M | 5,443,950 | Array Candidate | x | - |
| 2M | 2,001,826 | Test Array | x | x |
| 1.8M | 1,846,988 | Test Array Converted | x | x |
| 670k | 670,805 | Commercial Array | x | x |
Variants discovered from whole genome sequencing were filtered at various steps for quality control or using array design criteria. Six distinct sets of variants ranging from the initial ~23M high-quality, variants discovered from WGS to the 670k variants available in the commercial genotyping array are described throughout this manuscript
Fig. 1Comparing high QUAL genotypes called de-novo to the SNP50 array in Morgans and Standardbreds. SNPs on chromosome (ECA1) 1 with genotype calls from both WGS and SNP50 array were ranked by decreasing QUAL score for Morgans (MOR) at 6X (3759 SNPs; 12 individuals) and 12X (3751 SNPs; 6 individuals) coverage, and Standardbreds (STD) at 6X (4422 SNPs; 5 individuals) and 12X (4380 SNPs; 3 individuals) coverage. The cumulative proportion of genotypes that agree between platforms was compared based on ranking de-novo variants by QUAL score. Variants with missing data on either platform were excluded
Fig. 2Comparing QUAL ranked SNPs to VQSLOD ranked SNPs. VQSLOD scores were calculated from three different “gold standard” reference groups in Morgans (MOR) and Standardbreds (STD) using GATK VariantRecalibrator. Compared to QUAL scores (red line), high VQSLOD scored variants (top 10% variants by score) have a lower number of mismatched genotypes across the SNP50 BeadChip and variants discovered de novo
Number of breed specific tagging SNPs
| Breed Group | Number Of Tag SNPs |
|---|---|
| Thoroughbred | 144,175 |
| Lusitano | 148,097 |
| Icelandic | 148,206 |
| F. Montagne | 199,244 |
| Arabian | 199,264 |
| Belgian | 217,882 |
| Marremanno | 223,568 |
| Standardbred | 245,149 |
| Trotter | 256,790 |
| Warmblood | 304,510 |
| Morgan | 335,677 |
| Land Race | 338,040 |
| QuarterHorse | 366,702 |
| Draft | 370,701 |
| Pony | 387,279 |
Number of tagging SNPs required to reconstruct haplotypes in each breed (MAF > 0.10 and 0.90 r2, See Methods)
Imputation accuracy of the MNEc670k SNP genotyping array
| Tagging Group | 670 K to 2 M | Num Imputed Samples |
|---|---|---|
| Land Race | 0.981 +/− 0.001 | 3 |
| Arabian | 0.993 +/− 0.0009 | 13 |
| Belgian | 0.992 +/− 0.0005 | 7 |
| Draft | 0.9658 +/− 0.0140 | 6 |
| F. Montange | 0.988 +/− 0.0017 | 10 |
| Icelandic | 0.989 +/− 0.0012 | 6 |
| Lusitano | 0.992 +/− 0.0004 | 7 |
| Maremanno | 0.994 +/− 0.0005 | 9 |
| Morgan | 0.988 +/− 0.0036 | 20 |
| Pony | 0.991 +/− 0.0025 | 19 |
| Quarter Horse | 0.983 +/− 0.0033 | 25 |
| Standardbred | 0.989 +/− 0.0045 | 13 |
| Thoroughbred | 0.991 +/− 0.0068 | 9 |
| Warmblood | 0.985 +/− 0.0116 | 9 |
| Trotter | 0.9857 +/− 0.0046 | 9 |
Breed specific imputation accuracy (mean +/− s.e.m.) of genotypes from MNEc670k to MNEc2M SNP sets. In each tagging breed group, 1/3 of samples genotypes were masked to lower density SNP sets and removed from the reference population of 485 horses. Imputation was performed using Beagle 4.0 and concordance was determined with VCFtools
Fig. 3MNEc2M and MNEc670k Inter-SNP Distance. Distance between SNPs on each array was calculated using various minor allele frequency (MAF) cutoffs. Considering all available SNPs genotyped on the MNEc2M and MNEc670k arrays, on average, 1250 and 3756 bp separate markers (See Table 4 for average inter-SNP distances at the various MAF). Median values (red lines) and mean values (red boxes) were calculated at each MAF cutoff
MNEc2M and MNEc670k Inter-SNP distance at various minor allele frequency cutoffs
| Chip | MAF | Mean InterSNP Distance | Median InterSNP distance | Number of SNPs at MAF |
|---|---|---|---|---|
| MNEc2M | All SNPs | 1250 | 785 | 1,986,984 |
| MAF > 0 | 1255 | 787 | 1,978,913 | |
| MAF > 0.01 | 1334 | 835 | 1,862,844 | |
| MAF > 0.03 | 1590 | 991 | 1,562,205 | |
| MAF > 0.05 | 1876 | 1162 | 1,324,205 | |
| MAF > 0.10 | 2676 | 1623 | 928,235 | |
| MNEc670k | All SNPs | 3756 | 2172 | 661,349 |
| MAF > 0 | 3768 | 2178 | 659,278 | |
| MAF > 0.01 | 3837 | 2226 | 647,481 | |
| MAF > 0.03 | 4534 | 2606 | 547,858 | |
| MAF > 0.05 | 5199 | 2980 | 477,719 | |
| MAF > 0.10 | 6240 | 3651 | 398,055 |
Inter-SNP distance was calculated between SNPs informative at minor allele cutoffs greater than 0, 0.01, 0.03, 0.05 and 0.10. The number of SNPs included at this MAF cutoff is included. Distance and informativeness was re-calculated on both MNEc2M and MNEc670k arrays which were further broken down by tagging breed group (See Additional file 5: Table S5)
Fig. 4MNEc2M and MNEc670k Alternate Allele Frequency. Genotypes from horses on the 2 M test array (SNP Only; n = 332) as well as whole genome sequence (WGS Only; n = 153) were combined (WGS + SNP; n = 485) to estimate alternate allele frequency of the SNPs represented on the (a) 2 M and (b) 670k arrays. Figure 4 shows kernel density estimated (KDE) distributions of alternate allele (ALT) frequency in each sample group using variants that are on each array. Boxplots show ALT frequency distribution median (red line), mean (red square) as well as variance (See Table 5 for values)
MNEc2M and MNEc670k variant mean and median alternate allele frequency
| SNP Chip | Sample Split | Mean ALT Allele Frequency | Median ALT Allele Frequency |
|---|---|---|---|
| MNEc 2M Variants | WGS + SNP | 0.2115920561 | 0.0975609756 |
| SNP Only | 0.208498773 | 0.0900621118 | |
| WGS Only | 0.2313047405 | 0.1258278146 | |
| MNEc 670k Variants | WGS + SNP | 0.267280485 | 0.1729559748 |
| SNP Only | 0.2647029832 | 0.1682389937 | |
| WGS Only | 0.2836436744 | 0.1895424837 |
Average (mean and median) values for MNEc2M and MNEc670k arrays broken down by genotype information available from WGS, CHIP or WGS + CHIP
Fig. 5MNEc2M Breed Specific Alternate Allele Frequency. Alternate allele frequencies from variants present on the MNEc2M chip were split by breed group. Samples (WGS + SNP) were split into 15 tagging breed groups (See Additional file 5: Table S5). Breed groups with asterisk (*) indicate a combination of studbook breeds. Boxplots show median (red line) and mean (red box) of the alternate allele frequency distribution
Fig. 6Linkage Disequilibrium decay within and across breeds. Pairwise r2 was calculated between each SNP within 1 Mb having a minor allele frequency greater than 0.05. LD curves are broken down by breed as well as for samples derived from the all-breed WGS as well as from all-breed SNP cohorts. Breeds are ordered (TBLR) in the legend by their r2 values at 400 kb