| Literature DB >> 28270096 |
Paul M VanRaden1, Melvin E Tooker2, Jeffrey R O'Connell3, John B Cole2, Derek M Bickhart2.
Abstract
BACKGROUND: Millions of genetic variants have been identified by population-scale sequencing projects, but subsets of these variants are needed for routine genomic predictions or genotyping arrays. Methods for selecting sequence variants were compared using simulated sequence genotypes and real July 2015 data from the 1000 Bull Genomes Project.Entities:
Mesh:
Year: 2017 PMID: 28270096 PMCID: PMC5339980 DOI: 10.1186/s12711-017-0307-4
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Reliabilities for five simulated traits from ten sources of genetic information
| Trait | Parent average | 60 k | 60 k + 24 kGWA | 60 k + 24 kES | 60 k + 24 kEV | 60 k + 24 kG | 60 k + 10 k QTL | Only 10 k QTL | 600 k | 1.1 m genic |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 24.4 | 77.9 | 79.2 | 81.6 | 81.3 | 85.4 | 84.6 | 87.2 | 80.3 | 86.7 |
| 2 | 31.2 | 77.9 | 79.3 | 81.4 | 81.2 | 85.3 | 84.9 | 87.7 | 80.1 | 86.7 |
| 3 | 32.7 | 78.3 | 79.5 | 81.7 | 81.5 | 84.9 | 85.0 | 87.8 | 80.4 | 86.1 |
| 4 | 23.3 | 76.6 | 77.7 | 80.2 | 79.8 | 83.5 | 82.9 | 85.9 | 78.6 | 84.8 |
| 5 | 30.4 | 78.3 | 80.0 | 82.5 | 82.2 | 86.0 | 85.2 | 87.5 | 81.2 | 87.6 |
| Average | 28.4 | 77.8 | 79.2 | 81.5 | 81.2 | 85.0 | 84.5 | 87.2 | 80.1 | 86.4 |
Reliabilities expressed as percentages; 24 k markers selected from 600 k SNPs by GWA p value, multiple regression effect size (ES) or effect variance (EV); 24 k markers selected from sequence SNPs in or near genes (G) by effect size; 600 k markers plus 500 k SNPs in or near genes (1.1 m genic) by effect size
Computer resources needed to select markers from 30 million simulated variants
| Variant selection step | Number of threads | Computational time (h) | GB of memory | GB of disk space |
|---|---|---|---|---|
| Simulate 30 million | 1 | 56 | 210 | 32 |
| Prune linkage | 10 | 1 | 27 | 10 |
| Impute 8.4 million | 20 | 38 | 13 | 220 |
| Select 25,000 | 30 | 0.5 | <1 | <1 |
| Predict 1 million | 5 | 22 | 20 | <1 |
1000 sequenced and 25,984 genotyped bulls
Edits applied to simulated data and real data from Test 3
| SNP edit | Simulated data | Real data |
|---|---|---|
| Original number of SNPs called | 30 million | 39 million |
| Removed for MAF of <0.01 | 3 million | 20 million |
| Removed for linkage of >0.95 | 18 million | 13 million |
| Removed for imputation inaccuracy | 0 | 3 million |
| Remained after edits | 8 million | 3 million |
Test 3 included candidate SNPs, InDels, and intergenic and intronic variants
Fig. 1Accuracy of imputing sequence variants. Test 1 included 762,588 candidate SNPs. Test 2 included candidate SNPs plus 249,966 InDels for a total of 1,003,453 variants. Test 3 included candidate SNPs, InDels, and intergenic and intronic variants for a total of 3,148,506 variants. Chromosome 30 refers to the pseudo-autosomal region of chromosome X, and chromosome 31 refers to X-specific loci
Computation timea required with real sequence data for the longest (BTA1) and shortest (BTA29) chromosomes
| Computational step | BTA1 | BTA29 |
|---|---|---|
| Unzip VCF files | 6 | 2 |
| Read and transpose sequence | 95 | 36 |
| Subset sequenced animals | 1 | 1 |
| Subset matching HD markers | 8 | 10 |
| Merge sequence and HD data | 143 | 6 |
| Compute sequence linkage | 3 | 1 |
| Subset edited variants | 3 | 1 |
| Fix Mendelian conflicts | 3 | 1 |
| Impute with edited data | 16 | 10 |
| Reduce some sequence to HD data | 1 | 1 |
| Impute with reduced data | 17 | 9 |
| Total | 296 | 78 |
aTime in minutes
Reliability gains when adding real sequence variants to HD or 60 K
| Trait | Reliability for PA (%) | Gain for HD SNPs only | Gain for HD SNPs + 481,904 candidate SNPsa | Gain for HD SNPs + 481,904 candidate SNPs + indels | Gain for 60 k SNPs only | Gain for 60 k SNPs + 16,648 candidate SNPsb,c |
|---|---|---|---|---|---|---|
| Milk | 37.9 | 34.1 | 33.9 (−0.2) | 33.9 | 34.3 | 35.7 (1.4) |
| Fat | 37.9 | 33.7 | 34.0 (0.3) | 33.4 | 34.3 | 35.1 (0.8) |
| Protein | 37.9 | 27.9 | 27.0 (−0.9) | 26.7 | 27.5 | 28.2 (0.7) |
| Fat percentage | 37.9 | 49.2 | 52.7 (3.5) | 52.4 | 52.9 | 54.8 (1.9) |
| Protein percentage | 37.9 | 42.1 | 41.6 (0.5) | 43.0 | 41.6 | 44.3 (2.7) |
| Productive life | 32.0 | 36.1 | 35.8 (−0.3) | 36.4 | 35.6 | 38.2 (2.6) |
| Somatic cell score | 34.7 | 35.9 | 36.1 (0.2) | 37.1 | 35.1 | 37.0 (1.9) |
| Daughter pregnancy rate | 31.5 | 30.8 | 30.0 (−0.8) | 31.2 | 29.0 | 33.0 (4.0) |
| Cow conception rate | 29.8 | 28.7 | 28.1 (−0.6) | 28.8 | 28.9 | 31.8 (2.9) |
| Heifer conception rate | 30.0 | 19.0 | 20.3 (1.3) | 19.7 | 20.5 | 21.5 (1.0) |
| Sire calving ease | 29.9 | 27.8 | 27.7 (−0.1) | 25.2 | 24.5 | 28.5 (4.0) |
| Daughter calving ease | 25.3 | 32.5 | 30.8 (−1.7) | 29.9 | 31.5 | 31.4 (−0.1) |
| Sire stillbirth | 29.0 | 7.6 | 7.3 (−0.3) | 7.1 | 7.6 | 7.8 (0.2) |
| Daughter stillbirth | 23.8 | 37.4 | 37.0 (−0.4) | 35.8 | 35.4 | 38.0 (2.6) |
| Final score | 36.2 | 24.7 | 25.5 (0.8) | 25.8 | 24.6 | 27.8 (3.2) |
| Stature | 38.2 | 30.4 | 32.4 (2.0) | 32.8 | 30.3 | 34.7 (4.3) |
| Strength | 37.4 | 29.9 | 31.8 (1.9) | 31.8 | 29.9 | 34.5 (4.6) |
| Dairy form | 37.4 | 33.8 | 35.3 (1.5) | 35.8 | 35.0 | 38.2 (3.2) |
| Foot angle | 36.7 | 17.3 | 17.6 (0.3) | 18.2 | 17.2 | 19.6 (2.4) |
| Rear legs (side view) | 37.3 | 21.9 | 22.7 (0.8) | 22.0 | 22.1 | 24.1 (2.0) |
| Body depth | 37.6 | 31.0 | 33.1 (2.1) | 33.7 | 31.2 | 36.0 (4.8) |
| Rump angle | 37.8 | 32.7 | 34.0 (1.3) | 33.5 | 32.9 | 36.1 (3.2) |
| Rump width | 37.1 | 29.2 | 30.4 (1.2) | 30.2 | 29.1 | 32.5 (3.4) |
| Fore udder attachment | 37.5 | 35.1 | 36.4 (1.3) | 36.1 | 35.0 | 39.0 (4.0) |
| Rear udder height | 37.3 | 24.7 | 25.7 (1.0) | 25.8 | 24.1 | 27.3 (3.2) |
| Udder depth | 38.0 | 40.2 | 42.6 (2.4) | 42.8 | 40.6 | 44.6 (4.0) |
| Udder cleft | 37.1 | 23.7 | 24.5 (0.8) | 24.0 | 23.6 | 25.5 (1.9) |
| Front teat placement | 37.6 | 32.6 | 33.4 (0.8) | 32.3 | 30.9 | 35.0 (4.1) |
| Teat length | 37.7 | 29.0 | 30.3 (1.3) | 29.9 | 28.0 | 32.7 (4.7) |
| Rear legs (rear view) | 36.0 | 20.7 | 20.3 (−0.4) | 20.1 | 20.4 | 22.8 (2.4) |
| Feet and leg score | 36.4 | 16.9 | 16.5 (−0.4) | 16.6 | 15.9 | 18.3 (2.4) |
| Rear teat placement | 37.4 | 33.1 | 33.6 (0.5) | 32.1 | 32.9 | 35.2 (2.3) |
| Net merit | 34.4 | 23.8 | 24.3 (0.5) | 24.4 | 23.4 | 24.7 (1.3) |
| Average | 35.2 | 28.8 | 29.4 (0.6) | 29.2 | 29.5 | 32.2 (2.7) |
Reliability gains in percentage points over parent average reliability
PA parent average
aDifference from reliability gain for HD SNPS only in parentheses
bDifference from reliability gain for 60 k SNPS only in parentheses
cDoes not include 6584 60 k markers that were not available in sequence data
Coefficients for regression of validation data on genomic predictions when adding real sequence variants to HD or 60 k
| Trait | PA | HD SNPs only | HD SNPs + 481,904 candidate SNPs | HD SNPs + 481,904 candidate SNPs + InDels | 60 k SNPs only | 60 k SNPs + 16,648 candidate SNPsa |
|---|---|---|---|---|---|---|
| Milk | 0.81 | 1.03 | 1.06 | 1.06 | 1.04 | 1.05 |
| Fat | 0.68 | 0.92 | 0.95 | 0.94 | 0.94 | 0.93 |
| Protein | 0.75 | 0.93 | 0.96 | 0.95 | 0.94 | 0.95 |
| Fat percentage | 0.97 | 1.14 | 1.13 | 1.12 | 1.12 | 1.09 |
| Protein percentage | 0.77 | 0.96 | 0.98 | 0.97 | 0.95 | 0.96 |
| Productive life | 1.24 | 1.30 | 1.32 | 1.25 | 1.27 | 1.25 |
| Somatic cell score | 0.89 | 1.09 | 1.10 | 1.06 | 1.08 | 1.06 |
| Daughter pregnancy rate | 1.20 | 1.47 | 1.49 | 1.48 | 1.43 | 1.43 |
| Cow conception rate | 0.72 | 0.94 | 0.95 | 0.92 | 0.91 | 0.91 |
| Heifer conception rate | 0.75 | 0.97 | 1.03 | 0.98 | 0.94 | 0.92 |
| Sire calving ease | 0.65 | 0.83 | 0.83 | 0.81 | 0.82 | 0.84 |
| Daughter calving ease | 0.80 | 1.04 | 1.03 | 1.02 | 1.04 | 1.02 |
| Sire stillbirth | 0.84 | 0.75 | 0.76 | 0.78 | 0.77 | 0.76 |
| Daughter stillbirth | 0.77 | 1.15 | 1.16 | 1.15 | 1.12 | 1.16 |
| Final score | 0.71 | 0.93 | 0.92 | 0.92 | 0.91 | 0.88 |
| Stature | 0.84 | 1.04 | 1.02 | 1.01 | 1.01 | 1.00 |
| Strength | 0.80 | 1.05 | 1.03 | 1.02 | 1.01 | 0.99 |
| Dairy form | 0.82 | 1.10 | 1.08 | 1.08 | 1.07 | 1.05 |
| Foot angle | 0.71 | 0.84 | 0.82 | 0.82 | 0.81 | 0.79 |
| Rear legs (side view) | 0.87 | 1.01 | 0.99 | 0.99 | 0.98 | 0.96 |
| Body depth | 0.76 | 1.01 | 0.99 | 0.99 | 0.97 | 0.96 |
| Rump angle | 0.80 | 1.08 | 1.07 | 1.05 | 1.06 | 1.05 |
| Rump width | 0.78 | 1.01 | 0.99 | 0.98 | 0.98 | 0.96 |
| Fore udder attachment | 0.80 | 1.06 | 1.04 | 1.03 | 1.03 | 1.01 |
| Rear udder height | 0.78 | 0.97 | 0.96 | 0.96 | 0.94 | 0.93 |
| Udder depth | 0.76 | 1.11 | 1.09 | 1.08 | 1.07 | 1.06 |
| Udder cleft | 0.87 | 1.00 | 0.99 | 0.99 | 0.98 | 0.95 |
| Front teat placement | 0.80 | 1.05 | 1.03 | 1.01 | 1.02 | 0.99 |
| Teat length | 0.91 | 1.06 | 1.06 | 1.04 | 1.04 | 1.03 |
| Rear legs (rear view) | 0.58 | 0.86 | 0.85 | 0.83 | 0.83 | 0.80 |
| Feet and leg score | 0.54 | 0.74 | 0.72 | 0.72 | 0.71 | 0.68 |
| Rear teat placement | 0.90 | 1.13 | 1.10 | 1.09 | 1.09 | 1.04 |
| Net merit | 0.85 | 0.82 | 0.84 | 0.81 | 0.83 | 0.81 |
| Average | 0.81 | 1.01 | 1.01 | 1.00 | 0.99 | 0.98 |
PA parent average
aDoes not include 6584 60 k markers that were not available in sequence data
Fig. 2Example of variant selection on chromosome 5. For 1719 SNPs, windows were designated for SNPs with the largest effects. Then, only SNPs with larger effects were retained in those windows (1026 SNPs excluded)
Fig. 3Maximum correlations with neighbouring variants
Fig. 4Cumulative distributions for minor allele frequencies