| Literature DB >> 24903263 |
Lambros Koufariotis1, Yi-Ping Phoebe Chen, Sunduimijid Bolormaa, Ben J Hayes.
Abstract
BACKGROUND: In livestock, as in humans, the number of genetic variants that can be tested for association with complex quantitative traits, or used in genomic predictions, is increasing exponentially as whole genome sequencing becomes more common. The power to identify variants associated with traits, particularly those of small effects, could be increased if certain regions of the genome were known a priori to be enriched for associations. Here, we investigate whether twelve genomic annotation classes were enriched or depleted for significant associations in genome wide association studies for complex traits in beef and dairy cattle. We also describe a variance component approach to determine the proportion of genetic variance captured by each annotation class.Entities:
Mesh:
Year: 2014 PMID: 24903263 PMCID: PMC4070550 DOI: 10.1186/1471-2164-15-436
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Total number of variants annotated in different genomic classes in the beef and dairy data sets
| Class | Number variants in dairy | Number variants in beef |
|---|---|---|
| Total Genomic | 632003 | 729254 |
| Intergenic | 428255 (0.68) | 490982 (0.67) |
| Intragenic (genic regions) | 203534 (0.32) | 237914 (0.33) |
| Upstream variants | 23799 (0.04) | 27861 (0.04) |
| Downstream Variants | 26453 (0.04) | 30780 (0.04) |
| Introns | 195012 (0.31) | 227938 (0.31) |
| Exon | 8913 (0.01) | 10416 (0.01) |
| Protein Coding Sequence (CDS) | 6364 (0.01) | 7490 (0.01) |
| Synonymous | 3968 (0.01) | 4768 (0.01) |
| Missense | 2392 (0.004) | 2718 (0.004) |
| UTR (5' & 3') | 2386 (0.004) | 2748 (0.004) |
| Non-Coding Conserved | 6331 (0.01) | 7399 (0.01) |
| microRNA Predicted Target | 1932 (0.003) | 2213 (0.003) |
| Transcription Factor Binding Sites | 229 (0.0004) | 271 (0.0004) |
| Frameshift SNPs | 1 | 1 |
| Mature microRNA SNPs | 1 | 2 |
| All Splice Sites | 720 (0.001) | 855 (0.001) |
| All Stop sites | 86 (0.0001) | 88 (0.0001) |
The total number of SNP differ between data sets because the beef data included more breeds, so more SNP were polymorphic. Annotations were obtained from Ensembl version 73 [51], with the exception of mature miRNA SNPs and miRNA predicted targets which came from miRBase databases [28, 53], transcription factor binding sites were from Bickhart [27] and the non-coding conserved which were obtain from a phastCons study [55].
Dairy and beef trait descriptions including the number of phenotype records for the GWAS analysis
| Trait ID | Animal | Trait name and description | Number of phenotypes |
|---|---|---|---|
| Fat | Dairy | Fat Volume | 16812 |
| Fat Percent | Dairy | Fat Percent | 16812 |
| Milk | Dairy | Milk Volume | 16812 |
| Protein | Dairy | Protein Volume | 16812 |
| Protein Percent | Dairy | Protein Percent | 16812 |
| Angularity | Dairy | Angularity | 6910 |
| BCS | Dairy | Body Conditioning Score | 6910 |
| Mammary System | Dairy | Mammary System | 6910 |
| Fertility | Dairy | Fertility | 15430 |
| Survival Direct | Dairy | Survival Direct | 15352 |
| SCC | Dairy | Somatic Cell Count | 16297 |
| LLPF | Beef | Peak force measured in Longissimus lumborum muscle (kg) | 5358 |
| CIMF | Beef | Percent intramuscular fat measured in Longissimus lumborum muscle | 5824 |
| CRIB | Beef | Rib fat at slaughter | 5464 |
| SEMA | Beef | Exit scanned eye muscle area | 4539 |
| SC12 | Beef | Scrotal circumference measured at ages of 12 | 1112 |
| PNS24 | Beef | Percent normal sperm at the age of 24 months (%) | 964 |
| PPAI_BB | Beef | Post partum anoestrus interval in BB (days) | 629 |
| PPAI_TC | Beef | Post partum anoestrus interval in TC (days) | 863 |
| AGECL_BB | Beef | Age at first detected corpus luteum in BB (days) | 1007 |
| AGECL_TC | Beef | Age at first detected corpus luteum in TC (days) | 1108 |
Traits that are enriched or depleted for TAVs in both dairy and beef cattle for annotation classes
| Trait | Cattle breed | Intergenic | Upstream | Downstream | Intragenic | Intron | CDS | Synonymous | Missense | UTR (5’ & 3’) | Non-coding conserved | TFBS | micro RNA target |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fat | Dairy | ns | + | + | + | - | + | + | + | ns | ns | ns | ns |
| Fat Percent | Dairy | - | + | + | + | ns | + | + | + | ns | ns | ns | + |
| Milk | Dairy | - | + | + | + | + | + | + | + | ns | ns | ns | + |
| Protein | Dairy | - | + | + | + | + | + | ns | + | ns | - | ns | ns |
| Protein Percent | Dairy | - | + | + | + | + | + | + | + | + | ns | ns | ns |
| Angularity | Dairy | - | + | + | + | + | ns | ns | + | ns | ns | ns | ns |
| BCS | Dairy | ns | + | + | ns | ns | + | ns | + | ns | ns | ns | ns |
| Mammary System | Dairy | ns | + | + | - | - | + | ns | ns | ns | ns | ns | + |
| Fertility | Dairy | + | + | ns | - | - | ns | ns | ns | ns | ns | ns | ns |
| Survival Direct | Dairy | ns | + | + | - | - | ns | ns | ns | + | ns | ns | ns |
| SCC | Dairy | ns | ns | ns | ns | ns | ns | ns | ns | ns | ns | ns | ns |
| LLPF | Beef | - | + | + | + | + | + | + | ns | + | ns | ns | + |
| CIMF | Beef | - | + | ns | + | + | ns | ns | ns | ns | ns | ns | ns |
| CRIB | Beef | ns | + | ns | ns | ns | ns | ns | ns | ns | ns | ns | ns |
| SEMA | Beef | ns | ns | ns | - | - | ns | ns | ns | ns | ns | ns | ns |
| SC12 | Beef | + | + | + | - | - | ns | ns | ns | ns | + | ns | ns |
| PNS24 | Beef | + | + | + | - | - | ns | ns | ns | ns | ns | ns | - |
| AGECL_BB | Beef | ns | ns | - | ns | ns | ns | ns | ns | ns | ns | ns | ns |
| AGECL_TC | Beef | ns | ns | - | ns | ns | ns | ns | ns | ns | ns | ns | ns |
| PPAI_BB | Beef | ns | - | ns | ns | ns | ns | ns | ns | ns | ns | ns | ns |
| PPAI_TC | Beef | - | ns | ns | + | + | ns | ns | ns | ns | ns | ns | ns |
Traits were deemed to be significant by permutation testing. The null distribution for the permutation testing was constructed by testing the same number of SNP in each class, but randomly chosen, for the number of significant SNP at P<0.0001, and random selection of SNP was performed 1000 times for each class. A class was enriched if the actual number of SNP significant in that class was greater than the number significant in the 950th highest random set, and depleted if the number of significant SNP was less than the number significant than the 50th lowest random set. Traits that are enriched for TAVs in a functional class are indicated with +, those that were depleted are marked with -, and traits where no significance occurred is indicated with ns.
Figure 1Enrichment or depletion analysis of trait association variants in annotation classes. Permutation testing was performed to determine if the number of variants found to be significant in each class was greater than expected by chance for the total number of variants in that class. The number of traits that are significant are shown in blue for dairy and orange for beef. Enriched traits are indicated in the positive dark blue bars for dairy and positive dark orange bars for Beef. Depleted traits are shown below their corresponding class with light blue bars indicating depletion in dairy and light orange bars indicating depletion in beef.
Results from logistic regression analysis
| Traits | Intergenic | Upstream | Downstream | Intragenic | Intron | CDS | Synonymous | Missense | UTR (5'&3') | Non-coding conserved | TFBS | micro RNA target |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fat | 0.05 (−0.01) | 4.92 (0.15) | 3.36 (0.12) | 0.55 (0.10) | 0.17 (−0.11) | 4.70 (0.12) | 0.00 (0.02) | 0.26 (0.10) | 0.27 (−0.21) | 0.07 (0.02) | 0.22 (0.18) | 0.55 (−0.16) |
| Fat Percent | 0.68 (−0.08) | 6.44 (0.14) | 4.89 (0.11) | 12.71 (0.52) | 0.89 (−0.46) | 11.69 (−0.44) | 0.29 (0.36) | 0.06 (0.38) | 0.67 (−0.40) | 0.48 (−0.08) | 0.26 (−0.32) | 1.53 (0.28) |
| Milk | 0.67 (−0.06) | 1.91 (0.03) | 2.80 (0.05) | 2.16 (0.64) | 2.12 (−0.66) | 3.28 (0.14) | 0.84 (−0.66) | 0.37 (−0.54) | 0.05 (−0.66) | 0.93 (−0.10) | 0.09 (−0.09) | 1.26 (0.20) |
| Protein | 3.05 (−0.17) | 2.60 (0.00) | 5.12 (0.01) | 4.20 (0.39) | 1.41 (−0.49) | 2.85 (0.06) | 0.51 (−0.47) | 0.86 (−0.28) | 0.19 (−0.46) | 1.11 (−0.11) | 0.20 (0.13) | 0.48 (0.10) |
| Protein Percent | 0.21 (−0.02) | 15.12 (0.21) | 9.59 (0.17) | 5.10 (0.38) | 0.84 (−0.33) | 8.60 (−0.10) | 0.00 (0.00) | 0.46 (0.11) | 4.06 (−0.06) | 0.57 (−0.07) | 0.55 (0.29) | 0.84 (0.15) |
| Angularity | 0.89 (−0.25) | 2.59 (0.11) | 1.31 (0.02) | 7.74 (−0.75) | 0.78 (0.79) | 1.41 (0.86) | 0.06 (−0.26) | 1.12 (0.37) | 0.16 (0.38) | 0.93 (0.29) | 0.09 (0.21) | 0.71 (0.40) |
| BCS | 1.14 (−0.60) | 2.16 (0.02) | 2.12 (0.02) | 0.12 (0.34) | 0.24 (−0.88) | 3.04 (2.44) | 2.18 (−3.07) | 0.21 (−2.13) | 0.00 (−0.81) | 0.73 (−0.94) | 1.05 (1.66) | 0.14 (0.22) |
| Mammary System | 1.27 (0.31) | 2.70 (0.51) | 1.16 (0.38) | 2.54 (1.01) | 0.52 (−0.86) | 1.51 (−0.45) | 0.00 (0.07) | 0.04 (0.04) | 1.34 (−0.32) | 0.64 (−0.31) | 0.24 (−5.29) | 1.61 (0.65) |
| Fertility | 2.52 (0.90) | 1.46 (0.97) | 0.15 (0.59) | 2.44 (3.86) | 5.45 (−3.37) | 0.57 (−2.92) | 0.16 (0.31) | 0.62 (−0.98) | 0.04 (−3.33) | 0.27 (0.23) | 0.17 (−3.91) | 0.67 (−3.93) |
| Survival Direct | 4.19 (0.28) | 7.40 (0.41) | 6.19 (0.39) | 3.21 (0.43) | 0.32 (−0.23) | 0.75 (−0.71) | 0.41 (0.62) | 0.11 (0.55) | 4.97 (0.22) | 0.95 (−0.16) | 0.13 (0.14) | 0.53 (−0.20) |
| SCC | 0.88 (0.30) | 0.04 (0.17) | 0.82 (0.37) | 0.55 (0.06) | 0.06 (0.16) | 0.05 (0.09) | 0.00 (0.00) | 0.06 (0.10) | 0.72 (0.55) | 2.05 (0.51) | 0.23 (−5.00) | 0.66 (−0.88) |
| LLPF | 0.51 (0.12) | 28.17 (0.88) | 16.27 (0.74) | 10.89 (0.37) | 0.06 (0.11) | 3.11 (0.22) | 0.05 (0.22) | 0.40 (−0.08) | 1.66 (0.19) | 0.99 (−0.43) | 0.07 (0.14) | 1.50 (0.59) |
| CIMF | 0.00 (0.01) | 3.38 (0.46) | 0.1 (−0.05) | 8.15 (−0.19) | 0.32 (0.57) | 0.71 (0.55) | 0.00 (0.08) | 0.04 (0.03) | 0.08 (0.35) | 0.1 (−0.10) | 2.96 (1.87) | 0.31 (0.30) |
| CRIB | 0.18 (−0.1) | 1.16 (0.21) | 0.07 (−0.13) | 0.19 (0.06) | 0.04 (−0.13) | 0.42 (−0.46) | 0.08 (0.78) | 0.81 (−0.77) | 0.28 (0.17) | 0.75 (−0.67) | 0.20 (−4.17) | 0.08 (−0.16) |
| SEMA | 1.52 (1.55) | 0.69 (−0.15) | 0.43 (1.24) | 2.44 (1.07) | 0.04 (−0.27) | 0.07 (−1.02) | 0.04 (1.21) | 0.44 (−1.23) | 0.35 (0.55) | 0.90 (−2.40) | 0.11 (−2.32) | 0.41 (0.86) |
| SC12 | 0.22 (−0.06) | 2.13 (0.12) | 2.99 (0.17) | 11.18 (0.19) | 0.47 (−0.46) | 0.17 (−0.39) | 0.00 (0.00) | 0.06 (0.07) | 0.04 (−0.30) | 1.45 (0.26) | 0.05 (0.09) | 0.00 (−0.01) |
| PNS24 | 3.58 (0.45) | 0.49 (0.40) | 0.40 (0.38) | 105.99 (0.65) | 1.57 (−1.10) | 0.04 (−0.48) | 0.04 (0.12) | 0.27 (−0.06) | 0.27 (−0.34) | 0.08 (0.01) | 1.00 (0.86) | 1.27 (−0.74) |
| AGECL_BB | 2.75 (−0.81) | 0.30 (−0.78) | 0.67 (−0.83) | 1.30 (−0.48) | 0.11 (−0.22) | 0.12 (−0.23) | 0.04 (0.12) | 0.17 (−0.09) | 0.84 (0.29) | 0.00 (0.01) | 0.25 (−4.97) | 0.79 (−0.99) |
| AGECL_TC | 0.04 (−0.06) | 0.14 (−0.14) | 1.49 (−1.11) | 0.86 (−0.03) | 0.04 (0.15) | 0.05 (−1.29) | 0.05 (1.45) | 0.61 (−1.78) | 0.00 (0.09) | 0.20 (−0.33) | 0.14 (−3.09) | 0.66 (0.88) |
| PPAI_BB | 0.58 (−0.99) | 1.10 (−1.85) | 0.24 (−1.07) | 0.71 (−0.71) | 0.00 (−0.10) | 0.62 (0.23) | 0.00 (0.31) | 0.19 (−0.23) | 0.57 (−2.78) | 0.17 (0.25) | 0.14 (−2.88) | 0.49 (−2.89) |
| PPAI_TC | 0.77 (−0.56) | 0.11 (−0.35) | 0.21 (−0.52) | 5.11 (−1.46) | 0.74 (1.33) | 0.36 (−0.53) | 0.09 (1.98) | 0.91 (−2.18) | 1.01 (1.78) | 0.20 (−0.26) | 0.18 (−3.73) | 0.67 (−3.70) |
For each cell the first value is -log10 of the P-value for the annotation class and trait. The second value (in brackets) is the regression coefficient for annotation class and trait. Enriched annotations have positive effects, depleted annotations have negative effects.
Figure 2Heat map visualizing the degree of similarities between the genomic relationship matrices (GRMs) for each annotation. Highly similar matrices are indicated with a red color and highly dissimilar matrices are indicated with white.
Figure 3Traits showing a significant increase or decrease in the variance explained for each of the annotation classes tested, compared with the same number of randomly chosen SNP. This analysis was replicated 5 times, with significance determine as greater or less than the average of the proportion of variance explained by the randomly chosen SNP +/− 2 times the standard error. Blue bars indicate a significant increase in the variance explained than the same number of randomly chosen intergenic SNPs. Orange bars indicate a depletion in the variance explained for that class.
Difference in variation explained for each trait in each annotation classes tested when compared with the same number of randomly chosen SNP
| Traits | Upstream | Downstream | Intron | CDS | Synonymous | Missense | UTR (5'&3') | Non-coding conserved | Micro RNA target sites |
|---|---|---|---|---|---|---|---|---|---|
| Fat | −0.2 | 0.0 | −2.4 | 3.8* | 5.0* | 5.1* | 1.8 | −1.1 | −0.9 |
| Fat Percent | 8.7* | 4.6* | 1.9 | 18.4* | 21.0* | 20.0* | 5.0* | −3.1* | −3.2* |
| Milk | 2.2 | 1.2 | −1.3 | 6.4* | 7.5* | 7.7* | 3.4* | −0.3 | 0.5 |
| Protein | −1.0 | −1.4 | −3.5* | 1.2 | 1.7 | 3.2* | 2.2 | −0.6 | 0.6 |
| Protein Percent | 2.2 | 1.6 | 0.2 | 9.4* | 10.9* | 9.3* | 5.9* | 0.2 | 1.9 |
| Angularity | −1.9 | 0.3 | −1.2 | −0.9 | −0.5 | −0.4 | −1.2 | 0.9 | −1.2 |
| BCS | −0.9 | −1.3 | −1.1 | 0.6 | −0.8 | 1.4 | −1.3 | 1.2 | −0.7 |
| Mammary System | −1.1 | −2.4 | −0.9 | 0.1 | 1.1 | −0.1 | −0.3 | 0.1 | −1.6 |
| Fertility | −0.8 | −0.2 | 0.5 | −0.4 | −0.2 | 0.1 | 0.0 | 0.1 | −0.3 |
| Survival Direct | −0.8 | −1.6 | −2.2 | −1.9 | −0.8 | 0.1 | −1.0 | −1.6 | −0.7 |
| SCC | −3.0 | −3.3* | −2.2 | 0.5 | 0.3 | 0.1 | −0.5 | −0.1 | −0.8 |
This analysis was replicated 5 times, with significance determined as greater or less than the average of the proportion of variance explained by the same number of randomly selected intergenic SNP across five replicates +/− 2 times the standard error. Values with the symbol “*” indicate that there is a significant increase in variation explained (the differences is greater than 2 times the standard error from the random SNP sets) or a significant decrease in the variance explained (difference is greater than −2 times the standard error from the random SNP sets).
Figure 4Proportion of genetic variance explained on a per SNP basis for each of the annotations when fitted jointly in the model. The genetic variance per SNP is expressed as a% divided by10−4. These results show how much variance each SNP contributes to the class for each trait.