| Literature DB >> 31501319 |
Ruidong Xiang1,2, Irene van den Berg3,2, Iona M MacLeod2, Benjamin J Hayes2,4, Claire P Prowse-Wilkins3,2, Min Wang2,5, Sunduimijid Bolormaa2, Zhiqian Liu2, Simone J Rochfort2,5, Coralie M Reich2, Brett A Mason2, Christy J Vander Jagt2, Hans D Daetwyler2,5, Mogens S Lund6, Amanda J Chamberlain2, Michael E Goddard3,2.
Abstract
Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.Entities:
Keywords: animal breeding; cattle; evolution; gene regulation; quantitative traits
Mesh:
Year: 2019 PMID: 31501319 PMCID: PMC6765237 DOI: 10.1073/pnas.1904159116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Overview of the analysis. The discovery analysis involved the selection of variants from functional and evolutionary datasets; this figure shows examples of some of the datasets used. In the test analysis, each of the variant sets was used to make GRMs. Then, each one was analyzed in the GREML (gGi), together with the high-density SNP chip GRM (gGHD) for each of the 34 traits (Yj, ). Once the heritability, , of each gGi was calculated, it was averaged across traits and adjusted for the number of variants used to build the gGi to calculate the per-variant . The FAETH scoring of each variant was derived based on their memberships to differentially partitioned sets and the per-variant . In the validation analysis, variants with high and low FAETH ranking were tested in a Danish cattle dataset for GREML and genomic prediction of 3 production traits. The Australian test dataset contained 9,739 bulls and 22,899 cows of Holstein breed, 2,059 bulls and 6,174 cows of Jersey, 2,850 cows of mixed breeds, and 125 bulls and 424 cows of Australian Red. The Danish reference set contained 4,911 Holstein, 957 Jersey, and 745 Danish Red bulls, and the Danish validation population contained 500 Holstein, 517 Jersey, and 192 Danish Red bulls.
Variant sets selected from functional and evolutionary partitions
| Partitions | Targeted variant sets (no. of variants) | Animal no. |
| Gene expression QTLs | geQTLs with metaanalysis | 209 |
| Exon expression QTLs | eeQTLs with metaanalysis | 209 |
| Splicing QTLs | sQTLs with metaanalysis | 209 |
| Allele specific expression QTLs | aseQTLs with metaanalysis | 112 |
| Polar lipid metabolite QTLs | mQTLs with metaanalysis | 338 |
| ChIP-seq peaks | Under H3K4Me3 and H3K27Ac peaks from liver, muscle, and mammary gland (1,166,795) | 15 |
| Variant annotation | Annotated as UTR (42,350), intergenic (11,869,145), gene end (1,007,214), intron (4,629,025), splice.sites (11,080), coding.related (105,969), and noncoding.related (4,589) | na |
| Predicted CTCF sites | Variants tagged by mapped CTCF-binding motifs from humans, mice, dogs, and macaques as published in ref. | na |
| HPRS | Genome sites within the top 1% gkm SVM score from the HPRS as published in ref. | na |
| Conserved 100 species | Bovine genome sites lifted over from human sites with PhastCon score ( | na |
| Selection signature | GWAS | 1,370 |
| Young variants | Ranked within the bottom 1% of the proportion of positive correlations (PPRR) with rare variants, 1000 Bull Genome (893,986) | 2,330 |
| LD score quartiles | First quartile (4,417,033/4,416,205), second quartile (4,418,731/4,419,930), third quartile (4,415,633/4,415,481), and fourth quartile (4,417,975/4,417,756) | 44,270 |
| Variant density quartiles | First quartile (4,429,833), second quartile (4,414,996), third quartile (4,427,220), and fourth quartile (4,397,323) | |
| MAF quartiles | First quartile (4,414,292/4,417,036), second quartile (4,421,093/4,417,428), third quartile (4,416,834/4,418,157), and fourth quartile (4,417,153/4,418,157) |
For the 3 categories of quartiles, the numbers of variants on the left and right side of the slash were for the bulls and cows, respectively. LD score indicates the sum of linkage disequilibrium correlation between a variant and all variants in the surrounding 50-kb region, GCTA-LDS (38). The details of the variant annotations can be found in . The animal numbers are the sample size in each discovery analysis. Fourth quartile scores > third quartile > second quartile > first quartile. na, not applicable.
Fig. 2.Examples of regulatory and evolutionary signals from the discovery analysis. (A) A Manhattan plot of the metaanalysis of sQTLs from white blood and milk cells and liver and muscle tissues. (B) A Manhattan plot of the metaanalysis of aseQTLs in the white blood cells. (C) A distribution density plot of variants tagged by H3K4Me3 ChIP-seq mark from mammary gland within 2 Mb of gene transcription start site. (D) Artificial selection signatures between 8 dairy and 7 beef cattle breeds with the linear mixed-model approach using the 1000 Bull Genome database. The blue line indicates −log10(P value) = 4.
The relative proportion of selected variant in sets compared to the total number of variants analyzed (genome fraction) and their averaged heritability in bulls and cows, across 34 traits
| Category | Genome fraction, % | ||
| eeQTLs | 4.77 | 14.52 (2.2) | 3.96 (1.2) |
| sQTLs | 5.57 | 15.08 (2.5) | 3.88 (1.2) |
| aseQTLs | 5.21 | 11.0 (2.0) | 2.47 (0.7) |
| mQTLs | 0.03 | 0.71 (0.2) | 0.12 (0.04) |
| geQTLs | 0.53 | 1.54 (0.4) | 0.19 (0.06) |
| ChIP-seq | 6.60 | 4.21 (0.8) | 0.90 (0.3) |
| Noncoding.related | 0.03 | 0.06 (0.02) | 0.013 (0.004) |
| Splice.sites | 0.06 | 0.08 (0.02) | 0.02 (0.005) |
| UTR | 0.24 | 0.18 (0.03) | 0.03 (0.01) |
| Coding.related | 0.60 | 0.26 (0.06) | 0.04 (0.012) |
| Geneend | 5.70 | 3.76 (0.8) | 0.80 (0.2) |
| Intron | 26.2 | 5.56 (0.7) | 1.53 (0.3) |
| Intergenic | 67.2 | 10.3 (1.3) | 17.3 (2.2) |
| Predicted CTCF sites | 1.43 | 0.36 (0.08) | 0.046 (0.02) |
| HPRS | 0.96 | 0.31 (0.08) | 0.045 (0.02) |
| Conserved 100 species | 2.1 | 41.4 (2.6) | 17.4 (2.3) |
| Selection signatures | 0.02 | 0.011 (0.004) | 0.002 (0.0008) |
| Young variants | 0.54 | 0.78 (0.2) | 0.12 (0.05) |
| LD score q1 | 25 | 4.57 (0.6) | 1.18 (0.3) |
| LD score q2 | 25 | 5.56 (0.7) | 1.45 (0.3) |
| LD score q3 | 25 | 6.38 (0.8) | 1.75 (0.4) |
| LD score q4 | 25 | 6.94 (0.9) | 2.01 (0.5) |
| Variant density q1 | 25 | 5.59 (0.7) | 1.49 (0.3) |
| Variant density q2 | 25 | 5.42 (0.7) | 1.45 (0.3) |
| Variant density q3 | 25 | 5.72 (0.7) | 1.55 (0.3) |
| Variant density q4 | 25 | 5.99 (0.7) | 1.65 (0.4) |
| MAF q1 | 25 | 1.36 (0.2) | 0.35 (0.08) |
| MAF q2 | 25 | 11.5 (1.3) | 3.51 (0.7) |
| MAF q3 | 25 | 29.2 (2.4) | 10.3 (1.8) |
| MAF q4 | 25 | 40.5 (2.8) | 15.6 (2.4) |
SEs are in parenthesis. q1 ∼ q4 were the genome partitions based on the first, second, third, and fourth quartiles of MAF, LD score, and the number of variants (variant density) per 50-kb windows. Fourth quartile > third quartile > second quartile > first quartile.
Fig. 3.The proportion of genetic variances explained by sets of variants selected from functional and evolutionary categories. The ranking of variant sets based on the log10 scale of per-variant , averaged across bulls (left error bar) and cows (right error bar).
Fig. 4.Examples of top-ranked variant sets in important bovine trait QTL. (A) Manhattan plot of the metaanalysis of GWAS of 34 traits in the ±2 Mb region surrounding the beta casein (CSN2) gene, a major QTL for milk protein yield. (B) Manhattan plot of the metaanalysis of GWAS of 34 traits in the ±1 Mb region of the microsomal GST 1 (MGST1) gene, a major QTL for milk fat yield. The dots are colored based on their set memberships. The black bar between the gray dots and the X-axis indicates the gene locations.
Fig. 5.Further tests of the variant FAETH score. (A) The heritability of high and low FAETH ranking variants for the multibreed GRM and the within-breed GRM (2 GRMs fitted together) estimated across 34 traits in the Australian data. The error bars are the SE of heritability calculated across 34 traits. (B) The heritability of high and low FAETH ranking variants for 3 additional traits to the 34 traits in the Australian data used to calculate the FAETH score. (C) The multibreed heritability of high and low FAETH variants for 3 production traits in Danish data. The error bars are the SEs of the heritability of each GREML analysis. (D) Prediction accuracy of gBLUP of 3 production traits in Danish data using high and low FAETH variants (averaged between bulls and cows). The genomic predictors were trained in multiple breeds and predicted into single breeds (HOL, Holstein; JER, Jersey). P values of significant difference based on Z-score test: •P < 0.1; **P < 0.01; ***P < 0.001; ****P < 0.0001. Note that for the prediction accuracy r, the significance of difference was based on the sample sizes of the Danish candidate subset where there were 500 Holstein, 517 Jersey, and 192 Danish Red ().
FAETH annotation of previously identified causal or putative causal mutations for dairy cattle complex traits using the top variant sets
| Loci | Causal candidates | Annotation | Tagging variant sets | FAETH ranking |
| Chr1:144377960 ( | Intron | aseQTL | High | |
| Chr14:1802266 ( | Coding.related | mQTL, eeQTL, sQTL, aseQTL, ChIP-seq | High | |
| Chr19:51386735 ( | Intron | mQTL, eeQTL, sQTL, ChIP-seq | High | |
| Chr20:31909478 ( | Coding.related | Conserved 100 species | High |
“High” means that the variant was ranked within the top 1/3 of the FAETH score.