| Literature DB >> 32635893 |
Irene van den Berg1, Ruidong Xiang2,3, Janez Jenko4, Hubert Pausch5, Mekki Boussaha6, Chris Schrooten7, Thierry Tribout6, Arne B Gjuvsland4, Didier Boichard6, Øyvind Nordbø4, Marie-Pierre Sanchez6, Mike E Goddard2,3.
Abstract
BACKGROUND: Sequence-based genome-wide association studies (GWAS) provide high statistical power to identify candidate causal mutations when a large number of individuals with both sequence variant genotypes and phenotypes is available. A meta-analysis combines summary statistics from multiple GWAS and increases the power to detect trait-associated variants without requiring access to data at the individual level of the GWAS mapping cohorts. Because linkage disequilibrium between adjacent markers is conserved only over short distances across breeds, a multi-breed meta-analysis can improve mapping precision.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32635893 PMCID: PMC7339598 DOI: 10.1186/s12711-020-00556-4
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Description of GWAS used in the meta-analysis
| Acronym | Country | Breeds | Sex | Pheno | GWAS | impRef | impSoft | nIds | nVar |
|---|---|---|---|---|---|---|---|---|---|
| AUSB | Australia, New Zealand, the Netherlands | Holstein, Jersey, Australian Red | Bulls | DYD | GCTA | 1000_Run6 | Minimac3 | 11,923 | 15,474,359 |
| AUSC | Australia | Holstein, Jersey, Australian Red | Cows | YD | GCTA | 1000_Run6 | Minimac3 | 32,347 | 15,400,322 |
| HOLF | France | Holstein | Bulls | DYD | GCTA | 1000_Run4 | FImpute | 6375 | 13,885,363 |
| MON | France | Montbéliarde | Bulls | DYD | GCTA | 1000_Run4 | FImpute | 2588 | 14,409,070 |
| NOR | France | Normande | Bulls | DYD | GCTA | 1000_Run4 | FImpute | 2319 | 13,937,693 |
| NR | Norway | Norwegian Red | Bulls, cows | (D)YD | GCTA | within breed | Minimac4 | 21,540 | 12,985,160 |
| HOLG | Germany | Holstein | Bulls | EBV | EMMAX | 1000_Run4 | Minimac3 | 8805 | 14,804,061 |
| BRAU | Switzerland | Braunvieh | Bulls | EBV | EMMAX | 1000_Run5 | Minimac3 | 1646 | 15,813,995 |
| FLCK | Germany/Austria | Fleckvieh | Bulls | DYD | EMMAX | 1000_Run5 | Minimac3 | 6778 | 17,042,717 |
| Total | 94,321 | 25,702,992 |
Pheno: phenotypes used i.e. daughter yield deviations (DYD, bulls), yield deviations (YD, cows), estimated breeding values (EBV), GWAS: software used for GWAS, impRef: imputation reference, impSoft: imputation software; nIds: number of individuals; nVar: number of variants; nIds and nVar were the same except in Norwegian Red (21,550 individuals and 12,985,177 variants for protein content)
Description of the GWAS used in the validation meta-analysis
| Acronym | Country | Breeds | Sex | Pheno | GWAS | impRef | impSoft | nIds |
|---|---|---|---|---|---|---|---|---|
| VAUSC | Australia | Holstein, Jersey | Cows | YD | GCTA | 1000_Run6 | Minimac3 | 26,953 |
| VHOLF | France | Holstein | Cows | YD | GCTA | 1000_Run4 | FImpute | 2216 |
| VMON | France | Montbéliarde | Cows | YD | GCTA | 1000_Run4 | FImpute | 3032 |
| VNOR | France | Normande | Cows | YD | GCTA | 1000_Run4 | FImpute | 2659 |
| Total | 34,860 |
Pheno: phenotypes used were yield deviations (YD, cows); GWAS: software used for GWAS; impRef: imputation reference; impSoft: imputation software; nIds: number of individuals
Fig. 1Manhattan plot of the meta-analysis for fat percentage. Red line indicates p = 10−8
Fig. 2Manhattan plot of the meta-analysis for protein percentage. Red line indicates p = 10−8
Number of variants and QTL detected in the GWAS and meta-analysis for fat and protein percentage
| Analysis | Fat % | prot % | ||||||
|---|---|---|---|---|---|---|---|---|
| nS | FDR | nQ | nS/nQ | nS | FDR | nQ | nS/nQ | |
| AUSB | 8871 | 1.7 × 10−5 | 56 | 158 | 13,955 | 1.1 × 10−5 | 52 | 268 |
| AUSC | 9502 | 1.6 × 10−5 | 74 | 128 | 13,475 | 1.1 × 10−5 | 49 | 275 |
| HOLF | 10,124 | 1.4 × 10−5 | 22 | 460 | 11,033 | 1.3 × 10−5 | 35 | 315 |
| MON | 3971 | 3.6 × 10−5 | 13 | 305 | 5383 | 2.7 × 10−5 | 19 | 283 |
| NOR | 2981 | 4.7 × 10−5 | 11 | 271 | 3379 | 4.1 × 10−5 | 16 | 211 |
| NR | 4304 | 3.0 × 10−5 | 30 | 143 | 6231 | 2.1 × 10−5 | 37 | 168 |
| HOLG | 9244 | 1.6 × 10−5 | 20 | 462 | 10,102 | 1.5 × 10−5 | 38 | 266 |
| BRAU | 2483 | 6.4 × 10−5 | 13 | 191 | 2117 | 7.5 × 10−5 | 9 | 235 |
| FLCK | 9492 | 1.8 × 10−5 | 20 | 475 | 5654 | 3.0 × 10−5 | 33 | 171 |
| GWAS | 31,559 | – | 124 | 255 | 42,518 | – | 104 | 409 |
| META | 27,820 | 9.2 × 10−6 | 138 | 202 | 44,095 | 5.8 × 10−6 | 176 | 251 |
fat %: fat percentage; prot %: protein percentage; nS: number of significant variants; FDR: false discovery rate; nQ: number of QTL; nS/nQ: number of significant variants per QTL; AUSB: Australian bull dataset; AUSC: Australian cow dataset; HOLF: French Holstein; MON: Montbéliarde; NOR: Normande; NR: Norwegian Red; HOLG: German Holstein; BRAU: Braunvieh; FLCK: Fleckvieh; GWAS: non-overlapping significant variants select in any of the 9 GWAS; META: meta-analysis
Fig. 3Distribution of minor allele frequencies (MAF) of all variants and significant variants. Significant variants had a p-value ≤ 10−8 in the meta-analysis
Percentage of variants in functional classes of annotation
| Annotation | All variants | p_fat % ≤ 10−8 | p_prot % ≤ 10−8 |
|---|---|---|---|
| Intergenic | 65.85 | 51.45 | 50.07 |
| Intron | 26.54 | 35.12 | 35.84 |
| Upstream_gene | 3.49 | 5.91 | 5.89 |
| Downstream_gene | 3.04 | 4.24 | 4.83 |
| Synonymous | 0.36 | 0.99 | 0.91 |
| Missense | 0.32 | 0.54 | 0.54 |
| 3_prime_UTR | 0.22 | 0.32 | 0.44 |
| Splice_region | 0.06 | 0.14 | 0.14 |
| 5_prime_UTR | 0.05 | 0.12 | 0.11 |
| Non_coding_transcript_exon | 0.03 | 0.15 | 0.12 |
| Other | 0.02 | 0.01 | 0.02 |
| Not annotated | 0.02 | 1.01 | 1.09 |
p_fat % ≤ 10−8 and p_prot % ≤ 10−8 = variants with a p-value ≤ 10−8 in the meta-analysis for fat and protein percentage, respectively
Overlap between eQTL and significant variants
| Set | Cells collected from milk samples | Blood cells | ||||
|---|---|---|---|---|---|---|
| Total | eQTL | % | Total | eQTL | % | |
| All | 9,191,239 | 6678 | 0.07 | 8,587,100 | 52,802 | 0.61 |
| pgwas_fat % ≤ 10−8 | 22,152 | 9 | 0.04 | 20,702 | 476 | 2.30 |
| pmeta_fat % ≤ 10−8 | 20,087 | 3 | 0.01 | 18,735 | 633 | 3.38 |
| pgwas_prot % ≤ 10−8 | 28,967 | 5 | 0.02 | 27,781 | 1081 | 3.89 |
| pmeta_prot % ≤ 10−8 | 33,911 | 13 | 0.04 | 32,505 | 1496 | 4.60 |
All: all variants present in both meta-analysis and eQTL study; pgwas_fat % ≤ 10−8, pgwas_prot % ≤ 10−8, pmeta_fat % ≤ 10−8 and pmeta_prot % ≤ 10−8: variants with a p-value ≤ 10−8 in at least one of the within population GWAS and meta-analysis for fat and protein percentage, respectively; total: total number of variants in a set of variants, eQTL: number of variants in a set that were eQTL, %: eQTL total*100%
Fig. 4Distribution of the proportion of GWAS with the same direction of effect. From left to right: all variants (all variants), variants with p ≤ 10−8 in the meta-analysis (p ≤ 10−8), variants included in all GWAS (variants in all GWAS), and variants included in all GWAS and p ≤ 10−8 in the meta-analysis (p ≤ 10−8 & in all GWAS). Top = fat percentage (fat %), bottom = protein percentage (prot %)
Fig. 5FST of all sequence variants and significant variants for fat percentage. Empirical cumulative distribution (ECDF) of FST values of all variants (black) and significant variants (red), for variants with a minor allele frequency between 0.002 and 0.01 (top left), 0.01 and 0.05 (top right), 0.05 and 0.10 (bottom left), and 0.10 and 0.50 (bottom right)
Fig. 6FST of all sequence variants and significant variants for protein percentage. Empirical cumulative distribution (ECDF) of FST values of all variants (black) and significant variants (red), for variants with a minor allele frequency between 0.002 and 0.01 (top left), 0.01 and 0.05 (top right), 0.05 and 0.10 (bottom left), and 0.10 and 0.50 (bottom right)
Fig. 7Fold enrichment and significance of keywords in DAVID clusters for fat and protein percentage. Log fold enrichment (logFE) and −log10 p-value after Benjamini–Hochberg correction for multiple testing (−log10(p)) for keywords in the top three clustered detected in DAVID analysis. For cluster, the keyword the top significant variant is annotated on the graph