| Literature DB >> 27233668 |
Angela H Chen1, Alexander E Lipka2.
Abstract
A typical plant genome-wide association study (GWAS) uses a mixed linear model (MLM) that includes a trait as the response variable, a marker as an explanatory variable, and fixed and random effect covariates accounting for population structure and relatedness. Although effective in controlling for false positive signals, this model typically fails to detect signals that are correlated with population structure or are located in high linkage disequilibrium (LD) genomic regions. This result likely arises from each tested marker being used to estimate population structure and relatedness. Previous work has demonstrated that it is possible to increase the power of the MLM by estimating relatedness (i.e., kinship) with markers that are not located on the chromosome where the tested marker resides. To quantify the amount of additional significant signals one can expect using this so-called K_chr model, we reanalyzed Mendelian, polygenic, and complex traits in two maize (Zea mays L.) diversity panels that have been previously assessed using the traditional MLM. We demonstrated that the K_chr model could find more significant associations, especially in high LD regions. This finding is underscored by our identification of novel genomic signals proximal to the tocochromanol biosynthetic pathway gene ZmVTE1 that are associated with a ratio of tocotrienols. We conclude that the K_chr model can detect more intricate sources of allelic variation underlying agronomically important traits, and should therefore become more widely used for GWAS. To facilitate the implementation of the K_chr model, we provide code written in the R programming language.Entities:
Keywords: GWAS; linkage disequilibrium; maize; marker subsets; mixed model
Mesh:
Year: 2016 PMID: 27233668 PMCID: PMC4978891 DOI: 10.1534/g3.116.029090
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary information for three classes of traits that were analyzed in the Goodman association panel described in Flint-Garcia
| Trait Class | No. Traits Analyzed | Sample Size | No. Markers for GWAS | Data Source |
|---|---|---|---|---|
| Carotenoid | 15 | 201 | 291,939 | |
| Tocochromanol | 20 | 252 | 293,863 | |
| Flowering time | 3 | 278 | 299,253 |
GWAS, genome-wide association study.
Summary information for the traits that were analyzed in the North Central Regional Plant Introduction Station maize association panel described in Romay
| Trait | Sample Size | No. Markers for GWAS | Data Source |
|---|---|---|---|
| Sweet | 2631 | 387,612 | |
| Days to silking | 2279 | 391,060 | |
| Days to anthesis | 2293 | 391,044 | |
| Plant height | 2293 | 391,044 | |
| Ear height | 2293 | 391,044 |
GWAS, genome-wide association study.
Both best linear unbiased predictors and best linear unbiased estimators of these three traits are available in the supplement of Peiffer . We used best linear unbiased estimators for this analysis.
Number of significant associations detected at both 5% and 10% false discovery rates between the K_Chr model and traditional unified mixed linear model in the Goodman diversity panel
| Trait Class | Genetic Architecture | No. Significant Associations (5% FDR) | No. Significant Associations (10% FDR) | No. Significant Associations Identified Using K_chr Model in Novel Regions | No. Significant Associations Identified Using Traditional MLM in Novel Regions | ||
|---|---|---|---|---|---|---|---|
| K_Chr | Trad. MLM | K_Chr | Trad. MLM | ||||
| Carotenoid | Polygenic | 48 | 30 | 82 | 40 | 28 | 0 |
| Tocochromanol | Polygenic | 110 | 77 | 207 | 146 | 47 | 6 |
| Flowering time | Complex | 0 | 0 | 0 | 0 | 0 | 0 |
FDR, false discovery rate; MLM, mixed linear model; Trad., traditional.
A marker that is significantly associated with a trait at 10% FDR when using the K_chr model was declared to be in a novel genomic region if there is no marker within ± 250 kb that is significantly associated with the same trait at 10% FDR when using the traditional unified MLM.
A marker that is significantly associated with a trait at 10% FDR when using the traditional unified MLM was declared to be in a novel genomic region if there is no marker within ± 250 kb that is significantly associated with the same trait at 10% FDR when using the K_chr model.
Number of significant associations detected at a 5% false discovery rate between the K_Chr model and traditional unified mixed linear model in the North Central Regional Plant Introduction Station diversity panel
| Trait | Genetic Architecture | No. Significant Associations (5% FDR) | No. Significant Associations Identified Using K_chr Model in Novel Regions | No. Significant Associations Identified Using Traditional MLM in Novel Regions | |
|---|---|---|---|---|---|
| K_Chr | Trad. MLM | ||||
| Sweet | Mendelian | 22,600 | 21,985 | 18 | 0 |
| Days to silking | Complex | 30,590 | 32,874 | 97 | 0 |
| Days to anthesis | Complex | 17,254 | 11,564 | 263 | 0 |
| Plant height | Complex | 488 | 227 | 33 | 0 |
| Ear height | Complex | 2596 | 1016 | 311 | 0 |
FDR, false discovery rate; MLM, mixed linear model; Trad., traditional.
A marker that is significantly associated with a trait at 5% FDR when using the K_chr model was declared to be in a novel genomic region if there is no marker within ± 250 kb that is significantly associated with the same trait at 5% FDR when using the traditional unified MLM.
A marker that is significantly associated with a trait at 5% FDR when using the traditional unified MLM was declared to be in a novel genomic region if there is no marker within ± 250 kb that is significantly associated with the same trait at 5% FDR when using the K_chr model.
Figure 1Manhattan plots depicting all SNPs significantly associated with carotenoid (A) and tocochromanol (B) traits at 10% FDR using the K_chr model located in novel genomic regions. Such a SNP is in a novel genomic region if there are no SNPs within ± 250 kb significantly associated with that same trait at 10% FDR when using the traditional unified mixed linear model. (A) The X-axis depicts the B73 RefGen_v2 position along the maize genome and the Y-axis shows the −log(10) P-values for each significant SNP at 10% FDR located in a novel genomic region. The blue dots represent novel genomic signals for β-xanthophylls/α-xanthophylls, the light orange dot represents such a signal for α-carotene/zeinoxanthin, and the dark orange dots represent such genomic signals for zeinoxanthin/lutein. The minor allele frequencies of the SNPs depicted in the figure range from 0.09–0.45. (B) The X- and Y-axes are as described in (A). The blue dot represents novel genomic signals for γ-tocopherol/(γ-tocopherol + α-tocopherol), the light orange dots represent such signals for δ-tocotrienol/(γ-tocotrienol + α-tocotrienol), the dark orange dots represent such signals for δ-tocotrienol/γ-tocotrienol, and the purple dots represent such signals for α-tocopherol/γ-tocopherol. The minor allele frequencies of the SNPs depicted in the figure range from 0.08–0.48. The approximate B73 RefGen_v2 positions of relevant biosynthetic pathway genes are depicted by dotted gray arrows. FDR, false discovery rate; SNP, single nucleotide polymorphism.
For each indicated trait analyzed in the Goodman diversity panel, the number of significant markers identified by the K_chr model at 10% false discovery rate that are located in novel genomic regions are presented
| Trait Name | No. Significant Associations in Novel Regions | B73 RefGen_v2 Position of Nearest Novel Significant Association to Candidate Gene | Candidate Gene Name and B73 RefGen_v2 Position |
|---|---|---|---|
| β-Xanthophylls/α-xanthophylls | 25 | Chr 2: 51,751,723 | |
| Chr 8: 131,533,827 | |||
| α-Carotene/zeinoxanthin | 1 | Chr 1: 92,347,976 | |
| Zeinoxanthin/lutein | 2 | Chr 1: 92,347,976 | |
| γ-Tocopherol/(γ-tocopherol + α-tocopherol) | 2 | NA | NA |
| δ-Tocotrienol/(γ-tocotrienol + α-tocotrienol) | 10 | Chr 5: 132,656,905 | |
| δ-Tocotrienol/γ-tocotrienol | 30 | Chr 5: 133,501,858 | |
| α-Tocopherol/γ-tocopherol | 5 | NA | NA |
For all such markers that are on the same chromosome as an a priori candidate gene, information about the corresponding candidate gene is provided. Chr., chromosome; NA, not applicable.
A marker that is significantly associated with a trait at 10% false discovery rate (FDR) when using the K_chr model was declared to be in a novel genomic region if there is no marker within ± 250 kb that is significantly associated with the same trait at 10% FDR when using the traditional unified mixed linear model (MLM).
If at least one of the markers significantly associated with a trait at 10% FDR using the K_chr model is located in a novel genomic region on the same chromosome as a relevant candidate gene, then the B73 RefGen_v2 position of the closest such marker to the candidate gene is reported.
When applicable, the name of the nearest candidate gene (as depicted in Owens and Lipka ) as well as their B73 RefGen_v2 ORF (open reading frame) start and stop bp are reported.
Figure 2Manhattan plot depicting all SNPs significantly associated with the traits evaluated in the North Central Regional Plant Introduction Station panel at 5% FDR using the K_chr model located in novel genomic regions. Such a SNP is in a novel genomic region if there are no SNPs within ± 250 kb significantly associated with that same trait at 5% FDR when using the traditional unified mixed linear model. The X-axis depicts the B73 RefGen_v2 position along the maize genome and the Y-axis shows the −log(10) P-values for each significant SNP at 5% FDR located in a novel genomic region. The blue dots represent novel genomic signals for sweet vs. starchy corn, the light orange dots represent such signals for days to silking, the red dots represent such signals for days to anthesis, the black dots represent such signals for plant height, and the purple dots represent such signals for ear height. The minor allele frequencies of the SNPs depicted in the figure range from 0.05–0.50. The approximate B73 RefGen_v2 positions of relevant candidate genes and regulatory elements are depicted by dotted gray arrows. FDR, false discovery rate; SNP, single nucleotide polymorphism.
For each indicated trait analyzed in the North Central Regional Plant Introduction Station panel, the number of significant markers identified by the K_chr model at 5% false discovery rate that are located in novel genomic regions are presented
| Trait Name | No. Significant Associations in Novel Regions | B73 RefGen_v2 Position of Nearest Novel Significant Association to Candidate Gene | Candidate Gene/Regulatory Element Name and B73 RefGen_v2 Position |
|---|---|---|---|
| Sweet | 18 | NA | NA |
| Days to silking | 97 | Chr 10: 58,673,233 | |
| Days to anthesis | 263 | Chr 8: 96,929,838 | |
| Chr 8: 150,876,807 | |||
| Chr 10: 94,588,819 | |||
| Plant height | 33 | NA | NA |
| Ear height | 311 | NA | NA |
For all such markers that are on the same chromosome as an a priori candidate gene or regulatory element, corresponding genomic information is provided. NA, not applicable; Chr., chromosome.
A marker that is significantly associated with a trait at 5% false discovery rate (FDR) when using the K_chr model was declared to be in a novel genomic region if there is no marker within ± 250 kb that is significantly associated with the same trait at 5% FDR when using the traditional unified MLM.
If at least one of the markers significantly associated with a trait at 5% FDR using the K_chr model is located in a novel genomic region on the same chromosome as a relevant candidate gene or regulatory element, then the B73 RefGen_v2 position of the closest such marker is reported.
When applicable, the name of the nearest candidate gene or regulatory element as well as their B73 RefGen_v2 ORF (open reading frame) start and stop bp are reported.
Figure 3Distribution of P-values obtained from the K_chr model and traditional unified mixed linear model (MLM) at six specific genomic regions, each of which contain at least one candidate gene. Each graph compares the distribution of P-values from the K_chr model (red box plot, left) to those from the traditional unified MLM (blue box plot, right). The −log(10) P-values are presented on the Y-axis. (A) Distribution of P-values from the K_chr model and MLM when markers within the chromosome 5 region surrounding ZmVTE1 were tested for association with δ-tocotrienol/γ-tocotrienol. (B) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 1 region surrounding lut1 were tested for association with zeinoxanthin. (C) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 4 region surrounding Su1 were tested for associations with sweet vs. starchy corn. (D) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 5 region surrounding ZmVTE4 were tested for associations with α-tocopherol. (E) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 2 region surrounding zep1 were tested for associations with β-xanthophylls/α-xanthophylls. (F) Distribution of P-values from the K_chr model and MLM when markers in the chromosome 8 region surrounding ZCN8 and ZmRap2.7 were tested for associations with days to silking. For the regions with high local linkage disequilibrium (LD; i.e., those presented in A, B, and C), the distribution of P-values from the K_chr model are noticeably lower than the distribution presented by the traditional unified MLM. The same trend is observed for the two regions analyzed using data from the powerful North Central Regional Plant Introduction Station panel (presented in C and F). Finally, the distribution of P-values from the two different models are more similar in regions of lower LD (presented in D and E) analyzed using data from the smaller Goodman diversity panel.
For each genomic region surrounding the indicated a priori candidate gene that was assessed using results from the Goodman diversity panel, the Wilcoxon signed rank test was used to compare the distribution of P-values obtained from the K_chr model to those from the traditional unified mixed linear model
| Trait Analyzed | Trait Analyzed | Trait Analyzed | Trait Analyzed | ||||
|---|---|---|---|---|---|---|---|
| δ-Tocotrienol | < 2.20 × 10−16 | Zeinoxanthin | 5.29 × 10−2 | α-Tocopherol | 8.69 × 10−1 | Zeaxanthin | 2.17 × 10−1 |
| δ-Tocotrienol/(γ-tocotrienol + α-tocotrienol) | < 2.20 × 10−16 | α-Carotene/zeinoxanthin | 1.25 × 10−3 | δ-Tocopherol/α-tocopherol | 4.29 × 10−2 | β-Xanthophylls/α-xanthophylls | 1.16 × 10−1 |
| δ-Tocotrienol/γ-tocotrienol | < 2.20 × 10−16 | Zeinoxanthin/lutein | 4.45 × 10−3 | γ-Tocopherol/(γ-tocopherol + α-tocopherol) | 7.30 × 10−1 | ||
| α-Tocopherol/γ-tocopherol | 2.5 × 10−1 | ||||||
For each indicated trait, P-values from the Wilcoxon signed rank test are reported. The genomic regions surrounding ZmVTE1 and lut1 are in high linkage disequilibrium (LD), while the genomic regions surrounding ZmVTE4 and zep1 are in lower LD.
For each genomic region surrounding the indicated a priori candidate gene or regulatory element that was assessed using results from the North Central Regional Plant Introduction Station panel, the Wilcoxon signed rank test is used to compare the distribution of P-values obtained from the K_chr model to those from the traditional unified mixed linear model
| Trait Analyzed | Trait Analyzed | ||
|---|---|---|---|
| Sweet | < 2.20 × 10−16 | Days to anthesis | < 2.20 × 10−16 |
| Days to silking | < 2.20 × 10−16 | ||
| Plant height | < 2.20 × 10−16 | ||
| Ear height | < 2.20 × 10−16 | ||
For each indicated trait, P-values from the Wilcoxon signed rank test are reported. The genomic region surrounding Su1 is in high linkage disequilibrium (LD), while the genomic region surrounding ZCN8 and ZmRap2.7 is in lower LD.