| Literature DB >> 34100945 |
Ravi V Mural1, Marcin Grzybowski1, Chenyong Miao1, Alyssa Damke2, Sirjan Sapkota3,4, Richard E Boyles4,5, Maria G Salas Fernandez6, Patrick S Schnable6, Brandi Sigmon2, Stephen Kresovich4,7, James C Schnable1.
Abstract
Community association populations are composed of phenotypically and genetically diverse accessions. Once these populations are genotyped, the resulting marker data can be reused by different groups investigating the genetic basis of different traits. Because the same genotypes are observed and scored for a wide range of traits in different environments, these populations represent a unique resource to investigate pleiotropy. Here, we assembled a set of 234 separate trait datasets for the Sorghum Association Panel, a group of 406 sorghum genotypes widely employed by the sorghum genetics community. Comparison of genome-wide association studies (GWAS) conducted with two independently generated marker sets for this population demonstrate that existing genetic marker sets do not saturate the genome and likely capture only 35-43% of potentially detectable loci controlling variation for traits scored in this population. While limited evidence for pleiotropy was apparent in cross-GWAS comparisons, a multivariate adaptive shrinkage approach recovered both known pleiotropic effects of existing loci and new pleiotropic effects, particularly significant impacts of known dwarfing genes on root architecture. In addition, we identified new loci with pleiotropic effects consistent with known trade-offs in sorghum development. These results demonstrate the potential for mining existing trait datasets from widely used community association populations to enable new discoveries from existing trait datasets as new, denser genetic marker datasets are generated for existing community association populations.Entities:
Keywords: GWAS; community genetic resources; pleiotropy; quantitative genetics; sorghum
Mesh:
Year: 2021 PMID: 34100945 PMCID: PMC9335936 DOI: 10.1093/genetics/iyab087
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.402
Papers scoring traits in the sorghum association population
| Reference | Study type | Phenotypes scored | # of SAP accessions evaluated | Genetic associations? | Trait data online? |
|---|---|---|---|---|---|
|
| Vegetative | 8 | 377 | ASa | No |
|
| Height & Inflorescence | 6 | 378 | AS | Yes |
|
| Biomass Composition | 2 | 377 | No | Yes |
|
| Drought Stress | 10 | 300 | No | No |
|
| Grain Quality | 10 | 300 | AS | No |
|
| Tannins | 1 | 161 | AS | Yes |
|
| Various | 6 | 355 | GWAS | Used public data |
|
| Flavonoids | 2 | 259–387 | GWAS | Yes |
|
| Phosphorous Deficiency | 10 | 287 | AS | No |
|
| Tillering and inflorescence | 9 | 377 | GWAS | No |
|
| Plant Architecture | 6 | 315 | GWAS | Released here |
|
| Polyphenols | 3 | 308 | GWAS | No |
|
| Disease Resistance | 6 | 300 | GWAS | Yes, but IDs ambiguous |
|
| Drought Stress | 28 | 267 | GWAS | Yes |
|
| Disease Resistance | 3 | 177 | No | Yes |
|
| Grain Quality | 12 | 100 | No | Yes |
|
| Height | 3 | 307 | GWAS | Used public data |
|
| Height & Inflorescence | 12 | 354 | GWAS | No |
|
| Seed Size | 6 | 354 | GWAS | Yes |
|
| Various | 13 | 378 | GWAS | Released here |
|
| Elemental Abundance | 22 | 407 | GWAS | Yes |
|
| Plant Architecture | 9 | 315 | GWAS | Released here |
|
| Grain Quality | 10 | 378 | GWAS | Released here |
|
| Heat Stress | 2 | 374 | GWAS | No |
|
| Heat and Cold Stress | 12 | 300 | GWAS | Yes |
|
| Vegetative (HTP) | 4 | 307 | GWAS | No |
|
| Photosynthesis/Cold Stress | 24 | 304 | GWAS | No |
|
| Elemental Abundance | 18 | 100 | No | Yes |
|
| Polyphenols | 3 | 266 | GWAS | Yes |
|
| Grain Quality | 4 | 265 | GWAS | Yes |
|
| Disease Resistance | 2 | 335 | GWAS | Yes |
|
| Vegetative (HTP) | 6 | 325 | GWAS | Released here |
|
| Disease Resistance | 2 | 331 | GWAS | Yes |
|
| Mycotoxin | 2 | 98 | No | Yes, but IDs ambiguous |
|
| Cold Stress | 13 | 351 | GWAS | Yes |
|
| Various | 4 | 334 | GWAS | Used public data |
|
| Disease Resistance | 1 | 359 | GWAS | Yes |
|
| Root Architecture (HTP) | 12 | 294 | GWAS | Yes |
|
| Inflorescence (HTP) | 8 | 302 | GWAS | Yes |
|
| Height | 1 | 357 | GWAS | Yes |
“AS” means association studies which were not conducted using genome wide sets of markers, and “GWAS” means association studies which did utilize genome wide sets of markers.
Figure 1.Characteristics of Sorghum Association Panel trait datasets. (A) Geographic distribution of trials where trait datasets were collected. Size of circles indicates number of traits collected at a specific geographic location. Colors of circles indicate types of trait datasets collected at that location. Labels for which colors correspond to which types of traits are given in Panel (B). A set of 30 traits scored in Nova Porteirinha, Minas Gerais, Brazil (Queiroz ; Paiva ) are not visible in this panel. (B) Representation of seven broad phenotypic categories among the 234 traits collected here. Category assignments for individual traits are provided in Supplementary Table S1. (C) Distributions of narrow sense heritability values, calculated using the 2020 genetic marker dataset (Miao ), across the same seven broad phenotypic categories are shown in panel (B).
Figure 2.Combined Manhattan plots comparing MTAs identified using different marker datasets for the same individuals. (A) Combined Manhattan plot for 36 traits with at least one significant GWAS hit when analyzed using the 2013 genetic marker dataset and considering data from only those 304 sorghum lines genotyped in both the 2013 and 2020 datasets. Green lines topped with circles indicate the physical position and -log10P-value for the single most significant SNP within a GWAS peak identified for a particular trait. Text labels for individual traits employ trait names provided in Supplementary Table S1. Dashed red line indicates the cutoff for statistical significance calculated from the effective SNP number in the 2013 genetic marker dataset. (B) Combined Manhattan plot for 40 traits with at least one significant GWAS hit when analyzed using the 2020 genetic marker dataset and considering data from only those 304 sorghum lines genotyped in both the 2013 and 2020 datasets. Locations and P-values of the most significant SNP within each peak and statistical significance cutoff labeled as above. Blue labels indicate peaks shared between datasets. Red labels indicate traits where at least one significant GWAS peak is identified in both datasets but none of the peaks are shared between datasets. Black labels indicate traits where one or more significant GWAS peaks were identified with one of the marker datasets but no significant GWAS peaks were identified when the other marker dataset was employed. (C) Relationship between the identification of one or more significant GWAS peaks for a given trait dataset in each of the two genetic marker datasets. (D) Number of GWAS peaks which were either identified using both or only one of the two genetic marker datasets tested.
Figure 3.Combined Manhattan plot for GWAS using all 343 individuals genotyped in the 2020 SNP set. (A) Combined Manhattan plot for 43 traits with at least one significant GWAS hit when analyzed using the 2020 genetic marker dataset and all 343 sorghum lines genotyped in the 2020 genetic marker dataset. Green lines topped with circles indicate the physical position and -log10P-value for the single most significant SNP within a GWAS peak identified for a given trait. Text labels employ trait names provided in Supplementary Table S1. Dashed red line indicates the cutoff for statistical significance calculated from the effective SNP number in the 2020 genetic marker dataset. Lower panel indicate positions of a set of cloned sorghum mutants, taken from (Boyles ). Estimates of LD among summit SNPs of each peak are shown in Supplementary Figure S5A. (B) Summary of results from GWAS analysis using all 343 SAP lines included in the 2020 marker dataset. (C) Number of traits where one or more significant GWAS peaks were identified in the 2013 dataset considering only accessions shared with the 2020 dataset, the 2020 dataset considering only accessions shared with the 2013 dataset, and/or all accessions in the 2020 dataset.
Summary of the GWAS results when data from all 343 accessions in the 2020 marker set are employed
| Chr # | GWAS hits | Unique genomic regionsa | Single trait peaks | Two trait peaks | ≥3 trait peaks |
|---|---|---|---|---|---|
| Chr 1 | 5 | 5 | 5 | 0 | 0 |
| Chr 2 | 8 | 2 | 1 | 0 | 1 |
| Chr 3 | 10 | 10 | 10 | 0 | 0 |
| Chr 4 | 2 | 1 | 0 | 1 | 0 |
| Chr 5 | 4 | 3 | 2 | 1 | 0 |
| Chr 6 | 9 | 4 | 3 | 0 | 1 |
| Chr 7 | 4 | 3 | 2 | 1 | 0 |
| Chr 8 | 1 | 1 | 1 | 0 | 0 |
| Chr 9 | 13 | 2 | 1 | 0 | 1 |
| Chr 10 | 0 | 0 | 0 | 0 | 0 |
| Total | 56 | 31 | 25 | 3 | 3 |
aGWAS hits within 500 kb of each other on the genome were merged into a single interval. Given the low incidence of observed pleiotropy, a conservatively large interval (greater than the 50–350 kb reported range for LD decay in sorghum) was selected to reduce the incidence of false negatives (i.e. true cases of pleiotropy effects misclassified as independent signals from distinct loci).
bThe locus on Chr2 associated with ≥3 traits is associated with seven total traits all associated with seed composition: SeedProteinSum_H, SeedStarch&SumProtFatRatio_H, SeedStarchProtFatSum_H, SeedProtein_H, SeedStarchProteinRatio_H, SeedStarchFatRatio_H, and SeedOil_H.
cWhile these peaks were identified for multiple datasets, the datasets all represent independent measures of similar phenotypes.
dThe locus on Chr6 associated with ≥3 traits is associated with six total traits all associated with plant height: PlantVolume_V, PlantHeight_U, PlantHeight_V, HeightFlagLeaf_T, PlantHeight_W, and PanicleHeight_C.
eThe locus on Chr9 associated with ≥3 traits is associated with 13 total traits all associated with plant height: PlantArea_V, PanicleHeight_C, PanicleExsertion_U, HeightFlagLeaf_T, PlantHeight_T, PreflagHeight_C, HeightFlagLeaf_W, PlantHeight_W, FlagToApex_W, PlantVolume_V, PlantHeight_U, PlantHeight_V, and PlantSurfaceArea_V.
Figure 4.Pleiotropic analysis of SAP phenotypes. (A) Markers assigned significant Bayes factor values in MashR analysis. Green lines topped with circles indicate the physical position and log10 Bayes factor for the most significant SNP within a peak identified for a pleiotropic loci. Text labels indicate the position and name of the most significant marker within each peak. The number of trait datasets significantly associated with a marker at lfsr <0.001 is indicated in brackets. It should be noted that trait datasets include both measurements of different traits and the same trait scored across different environments in different studies. Dashed red line indicates the cutoff for statistical significance at log10 Bayes factor of 4. Estimates of linkage disequilibrium among the summit SNPs of each distinct peak are shown in Supplementary Figure S5B. (B) Distribution of the number of trait datasets which were significantly associated with each unique peak. (C) Distribution of effect sizes and directions of a subset of the 60 trait datasets for which the genetic marker S04_6082617 has a significant effect (lfsr < 0.001). To aid readability, only the subset of trait datasets where the effect size is >0.05 or <0.05 are shown. Bar thickness is proportional to the relative estimated statistical significance of each association with the thickest bars marking the most significantly associated trait for a given marker and the thinnest the least significantly associated trait for a given marker which still passed all filtering criteria. (D) Distribution of effect sizes and directions for a subset of the 87 trait datasets for which the genetic marker S06_60660773 has significant effects. Cutoffs for visualization are the same as applied for panel C. (E) Distribution of effect sizes and directions for a subset of the 90 trait datasets for which the genetic marker (S08_55231823) has significant effects. Cutoffs for visualization are the same as applied for panels C and D.