| Literature DB >> 23741505 |
John Stanton-Geddes1, Timothy Paape, Brendan Epstein, Roman Briskine, Jeremy Yoder, Joann Mudge, Arvind K Bharti, Andrew D Farmer, Peng Zhou, Roxanne Denny, Gregory D May, Stephanie Erlandson, Mohammed Yakub, Masayuki Sugawara, Michael J Sadowsky, Nevin D Young, Peter Tiffin.
Abstract
Genome-wide association study (GWAS) has revolutionized the search for the genetic basis of complex traits. To date, GWAS have generally relied on relatively sparse sampling of nucleotide diversity, which is likely to bias results by preferentially sampling high-frequency SNPs not in complete linkage disequilibrium (LD) with causative SNPs. To avoid these limitations we conducted GWAS with >6 million SNPs identified by sequencing the genomes of 226 accessions of the model legume Medicago truncatula. We used these data to identify candidate genes and the genetic architecture underlying phenotypic variation in plant height, trichome density, flowering time, and nodulation. The characteristics of candidate SNPs differed among traits, with candidates for flowering time and trichome density in distinct clusters of high linkage disequilibrium (LD) and the minor allele frequencies (MAF) of candidates underlying variation in flowering time and height significantly greater than MAF of candidates underlying variation in other traits. Candidate SNPs tagged several characterized genes including nodulation related genes SERK2, MtnodGRP3, MtMMPL1, NFP, CaML3, MtnodGRP3A and flowering time gene MtFD as well as uncharacterized genes that become candidates for further molecular characterization. By comparing sequence-based candidates to candidates identified by in silico 250K SNP arrays, we provide an empirical example of how reliance on even high-density reduced representation genomic makers can bias GWAS results. Depending on the trait, only 30-70% of the top 20 in silico array candidates were within 1 kb of sequence-based candidates. Moreover, the sequence-based candidates tagged by array candidates were heavily biased towards common variants; these comparisons underscore the need for caution when interpreting results from GWAS conducted with sparsely covered genomes.Entities:
Mesh:
Year: 2013 PMID: 23741505 PMCID: PMC3669257 DOI: 10.1371/journal.pone.0065688
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Proportion of variance attributed to accessions and linkage disequilibrium for top 50 candidates SNPs.
| trait | Among accesion/total variance | Proportion top 50 SNPs not in LD (r2<0.8) | Proportion top 50 SNPs not in LD (r2<0.3) | Linear regression r2 (SNPs in final model) | correlation between MAF and effect size, top 200 SNPs (P-value) |
| Height | 0.58 | 0.98 | 0.91 | 0.75 (33) | −0.12 (0.10) |
| Flowering | 0.74 | 0.89 | 0.59 | 0.64 (31) | −0.05 (0.48) |
| Trichomes | 0.45 | 0.32 | 0.68 | 0.41 (17) | 0.22 (0.002) |
| Nodules on upper roots | 0.34 | 0.95 | 0.73 | 0.65 (27) | 0.01 (0.93) |
| Nodules on lower roots | 0.35 | 0.98 | 0.86 | 0.69 (32) | −0.21 (0.002) |
| Total nodules | 0.38 | 0.99 | 0.94 | 0.74 (30) | −0.08 (0.24) |
| Strain occupancy in upper roots | 0.24 | 0.98 | 0.92 | 0.67 (27) | 0.10 (0.14) |
| Strain occupancy in lower roots | 0.22 | 0.96 | 0.87 | 0.61 (24) | 0.17 (0.01) |
Figure 1Manhattan plots showing candidate SNPs.
(a) Flowering time, (b) nodules in lower roots and (c) nodule occupancy in lower roots. Colors indicate MAF of top 200 SNPs. Y-axis shows –log10(P) and X-axis is the physical location along each of the 8 chromosomes, uncaptured transcribed contigs (T), unanchored BACs (U).
Characterized genes associated with candidate SNPs for nodulation traits.
| Trait | Gene name | Function |
|
| Calmodulin | signaling during nodule formation |
|
| nod factor receptor, acts upstream of other nod signaling genes | |
|
| signaling during defense and development | |
|
|
| nodule development, nodule-specific expression induced by rhizobial infection |
|
| chitinase with rhizobial strain-specific expression | |
|
|
| nod factor induced |
|
| Calmodulin | signaling during nodule formation |
|
| predominant ATPase functioning in symbiotic Ca2
+ signaling | |
|
| Nodule specific glycine rich protein, expressed primarily in young nodules, in nodule apex | |
|
| NO3
− dependent expression, involved in primary root growth and NO3
− sensing | |
|
|
| strongly expressed in nodules, binds |
|
| nodulin with rhizobia-signal dependent expression, affects infection thread size and number of viable bacteria inside of nodules |
Overlap in candidate SNPs identified using sequence data compared to in silico SNP arrays.
| top 20 | top 20 | top 50 | top 50 | |
| 1 kb | 20 kb | 1 kb | 20 kb | |
| Height | 14.1(9–19) | 17.2(12–20) | 18.9(12–26) | 30.2(22–38) |
| Nodule numberlower roots | 9.4(4–15) | 11.5(6–16) | 14.0(6–21) | 21.2(12–29) |
| Strain occupancylower roots | 6.3(2–12) | 9.4(4–16) | 8.4(3–16) | 16.5(10–25) |
Shown are the average number of top 20 and 50 in silico candidate SNPs within 1 and 20 kb of one of the top 200 sequenced-based candidates. Data are from 100 250 K SNP in silico platforms, the minimum and maximum number of tagged sequence candidates is in parentheses.
Figure 2MAF distribution of genomic and candidate SNPs (minor allele frequency >0.02) identified using sequence data and 250 K SNP arrays.
Shown are (a) all assayed SNPs, (b) sequence-based candidates for height, (c) top 50 candidate SNPs from 100 in silico platforms, and (d) distributions of sequenced based candidates within 1 kb of any of the top 50 in silico candidates.