| Literature DB >> 27029319 |
Babu Valliyodan1, Gunvant Patil1, Peng Zeng2, Jiaying Huang2, Lu Dai2, Chengxuan Chen2, Yanjun Li2, Trupti Joshi3,4, Li Song1, Tri D Vuong1, Theresa A Musket1, Dong Xu3, J Grover Shannon5, Cheng Shifeng2, Xin Liu2, Henry T Nguyen1.
Abstract
Cultivated soybean [Glycine max (L.) Merr.] is a primary source of vegetable oil and protein. We report a landscape analysis of genome-wide genetic variation and an association study of major domestication and agronomic traits in soybean. A total of 106 soybean genomes representing wild, landraces, and elite lines were re-sequenced at an average of 17x depth with a 97.5% coverage. Over 10 million high-quality SNPs were discovered, and 35.34% of these have not been previously reported. Additionally, 159 putative domestication sweeps were identified, which includes 54.34 Mbp (4.9%) and 4,414 genes; 146 regions were involved in artificial selection during domestication. A genome-wide association study of major traits including oil and protein content, salinity, and domestication traits resulted in the discovery of novel alleles. Genomic information from this study provides a valuable resource for understanding soybean genome structure and evolution, and can also facilitate trait dissection leading to sequencing-based molecular breeding.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27029319 PMCID: PMC4814817 DOI: 10.1038/srep23598
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Phylogeny and population structure of soybean lines.
(a) Phylogenetic tree constructed using SNP data. (b) Principal component analysis (PCA) of the 106 soybean lines. The PCA analysis was conducted using 10,417,285 SNPs. (c) Bayesian clustering (STRUCTURE, K = 3) of soybean accessions. (d) Summary of population divergence represent measures of nucleotide diversity for the group, and values between pairs indicate the population divergence (Fst).
Figure 2LD decay of wild, landrace and elite soybeans.
LD decay determined by squared correlations of allele frequencies (r) against distance between polymorphic sites in elite (green), landrace (blue) and wild (red) soybeans.
Summary of SNP statistics in soybean genotypes and the distribution of SNPs in whole genome and genic regions.
| Total | θπ (10−3) | θw (10−3) | Intergenic | Intron | 5′-UTR | 3′-UTR | Exon | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | Nonsynonymous | Synonymous | Nonsyn/Syn | ||||||||
| 8,106,944 | 2.79 | 2.34 | 6,688,631 | 891,196 | 86,445 | 134,627 | 306,045 | 177,729 | 128,485 | 1.38 | |
| 7,022,002 | 1.78 | 1.49 | 5,812,993 | 753,093 | 76,649 | 112,834 | 266,433 | 158,476 | 108,082 | 1.47 | |
| 7,148,434 | 1.6 | 1.4 | 5,921,685 | 764,306 | 77,156 | 113,474 | 271,813 | 162,697 | 109,271 | 1.49 | |
| 8,430,864 | 1.9 | 1.49 | 6,989,257 | 898,983 | 90,548 | 133,771 | 318,305 | 191,266 | 127,215 | 1.50 | |
*SNPs located in overlapping region of different transcripts were annotated independently. Some SNPs are synonymous SNPs in one transcript, and concurrently non-synonymous SNPs in another overlapping transcript, and vice versa. Thus, the sum of synonymous and non-synonymous SNPs is more than the number of SNPs in the CDS regions.
Figure 3Summary of resequencing data of 106 soybean germplasm.
Figure 4Genetic diversity (π) of a domestication region.
Diversity pattern in Gm15 for G. soja, landrace and elite cultivars associated with yield and flower number related QTLs48 (4.5–4.9 Mb). Previously this region was identified as a large QTL (3.1–6.7 MB) and contained 47 genes. In the present study this region narrow down to 4.5–4.9 Mb and contains 5 genes, in which Glyma15g00750 showed relatively lower π in cultivated soybean, and functionally this gene is annotated to bHLH/Circadian-protein, suggesting that this gene might be involved in yield-related traits in cultivated soybean.