| Literature DB >> 27229151 |
Tae-Sung Kim1,2, Qiang He1, Kyu-Won Kim1,2, Min-Young Yoon1, Won-Hee Ra1, Feng Peng Li1, Wei Tong1, Jie Yu1, Win Htet Oo1, Buung Choi1, Eun-Beom Heo1, Byoung-Kook Yun3, Soon-Jae Kwon4,5, Soon-Wook Kwon6, Yoo-Hyun Cho7, Chang-Yong Lee3, Beom-Seok Park8, Yong-Jin Park9,10.
Abstract
BACKGROUND: Rice germplasm collections continue to grow in number and size around the world. Since maintaining and screening such massive resources remains challenging, it is important to establish practical methods to manage them. A core collection, by definition, refers to a subset of the entire population that preserves the majority of genetic diversity, enhancing the efficiency of germplasm utilization.Entities:
Keywords: Core collection; GWAS; Germplasm; INDEL; Rice; SNP; Whole-genome resequencing
Mesh:
Year: 2016 PMID: 27229151 PMCID: PMC4882841 DOI: 10.1186/s12864-016-2734-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of sequencing statistics for KRICE_CORE
| Variation range | Sequence read | Clean read rate | Deduplication read | Deduplication rate | Mapping rate | Average depth |
|---|---|---|---|---|---|---|
| Max | 59,334,970 | 95.1 | 54,800,574 | 99.4 | 98.9 | 13.9 |
| Min | 29,090,954 | 86.4 | 25,021,515 | 95.9 | 89.5 | 6.3 |
| Average | 39,753,733.9 | 91.6 | 36,006,200 | 98.8 | 95.5 | 9.0 |
Summary of chromosomal SNP and INDEL distribution for KRICE_CORE
| Chromosome | SNP | INDEL | ||||||
|---|---|---|---|---|---|---|---|---|
| Total | High quality | Total | High quality | |||||
| Count | Densitya | Count | Density | Count | Density | Count | Density | |
| 1 | 1,492,529 | 34.5 | 245,749 | 5.7 | 157,701 | 3.6 | 34,662 | 0.8 |
| 2 | 1,194,215 | 33.2 | 219,868 | 6.1 | 123,316 | 3.4 | 28,870 | 0.8 |
| 3 | 1,117,000 | 30.7 | 227,031 | 6.2 | 116,347 | 3.2 | 30,925 | 0.8 |
| 4 | 1,482,884 | 41.8 | 151,464 | 4.3 | 118,490 | 3.3 | 19,101 | 0.5 |
| 5 | 1,136,915 | 37.9 | 190,680 | 6.4 | 98,439 | 3.3 | 23,019 | 0.8 |
| 6 | 1,279,651 | 41.0 | 175,920 | 5.6 | 117,347 | 3.8 | 22,733 | 0.7 |
| 7 | 1,268,980 | 42.7 | 152,153 | 5.1 | 112,520 | 3.8 | 20,766 | 0.7 |
| 8 | 1,323,731 | 46.5 | 148,873 | 5.2 | 113,720 | 4.0 | 18,831 | 0.7 |
| 9 | 969,987 | 42.2 | 128,492 | 5.6 | 85,643 | 3.7 | 16,566 | 0.7 |
| 10 | 1,081,145 | 46.6 | 141,153 | 6.1 | 91,306 | 3.9 | 17,727 | 0.8 |
| 11 | 1,425,491 | 49.1 | 148,897 | 5.1 | 128,449 | 4.4 | 17,902 | 0.6 |
| 12 | 1,352,277 | 49.1 | 116,249 | 4.2 | 115,237 | 4.2 | 15,040 | 0.5 |
| Total | 15,124,805 | 40.5 | 2,046,529 | 5.5 | 1,378,515 | 3.7 | 266,142 | 0.7 |
aDensity = SNP/INDEL count/length of the chromosome
Fig. 1SNP and INDEL frequency in the KRICE_CORE population. a Comparison of the mean density of SNPs and Indels in KRICE_CORE. b SNP and INDEL distribution across various genome regions. The promoter of the genic region refers to the region 2 kb upstream of the transcription start site. c Correlation between SNPs and INDEL occurrences across KRICE chromosomes
Fig. 2Genome-wide distribution of SNPs and INDELs of KRICE_CORE. a SNP density of the 50-kb window across the KRICE_CORE genome. b Correlation between SNP and INDEL occurrence across KRICE_CORE genome. c Functional category of genes in SNP/INDEL-enriched regions. d Distribution of Tajima’s D values of the genome wide and top 5 % SNP enriched regions
Fig. 3Population structure of KRICE_CORE. a Neighbor-joining analysis among KRICE_CORE using 2,046,529 HQSNPs. Landraces and weedy accessions are denoted as red triangles and yellow circles in the tree, respectively. b Pie graph designating the proportion of each subgroup. c Population structure analysis using FRAFFE. Each color represents one population. Each accession is represented by a horizontal bar, and the size of each colored segment in each horizontal bar represents the proportion contributed by ancestral population. K value represents the number of assumed clusters or populations
Fig. 4Genetic diversity of KRICE_CORE. a Nucleotide diversity (π) of wild rice (O. rufipogan) vs KRICE_CORE (upper) and the resulting ROD value (Methods). b Mean RODs among the subgroups in genome-wide or domesticated regions. Error bar indicates standard error of the mean (SEM). Sliding window analyses of π or ROD are shown for chromosome 1 with a 10-kb window
Fig. 5Genome-wide association studies of ‘pericarp color’ (a), ‘amylose content’ (b), ‘rice seed protein content’ (c), and ‘number of panicles per plant’ (d). Manhattan plots of the linear (for a) or mixed linear model (for b–d) are shown from negative log10-transformed P-values, which are plotted against the positions on each of the twelve chromosomes. A red horizontal line indicates the genome-wide significance threshold
Genome-wide significant association signals of agronomic traits using the linear and compressed MLM
| Traits | Ch | Position | Major allele (Count) | Minor allele (Count) | MAFa | R2 | P value | Known Loci (Reference) |
|---|---|---|---|---|---|---|---|---|
| Pericarp color | 7 | 6068072 | A(95) | G(41) | 0.30 | 0.465 | 3.05E-19 | Rc(46) |
| 6 | 21304669 | T(93) | C(43) | 0.32 | 0.286 | 4.87E-11 | ||
| 3 | 3640060 | G(116) | A(13) | 0.12 | 0.268 | 2.48E-10 | ||
| 5 | 11479616 | C(108) | T(26) | 0.20 | 0.255 | 7.57E-10 | ||
| Amylose content | 6 | 1765761 | G(73) | T(61) | 0.46 | 0.555 | 5.55E-09 | Wx(21) |
| Protein content | 7 | 24810573 | A(77) | G(59) | 0.43 | 0.348 | 1.14E-06 | |
| 2 | 26791708 | C(110) | T(25) | 0.19 | 0.347 | 1.27E-06 | ||
| 6 | 18133739 | G(93) | A(42) | 0.31 | 0.339 | 2.15E-06 | ||
| Panicle number | 3 | 30877559 | C(128) | T(9) | 0.07 | 0.404 | 3.13E-08 | |
| 9 | 7760141 | C(128) | T(() | 0.07 | 0.386 | 1.09E-07 |
aMinor allele frequency
Fig. 6Regions of the associated signals near the Rc (a) and Wx (b) regions. The top of each panel shows the ROD for a 1-Mb window around the peak SNPs. Negative log10P-values for each SNP from the linear or compressed mixed linear model are plotted. Blue or dark orange dashed horizontal lines indicate the genome-wide significance cutoff for the Rc and Wx regions, respectively. The bottom of each panel within the blue or red box denotes the range of SNPs over the cutoff. c Mean ROD values for 400 kb around the Rc and Wx regions. d LD decay plots of the regions compared to genome-wide LD decay. e and f LD blocks of 400 kb on each side of the association peak for the Rc (e) and Wx (f) regions