| Literature DB >> 24574893 |
Jiancheng Wang1, Yajing Guan2, Yang Wang2, Liwei Zhu2, Qitian Wang2, Qijuan Hu2, Jin Hu2.
Abstract
Core collection is an ideal resource for genome-wide association studies (GWAS). A subcore collection is a subset of a core collection. A strategy was proposed for finding the optimal sampling percentage on plant subcore collection based on Monte Carlo simulation. A cotton germplasm group of 168 accessions with 20 quantitative traits was used to construct subcore collections. Mixed linear model approach was used to eliminate environment effect and GE (genotype × environment) effect. Least distance stepwise sampling (LDSS) method combining 6 commonly used genetic distances and unweighted pair-group average (UPGMA) cluster method was adopted to construct subcore collections. Homogeneous population assessing method was adopted to assess the validity of 7 evaluating parameters of subcore collection. Monte Carlo simulation was conducted on the sampling percentage, the number of traits, and the evaluating parameters. A new method for "distilling free-form natural laws from experimental data" was adopted to find the best formula to determine the optimal sampling percentages. The results showed that coincidence rate of range (CR) was the most valid evaluating parameter and was suitable to serve as a threshold to find the optimal sampling percentage. The principal component analysis showed that subcore collections constructed by the optimal sampling percentages calculated by present strategy were well representative.Entities:
Mesh:
Year: 2014 PMID: 24574893 PMCID: PMC3918405 DOI: 10.1155/2014/503473
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
The number of homogeneous populations of Tukey's test (α = 0.05) of 7 evaluating parameters in each germplasm population from the sampling percentage of 10% to 30%.
| Parameter | Genetic distance | |||||
|---|---|---|---|---|---|---|
| Euclid | Seuclid | Mahal | Cityblock | Cosine | Correlation | |
| MD | 3 | 4 | 2 | 3 | 1 | 1 |
| VD | 1 | 3 | 3 | 1 | 1 | 1 |
| CR | 8 | 7 | 5 | 8 | 6 | 2 |
| VR | 15 | 11 | 6 | 12 | 1 | 1 |
| CRmax | 6 | 5 | 4 | 5 | 6 | 4 |
| CRmin | 3 | 3 | 3 | 3 | 3 | 1 |
| CRmea | 3 | 5 | 3 | 2 | 1 | 1 |
|
| ||||||
| Total | 39 | 38 | 26 | 34 | 19 | 11 |
Figure 1The 3D curved surface of CR changing by the sampling percentage and the number of traits.
The formulas distilled by Eureqa based on the simulation results of CR.
| Sizea | Formula | Errorb |
| FNd |
|---|---|---|---|---|
| 34 |
| 0.075 | 0.9935 | (12) |
| 28 |
| 0.075 | 0.9934 | (11) |
| 26 |
| 0.075 | 0.9934 | (10) |
| 24 |
| 0.076 | 0.9932 | (9) |
| 20 |
| 0.078 | 0.9926 | (8) |
| 18 |
| 0.083 | 0.9917 | (7) |
| 15 |
| 0.090 | 0.9896 | (6) |
| 13 |
| 0.114 | 0.9757 | (5) |
| 12 |
| 0.120 | 0.9746 | (4) |
| 11 |
| 0.161 | 0.9619 | (3) |
| 9 |
| 0.180 | 0.9571 | (2) |
| 7 |
| 0.389 | 0.7222 | (1) |
aThe complexity of the formula; bthe error of the fitted formula; c R 2: the coefficient of determination; dFN: formula number.
Figure 2The fitness of the 12 formulas. The number on x-axis was the index of the validation data. The number on y-axis was the value of the validation data. The dots showed the validation data, and the fold line showed the solution based on the selected formula. The numbers in parentheses were the formula number.
Figure 3The relation curve of the sampling percentage and the number of traits when CR's value was set to 80%. 25.01 and 6.07 were the optimal sampling percentage (%) when the number of traits was 1 and 20, respectively.
The values of five evaluating parameters in subcore collections constructed by three sampling percentages with 20 traits.
| Subcore collection | Sampling percentage | Parameter | ||||
|---|---|---|---|---|---|---|
| CR | VR | CRmax | CRmin | CRmea | ||
| Treata | 6.07% | 83.46 | 167.13 | 95.85 | 97.49 | 97.54 |
| 10.00% | 89.84 | 152.09 | 97.55 | 97.89 | 99.30 | |
| 15.00% | 94.88 | 140.39 | 98.90 | 99.53 | 99.15 | |
|
| ||||||
| CKb | 6.07% | 48.91 | 95.62 | 92.50 | 85.35 | 100.52 |
| 10.00% | 56.00 | 94.51 | 94.86 | 85.75 | 101.05 | |
| 15.00% | 61.49 | 94.62 | 95.36 | 87.64 | 100.36 | |
aSubcore collection constructed by LDSS method based on Seuclid distance combining UPGMA cluster method; bsubcore collection constructed by complete random selection.
Figure 4Principal component plots of core accessions and reserve accessions in the sampling percentages of 6.07%, 10%, and 15%. The axes represented the first two principal components. The upward pointing triangles represented the core accessions; the crosses represented the reserved accessions. The left column showed plots for subcore collection constructed by LDSS method based on Seuclid distance combining UPGMA cluster method (treat); the right column showed plots for subcore collection constructed by complete random selection (CK).