| Literature DB >> 29255118 |
Patrick Thorwarth1, Eltohamy A A Yousef2, Karl J Schmid3.
Abstract
Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower (Brassica oleracea var. botrytis) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding.Entities:
Keywords: GenPred; Genomic Selection; Shared Data Resources; cauliflower; gene bank; genome-wide association study; genomic prediction; genotyping-by-sequencing
Mesh:
Year: 2018 PMID: 29255118 PMCID: PMC5919744 DOI: 10.1534/g3.117.300199
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1(A) Discriminant analysis of principal components plot for the five inferred clusters using the k-means algorithm (Jombart and Ahmed 2011). (B) Boxplots for number of days to budding for each DAPC-inferred cluster. Letters above boxplots display Tukey-test results. Clusters with the same letter are not significantly differentiated from each other. Values within boxplots display the mean time to budding for each cluster.
Figure 2LD decay in the whole population (A) and clusters 1–5 (B–F). The dashed horizontal line indicates the average background LD of all chromosomes of a respective population. The dashed vertical line indicates the maximum distance between linked markers and is used as reference point for the LD decay.
Overview of significant associations detected with EMMAX and MLMM
| Rank | Variance explained | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Trait | Chr | Pos | Method | RR | BayesB | Gene ID | Ortholog | Description | Phenotypic | Genetic | MAF |
| Apical Length | 2 | 26,787,029 | MLMMI | 30,559 | 29,378 | Bol035969 | AtSec6 | Cell growth | 4.1 | 37.3 | 0.05 |
| Apical Length | 3 | 20,986,662 | MLMMI | 2397 | 1712 | Bol035507 | AT1G53730 | Protein coding | 11.6 | 105.3 | 0.03 |
| Apical Length | 6 | 4,914,166 | MLMMI | 4902 | 1785 | Bol032997 | — | Protein kinase | 10.6 | 96.0 | 0.08 |
| Apical Length | 6 | 34,494,415 | MLMMU | 2111 | 1194 | Bol040102 | AT1G71780 | Biological processes | 2.2 | 20.3 | 0.13 |
| Cluster Width | 2 | 5,063,181 | EMMAXU | 8 | 4 | Bol007138 | AT1G03220 | Proteolysis | 19.1 | 34.1 | 0.22 |
| Curd Width | 2 | 3,528,844 | EMMAXU | 1 | 1 | Bol021232 | AT5G19110 | Heat stress | 14.4 | 32.6 | 0.44 |
| Curd Width | 2 | 5,063,181 | EMMAXU | 6 | 4 | Bol007138 | AT1G03220 | Proteolysis | 12.1 | 27.4 | 0.22 |
| Length of Nearest Branch | 9 | 25,012,587 | MLMMU | 2 | 2 | Bol012235 | — | unknown | 7.9 | 158.7 | 0.06 |
| Number of Branches | 6 | 2,323,306 | EMMAXU | 2 | 2 | Bol035509 | AT1G75310 | Protein binding | 6.2 | 23.7 | 0.06 |
| Number of Branches | 6 | 2,323,306 | MLMMU | 2 | 2 | Bol035509 | AT1G75310 | Protein binding | 6.2 | 23.7 | 0.06 |
| Number of Branches | 7 | 41.524,584 | EMMAXU | 1 | 1 | Bol024369 | AT2G17050 | NBS gene family | 8.5 | 32.6 | 0.21 |
| Number of Branches | 7 | 41,524,584 | MLMMU | 1 | 1 | Bol024369 | AT2G17050 | NBS gene family | 8.5 | 32.6 | 0.21 |
| Number of Days to Budding | 1 | 37,688,065 | EMMAXU | 1 | 1 | Bol023068 | AT3G09240 | Signaling pathway | 13.5 | 14.4 | 0.2 |
| Number of Days to Budding | 1 | 37,688,065 | MLMMU | 1 | 1 | Bol023068 | AT3G09240 | Signaling pathway | 13.5 | 14.4 | 0.2 |
| Number of Days to Budding | 2 | 2,708,156 | MLMMU | 5 | 6 | Bol024638 | AT5G10090 | Flowering related | 16.4 | 17.4 | 0.57 |
| Number of Days to Budding | 2 | 2,708,163 | MLMMU | 6 | 5 | Bol024638 | AT5G65160 | Flowering related | 16.4 | 17.4 | 0.57 |
| Number of Days to Budding | 2 | 2,708,182 | MLMMU | 7 | 4 | Bol024638 | AT5G65180 | Flowering related | 16.4 | 17.4 | 0.57 |
| Number of Days to Budding | 6 | 2,949,314 | EMMAXI | 1861 | 572 | Bol026132 | AT1G75010 | Flowering | 25.2 | 26.8 | 0.05 |
| Number of Days to Budding | 6 | 2,949,314 | MLMMI | 1861 | 572 | Bol026132 | AT1G75010 | Flowering | 25.2 | 26.8 | 0.05 |
| Number of Days to Budding | 7 | 936,738 | EMMAXI | 2 | 5 | Bol027177 | — | unknown | 4.2 | 4.5 | 0.36 |
| Number of Days to Budding | 7 | 936,738 | MLMMI | 2 | 5 | Bol027177 | — | unknown | 4.2 | 4.5 | 0.36 |
| Number of Days to Budding | 7 | 936,770 | EMMAXI | 1 | 1 | Bol027177 | — | unknown | 6.4 | 6.8 | 0.23 |
| Number of Days to Budding | 7 | 936,770 | MLMMI | 1 | 1 | Bol027177 | — | unknown | 6.4 | 6.8 | 0.23 |
| Number of Days to Budding | 7 | 41,524,584 | MLMMU | 2 | 3 | Bol024369 | AT2G17050 | NBS gene family | 4.1 | 4.4 | 0.21 |
The Rank column indicates which rank the significant association had among marker effects in the genomic prediction with ridge regression (RR) or BayesB methods. The last letter in the Method column indicates in which data set the QTL was discovered (U, unimputed, I, imputed). Phenotypic and genotypic variance indicate the percentage of variance explained by the respective SNP. Chr, chromosome; MAF, minor allele frequency; Pos, position.
Prediction ability for six curd-related traits with different data sets using GBLUP
| Imputed Data | |||||
|---|---|---|---|---|---|
| Trait | Unimputed Data | BEAGLE | fastPHASE | Corrected | Mean |
| Curd Width | 0.38 | 0.45 | 0.45 | 0.45 | 0.43 |
| Cluster Width | 0.62 | 0.65 | 0.65 | 0.59 | 0.63 |
| Number of Branches | 0.34 | 0.38 | 0.38 | 0.31 | 0.35 |
| Apical Length | 0.13 | 0.13 | 0.14 | 0.08 | 0.12 |
| Nearest Branch | 0.22 | 0.27 | 0.28 | 0.21 | 0.25 |
| Number of Days | 0.63 | 0.63 | 0.64 | 0.39 | 0.57 |
| Mean | 0.39 | 0.42 | 0.42 | 0.34 | 0.39 |
Unimputed: prediction ability using 675 SNPs. Imputed: prediction ability using BEAGLE and fastPHASE imputed data. Corrected: prediction ability for the GBLUP model with a realized relationship matrix corrected for population structure.
Prediction ability for six curd-related traits with different data sets using BayesB
| Imputed Data | ||||
|---|---|---|---|---|
| Trait | Unimputed Data | BEAGLE | fastPHASE | Mean |
| Curd Width | 0.35 | 0.40 | 0.44 | 0.40 |
| Cluster Width | 0.60 | 0.64 | 0.66 | 0.64 |
| Number of Branches | 0.38 | 0.35 | 0.41 | 0.38 |
| Apical Length | 0.09 | 0.12 | 0.10 | 0.10 |
| Nearest Branch | 0.23 | 0.28 | 0.29 | 0.26 |
| Number of Days | 0.66 | 0.66 | 0.61 | 0.64 |
| Mean | 0.39 | 0.41 | 0.42 | 0.40 |
Unimputed: prediction ability using 675 SNPs. Imputed: prediction ability using BEAGLE and fastPHASE imputed data.
Figure 3Effect of increasing the number of markers, included in a five-fold cross-validation with 10 replications using a standard GBLUP model, on prediction ability. Values represent averages of 100 runs. 10, 25, 50, 100, 250 and 500 markers, respectively, were sampled randomly for each run.