| Literature DB >> 22645117 |
Akihiro Nakaya1, Sachiko N Isobe.
Abstract
BACKGROUND: Genomic selection or genome-wide selection (GS) has been highlighted as a new approach for marker-assisted selection (MAS) in recent years. GS is a form of MAS that selects favourable individuals based on genomic estimated breeding values. Previous studies have suggested the utility of GS, especially for capturing small-effect quantitative trait loci, but GS has not become a popular methodology in the field of plant breeding, possibly because there is insufficient information available on GS for practical use. SCOPE: In this review, GS is discussed from a practical breeding viewpoint. Statistical approaches employed in GS are briefly described, before the recent progress in GS studies is surveyed. GS practices in plant breeding are then reviewed before future prospects are discussed.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22645117 PMCID: PMC3478044 DOI: 10.1093/aob/mcs109
Source DB: PubMed Journal: Ann Bot ISSN: 0305-7364 Impact factor: 4.357
Fig. 1.Schemes of genomic selection (GS) (left) and traditional MAS for the selection of quantitative traits (right). Both GS and traditional MAS contained training and breeding phases. In the training phase, quantitative trait loci (QTLs) are identified in traditional MAS to produce formulae for genomic estimated breeding value (GEBV) prediction, i.e. GS models. In the breeding phase, favourable individuals are selected based on the genotypes of the selected markers in MAS, whereas GEBVs are used for selection in GS.
Fig. 2.Relationships between marker genotypes (x1 : 0 and 1) and phenotypes (y) of the individuals (open circles) in a training population. If the marker genotype is correlated with the phenotype, segregation is modelled using the bold line (y = β0 + x1 β1, where β0 and β1 are parameters to be determined.).
Features of test populations, number of genotyped loci, and ranges of GEBV accuracy investigated in empirical plant GS studies
| Species | Population type | Size of population used | Training population ratio* | No. of genotyped markers† | Accuracy of GEBVs‡ | Models for GEBV prediction¶ | Traits | Reference |
|---|---|---|---|---|---|---|---|---|
| Maize | RILs derived from single cross | 223 | 0·43, 0·65, 0·80 | 1339 SSRs and RFLPs | 0·48–0·73 | BLUP | 8 morphological traits, 3 chemical components, grain moisture | |
| Maize | RILs derived from single cross | 119 | 0·80 | 1339 SSRs and RFLPs | 0·40–0·50 | BLUP | 5 morphological traits, grain moisture | |
| Maize | F2 derived from single cross | 349 | 0·08, 0·13, 0·20 | 160 SSRs | 0·59–0·72 | BLUP | 3 morphological traits, grain moisture | |
| Maize | Testcrosses of DHLs | 371 | 0·13, 0·26, 0·32 | 125 SNPs | 0·31–0·55 | BLUP | 3 morphological traits, grain moisture | |
| RILs derived from single cross | 415 | 0·12, 0·23, 0·29, 0·32 | 69 SSRs | 0·90–0·93 | BLUP | Flowering time, dry matter, free amino acids | ||
| Barley | DHLs derived from single cross | 150 | 0·36, 0·64, 0·80 | 223 RFLPs | 0·64–0·83 | BLUP | Plant height, grain yield, 3 chemical components | |
| Barely | DHLs derived from single cross | 140 | 0·34, 0·69, 0·80 | 107 RFLPs and AFLPs | 0·66–0·85 | BLUP | Plant height, two chemical components | |
| Maize | DHLs derived from single cross | 208 | 1·00 | 136 SNPs and SSRs | 1·00§ | RR, POW, EXP, GAU, SPH | Kernel dry weight | |
| Wheat | Lines bred in CIMMYT | 599 | 0·10 | 1279 DArTs | 0·48–0·61 | PM-RKHS | Grain yield | |
| Maize | Lines bred in CIMMYT | 300 | 0·90 | 1148 SNPs | 0·42–0·79 | M-BL | Grain yield, female flowering, male flowering, anthesis-silking interval | |
| Wheat | DHLs derived from single cross | 209 | 0·11,0·23,0·46 | 399 SSRs, DArTs, AFLPs, TRAPs, STS | 0·32–0·84 | RR-BLUP | 8 grain quality | |
| Wheat | DHLs derived from single cross | 174 | 0·14, 0·28, 0·55 | 574 DArTs | 0·41–0·73 | RR-BLUP | 8 grain quality | |
| Maize | 25 nested association mapping populations | (126–196) × 25 populations | 0·20, 0·40, 0·60, 0·80 | 1106 SNPs | 0·26–0·57 | RR-BLUP | Three flowering traits | |
| Loblolly pine | 61 full-sib families derived from 32 parents | 790 – 840 | not shown | 3938 SNPs | 0·64–0·77 | BLUP | Diameter at breast height, total height | |
| Loblolly pine | Full-sib offspring | 149 | Not shown | 3406 SNPs | 0·3–0·83 | Pedigree model | Growth and quality traits | |
| Eucalyptus | 43 full-sib family 11 interspecific hybrids | 783 | 0·90 | 3120 DArTs | 0·53–0·69 | BLUP | Height, diameter at breast height, wood density, pulp yield, lignin content, | |
| Eucalyptus | 75 full-sib family 55 elite parents (hybrids) | 920 | 0·90 | 3564 DArTs | 0·54–0·62 | BLUP |
* Percentages of number of individuals in training populations to whole populations.
† SSRs, Simple sequence repeat markers; RFLPs, restriction fragment length polymorphism markers; SNP, single nucleotide polymorphic markers; DArTs, diversity array technology markers; AFLPs, amplified fragment length polymorphism markers; TRAPs, target region amplification polymorphism markers; STS, sequence tagged site marker.
‡ Correlation between observed phenotypic values and GEBVs.
§ Correlation between adjusted mean and GEBVs. Error variance is not fixed.
¶ Models with the highest or higher accuracy of GEBVs when multiple methods were used for GEBV prediction. BLUP, Best linear unbiased prediction. Spatial models using: POW, power; EXP, exponential; GAU, Gaussian; SPH, spherical models. PM-RKHS, Pedigree plus molecular marker model using reproducing kernel Hilbert space regression. M-BL, Regression model using the Bayesian LASSO; RR, ridge regression.
Features of test populations, number of genotyped SNPs and ranges of GEBV accuracy in empirical animal GS studies
| Species | Population type | Size of test population | Training: validatinga | No. of genotyped SNPs | Accuracy of GEBVsb | Models for GEBV predictionc | Traits | Reference |
|---|---|---|---|---|---|---|---|---|
| Mice | A heterogeneous population derived from eight inbred lines | 1884 | 942:942 | 10 946 | 0·16–0·25f, 0·27–0·67g | Linear mixed model with SNP genotypes, not polygenetic effects | Weight, growth slope, body length, body mass index | |
| Dairy cattle; Australian Holstein-Friesian | Bull progeny tested by Genetics Australia | 730 | Bulls born in 1998–2002 : bulls born in 2003 | 38 259 | 0·14–0·55 | BayesA | Breeding value, profit ranking, selection value, protein yield and protein percentage | |
| Dairy cattle; Norwegian Red | 34 sires and 466 sons | 500 | 100: 400d, 34–100 : 400–466e | 18 991 | 0·20–0·61d, 0·15–0·62e | G-BLUP | Milk yield, fat yield, protein yield, clinical mastitis, calving ease | |
| Dairy cattle; North American Holstein | American Holstein bulls born between 1952 and 2002 | 5335 | 3576 : 1759 | 38 416 | 0·33–0·69 | Linear mixed model | 27 traits for milk production, body size, shape and fertility | |
| Beef cattle; Angus | Parental identified steers and sires | 2405 | 85–2405 : 84–2405 | 41 028 | 0·23–0·44 | Genomic relationship matrices | Average daily feed intake, residual feed intake, average daily gain | |
| Beef cattle; Angus, Charolais, University of Alberta hybrid bulls | Admixture population | 721 | 198–203 : 721 | 37 959 | −0·07–0·48 | RR-BLUP (with top 200 SNPs) | Average daily gain, dry matter intake, residual feed intake | |
| Chicken; blown-egg layer line | Five consecutive generations in a single line | 2708 | 768–2167 : 274–289 | 23 356 | 0·2–0·7h | G-BLUP and Bayes-C-π | 13 traits for eggs and 3 traits for chicken bodies |
Only studies that investigated the accuracy of GEBVs based on the correlation between observed phenotypic values and GEBVs are listed.
a Number of individuals used for GEBV prediction (training population) versus that used for validation (validating population).
b Correlation between observed phenotypic values and GEBVs.
c Models with the highest or higher accuracy of GEBVs when multiple methods were used for GEBV prediction. G-BLUP, Best linear unbiased prediction; RR-BLUP, random regression best linear unbiased prediction.
d Random masking.
e Cohort masking.
f Across families.
g Within families.
h Data cited from Figs 1 and 2.
Fig. 3.Variation of LD intensity in different populations of a single species. (A) Allele frequency and LD indexes (r2) between marker I and others in an unrelated population. Roman numerals represent markers mapped on a linkage group with 20-cM intervals. The two allele types, white and black, are represented in white and black. White allele freq. means the frequency of white alleles for markers II–V, in each case where the marker I allele is white or black. In this example, the white allele frequencies of markers II, III, IV and V are all 0·5, while the LD indices (r2) between marker I and other markers are all zero (completely random). (B) A population of clonally propagated individuals. Assume that an individual is selected from an unrelated population (outlined in blue in population ‘A’) and clonally propagated. All individuals in population ‘B’ share the same genotype. Thus, the r2 between marker I and the other markers are all 1·0 (complete LD). (C) Suppose two individuals are selected from population ‘A’ (outlined in blue and red) and RILs (recombinant inbred lines) are developed based on a cross between the two individuals. Recombination occurs during meiotic division in the F1, so the white allele frequency varies depending on the distances between marker I and other markers. Then, LD decays are observed in the RILs.
Fig. 4.Allele types of flanking markers for a targeted gene. Roman numerals represent the markers (I, II, IV and V) mapped on a linkage group. ‘G’ indicates a targeted gene. Distances between adjacent markers and the gene G are 20 cM. White and black represent the allele types of the markers, while grey and yellow indicate the allele types of a targeted gene. Suppose that the yellow allele is a favourable genotype on a targeted gene G. The LD between gene G, marker II and marker IV is completely random in a pool of breeding materials (unrelated population) while significant LD (r2 = 0·8) is observed in RILs developed from biparental crosses (1 and 2), as shown in Fig. 3. When the two individuals outlined in red are selected for a biparental cross (B: cross 1), the genotypes of the flanking markers (II and IV) linked to gene G/yellow are white. By contrast, when the two individuals outlined in blue are selected for a biparental cross (C: cross 2), the genotypes of the flanking markers (II and IV) linked to gene G/yellow are black. This example indicates that the allele types with significant LD with the targeted genes are different between the two crosses.