| Literature DB >> 30325668 |
Noelle L Anglin1, Ahmed Amri2, Zakaria Kehel2, Dave Ellis1.
Abstract
Genebanks are responsible for collecting, maintaining, characterizing, documenting, and distributing plant genetic resources for research, education, and breeding purposes. The rationale for requests of plant materials varies highly from areas of anthropology, social science, small-holder farmers, the commercial sector, rehabilitation of degraded systems, all the way to crop improvement and basic research. Matching "the right" accessions to a particular request is not always a straightforward process especially when genetic resource collections are large and the user does not already know which accession or even which species they want to study. Some requestors have limited knowledge of the crop; therefore, they do not know where to begin and thus, initiate the search by consultation with crop curators to help direct their request to the most suitable germplasm. One way to enhance the use of genebank material and aid in the selection of genetic resources is to have thoroughly cataloged agronomic, biochemical, genomic, and other traits linked to genebank accessions. In general, traits of importance to most users include genotypes that thrive under various biotic and abiotic stresses, morphological traits (color, shape, size of fruits), plant architecture, disease resistance, nutrient content, yield, and crop specific quality traits. In this review, we discuss methods for linking traits to genebank accessions, examples of linked traits, and some of the complexities involved, while reinforcing why it is critical to have well characterized accessions with clear trait data publicly available.Entities:
Keywords: FIGS; GWAS; genebanks; genetic resources; molecular markers; trait association
Mesh:
Year: 2018 PMID: 30325668 PMCID: PMC6204556 DOI: 10.1089/bio.2018.0033
Source DB: PubMed Journal: Biopreserv Biobank ISSN: 1947-5543 Impact factor: 2.300
Different Approaches Discussed in This Review That Genebanks Can Utilize to Link Traits to Accessions Along with Their Major Advantages and Disadvantages
| Mining public data | Low cost, no research required, only cost is personnel time to mine data and format information. | Lose quality control, no input on experimental design, may not include enough replications or GXE analysis, difficult to summarize all meta data within genebank dbase or harmonize among scoring of traits from different experiments/labs. |
| User feedback | No cost, long-term users in the community are invested and are conscientious about data fidelity/quality. | Difficult to receive feedback before publication and after publication no feedback is generally ever received, need more than anecdotal evidence to link trait to accession. |
| Brute force phenotyping | Quality control, traits important to the breeding/user community can be selected. | Requires significant funding and personnel time especially for large numbers of accessions and multi-location testing. |
| Core/Mini core | Reduces number of accessions to screen for a trait of interest. | Not all desired traits can be found in a core/mini core. |
| Focused identification of germplasm strategy | Limits number of accessions to screen and gives a “best bet” of accessions to find trait of interest using available data. High probability of finding sought traits in a manageable subset. | Traits are not always predictable based on the information available and complexity of the trait. Some evaluation data needed to develop the algorithms for predicting the relationship between the environmental conditions and the trait. |
| Marker assisted selection | Straightforward approach to identify traits without needing mature plants to evaluate, can screen large numbers of accessions efficiently. | Does not work on complex quantitative traits, expensive if markers are not already developed, requires specialized laboratory equipment. |
| Genome wide association studies | Links markers to traits of interest that can be used for selection/screening in germplasm, traits with few loci with large effects work well. Families not required. | Spurious associations occur so validation is required, genotyping and phenotyping is required so cost is significant, complex traits can be difficult for GWAS, no guarantee on which trait(s) will have strong association to molecular markers. Affected by heritability, GxE, population size, and genotyping quality. |
GWAS, genome wide association studies; GxE, genotype by environment.
Confirmed Traits Identified by the Focused Identification of Germplasm Strategy Approach for Wheat, Barley, and Faba Bean
| Resistance to Russian Wheat Aphid | Wheat | [ |
| Resistance to stem rust (UG99) | Wheat | [ |
| Resistance to yellow or stripe rust | Wheat | [ |
| Resistance to powdery mildew | Wheat | [ |
| Resistance to net blotch | Barley | [ |
| Resistance to Sunn pest | Wheat | [ |
| Tolerance to drought | Faba bean | [ |
| Tolerance to boron toxicity | Wheat | [ |
| Water use efficiency | Faba bean | [ |

Density of GFP using 8861 data points. A biomodal distribution was produced from the information in the genebank. The x-asis is GFP in days and the y-axis is the density. GFP, grain filling period.

Scatter plot (PC1 vs. PC2) of landraces resulting from Principle Components Analysis using climatic data. The dots symbolize the landrace's GFP class (plus for long GFP and circles for short GFP).
Performance Metrics for Three Machine Learning Classification Algorithms
| Accuracy | 0.834 | 0.838 | 0.817 |
| 95% CI | 0.799–0.865 | 0.804–0.868 | 0.781–0.849 |
| No information rate | 0.762 | 0.762 | 0.762 |
| 3.58E-05 | 1.37E-05 | 0.001423371 | |
| Kappa | 0.563 | 0.557 | 0.467 |
| Sensitivity | 0.722 | 0.675 | 0.54 |
| Specificity | 0.869 | 0.889 | 0.903 |
Accuracy is the fraction of predictions our model got right, 95% CI: confidence interval for accuracy, Kappa compares an observed accuracy with an expected accuracy (random chance), sensitivity—the proportion of truly positives cases that were classified as positive, specificity is the proportion of truly negative cases that were classified as negative, and NIR is the proportion of the data with the majority class and a p-value to test that accuracy is better than NIR.
CI, confidence interval; NIR, no information rate.

Predictive probability for the entire ICARDA durum wheat landrace collection based on machine learning model. Dark and light gray are the probabilities of being classified as long or short GFP, respectively. ICARDA, International Center for Agricultural Research in the Dry Areas.

Predictive GFP class for the entire ICARDA durum wheat landrace collection, white and black circles are long and short GFP landraces, respectively.