| Literature DB >> 27142305 |
Priyanka Singh1,2, Jasper Engel2, Jeroen Jansen2, Jorn de Haan1, Lutgarde Maria Celina Buydens3.
Abstract
BACKGROUND: Genomic prediction (GP) allows breeders to select plants and animals based on their breeding potential for desirable traits, without lengthy and expensive field trials or progeny testing. We have proposed to use Dissimilarity-based Partial Least Squares (DPLS) for GP. As a case study, we use the DPLS approach to predict Bacterial wilt (BW) in tomatoes using SNPs as predictors. The DPLS approach was compared with the Genomic Best-Linear Unbiased Prediction (GBLUP) and single-SNP regression with SNP as a fixed effect to assess the performance of DPLS.Entities:
Keywords: Bacterial wilt; Dissimilarity based Partial Least Squares; Genetic distance; Genomic prediction; Phenotype prediction
Mesh:
Year: 2016 PMID: 27142305 PMCID: PMC4855361 DOI: 10.1186/s12864-016-2651-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The selected dissimilarity measures used to calculate genomic distance among tomato accessions from SNPsa
| Distance | Equation | R-packages | References |
|---|---|---|---|
| Euclidean |
| gstudio | [ |
| Gower |
| daisy | [ |
| Allele share |
| Custom-R-script | [ |
| Nei |
| gstudio | [ |
| Bray |
| vegan | [ |
| Jaccard |
| vegan | [ |
| Kulczynski |
| vegan | [ |
| GRM |
| Custom R-script | [ |
xi1k and xi2k = SNPs at locus k for accession xi1 and xi2 respectively
di1i2k = distance between i1 and i2 samples for SNPs at locus k
B Bray- Curtis dissimilarity
G Genomic relationship matrix
Z genotype information for all tomato accessions
p frequency of allele at locus k
adi1i2 = distance between tomato accession i1 and i2
Summarized Mantel correlation statistics for analyzed genomic dissimilarity matrices*
Fig. 1Heatmap representation of dissimilarity scores between 242 tomato accessions for Bray (a) and Euclidean (b). The pixels are colored in proportion to the genotypic dissimilarity between tomato accessions. Euclidean and Bray heatmap represents distance group-I and II respectively
Fig. 2Multi-dimensional scaling (MDS) scores representation of Bray (a) and Euclidean (b) distances. MDS scores are visualized in first two dimension of MDS space, where MDS1 and MDS2 represents scores in first and second dimension respectively. The size and colors of bubbles are in proportion to the actual trait values (the measure of resistance against Bacterial wilt) of tomato accessions. The bigger bubble size represents higher resistance accessions
Dissimilarity based partial least squares (DPLS) prediction results over all dataset in a 10-fold CV setup
| Distance | PQc (R2d) | RMSEa | Optimal LVsb |
|---|---|---|---|
| Euclidean | 0.62 ± 0.005 | 370 ± 2.7 | 4 |
| Gower | 0.60 ± 0.0052 | 380 ± 2.8 | 6 |
| Allele share | 0.61 ± 0.005 | 380 ± 2.8 | 6 |
| Nei | 0.59 ± 0.005 | 390 ± 2.9 | 6 |
| Bray | 0.63 ± 0.004 | 370 ± 2.6 | 4 |
| Jaccard | 0.64 ± 0.0043 | 360 ± 2.8 | 4 |
| Kulczynski | 0.61 ± 0.0053 | 380 ± 2.8 | 4 |
| GRM | 0.62 ± 0.005 | 370 ± 2.9 | 5 |
| GBLUP | 0.61 ± 0.001 | 369.9 ± 0.66 | NA |
All the results presented in table are significant (with respect to p-value computed from permutation analysis). The results are averaged over 10-fold CV scheme. The 10-fold CV procedure was repeated 50 times. The standard error (se) calculated over 10-fold CV repetition. The last row present prediction results obtained from GBLUP. The PQ (R2d), RMSE and LVs represents prediction quality, root mean square error and latent variables respectively
RMSE stands for root mean square error
b LVs stands for latent variables used for model building
c PQ represent prediction quality
d R presented in the table are estimated for testset and not from training model. The value is calculated in a cross validation setup (some time indicated as Q2). This value is refer as prediction quality in this study
Fig. 3Dissimilarity based partial least squares (DPLS) scores representation for Bray (a) and Euclidean (b) distances. The DPLS scores are visualized in first two latent variables (LVs) where PLS1 and PLS2 present scores in first and second LVs space respectively. Each bubble represents a tomato accession. The size and color of the bubbles are corresponding to the actual trait values of tomato accessions where bigger size of the bubble corresponds to higher resistance accession. The PVE represents phenotypic variance explained by the DPLS prediction model
Fig. 4Dissimilarity based partial least squares (DPLS) prediction plot for Bray (a) and Euclidean (b). The prediction for each accession obtained in repeated 10-fold-CV scheme. Each point indicates mean value of accession prediction. Original and predicted value of BW traits are plotted on X and Y axis respectively. The R2 represent prediction quality and the red line indicates trend line for regression model
Fig. 5Illustration of distance matrix segmentation in double cross validation. Where D, Dc and Dt are squared distance matrix and represents distance scores between total accessions (M), accessions in calibration set (Mc) and accessions in training set (Mt) respectively