| Literature DB >> 30345411 |
Eliane Thaines Bodah1, Bruce Weir1.
Abstract
In this work, we investigated the suitability of performing partial least square regression (PLSR) on genotype-phenotype datasets to identify marker-trait associations. We utilized data collected on a cotton (Gossypium hirsutum L.) recombinant inbred line (RIL) mapping population that was evaluated under contrasting irrigation treatments, well-watered and water-limited conditions, in a hot, arid environment in 2012. Two phenotypic data sets were used in combination with the genetic data which consisted of 841 marker loci assigned to 117 linkage groups. The first dataset contained canopy traits that were gathered using a mobile, high-throughput phenotyping platform and included canopy temperature (CT), normalized difference vegetation index (NDVI), and canopy height (CHT) with leaf area index (LAI) being derived from NDVI and CHT measurements. The second phenotypic data set consisted of 14 elemental concentration measurements corresponding to the following elements: P, K, Ca, Mn, Fe, Zn, Ni, Cu, As, Co, Rb, Mo, S, and Mg. To conduct the PSLR analyses we used the "pls" and "pls depot" available in R statistical software version 3.2.4. The PLSR bi plot from the analysis of the first dataset showed that three (LAI, NDVI, and CHT) out of the four canopy traits were highly correlated, and by using multivariate analysis of variance (MANOVA), we detected 22 significant (p<0.01) marker-trait associations for the four traits. In contrast to the canopy trait analysis, our PLSR bi plot for the second dataset showed varying correlations for each of the 14 traits. Because of the lack of distinct trait similarities, MANOVA was not an ideal option to test for marker-trait associations so we implemented a jackknife re sampling technique. Jackknife re sampling failed to detect significant marker effects for several of the 14 elemental concentration traits. Thus, our future work aims to test other re sampling techniques such as boot straping for traits that do not exhibit high correlation. Overall, PLSR was a very informative way to comprehend data structure, displaying correlations within markers, within traits, and between marker and traits in one bi plot. Further studies are still needed to leverage detection of additional variance in correlated datasets and to prevent spurious results. To the best of our knowledge, this is the first time PLSR has been reported in such a context.Entities:
Keywords: Marker-trait association; Multivariate analyses; PLSR; Plant methods
Year: 2017 PMID: 30345411 PMCID: PMC6195366 DOI: 10.19080/ARTOAJ.2017.12.555864
Source DB: PubMed Journal: Agric Res Technol ISSN: 2471-6774
Figure 1Partial Least Square Regression: correlation biplot; showing trait components in red, and marker components in gray and black (significantly associated markers). Within traits, negative correlations for LAI, NDVI, CHT, and positive correlation for CT were detected.
Partial correlation coefficients between Y (CT, NDVI, CHT, LAI) and Xj were observed for each latent vector of matrix T (t1, t2, and t3).
| Trait | T1 | T2 | T3 |
|---|---|---|---|
| CT | 0.56 | 0.28 | 0.48 |
| NDVI | −0.61 | −0.21 | −0.5 |
| CHT | −0.63 | −0.42 | −0.13 |
| LAI | −0.72 | −0.42 | −0.22 |
Figure 2Principal Component Analysis (PCA) results for each drought-related trait; showing variation within each component, from left to right: canopy temperature, NDVI, canopy height, and LAI. Y-axis is percent variation explained by PC on X-axis.
Figure 3Markers associated with drought identified using Principal Component Analysis followed by Multivariate Analysis of Variance in different linkage groups (color-coded). The Y axis displays the −log(p-value); where −log (0.01) ~ 4.6. A total of 22 significant associations were detected at p<0.01 based on the Pillai-Bartlett Trace test.
Significantly drought-associated markers at p<0.01; showing marker ID, linkage group (LG), and trait coefficients for canopy height (CHT), normalized 50 difference vegetation index (NDVI), canopy temperature (CT), and leaf area index (LAI).
| Marker ID | LG | CHT | NDVI | CT | LAI |
|---|---|---|---|---|---|
| TMB0283a | 1 | −0.47 | 0.82 | 1.24 | 1.37 |
| SNP0192 | 1 | −0.71 | 1.03 | 1.62 | 1.73 |
| BNL3594a | 17 | −0.75 | 0.75 | 0.59 | 0.97 |
| SNP0132 | 22 | −1.06 | 1.31 | 1.04 | 1.53 |
| SNP0168 | 22 | −1.3 | 1.95 | 1.31 | 1.92 |
| SNP0129 | 22 | −0.77 | 1.09 | 0.64 | 1.12 |
| SNP0044 | 22 | −0.46 | 0.7 | 0.92 | 1.21 |
| SNP0004 | 22 | −0.53 | 0.85 | 1.19 | 1.44 |
| DC30147a | 42 | 0.41 | −1.13 | 0.42 | 0.019 |
| MUSB1117a | 45 | −0.13 | 0.09 | −0.57 | −0.72 |
| SHIN-0208a | 45 | −0.05 | −0.05 | −0.6 | −0.77 |
| SHIN-1490a | 45 | −0.31 | 0.14 | −0.44 | −0.58 |
| SNP0365 | 45 | −0.56 | 0.36 | −0.41 | −0.49 |
| SNP0348 | 52 | −0.79 | 0.75 | 1.4 | 1.6 |
| SNP0248 | 52 | −0.95 | 0.75 | 1.73 | 1.91 |
| SNP0019 | 53 | 0.25 | −0.58 | −1.49 | −1.47 |
| SNP0119 | 55 | −0.9 | 1.12 | −0.39 | −0.11 |
| SNP0104 | 55 | −0.52 | 1.13 | −0.65 | −0.43 |
| BNL2655a | 104 | 2.44 | −2.6 | −1.17 | −1.3 |
| BNL2499a | 104 | 2.052 | −2.36 | −1.14 | −1.33 |
| SNP0046 | 110 | 0.46 | −0.7 | −0.92 | −1.21 |
| SNP0045 | 110 | −0.46 | 0.7 | 0.92 | 1.21 |
Partial correlation between Y (P, K, Ca, Mn, Fe, Zn, Mo, S, Ni, Cu, As, Mg, Co, and Rb) and Xj were observed for each latent vector of matrix T (t1, t2, and t3).
| Ion | t1 | t2 | t3 |
|---|---|---|---|
| P | 0.32599596 | 0.35472220 | 0.333397 |
| K | 0.1577072 | −0.31668904 | −0.30585 |
| Ca | −0.17344796 | −0.11929882 | −0.33944 |
| Mn | −0.09707526 | −0.50462247 | −0.20505 |
| Fe | −0.31679096 | −0.17088226 | −0.02917 |
| Zn | −0.45062098 | −0.11295626 | 0.016705 |
| Mo | 0.0563702 | 0.22072761 | −0.38464 |
| S | 0.13152467 | 0.20619102 | −0.24154 |
| Ni | −0.33292305 | −0.09658496 | −0.00291 |
| Cu | −0.20325922 | −0.0886891 | −0.17913 |
| As | 0.41771351 | −0.08055791 | −0.08282 |
| Mg | 0.59218889 | 0.13912230 | 0.004347 |
| Co | 0.01165428 | 0.0185707 | 0.268 |
| Rb | 0.01356858 | 0.11154640 | 0.32252 |
Significant markers effect on the traits (for traits P, K, Ca, Mn, Mo, and Rb), showing marker name, estimate, standard error, DF, t value, p-value, and regression coefficient.
| SHIN.0473a | −70.04 | 35.21 | 94 | −1.99 | 0.04* | 2.64E+00 |
| DPL0755a | −38.1 | 18.3 | 94 | −2.08 | 0.04* | −1.00E-02 |
| SNP0002 | 67.2 | 27.47 | 94 | 2.45 | 0.02* | 8.70E-03 |
| DPL1550a | 104.18 | 44.11 | 94 | 2.36 | 0.02* | 2.08E-02 |
| SNP0126 | 85.28 | 40.29 | 94 | 2.12 | 0.04* | 3.76E+00 |
| SNP0138 | 81.29 | 34.82 | 94 | 2.33 | 0.02* | 1.32E-02 |
| SNP0043 | −103.45 | 46.35 | 94 | −2.23 | 0.03* | 4.28E-03 |
| SNP0188 | 90.93 | 41.75 | 94 | 2.18 | 0.03* | −5.78E-03 |
| SNP0452 | −81.38 | 37.49 | 94 | −2.17 | 0.03* | 4.51E-03 |
| SNP0108 | −76.29 | 36.3 | 94 | −2.1 | 0.04* | 4.13E-03 |
| SNP0031 | −98.13 | 46.4 | 94 | −2.11 | 0.04* | 3.71E-03 |
| SNP0067 | −98.13 | 46.4 | 94 | −2.11 | 0.04* | 3.71E-03 |
| P-associated | ||||||
| SNP0236 | −49.13 | 23.8105 | 94 | −2.06 | 0.04* | 9.813208e- |
| SNP0024 | 34.9 | 17.2903 | 94 | 2.02 | 0.04* | −1.811791e- |
| SNP0163 | 45.31 | 19.7703 | 94 | 2.29 | 0.02* | −1.920897e- |
| DPL1154b | 42.33 | 19.5603 | 94 | 2.16 | 0.03* | −1.984356e- |
| DPL0750a | −55.43 | 25.84 | 94 | −2.14 | 0.03* | 5.696450e- |
| DPL1846a | 44.2 | 21.94 | 94 | 2.01 | 0.04* | 4.15E-03 |
| K-associateds | ||||||
| Ca-associated | ||||||
| SNP0173 | −47 | 22.38 | 94 | −2.1 | 0.04* | −2.83E-03 |
| Mn-associated | ||||||
| SNP0470 | 0.24 | 0.12 | 94 | 2.04 | 0.04* | 2.33E-03 |
| DPL1144a | 0.26 | 0.13 | 94 | 2.04 | 0.04* | −2.472913075 |
| Mo-associated | ||||||
| SNP0470 | −16.3 | 8.2 | 94 | −1.99 | 0.04* | −3.74E-04 |
| DPL1144a | −17.85 | 8.63 | 94 | −2.07 | 0.04* | −3.78E-04 |
| Rb-associated | ||||||
| DPL1550a | 3.62E-02 | 1.63E-02 | 94 | 2.21 | 0.03* | 1.59E-02 |
| CM0007a | 2.37E-02 | 1.15E-02 | 94 | 2.05 | 0.04* | 1.21E-02 |
| BNL1414a | 2.40E-02 | 1.16E-02 | 94 | 2.06 | 0.04* | 7.14E-03 |