| Literature DB >> 25025305 |
Abstract
Linear mixed model (LMM) analysis has been recently used extensively for estimating additive genetic variances and narrow-sense heritability in many genomic studies. While the LMM analysis is computationally less intensive than the Bayesian algorithms, it remains infeasible for large-scale genomic data sets. In this paper, we advocate the use of a statistical procedure known as symmetric differences squared (SDS) as it may serve as a viable alternative when the LMM methods have difficulty or fail to work with large datasets. The SDS procedure is a general and computationally simple method based only on the least squares regression analysis. We carry out computer simulations and empirical analyses to compare the SDS procedure with two commonly used LMM-based procedures. Our results show that the SDS method is not as good as the LMM methods for small data sets, but it becomes progressively better and can match well with the precision of estimation by the LMM methods for data sets with large sample sizes. Its major advantage is that with larger and larger samples, it continues to work with the increasing precision of estimation while the commonly used LMM methods are no longer able to work under our current typical computing capacity. Thus, these results suggest that the SDS method can serve as a viable alternative particularly when analyzing 'big' genomic data sets.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25025305 PMCID: PMC4099369 DOI: 10.1371/journal.pone.0102715
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Effects of marker density on correlation between realized genetic relatedness and its expected value under the AR1 model.
Means and 90% ranges of 100 SDS and REML estimates of narrow-sense heritability (h 2) for 27 simulation trialsa.
| Realized | SDS | REML | ||||||
|
| n | m |
|
|
|
|
|
|
| 0.2 | 500 | 200 | 0.187±0.030 | 0.146–0.243 | 0.193±0.078 | 0.081–0.313 | 0.187±0.047 | 0.121–0.249 |
| 0.2 | 500 | 2000 | 0.190±0.032 | 0.140–0.244 | 0.197±0.077 | 0.082–0.342 | 0.199±0.065 | 0.095–0.303 |
| 0.2 | 500 | 20000 | 0.186±0.031 | 0.136–0.238 | 0.184±0.088 | 0.054–0.319 | 0.192±0.069 | 0.095–0.300 |
| 0.2 | 1000 | 200 | 0.194±0.029 | 0.153–0.243 | 0.198±0.069 | 0.106–0.337 | 0.197±0.039 | 0.134–0.261 |
| 0.2 | 1000 | 2000 | 0.189±0.022 | 0.157–0.223 | 0.192±0.045 | 0.118–0.262 | 0.193±0.040 | 0.132–0.263 |
| 0.2 | 1000 | 20000 | 0.191±0.024 | 0.155–0.228 | 0.201±0.086 | 0.091–0.342 | 0.194±0.048 | 0.113–0.273 |
| 0.2 | 5000 | 200 | 0.199±0.011 | 0.181–0.216 | 0.196±0.041 | 0.136–0.275 | 0.198±0.011 | 0.179–0.216 |
| 0.2 | 5000 | 2000 | 0.199±0.013 | 0.181–0.218 | 0.198±0.039 | 0.149–0.269 | 0.199±0.016 | 0.173–0.223 |
| 0.2 | 5000 | 20000 | 0.200±0.011 | 0.184–0.217 | 0.204±0.050 | 0.124–0.297 | 0.201±0.019 | 0.174–0.233 |
| 0.5 | 500 | 200 | 0.476±0.049 | 0.407–0.562 | 0.492±0.138 | 0.296–0.725 | 0.482±0.048 | 0.415–0.561 |
| 0.5 | 500 | 2000 | 0.480±0.052 | 0.394–0.564 | 0.495±0.119 | 0.300–0.702 | 0.499±0.070 | 0.384–0.602 |
| 0.5 | 500 | 20000 | 0.474±0.051 | 0.386–0.556 | 0.480±0.150 | 0.271–0.694 | 0.480±0.085 | 0.337–0.602 |
| 0.5 | 1000 | 200 | 0.487±0.046 | 0.419–0.562 | 0.494±0.114 | 0.344–0.683 | 0.495±0.035 | 0.435–0.548 |
| 0.5 | 1000 | 2000 | 0.481±0.036 | 0.428–0.535 | 0.482±0.072 | 0.374–0.613 | 0.491±0.047 | 0.417–0.565 |
| 0.5 | 1000 | 20000 | 0.485±0.038 | 0.423–0.542 | 0.511±0.144 | 0.343–0.774 | 0.490±0.050 | 0.408–0.563 |
| 0.5 | 5000 | 200 | 0.498±0.017 | 0.470–0.525 | 0.497±0.063 | 0.408–0.615 | 0.497±0.010 | 0.479–0.513 |
| 0.5 | 5000 | 2000 | 0.497±0.020 | 0.468–0.528 | 0.498±0.052 | 0.428–0.595 | 0.498±0.016 | 0.472–0.521 |
| 0.5 | 5000 | 20000 | 0.499±0.017 | 0.474–0.526 | 0.507±0.069 | 0.411–0.609 | 0.501±0.022 | 0.464–0.534 |
| 0.8 | 500 | 200 | 0.782±0.034 | 0.733–0.837 | 0.802±0.177 | 0.538–1.096 | 0.790±0.022 | 0.753–0.828 |
| 0.8 | 500 | 2000 | 0.785±0.035 | 0.722–0.838 | 0.801±0.135 | 0.592–1.022 | 0.797±0.044 | 0.725–0.860 |
| 0.8 | 500 | 20000 | 0.781±0.035 | 0.715–0.833 | 0.797±0.202 | 0.536–1.114 | 0.783±0.064 | 0.670–0.874 |
| 0.8 | 1000 | 200 | 0.790±0.031 | 0.742–0.837 | 0.797±0.137 | 0.612–1.008 | 0.797±0.014 | 0.775–0.819 |
| 0.8 | 1000 | 2000 | 0.786±0.024 | 0.749–0.821 | 0.785±0.084 | 0.652–0.953 | 0.795±0.027 | 0.754–0.842 |
| 0.8 | 1000 | 20000 | 0.789±0.026 | 0.746–0.825 | 0.829±0.191 | 0.573–1.189 | 0.796±0.032 | 0.736–0.841 |
| 0.8 | 5000 | 200 | 0.799±0.011 | 0.780–0.815 | 0.799±0.081 | 0.687–0.955 | 0.798±0.004 | 0.792–0.806 |
| 0.8 | 5000 | 2000 | 0.798±0.013 | 0.779–0.817 | 0.800±0.057 | 0.715–0.899 | 0.799±0.007 | 0.787–0.809 |
| 0.8 | 5000 | 20000 | 0.799±0.011 | 0.783–0.816 | 0.809±0.088 | 0.665–0.937 | 0.801±0.013 | 0.778–0.819 |
*Indicates a significant deviation from h 2 according to a t-test: .
The 27 simulation trials consist of three levels of true narrow-sense heritability (h 2), three sample sizes (n) and three marker densities (m). In each simulation sample, h 2 is estimated by the symmetric difference squared (SDS) method implemented in our R package, SDS/R and by residual maximum likelihood (REML) method implemented in the GCTA software.
SD = standard deviation.
Actual computational efficiency by SDS and REML procedures under samples of seven sizes (n)a.
| REML | ||||||
| GCTA | rrBLUP | SDS | ||||
| n | Time (s) | RAM(GB) | Time (s) | RAM(GB) | Time (s) | RAM(GB) |
| 500 | 0.235 | <0.01 | 1.438 | 0.27 | 0.054 | 0.26 |
| 1000 | 0.746 | 0.02 | 17.856 | 0.42 | 0.061 | 0.32 |
| 2000 | 3.881 | 0.33 | 112.780 | 0.94 | 0.084 | 0.42 |
| 5000 | 58.451 | 1.91 | 2738.159 | 4.77 | 0.231 | 0.59 |
| 10000 | 226.827 | 7.46 | 9054.193 | 19.33 | 0.756 | 1.14 |
| 20000 | 1610.518 | 28.90 | NA | 65.60 | 2.851 | 2.05 |
| 40000 | NA | NA | NA | NA | 11.286 | 5.13 |
The computational times in seconds (s) and memory requirements in gigabites (GB) that are required to run a simulated sample of size n with the true heritability of h 2 = 0.5 and marker density of m = 2000 by the symmetric difference squared (SDS) and two residual maximum likelihood (REML) methods, GCTA and rrBLUP. Each of these times and memory requirements is an average over five simulation samples.
NA indicates that no information is available due to termination of the analysis.
SDS and REML estimates of heritability for wheat grain yield in four environments (E1–E4)a.
| Environment | SDS | REML | ||||
|
| SD | CI95 |
| SD | CI95 | |
| E1 | 0.564 | 0.086 | 0.362–0.706 | 0.498 | 0.049 | 0.399–0.589 |
| E2 | 0.452 | 0.114 | 0.192–0.629 | 0.448 | 0.048 | 0.324–0.520 |
| E3 | 0.379 | 0.070 | 0.223–0.497 | 0.423 | 0.061 | 0.305–0.544 |
| E4 | 0.481 | 0.071 | 0.304–0.587 | 0.430 | 0.058 | 0.292–0.524 |
The estimates of heritability for the wheat data set taken from Crossa et al. [33] by the symmetric difference squared (SDS) method and a residual maximum likelihood (REML) method, GCTA.
The 95% confidence intervals (CI 95) are constructed based on 1000 bootstrap samples.
SDS and REML estimates of heritability for bristle number in a large wild-caught cohort of fruit fly (Drosophila melanogaster) a.
| SDS | REML | ||||||
| Sex | Trait |
| SD | 95% CI |
| SD | 95% CI |
| Female | ABN | 0.035 | 0.027 | −0.028–0.074 | 0.000 | 0.004 | 0.000–0.015 |
| Female | SBN | 0.008 | 0.023 | −0.034–0.053 | 0.000 | 0.006 | 0.000–0.020 |
| Male | ABN | 0.090 | 0.039 | −0.015–0.136 | 0.014 | 0.013 | 0.000–0.043 |
| Male | SBN | 0.005 | 0.027 | −0.033–0.072 | 0.000 | 0.009 | 0.000–0.031 |
The estimates of heritability for the fruit fly data set taken from Macdonald et al. [34] by the symmetric difference squared (SDS) method and a residual maximum likelihood (REML) method, GCTA.
ABN = abdominal bristle number and SBN = sternopleural bristle number.
The 95% confidence intervals (CI 95) are constructed based on 1000 bootstrap samples.