| Literature DB >> 34437540 |
Mitchell J Feldmann1, Hans-Peter Piepho2, William C Bridges3, Steven J Knapp1.
Abstract
The development of genome-informed methods for identifying quantitative trait loci (QTL) and studying the genetic basis of quantitative variation in natural and experimental populations has been driven by advances in high-throughput genotyping. For many complex traits, the underlying genetic variation is caused by the segregation of one or more 'large-effect' loci, in addition to an unknown number of loci with effects below the threshold of statistical detection. The large-effect loci segregating in populations are often necessary but not sufficient for predicting quantitative phenotypes. They are, nevertheless, important enough to warrant deeper study and direct modelling in genomic prediction problems. We explored the accuracy of statistical methods for estimating the fraction of marker-associated genetic variance (p) and heritability ([Formula: see text]) for large-effect loci underlying complex phenotypes. We found that commonly used statistical methods overestimate p and [Formula: see text]. The source of the upward bias was traced to inequalities between the expected values of variance components in the numerators and denominators of these parameters. Algebraic solutions for bias-correcting estimates of p and [Formula: see text] were found that only depend on the degrees of freedom and are constant for a given study design. We discovered that average semivariance methods, which have heretofore not been used in complex trait analyses, yielded unbiased estimates of p and [Formula: see text], in addition to best linear unbiased predictors of the additive and dominance effects of the underlying loci. The cryptic bias problem described here is unrelated to selection bias, although both cause the overestimation of p and [Formula: see text]. The solutions we described are predicted to more accurately describe the contributions of large-effect loci to the genetic variation underlying complex traits of medical, biological, and agricultural importance.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34437540 PMCID: PMC8425577 DOI: 10.1371/journal.pgen.1009762
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
REML estimates of marker-associated variance (), the fraction of the genetic variance explained by markers (), and marker heritability () from random marker effects analyses and coefficients of determination (R2) from Type II and Type III fixed marker effects analyses for large effect loci identified in cattle, sunflower, and strawberry studies.
| Study | Source |
|
| Variance Component |
|
| Type II | Type III | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |||||
| Cattle White Spotting |
| 25 | — |
| 7.92 | — | 0.76 | 3.88 | — | 0.37 | — | — |
| 2 | 0.35 |
| 0.62 | — | 0.06 | 0.21 | — | 0.02 | 0.04 | 0.00 | ||
| 2 | 0.41 |
| 2.91 | — | 0.28 | 1.20 | — | 0.11 | 0.21 | 0.08 | ||
| 2 | 0.54 |
| 3.81 | — | 0.37 | 2.04 | — | 0.20 | 0.23 | 0.10 | ||
| 4 | 0.58 |
| 0.00 | — | 0.00 | 0.00 | — | 0.00 | 0.00 | 0.00 | ||
| 4 | 0.67 |
| 0.00 | — | 0.00 | 0.00 | — | 0.00 | 0.01 | 0.01 | ||
| 4 | 0.70 |
| 0.37 | — | 0.04 | 0.26 | — | 0.02 | 0.01 | 0.01 | ||
| 7 | 0.77 |
| 0.22 | — | 0.02 | 0.17 | — | 0.02 | 0.01 | 0.01 | ||
| 2,935 | — |
| 5.26 | — | — | 5.26 | — | — | — | — | ||
| Sunflower Oil Content | Entry ( | 145 | — |
| 21.61 | — | 0.95 | 21.61 | — | 0.95 | — | — |
| 145 | — |
| 30.76 | 1.42 | 1.35 | 22.15 | 1.02 | 0.98 | — | — | ||
|
| 7 | — |
| 17.85 | 0.83 | 0.79 | 9.24 | 0.43 | 0.41 | — | — | |
|
| 1 | 0.48 |
| 11.57 | 0.54 | 0.51 | 5.59 | 0.26 | 0.25 | 0.21 | 0.26 | |
|
| 1 | 0.47 |
| 1.26 | 0.06 | 0.06 | 0.60 | 0.03 | 0.03 | 0.02 | 0.04 | |
|
| 1 | 0.49 |
| 2.9 | 0.13 | 0.13 | 1.41 | 0.07 | 0.06 | 0.05 | 0.10 | |
| 1 | 0.77 |
| 0.21 | 0.01 | 0.01 | 0.17 | 0.01 | 0.01 | 0.01 | 0.01 | ||
| 1 | 0.78 |
| 1.89 | 0.09 | 0.08 | 1.46 | 0.07 | 0.06 | 0.03 | 0.04 | ||
| 1 | 0.77 |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ||
| 1 | 0.88 |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | ||
| 138 | — |
| 12.91 | 0.60 | 0.57 | 12.91 | 0.60 | 0.57 | — | — | ||
| Residual ( | 144 | — |
| 2.07 | — | — | 2.07 | — | — | — | — | |
| Strawberry Fusarium Wilt | Entry ( | 557 | — |
| 3.26 | — | 0.98 | 3.26 | — | 0.98 | — | — |
| 557 | — |
| 4.77 | 1.46 | 1.44 | 2.39 | 0.73 | 0.72 | — | — | ||
| 2 | 0.47 |
| 4.48 | 1.37 | 1.35 | 2.09 | 0.64 | 0.63 | 0.84 | 0.84 | ||
| 555 | — |
| 0.30 | 0.09 | 0.09 | 0.30 | 0.09 | 0.09 | — | — | ||
| Residual ( | 1,631 | — |
| 0.23 | — | — | 0.23 | — | — | — | — | |
| Strawberry Fusarium Wilt | Entry ( | 540 | — |
| 3.30 | 0.98 | 3.30 | — | 0.98 | — | — | |
| 540 | — |
| 4.01 | 1.21 | 1.20 | 3.45 | 1.05 | 1.03 | — | — | ||
| 2 | 0.62 |
| 1.48 | 0.45 | 0.44 | 0.93 | 0.28 | 0.28 | 0.22 | 0.22 | ||
| 538 | — |
| 2.53 | 0.77 | 0.75 | 2.53 | 0.77 | 0.75 | — | — | ||
| Residual ( | 1,584 | — |
| 0.23 | — | — | 0.23 | — | — | — | — | |
Statistics are shown for three marker loci (rs10, rs45, and rs20) associated with genetic variation for white spotting (%) in a cattle population (n = 2, 973) with a single phenotypic observation per individual and highly unbalanced marker data [85]. The marker loci were identified by GWAS. The linear mixed model for the cattle analysis was identical to that for the sunflower analysis without replications (r = 1). k coefficient equations for three loci with unbalanced data are shown in S3 Text.
Statistics are shown for three marker loci (BR, PHY, and HYP) associated with genetic variation for seed oil content (%) in a sunflower recombinant inbred line (RIL) population (n = 146) with nearly balanced marker data and multiple phenotypic observations (replications) per RIL [53]. The marker loci were identified by QTL mapping. Variance components were estimated from LMM (27) for the AMV method and LMM (S13) for the ASV method.
Statistics are shown for two SNP markers (AX396 and AX493) associated with genetic variation for resistance to Fusarium wilt in a strawberry population (n = 565) with unbalanced SNP marker data and multiple phenotypic observations per individual [86]. AX396 and AX493 are tightly linked and both were in LD with a dominant gene (FW1) conferring resistance to Fusarium wilt but had significantly different genotypic ratios among individuals in the population. Variance components were estimated from LMM (2) for the AMV method and LMM (15) for the ASV method. The k coefficient a single locus with unbalanced data are shown in S1 Text.
Type II R2 is the coefficient of partial determination estimated from a Type II ANOVA, where the main and interactions effects of markers are fixed. For the cattle example, the reduction in sums of squares for main effects were estimated with the other main effects in the genetic model without interactions, e.g., the reduction in SS for rs10 was R(rs10|rs45, rs20). Similarly, the reduction in SS for each two-locus interaction was estimated without main or three-way interaction effects in the genetic model, e.g., the Type II reduction in sum of squares for the rs10 × rs45 interaction was R(rs10 × rs45|rs45, rs20, rs10 × rs20, rs45 × rs20) and so on for the other two-locus interactions. Finally, the reduction in SS for the three-locus interaction was R(rs10 × rs45 × rs20|rs10, rs45, rs20, rs10 × rs45, rs10 × rs20, rs45 × rs20).
Type III R2 is the coefficient of partial determination estimated from a Type III ANOVA, where the main and interactions effects of markers are fixed, e.g., the reduction in sums of squares for rs10 in the cattle example was estimated by fitting rs10 with all other factors in the model: R(rs10|rs45, rs20, rs10 × rs45, rs10 × rs20, rs45 × rs20, rs10 × rs45 × rs20).
Type I, II, and III sums of squares for fixed effect analyses of markers associated with QTL identified in GWAS and QTL mapping experiments in cattle and sunflower.
| Study | Source |
| Type II SS | Type III SS | |||||
|---|---|---|---|---|---|---|---|---|---|
| ABC | ACB | BAC | BCA | CAB | CBA | ||||
| Cattle White Spotting | 3,552.3 | 3,552.3 | 1,707.2 | 591.4 | 1,208.7 | 591.4 | 542.5 | 22.1 | |
| 6,539.7 | 4,259.6 | 8,384.8 | 8,384.8 | 4,259.6 | 4,876.8 | 4,282.7 | 1,394.9 | ||
| 4,880.5 | 7,160.7 | 4,880.5 | 5,996.4 | 9,504.4 | 9,504.4 | 4,834.3 | 1,788.4 | ||
| 12.7 | 12.7 | 12.7 | 12.7 | 12.7 | 12.7 | 14.3 | 47.4 | ||
| 132.7 | 132.7 | 132.7 | 132.7 | 132.7 | 132.7 | 107.4 | 234.0 | ||
| 193.1 | 193.1 | 193.1 | 193.1 | 193.1 | 193.1 | 193.1 | 91.5 | ||
| 143.5 | 143.5 | 143.5 | 143.5 | 143.5 | 143.5 | 143.5 | 143.5 | ||
| 15,512.9 | 15,512.9 | 15,512.9 | 15,512.9 | 15,512.9 | 15,512.9 | 15,512.9 | 15,512.9 | ||
| Sunflower Oil Content |
| 1,624.0 | 1,624.0 | 1,708.8 | 1,904.0 | 1,829.7 | 1,904.0 | 1,881.4 | 1,711.2 |
|
| 298.2 | 254.2 | 213.4 | 213.4 | 254.3 | 180.0 | 220.2 | 208.3 | |
|
| 537.1 | 581.0 | 537.1 | 342.0 | 375.4 | 375.4 | 507.0 | 511.6 | |
| 57.9 | 57.9 | 57.9 | 57.9 | 57.9 | 57.9 | 49.7 | 50.0 | ||
| 168.0 | 168.0 | 168.0 | 168.0 | 168.0 | 168.0 | 172.1 | 195.5 | ||
| 11.1 | 11.1 | 11.1 | 11.1 | 11.1 | 11.1 | 11.1 | 7.6 | ||
| 36.6 | 36.6 | 36.6 | 36.6 | 36.6 | 36.6 | 36.6 | 36.6 | ||
| 4,113.4 | 4,113.4 | 4,113.4 | 4,113.4 | 4,113.4 | 4,113.4 | 4113.4 | 4,113.4 | ||
| Residual | 553.8 | 553.8 | 553.8 | 553.8 | 553.8 | 553.8 | 553.8 | 553.8 | |
For each Type I ANOVA, the six possible orders of the three main effects (marker loci A, B, and C) were tested in the genetic model, where A = rs10, B = rs45, and C = rs20 for the cattle example and A = BR, B = PHY, and C = HYP for the sunflower example. The interactions were added to the genetic model in a single sequence: A × B, A × C, B × C, and A × B × C. The three letters indicate the sequence with which markers loci entered the genetic model, e.g., for the ABC order, the sums of squares for the three main effects were SS(A|μ), SS(B|A, μ), and SS(C|A, B, μ), where μ is the population mean and factors to the right of the vertical bar were included in the model. Similarly, for the CBA order, the sums of squares for the three main effects were SS(C|μ), SS(B|C, μ), and SS(A|B, C, μ). The sequences with which interactions were added to the genetic model were identical in the six Type I analyses, e.g., the sums of squares for the A × B interaction was SS(A × B|A, B, C, μ) and for the three-way interaction was SS(A × B × C|A, B, C, A × B, A × C, B × C, μ).
Statistics are shown for three marker loci (rs10, rs45, and rs20) associated with genetic variation for white spotting (%) in a cattle population (n = 2, 973) with a single phenotypic observation per individual and highly unbalanced marker data [85]. The markers were identified by GWAS. The linear model for the cattle analysis was identical to the linear model for the sunflower analysis without replications (r = 1); hence, the residual in the cattle analysis was the entry nested in marker source of variation. k coefficients for three loci with unbalanced data are shown in S3 Text.
Statistics are shown for three marker loci (BR, PHY, and HYP) associated with genetic variation for seed oil content (%) in a sunflower recombinant inbred line (RIL) population (n = 146) with nearly balanced marker data and multiple phenotypic observations (replications) per RIL [53]. k coefficients for three loci with unbalanced data are shown in S3 Text.
Fig 1Accuracy of AMV and ASV estimators of marker heritability.
AMV and ASV estimates of are shown for 1,000 segregating populations simulated for different numbers of entries (n individuals, families, or strains), five replications/entry (r = 5), true marker heritability () ranging from 0 to 1, and one to three marker loci with three genotypes/marker locus (n = 3). AMV estimates of marker heritability (; red highlighted observations) and ASV estimates of marker heritability (; blue highlighted observations) are shown for: (A) one locus with balanced data for n = 540 entries (study design 1); (B) two marker loci with interaction (M1, M2, and M1×M2) and balanced data for n = 540 (study design 2); (C) three marker loci with interactions (M1, M2, M3, M1×M2, M1×M3, M2×M3, and M1×M2×M3) and balanced data for n = 540 (study design 3); (D) an population segregating 1:2:1 for one marker locus with r = 135 entries for both homozygotes and r = 270 heterozygous entries, and n = 540 (study design 4); (E) one locus with 10% randomly missing data among 540 entries (study design 5); and (F) one locus with 33% randomly missing data among 540 entries (study design 6). Study design details are shown in S1 Table.
Fig 2Effect of r, n, and on the relative bias of AMV and ASV estimators of .
(A and B) Phenotypic observations were simulated for 1,000 populations segregating for a single marker locus with three genotypes (n = 3), n = 900 progeny, and r = 1, 2, 5, 10, or 20 (study designs 7–11). The marker locus was assumed to be in complete linkage disequilibrium with a single QTL that explains 50% of the phenotypic variance (). (A) Distribution of the relative biases of AMV estimates of for different r. The relative bias was identical for different r. (B) Distribution of the relative biases of ASV estimates of for different r. The relative bias was identical for different r. (C and D) Phenotypic observations were simulated for 1,000 populations segregating for a single marker locus with three genotypes (n = 3), five replications/entry (r = 5), and n = 450, 900, 1,800, 3,600, or 7,200 entries/population (study designs 12–16). The marker locus was assumed to be in complete linkage disequilibrium with a single QTL that explains 50% of the phenotypic variance (). (C) Distribution of the relative biases of AMV estimates of for different n. The relative bias was identical across the variables tested. (D) Distribution of the relative biases of ASV estimates of for different n. The relative bias () was identical across the variables tested. (E and F) Phenotypic observations were simulated for 1,000 populations segregating for a single marker locus with three genotypes (n = 3), five replications/entry (r = 5), and n = 450 entries/population. The marker locus was assumed to be in complete linkage disequilibrium with a single QTL that explains 5–95% of the phenotypic variance ( to 0.95 (study designs 17–21). (E) Distribution of the relative biases of AMV estimates of for different . The relative bias was identical across the variables tested. (F) Distribution of the relative biases of ASV estimates of for different . The relative bias was identical across the variables tested.