| Literature DB >> 34898811 |
Frederic Kosmowski1, Jordan Chamberlin2, Hailemariam Ayalew3,4, Tesfaye Sida4, Kibrom Abay5, Peter Craufurd6.
Abstract
Agricultural statistics and applied analyses have benefitted from moving from farmer estimates of yield to crop cut based estimates, now regarded as a gold standard. However, in practice, crop cuts and other sample-based protocols vary widely in the details of their implementations and little empirical work has documented how alternative yield estimation methods perform. Here, we undertake a well-measured experiment of multiple yield estimation methods on 237 smallholder maize plots in Amhara region, Ethiopia. We compare yield from a full plot harvest with farmer assessments and with estimates from a variety of field sampling protocols: W-walk, transect, random quadrant, random octant, center quadrant, and 3 diagonal quadrants. We find that protocol choices are important: alternative protocols vary considerably in their accuracy relative to the whole plot, with absolute mean errors ranging from 23 (farmer estimates) to 10.6 (random octant). Furthermore, while most methods approximate the sample mean reasonably well, the divergence of individual measures from true plot-level values can be considerable. We find that randomly positioned quadrants outperform systematic sampling schemes: the random octant had the best accuracy and was the most cost-effective. The nature of bias is non-classical: bias is correlated with plot size as well as with plot management characteristics. In summary, our results advocate that even "gold standard" crop cut measures should be interpreted cautiously, and more empirical work should be carried out to validate and extend our conclusions.Entities:
Keywords: Agricultural systems; Crop production; Crop yields; Farm survey data; Measurement errors; Sampling methods
Year: 2021 PMID: 34898811 PMCID: PMC8639447 DOI: 10.1016/j.foodpol.2021.102122
Source DB: PubMed Journal: Food Policy ISSN: 0306-9192 Impact factor: 4.552
Fig. 1Overview of the yield measurement methods. Note: These methods were preceded by farmer prediction on crop production and followed by a full plot harvest. In the random quadrant and octant methods, the location of the sample unit is only presented as an example.
Yield estimation methods used in this study.
| Measure | Description | Average duration (in minutes) |
|---|---|---|
| Farmer est. | Farmer estimate | 1 |
| W-walk | W-walk with cob collection | 38 |
| Transect | Transect with cob collection | 35* |
| Random quadrant | Randomly placed 16 m2 quadrant | 28 |
| Random octant | Random octant | 27** |
| Center quadrant | 16 m2 quadrant in plot center | 14 |
| 3 diag. quadrants | 3 × 16 m2 quadrants along plot diagonal | 41 |
| Full harvest | Full harvest | 218 |
Note: * The transect method also involved picture-taking of maize cobs, with QR codes on a dark background: mean duration is consequently higher than it would be under a strict application of the protocol; ** Enumerator time only, for data recording. The harvest was performed by hired laborers.
Fig. 2Box plots of mean yields by estimation method. Qt/ha denotes quintals per hectare (1 quintal = 100 kg).
Pairwise correlations between sampling methods.
| Full harvest | Farmer estimate | W-walk | Transect | Random quadrant | Random octant | Center quadrant | 3 diag. quadrants | |
|---|---|---|---|---|---|---|---|---|
| Full harvest | 1 | |||||||
| Farmer estimate | 0.287*** | 1 | ||||||
| W-walk | 0.256*** | 0.123 | 1 | |||||
| Transect | 0.267*** | 0.089 | 0.710 | 1 | ||||
| Random quadrant | 0.629*** | 0.120 | 0.258 | 0.262*** | 1 | |||
| Random octant | 0.812*** | 0.198** | 0.189** | 0.242*** | 0.553*** | 1 | ||
| Center quadrant | 0.545*** | 0.129* | 0.323 | 0.393* | 0.547*** | 0.513*** | 1 | |
| 3 diag. quadrants | 0.579*** | 0.125 | 0.251 | 0.357*** | 0.548*** | 0.551*** | 0.872*** | 1 |
Note: Spearman’s correlation coefficients are calculated on yield measurements from different methods. *** p < 0.001; ** p < 0.01; * p < 0.05.
Effect of plot cob distribution and variance on downward and upward measurement errors.
| W-walk | Transect | Random quadrant | Random octant | Center quadrant | 3 diag. quadrants | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Errors < 0 (Underest.) | Errors > 0 (Overest.) | Errors < 0 (Underest.) | Errors > 0 (Overest.) | Errors < 0 (Underest.) | Errors > 0 (Overest.) | Errors < 0 (Underest.) | Errors > 0 (Overest.) | Errors < 0 (Underest.) | Errors > 0 (Overest.) | Errors < 0 (Underest.) | Errors > 0 (Overest.) | |
| Total cob count | 0.27*** | 0.71*** | 0.17 | 0.76*** | −0.04 | 0.23*** | −0.18** | 0.22*** | −0.01 | 0.27*** | 0.06 | 0.27** |
| (0.08) | (0.08) | (0.09) | (0.07) | (0.06) | (0.06) | (0.06) | (0.06) | (0.08) | (0.05) | (0.06) | (0.08) | |
| Area in ha | −66.82* | −360.36*** | −27.93 | −388.81*** | −0.67 | −139.66* | 92.89** | −89.97* | 60.19 | −215.46*** | 44.03 | −204.72** |
| (30.11) | (53.60) | (31.24) | (48.90) | (35.22) | (52.70) | (29.93) | (40.94) | (49.98) | (57.15) | (43.80) | (62.06) | |
| Cob count (CV) | 0.26* | −0.06 | −0.02 | 0.27 | 0.13 | 0.13 | −0.17 | 0.19 | 0.28 | 0.46* | 0.27 | 0.46 |
| (0.12) | (0.14) | (0.13) | (0.23) | (0.12) | (0.19) | (0.12) | (0.14) | (0.23) | (0.22) | (0.22) | (0.28) | |
| Mean cob weight (CV) | −0.26 | −0.02 | −0.21* | −0.08 | −0.28* | 0.18 | −0.08 | 0.16 | −0.33* | −0.06 | 0.03 | −0.14 |
| (0.14) | (0.08) | (0.08) | (0.09) | (0.13) | (0.17) | (0.12) | (0.11) | (0.16) | (0.08) | (0.08) | (0.22) | |
| Constant | −19.71*** | 30.22*** | −17.10*** | 23.60*** | −11.27* | 16.99* | −8.44* | 5.18 | −31.76** | 18.51** | –32.78** | 18.30* |
| (4.39) | (5.64) | (4.13) | (6.10) | (4.50) | (7.97) | (3.94) | (5.67) | (10.99) | (6.01) | (11.34) | (9.09) | |
| Observations | 120 | 116 | 130 | 106 | 121 | 115 | 115 | 121 | 148 | 88 | 174 | 62 |
| R2 | 0.11 | 0.44 | 0.08 | 0.46 | 0.05 | 0.11 | 0.12 | 0.19 | 0.05 | 0.29 | 0.03 | 0.28 |
Notes: Table shows coefficient estimates from linear regression, on samples divided by the direction of measurement error. The dependent variable is the measurement error per method, calculated as the difference between the standardized full harvest and the standardized method output, in qt/ha. All continuous predictors are mean-centered and scaled by 1 standard deviation. CV = coefficient of variation. Standard errors, in parentheses, are heteroscedasticity robust. *** p < 0.001; ** p < 0.01; * p < 0.05. R2 is the coefficient of determination.
Association of measurement errors with the full harvest latent variable.
| Farmer estimate | W-walk | Transect | Random quadrant | Random octant | Center quadrant | 3 diag. quadrants | |
|---|---|---|---|---|---|---|---|
| Full harvest (qt/ha) | 0.71*** | 0.83*** | 0.82*** | 0.28 *** | 0.10 | 0.28*** | 0.26*** |
| (0.07) | (0.05) | (0.04) | (0.06) | (0.06) | (0.08) | (0.07) | |
| Constant | −34.51*** | −46.92*** | −48.48*** | −16.68*** | −4.64 | −26.42 *** | −24.58*** |
| (4.19) | (3.01) | (2.82) | (3.66) | (3.17) | (4.33) | (3.44) | |
| Observations | 237 | 237 | 237 | 237 | 237 | 237 | 237 |
| R2 | 0.35 | 0.64 | 0.61 | 0.09 | 0.02 | 0.06 | 0.06 |
Note: Table shows coefficient estimates from linear regression. The dependent variable is the measurement error per method, calculated as the difference between the standardized full harvest and the standardized method output, in qt/ha. All continuous predictors are mean-centered and scaled by 1 standard deviation. Standard errors, in parentheses, are heteroscedasticity robust. *** p < 0.001; ** p < 0.01; * p < 0.05. R2 is the coefficient of determination.
Association of measurement errors with plot management and plant growth factors.
| Farmer estimate | W-walk | Transect | Random quadrant | Random octant | Center quadrant | 3 diag. quadrants | |
|---|---|---|---|---|---|---|---|
| Full harvest (qt/ha) | 0.72*** | 0.76*** | 0.75*** | 0.31*** | 0.12* | 0.26 * | 0.21 * |
| (0.07) | (0.06) | (0.05) | (0.06) | (0.06) | (0.11) | (0.11) | |
| Area in ha | 117.25** | 26.31 | 27.65 | −81.07** | 30.03 | −0.90 | 29.67 |
| (35.66) | (21.43) | (19.99) | (29.04) | (26.66) | (34.15) | (31.32) | |
| Seed (kg/ha) | −0.13 | 0.00 | −0.06 | −0.04 | 0.03 | 0.03 | 0.07 |
| (0.08) | (0.04) | (0.04) | (0.04) | (0.03) | (0.05) | (0.05) | |
| Improved seed | −14.15* | 2.75 | 2.43 | −1.91 | −6.99 | 2.49 | −1.71 |
| (6.14) | (2.95) | (3.19) | (6.29) | (4.09) | (5.12) | (4.41) | |
| N (kg/ha) | 0.00 | 0.01 | 0.01 | 0.00 | −0.01 | 0.00 | 0.00 |
| (0.01) | (0.00) | (0.01) | (0.00) | 0.00) | (0.01) | (0.01) | |
| Labor (days/ha) | −0.02 | 0.00 | 0.01 | 0.00 | 0.02* | 0.01 | 0.02 |
| (0.01) | (0.01) | (0.01) | (0.02) | (0.01) | (0.02) | (0.01) | |
| Irrigated plot: 1 = Yes | −0.13 | 5.69** | 6.22** | −3.14 | −3.10 | 2.61 | 3.48 |
| (3.20) | (2.01) | (2.37) | (2.85) | (2.20) | (3.45) | (3.09) | |
| Erosion control methods: 1 = Yes | 0.18 | −5.42** | −6.28** | −8.21** | 0.47 | −3.33 | −3.65 |
| (2.82) | (1.84) | (1.97) | (2.79) | (2.03) | (3.35) | (2.98) | |
| Constant | −41.99*** | −50.48*** | −49.19*** | 1.81 | −11.38 | −27.70** | −32.69*** |
| (8.43) | (4.89) | 94.95) | (7.63) | (6.15) | (8.84) | (8.07) | |
| Observations | 237 | 237 | 237 | 237 | 237 | 237 | 237 |
| R2 | 0.47 | 0.67 | 0.65 | 0.17 | 0.08 | 0.07 | 0.08 |
Note: The dependent variable is the measurement error per method, calculated as the difference between the standardized full harvest and the standardized method output, in qt/ha (1 qt = 100 kg). All continuous predictors are mean-centered and scaled by 1 standard deviation. Standard errors, in parentheses, are heteroscedasticity robust. *** p < 0.001; ** p < 0.01; * p < 0.05. R2 is the coefficient of determination.
Maize production function estimates.
| Full harvest | Farmer estimate | W-walk | Transect | Random quadrant | Random octant | Center quadrant | 3 diag. quadrants | |
|---|---|---|---|---|---|---|---|---|
| Area in ha | −0.61 | −2.81 | −0.34 | −0.39 | 1.84 | −1.25 | 0.16 | −0.34 |
| (0.68) | (1.56) | (0.40) | (0.37) | (0.99) | (0.84) | (0.69) | (0.64) | |
| LN (Seed (kg/ha) | 0.13 | 0.28* | 0.05 | 0.06 | 0.17 | 0.08 | 0.09 | 0.03 |
| (0.08) | (0.14) | (0.05) | (0.04) | (0.10) | (0.09) | (0.08) | (0.06) | |
| Improved seed: 1 = Yes | −0.04 | −0.36** | 0.05 | 0.04 | 0.15 | −0.20 | 0.02 | −0.04 |
| (0.07) | (0.12) | (0.07) | (0.06) | (0.30) | (0.12) | (0.11) | (0.09) | |
| LN (N (kg/ha) | 0.10 | −0.05 | −0.03 | −0.01 | 0.05 | 0.20 | 0.04 | 0.04 |
| (0.06) | (0.06) | (0.03) | (0.03) | (0.07) | (0.11) | (0.08) | (0.07) | |
| LN (Labor (days/ha)) | 0.00 | 0.02 | 0.00 | 0.01 | 0.02 | −0.02 | 0.00 | 0.00 |
| (0.02) | (0.04) | (0.01) | (0.01) | (0.03) | (0.02) | (0.02) | (0.02) | |
| Irrigated plot: 1 = Yes | 0.25*** | 0.23 | −0.03 | −0.04 | 0.32*** | 0.27** | 0.18** | 0.16** |
| (0.06) | (0.13) | (0.04) | (0.04) | (0.09) | (0.10) | (0.07) | (0.06) | |
| Erosion control methods: 1 = Yes | −0.01 | −0.12 | 0.09* | 0.11** | 0.26* | −0.04 | 0.07 | 0.05 |
| (0.06) | (0.12) | (0.04) | (0.04) | (0.10) | (0.08) | (0.06) | (0.05) | |
| Constant | 2.94*** | 3.65*** | 3.95*** | 3.86*** | 2.21*** | 2.84*** | 3.42*** | 3.82*** |
| (0.46) | (0.70) | (0.30) | (0.28) | (0.66) | (0.76) | (0.55) | (0.50) | |
| Observations | 237 | 237 | 237 | 237 | 237 | 237 | 237 | 237 |
| R2 | 0.14 | 0.07 | 0.04 | 0.06 | 0.11 | 0.14 | 0.05 | 0.05 |
Note: Dependent variable = LN (Maize yield (in qt/ha)). Standard errors, in parentheses, are heteroscedasticity robust. *** p < 0.001; ** p < 0.01; * p < 0.05. R2 is the coefficient of determination.
Fig. 3Protocol durations (a) and cost-effectiveness (b) of alternative sampling methods. Mean protocol duration, measured in minutes, is indicated by black dots while the line corresponds to one standard deviation from the mean. Cost-effectiveness is measured by additional changes in accuracy per US$1,000 spent on data collection.