Literature DB >> 35711574

Misperceiving and misreporting input quality: Implications for input use and productivity.

Tesfamicheal Wossen¹, Kibrom A Abay², Tahirou Abdoulaye³.

Abstract

Farmers in developing countries routinely misperceive or misreport input quality for various reasons, which introduces substantial measurement error in farm survey data. In this paper, we motivate and illustrate, both analytically and empirically, the inferential and behavioral implications of misperception and misreporting using a unique crop variety identification data from Nigeria. Using a non-parametric framework for testing the presence of measurement error, we show that crop variety misclassification in our data is mostly driven by misperception. We then demonstrate the inferential challenges of treating misperception as misreporting and vice versa. Finally, we show that misperception induces crowding-in(out) of complementary agricultural inputs but these misperception-driven input allocations may not necessarily be yield-enhancing. As such, rectifying misperception by addressing agricultural input market imperfections may improve farmers' investment choices and productivity outcomes.

Entities: Chemical

Keywords: Agricultural inputs; Misclassification; Misperception; Misreporting; Nigeria; Smallholders

Year: 2022 PMID： 35711574 PMCID： PMC9193619 DOI： 10.1016/j.jdeveco.2022.102869

Source DB: PubMed Journal: J Dev Econ ISSN： 0304-3878

Introduction

Agricultural input markets in many sub-Saharan African (SSA) countries remain imperfect, contributing to sub-optimal use of improved agricultural inputs (Ashour et al., 2019, Bold et al., 2017). In particular, information asymmetry about the quality of inputs often limits farmers’ ability to use complementary agricultural inputs consistently (Ashour et al., 2019, Bold et al., 2017, Michelson et al., 2021, Gilligan and Karachiwalla, 2019, Wossen et al., 2020). In the absence of institutional and technological instruments for quality assurance, farmers often utilize available information in local markets and other observable attributes of inputs to form subjective beliefs about the quality and variety of inputs they acquire (Ashour et al., 2019, Bold et al., 2017, Michelson et al., 2021, Berazneva et al., 2018). However, locally available information and observable attributes of inputs may not reveal the authenticity and quality of inputs accurately and hence farmers may misperceive and/or misreport true input quality. In the context of improved crop varieties, the focus of this paper, the difficulty of observing quality in the local markets is often compounded by the credence nature of seed quality: neither genetic nor physical quality traits can be observed and assessed by farmers accurately at the time of exchange (Ashour et al., 2019, Michelson et al., 2021). For instance, using genetic test based varietal identification, recent studies find that farmers in developing countries substantially misreport or misperceive the improvement status of the crop varieties they grow (Wossen et al., 2019a, Wineman et al., 2020, Floro et al., 2018, Kosmowski et al., 2016, Maredia et al., 2016). In many developing countries, measurement error in input quality or type is a pervasive feature of farm survey data (Carletto et al., 2013, Gourlay et al., 2019, Beegle et al., 2012). Focusing on crop variety identification, this paper considers two important sources of measurement error in farm survey data: misreporting, which arises due to poor measurement (e.g., erroneous survey response or recording errors during data collection) and misperception or mistaken beliefs which arises due to various frictions such as seed market imperfections (Wilhelm, 2019, Drerup et al., 2017). In the case of misreporting, farmers’ latent perceptions are expected to be identical with the truth since farmers misreporting their true perceptions are assumed to know and base their decision up on their latent perceptions instead of its measurement (Lewbel, 2007, Chetty, 2012). On the other hand, misperception is not a survey error per se and refers to the deviation between farmers’ latent perceptions and the truth, an informational asymmetry that reflects the underlying decision making process of farmers, including their input and production choices (Drerup et al., 2017). As such, while self-reported values may differ from the truth both under misreporting and misperception, farmers’ observed input use and intensification choices are expected to deviate from the frictionless optimum under misperception but not misreporting in the sense that misperception cannot be conditionally independent of farmers’ decision making behavior (Drerup et al., 2017, Hu and Wansbeek, 2017, Wilhelm, 2019, Chetty, 2012). Moreover, when the misreported or misperceived variable of interest is binary, its measurement error (i.e., misclassification) is inherently nonclassical as the true value and its measurement share the same discrete support, making the classical measurement error assumptions implausible (Mahajan, 2006, Lewbel, 2007, Hausman et al., 1998, Hu, 2008, DiTraglia and García-Jimeno, 2019, Meyer and Mittag, 2017).1 Therefore, understanding the inferential and behavioral implications of misclassification requires identifying its origins: survey misreporting or misperception. Using a unique self-reported and genetic-test based cassava variety identification data from Nigeria, this paper aims to identify the origins of variety misclassification based on the informational content of farmers’ complementary input choices (i.e., fertilizer and herbicide use in our context). Variety misclassification in this paper specifically refers to whether a farmer knows/believes that he/she is growing an improved or local variety, which can be driven by either misreporting (i.e., the farmer knows the true genetic quality of the variety but misreport it in the survey) or misperception (i.e., the farmer holds mistaken beliefs about the true genetic quality of the variety).2 In the context of Nigeria, cassava variety misperception is related, at least partially, to the informality of the underlying cassava seed market. In particular, due to the credence nature of seed quality, farmers may acquire planting material that is not what they think it is due to asymmetric quality information either because sellers have lost track of the variety type being exchanged, or through willful deception (Wineman et al., 2020, Wossen et al., 2020, Jaleta et al., 2020). We focus on crop variety misperception since accurate knowledge of variety traits affect the adoption decision of complementary farm inputs for improved seed (Emerick et al., 2016, Macours, 2019). For instance, Emerick et al. (2016) find evidence of crowding-in of fertilizer and other modern cultivation practices among adopters of improved rice varieties in India. In the context of Tanzania, Wineman et al. (2020) find that farmers apply more fertilizer when they perceive (correctly or incorrectly) growing an improved maize variety. Similarly, Ashour et al. (2019) find that the supply of adulterated and substandard agricultural inputs in local markets affect farmers’ perception about input qualities and willingness to adopt modern inputs. Against this backdrop, this paper aims to examine the following inferential and behavioral implications of crop variety misclassification. First, building on the existing measurement error literature (Mahajan, 2006, Wilhelm, 2019), we identify and decompose the origins of variety misclassification into misreporting and misperception. We do so in a data-driven fashion, by examining the conditional means of input use functions non-parametrically, which are expected to vary across the two underlying sources of misclassification. Second, we examine the inferential implications of variety misclassification focusing on the estimation of input complementarity relationships. In particular, we derive and characterize the expected bias that one obtains under misperception and misreporting and show, both analytically and empirically, the inferential challenges of treating misperception as misreporting. Third, we propose appropriate empirical specifications to carry out hypothesis tests and statistical inference on input complementarity relationships in the presence of misperception and misreporting. Fourth, we empirically study the behavioral implication of misperception, focusing on farmers’ observed fertilizer and herbicide use and intensification choices. Finally, we analytically and empirically relate the implication of variety misperception and the implied input use behavior of farmers to inference related to yield. Our non-parametric test results suggest that farmers’ latent variety perceptions are identical to their reports but not with the genetic test results, implying variety misclassification, at least in our setting, is mostly driven by misperception instead of misreporting. We then show that these misperceptions have important inferential implications, particularly for inference related to the estimation of input complementarity relationships as well as input and land productivity. More specifically, we show that treating variety misperception as misreporting results in substantially biased and spurious input complementarity relationship estimates. For instance, consistent with our non-parametric test results, we find that estimating input complementarity relationships using farmer variety reports as a proxy for their true but unobserved perceptions results in unbiased input complementarity relationship estimates. However, estimating input complementarity relationships using the genetic test results as a proxy for farmers’ unobserved variety perceptions yields input complementarity estimates that are biased downward by up to 70 percent, implying treating misperception as misreporting can lead to potentially wrong policy prescriptions. In terms of behavioral response, we show that farmer variety perceptions (correct or incorrect) influences the allocation of fertilizer and herbicide inputs. Specifically, we find that false positive(negative) variety perceptions induce crowding-in(out) of fertilizer and herbicide inputs. However, our estimates on the yield response of these inputs suggest that misperception induced input allocations may not necessarily be yield-enhancing. That is, by distorting the allocation of complementary farm inputs, misperception could introduce inefficiencies in the production process. Besides the inferential and behavioral implications of misperception, our results underscore the potential costs of agricultural input (and information) market imperfections on farmers’ investment choices and productivity outcomes. In particular, crop variety misperception due to seed market imperfections might explain the co-existence of input over(under) use and low agricultural productivity as input complementarities are driven by farmer perceptions (correct or incorrect), irrespective of the true genetic quality of the crop varieties they grow. As such, rectifying misperception by addressing agricultural input and seed market imperfections can play an important role in improving farmers’ investment choices and productivity outcomes. This paper broadly relates to the literature on nonclassical measurement error (Mahajan, 2006, Black et al., 2000, Frazis and Loewenstein, 2003, Lewbel, 2007, Aigner, 1973, Battistin et al., 2014, Bollinger, 1996, Schennach, 2007, Kane et al., 1999, Hu, 2008, DiTraglia and García-Jimeno, 2019) and makes a unique contribution to the evolving literature on the inferential and behavioral implications of misperception and misreporting in farm survey data (Wossen et al., 2019a, Abay et al., 2021). Our results provide a first-hand empirical evidence that measurement error in survey responses can be driven by misperception instead of misreporting, with distinct inferential and behavioral implications. Our paper also relates to recent studies uncovering substantial misallocation of agricultural inputs across plots and crops in African agriculture (Gollin and Udry, 2019, Restuccia and Santaeulalia-Llopis, 2017). Finally, this paper also contributes to the literature on the behavioral dimensions of technology adoption (Bulte et al., 2014, Macours, 2019). The remainder of the paper is organized as follows. Section 2 describes our analytical framework to motivate and illustrate the inferential and behavioral implications of crop variety misclassification. In Sections 3, 4, we report the data, research setting and our empirical estimation strategies, respectively. Section 5 reports the main results and Section 6 provides concluding remarks.

Analytical framework

In this section, we layout an analytical framework to motivate and illustrate the inferential and behavioral implications of misclassification. Focusing on farmers’ crop variety misclassification, our analytical framework proceeds as follows: First, we characterize the expected bias under survey misreporting and misperception. We then introduce a non-parametric framework to carry out hypothesis tests for the presence of misclassification. Finally, we discuss our non-parametric and parametric estimation methods under our maintained identification assumptions.

Input use decision

Suppose that a farmer makes input use decision based on her perceived crop variety type () at the time of input use choices. Although our empirical tests are based on non-parametric specifications, for simplicity, we assume the following parametric specification for input use: where is farmers’ variety perception (i.e., whether it is improved or not) at the time of their input choices, denotes farmers’ observed complementary input choices (e.g., fertilizer and herbicide inputs) and is a mean-zero error term.3 Throughout this paper, we assume , otherwise there is no meaningful relationship to estimate and is exogenous, so that . However, since an individual’s latent variety perception (i.e., ) is unobservable, inference via Eq. (1) is infeasible. Our main goal is thus to identify the conditional mean function, , from the joint distribution of the observables in our data. Although is unobserved, we do observe two relevant proxies/surrogates in our data. The first is farmers’ self-reported variety information from the survey which we denote by (i.e., farmer reports) and the second is a result of the genetic test, which we denote by (i.e., genetic test results). Both and are binary with their support given by . The relationship between and as well as that of and can be defined as follows: where and are assumed to be uncorrelated with the error term of Eq. (1) (i.e., ; , see Assumption 2.1 below) and , depending on the value of and in relation to . In the absence of misreporting, farmer reports () are expected to be the same as their perceptions and hence misreporting arises when is a misclassified version of (i.e., ). In our setting, could differ from due to survey errors (e.g., recording by enumerators, intentional misreporting by farmers, etc.). Similarly, the genetic test result () may differ from , when farmers’ hold mistaken variety beliefs due to various market frictions. For instance, due to the informality of the underlying cassava seed market in Nigeria, farmers may hold mistaken variety beliefs due to asymmetric quality information either because sellers have lost track of the variety type being exchanged, or through willful deception (Wineman et al., 2020, Wossen et al., 2019a).

Bias under misreporting

Suppose that the mechanism underlying variety misclassification is misreporting. In the presence of misreporting, Eq. (1) is expected to be the true data generation process (DGP) guiding farmers’ input use decision. However, the researcher observes farmer reports (), which is a misclassified version of . Under this scenario, the researcher operationalizes the following DGP for input use: . Recalling in Eq. (2a) above, the error term of the operational model can be expressed as: ). To simplify our notations, let and be and , respectively. Denote , for as the probability of false positive and negative variety reports, respectively.4 Given our assumption that and , attempting to recover using as a measure of farmers’ true but unobserved variety perceptions () leads to the following bias: The identity in Eq. (3) suggests that the size and sign of the bias in depends on the extent of misreporting (i.e., ). For instance, assuming , would be biased towards zero provided that and may even be estimated with the wrong sign when . We note that, under this scenario, can directly be identified using with the additional assumption that farmer perceptions are equal to the genetic test results (i.e., ). That is, in the absence of variety misperception, the informational content of can reveal and hence estimating Eq. (1) using , instead of , would overcome the bias in .

Bias under misperception

Next, let us assume that farmer reports () are the same as their true but unobserved variety perceptions () but these reports deviate from the genetic test results (i.e., ). Under this scenario, the researcher who only has access to farmer reports will recover via Eq. (3) since there is no misclassification in (i.e., ). Suppose that, in addition to , the researcher observes , which, compared to , is a more reliable and accurate measure of the genetic quality of the cassava varieties grown by farmers. Let be and denote , for . Under this scenario, attempting to recover via with the assumption that farmer perceptions are identical to the genetic test results leads to the following bias:5 Eq. (4) shows that so long as . Our characterization of the expected bias under misreporting in Eq. (3) and misperception in Eq. (4) underscores the importance of identifying the origins of misclassification. However, the origins of misclassification are unknown ex-ante and hence we cannot a priori determine whether or is the appropriate measure of in order to carry out inference on farmers’ input choices and input complementarity relationships. Therefore, drawing on the misclassification literature (e.g., Mahajan, 2006, Wilhelm, 2019), the next section introduces a non-parametric framework to carry out hypothesis tests for the presence of misclassification in or/and .

Testing for misclassification

This section lays out a non-parametric framework for testing the presence of misclassification in or/and . As discussed above, our central premise is based on the notion that when the underlying misclassification mechanism, misreporting or misperception, is unknown, and can only be considered as two surrogates of the true but unobserved variety perception (). To fix ideas, we maintain the following non-differential misclassification assumption for input use throughout this paper.

Non-Differential Misclassification

Assumption 2.1 is the conditional mean independence assumption and states that what matters to farmers’ input choices is their variety perceptions () instead of the two proxies, and . Thus, it ensures that once is known, and are not informative about the conditional mean of (i.e., ).6 Assumption 2.1 is standard in the measurement error literature (Mahajan, 2006, Wilhelm, 2019, Imai and Yamamoto, 2010). Hence, our hypothesis tests for the presence of misclassification in or/and relies on the validity of Assumption 2.1. The non-differential misclassification assumption of for input use is reasonable since the genetic-test results were not available to the farmers at the time of their input choices and genotyping error, if any, is likely to be random. Hence, the informational content of can only affect farmers’ input choices through its correlation with their variety perceptions (i.e., is expected to be a strong predictor of farmers’ variety perception formation () and conditional on it will not be informative about the conditional mean of ). Although the assumption of non-differential misclassification in is somewhat restrictive, it can be justified in our setting since all cassava farmers are subject to the same seed market shocks. The exclusion restriction stated in Assumption 2.1 depends on the latent variety perception variable () and thus cannot be directly tested. However, as shown in Wilhelm (2019), testing the above exclusion restriction does not require solving for the distribution of the unobservable latent perceptions. Thus, we exploit our assumptions on the expected DGP of input use under misreporting and misperception to express Assumption 2.1 in terms of observables in our data (i.e., and ).

No Misreporting: w.r.t. Assumption 2.1

Assumption 2.2 implies that farmer perceptions are equal to their reports. Under Assumption 2.1, Assumption 2.2, we can derive the following testable implication: That is, under Assumption 2.1, the conditional mean of in Eq. (5) does not depend on if and only if there is no misclassification in (Mahajan, 2006). The equality in (5) depends only on observables and can directly be tested without imposing any parametric assumptions about how the conditional mean of depends on (Mahajan, 2006, Wilhelm, 2019, Lee and Wilhelm, 2019). It is, however, important to note that the equality in (5) is a joint test of Assumption 2.2, Assumption 2.1. Therefore, rejecting the equality in (5) implies that either one or both of Assumption 2.1, Assumption 2.2 do not hold. But failure to reject the equality does not imply that both assumptions are true, rather it implies that our data is consistent with the joint assumption.

No Misperception: w. r. t. Assumption 2.1

Assumption 2.3 implies that farmer perceptions are equal to the genetic test results. Under Assumption 2.1, Assumption 2.3, we can derive the following testable implication: Again, the testable implication in Eq. (6) is a joint test of Assumption 2.3, Assumption 2.1. However, under our maintained assumption that misclassification in the two proxies is non-differential for input use, the testable implications in Eqs. (5), (6) are informative on whether our data supports Assumption 2.2 or Assumption 2.3. Hence, under Assumption 2.1, if Eq. (5) holds but not Eq. (6), our data rejects Assumption 2.3 (i.e., ) and is consistent with Assumption 2.2 (i.e., ). On the other hand, if Eq. (6) holds but not (5), our data rejects Assumption 2.2 and is consistent with Assumption 2.3. Alternatively, the above assumptions can also be tested by imposing linearity in the conditional expectations of the input use functions. As discussed above, the testable implication of Eq. (5) is equivalent to the restriction that conditional on , is not informative about the conditional mean of . Hence, imposing linearity, the testable implications of the above assumption can be reformulated as follows: Under Assumption 2.1, Assumption 2.2 implies that in Eq. (7). The hypothesis test in Eq. (7) relies on linearity of the conditional expectations of the input use functions and hence should only be considered as predictive/suggestive test.7 For instance, the test will be uninformative in the presence of misspecification (i.e., nonlinearities in the regression equations) since measurement error and nonlinearities can manifest themselves in similar ways (Wilhelm, 2019). In fact, in Eq. (7), testing for the presence of misclassification in is equivalent to testing for the exclusion restriction of (Mahajan, 2006, D’Haultfœuille et al., 2021).8

Point identification

The above non-parametric tests do not require solving for the distribution of the unobservable latent perceptions (i.e., estimation of the misclassification probabilities) and hence are very informative as a first step model specification test (Mahajan, 2006, Wilhelm, 2019). For instance, under Assumption 2.1, can be point-identified via Eq. (3) if Assumption 2.2 holds or via Eq. (4) when Assumption 2.3 holds. However, if both assumptions are rejected, cannot be point-identified either via Eq. (3) or Eq. (4).9 Following Mahajan (2006), this section describes the conditions for non-parametric identification of using as a proxy for and as an instrumental-like variable (ILV) to deal with the potential measurement error in . Hence, in addition to Assumption 2.1 above, we also maintain the following additional conditions.

Identification Assumptions

Assumption 2.4 states that conditional on , and have independent sources of error. This assumption is expected to hold in our case because of the way the data are collected. In particular, originates from the survey (farmers) and from a genetic test, which is beyond farmers control as the genetic test results were not revealed to farmers at the time of their input choice (i.e., genotyping error, if any, originates from the laboratory analysis and hence is independent of the reporting errors by farmers in the survey conditional on their true variety perceptions). Hence, conditional on , the error in has a very different origin from the error in as they originate from two independent sources. Assumption 2.4 is the monotonicity condition and restricts the extent of misclassification in to ensure a positive correlation between and . That is, the misclassification in is not so severe in the sense that is a better predictor of compared to (Mahajan, 2006, Hu, 2008, DiTraglia and García-Jimeno, 2019). Assumption 2.4 is the instrumental variable relevance condition and ensures a strong correlation between and so that is informative about . Under Assumption 2.1, Assumption 2.4, misreporting rates (i.e., and ) can be point-identified as a function of the observable moments of (, , ) in our data.10 To see this, let us define the relationship between and in relation to where since is binary as follows: Denote ; and . Using the law of total probability, the relationship between and is given as: Hence, if and are known, will be identified via Eq. (8) and via Eq. (3) since is observable in our data. Thus, we can estimate consistently using as ILV to deal with the misclassification in using a Generalized Method of Moments (GMM) estimator.

Implications for inference on productivity

In this section, we aim to show the inferential implications of misclassification focusing on the reduced form biophysical relationship between improved genetics and yield () (i.e., cassava output per hectare). However, the non-differential misclassification assumption of the two proxies for input use (i.e., Assumption 2.1) is unlikely to hold for yield since is expected to directly affect cassava yields over and above its indirect effect via and (i.e., since improved varieties generate higher yields even after conditioning on and ). Hence, with the additional assumption that measures the true genetic quality of cassava varieties grown by farmers without error, we choose to specify the reduced form biophysical relationship using instead of the latent perception variable (). Since is observed in our data, estimating the reduced form biophysical relationship between and is rather straightforward in the absence of variety misperception (i.e., w.r.t. Assumption 2.1). Under this scenario, the relevant unconditional non-differential measurement error assumption for yield becomes: Assumption 2.5 states that farmer reports are not informative about yields once we know the genetic test results as well as farmers perception at input choice time. When , the testable implication of Assumption 2.5 states that conditional on , farmer reports provide no additional information about the conditional mean of .11 Assuming a linear relationship, the reduced form biophysical relationship between the true genetic quality of the cassava varieties grown by farmers () and can be specified using the following production function: To demonstrate the expected bias due to variety misreporting, the error term, , is assumed to be mean zero and uncorrelated with .12 As before, let us assume that the researcher has access to farmer reports () only, which is a misclassified version of . Recalling in the presence of only variety misreporting, the production function estimated by the researcher is given by: . Hence, estimating Eq. (9) using instead of leads to the following bias: Since , due to the non-classical nature of variety misreporting, it follows that . In this case, estimating Eq. (10) using in stead of would completely overcome the bias in . If the testable implication of Assumption 2.5 is rejected (i.e., conditional on , provides relevant information about the conditional mean of ), it must be the case that either w.r.t. Assumption 2.1 or/and Assumption 2.5 do not hold for yield. In fact, in the presence of misperception, since input use decisions are made based on perceived () instead of true genetic quality (), misperception will introduce an endogeneity problem in Eq. (9) (i.e., , the OLS estimator of Eq. (11) could be biased since the assumption that will not hold). Hence, we reformulated our non-differential measurement error assumption for yield as follows: Assumption 2.6 invokes the non-differential measurement error assumption conditional on farmers perception at input choice time (), ex-post input choices () and the true genetic quality of the cassava varieties grown by farmers (). When , the testable implication of Assumption 2.6 states that farmer variety reports are not informative about the conditional mean of once we know the genetic test results and farmers’ ex-post input choices (). Given the above non-differential variety misperception assumption for yield, the reduced form biophysical relationship between and can be re-formulated as follows: Assuming is mean zero and uncorrelated with both and , the OLS estimator of Eq. (11) will be unbiased. As shown in Eq. (10) above, when estimating Eq. (11) using instead of , we implicitly assume the following regression specification: . In this case, the expected bias in depends not only on the non-classical nature of variety misreporting (i.e., ) but also on whether is zero or not. In particular, even if both and are expected to be exogenous in Eq. (11), the correlation between and cannot be assumed to be zero as long as is strongly correlated with (Nguimkeu et al., 2021).13 Since true genetic quality () is assumed to be measured without error, rejecting the equality implied by the observable implication of Assumption 2.6 would imply that either Eq. (11) does not include all the relevant input choices (i.e., all the relevant inputs were not included in ) or/and measurement error is differential for yield in the sense that farmer perceptions affect yields not only through input choices but also via other channels. For the sake of exposition, let us assume that farmer perceptions affect yields only through input choices. In particular, imposing this additional assumption, Assumption 2.6 can be re-formulated as follows: Under Assumption 2.7, conditional on and , farmer variety perceptions or reports will not be informative about the conditional mean of . In fact, the assumption that farmer perceptions only affect yields through input choices is equivalent to the restriction that conditional on , w.r.t. Assumption 2.5. As discussed above, in the presence of misperception, input adjustments that were not included in will remain in the error term of Eq. (11) (i.e., our assumption that is mean zero and uncorrelated with both and will not hold). However, imposing Assumption 2.7, we can probe the relevance of misperception by conditioning our regressions on , and in spirit of control function approach. That is, since both and measure the same concept (i.e., the genetic quality of the cassava varieties grown by farmers) and given that is assumed to measure genetic quality without error, we are assuming that conditional on and , is expected to capture misperception induced input choices that were not included in . Under this scenario, the appropriate regression specification for yield becomes: Since is mean zero and uncorrelated with and conditional on by Assumption 2.7, the OLS estimator of Eq. (12) will be unbiased.14 In fact, under Assumption 2.7, if all the relevant input choices are captured by as assumed in Eq. (11), the coefficient associated with is expected to be statistically insignificant (i.e., or conditional on and , is not relevant for ).15

Data and descriptive statistics

For examining the implications of variety misclassification, we use data from the Cassava Monitoring Survey (CMS) of Nigeria.16 The CMS was conducted in 2016 with the aim of measuring the adoption rate of improved cassava cultivars in Nigeria. To do so, data were collected from 16 states, which together account for more than 80 percent of total cassava production of the country. To collect nationally representative data, a multistage stratified sampling design was employed. First, the 16 states were grouped into four geopolitical regions. Then, 125 enumeration areas (EAs) were selected from each region using probability proportional to size (PPS) sampling approach. From each EA, 5 cassava growing households were then randomly selected for interview. During the post-planting survey, detail information about the type of cassava varieties grown by farmers were collected using two different approaches. The first is by asking farmers to report the type of cassava variety they grow, specifically whether the cassava variety they grow is improved or not. This corresponds to the standard variety data collection approach often employed in household surveys.17 Second, leaf samples from all identified cassava plots of farmers were also collected to accurately identify the improvement status of the cassava varieties grown by farmers through DNA-fingerprinting. Since the DNA-fingerprinting approach is independent of environmental conditions or plant growth stage, the improvement status of the cassava varieties grown by an individual farmer can be identified accurately (Rabbi et al., 2015).18 We then match farmer reports with the genetic-test results to explore the extent of mismatch in the two variety identification methods. Table 1 presents summary statistics associated with variety identification, input use and other household and plot characteristics.19 In our analysis, farmers’ report is coded as a binary indicator using farmers’ response on the improvement status of the cassava varieties they grow. That is, for each identified cassava plot, farmers’ report takes a value of one if the farmer identifies the cassava variety as improved and zero otherwise. Similarly, for each identified cassava plot, the genetic test result is coded as one if the cassava variety is confirmed to be improved and zero otherwise. According to farmer reports, about 56% of the plots are planted with improved cassava varieties but the genetic test results show that 70% of the plots are planted with improved cassava varieties. From the matched plot level variety information, we find mismatches between farmer reports and the genetic test results in about 36% of the plots in the full sample (i.e., while plots with and cases account about 25% and 11% of the full sample, plots with and cases account about 19% and 45% of the full sample, respectively).20 As discussed in Section 2, since the underlying misclassification mechanism is unknown ex-ante, the above reported mismatches between farmer reports and genetic-test results suggest the presence of either variety misperception or misreporting in our data.21

Table 1

Descriptive statistics.

	Mean	Standard deviation
Improved variety (TR)	0.56	0.49
Improved variety (TG)	0.70	0.46
Fertilizer use (1 = Yes; 0 = No)	0.26	0.44
Fertilizer (kg/ha)	22.9	50.4
Herbicide use (1 = Yes; 0 = No)	0.47	0.50
Herbicide (lit/ha)	6.18	12.35
Labor ( in person-days)	55.25	88.6
Yield (t/ha)	14.81	9.64
Household size (# members)	4.6	2.4
Age of household head	51.7	13.7
Education of household head (Years of schooling)	8.7	4.9
Gender of household head (1 = Male, 0 = Female)	0.9	0.3
Access to extension (1 = Yes; 0 = No)	0.36	0.48
Membership in cassava growers’ association (1 = Yes; 0 = No)	0.21	0.41
Plot manager (1 = Men, 0 = Jointly/Women)	0.37	0.48
Plot soil fertility status (1 = Good, 0 = Medium/Poor)	0.74	0.44
Plot distance from residence (kilometers)	2.02	2.1

No. observations	3933

Descriptive statistics. From each cassava plot, we also elicited production and input use information. For instance, fertilizer and herbicide inputs were applied in about 26% and 47% of the plots, respectively. To uncover farmers’ observed input use behavior, we report the means of fertilizer and herbicide input use conditional on farmer reports and genetic test results in Table 2. Results suggest a positive correlation between adoption of improved cassava varieties and input use decisions. However, this correlation is much stronger conditional on farmer reports compared to the genetic test results. In Table 2, we also report the mean of cassava yield given farmer reports and genetic-test results. Unlike input use patterns, cassava yields are much higher conditional on compared to . The above empirical pieces of evidence suggest that while farmers’ variety report is strongly correlated with their input use behavior, yields seem to be more aligned with the true genetic attributes of the cassava varieties grown by farmers. We will formally test these empirical patterns using both parametric and non-parametric methods in Section 5.

Table 2

Conditional means of yield, fertilizer and herbicide inputs.

Variable	(1)	(2)	(3)	(4)	(5)	T-test difference
	TR=0	TR=1	TG=0	TG=1	Total
	Mean/SE	Mean/SE	Mean/SE	Mean/SE	Mean/SE	(2)-(1)	(4)-(3)
Fertilizer use	0.199 (0.010)	0.312 (0.010)	0.229 (0.012)	0.276 (0.009)	0.262 (0.007)	0.113***	0.047***
Herbicide use	0.259 (0.010)	0.640 (0.010)	0.370 (0.014)	0.515 (0.010)	0.472 (0.008)	0.381***	0.145***
Yield (t/ha)	13.485 (0.211)	15.869 (0.217)	11.844 (0.236)	16.08 (0.19)	14.812 (0.154)	2.384***	4.076***
No. observations	1744	2189	1175	2758	3933

Notes:The value displayed for t-tests are the differences in the means across the groups. ***, **, and * indicate significance at the 1, 5, and 10 percent critical level.

Conditional means of yield, fertilizer and herbicide inputs. Notes:The value displayed for t-tests are the differences in the means across the groups. ***, **, and * indicate significance at the 1, 5, and 10 percent critical level.

Estimation

We now describe the empirical strategy we employed to estimate the quantities of interest. We first start by carrying out the non-parametric tests described in Section 2.2 for each input, both at the intensive and extensive margin of input use. Under our maintained assumption that misclassification in the two proxies is non-differential for input use, we test the null of no misclassification in (i.e., the hypothesis that farmer perceptions are equal to their reports as stated in Assumption 2.2) by running a non-parametric regression of on and as follows: where stands for observed input use by household at plot . That is, under Assumption 2.1, the conditional mean of does not depend on if and only if there is no misclassification in . Hence, can be considered as a test of misreporting in the sense that poor measurement derives misclassification in our data. Similarly, we test the null of no misclassification in (i.e., the hypothesis that farmer perceptions are equal to the genetic test results as stated in Assumption 2.3) as follows: Under Assumption 2.1 the conditional mean of does not depend on if and only if there is no misclassification in . Hence, is a test of no misperception in the sense that market frictions such as asymmetric quality information derives misclassification in our data (Wilhelm, 2019, Lee and Wilhelm, 2019). That is, in the absence of market frictions, farmers’ input allocation decision is expected to be based on the true genetic quality of the crop varieties they grow (i.e., based on ). We carry out the above hypothesis tests for the null of no misclassification in the two proxies using the Cramer–von Mises (CvM) and Kolmogorov–Smirnov (KS) test statistics. Under Assumption 2.1, if Eq. (13) holds but not Eq. (14), misperception is considered to be the main source of variety misclassification in our data. On the other hand, if Eq. (14) holds but not (13), misreporting is considered to be the main source of variety misclassification in our data. Next, we carry out hypothesis tests for the presence of misclassification under parametric model specification and examine the inferential and behavioral implications of misclassification. To examine the inferential implication of misclassification, we follow our analytical framework and estimate input complementarity relationships based on the OLS estimator from a regression of on a constant and as well as on and a constant. As discussed in Section 2.1, in terms of inference on input complementarity relationships, can be point-identified via Eq. (3) if Assumption 2.2 holds or via Eq. (4) when Assumption 2.3 holds. Thus, following our analytical framework, we estimate the following adoption propensity equations:22 where all terms are as defined above and stands for household fixed effects.23 Imposing linearity in the input equations, Eqs. (15a), (15b) tests the restriction that conditional on , is not informative about the conditional mean of . Under our maintained assumption that misclassification in and is non-differential for input use, and in these regression specifications imply that is a more accurate measure of compared to . In this case, the OLS estimator from a regression of on (and a constant) is expected to be unbiased (i.e., ). On the other hand, and implies that is a more accurate measure of compared to . In this case, the OLS estimator from a regression of on (and a constant) is expected to be unbiased (i.e., ). Next, we explore the behavioral implications of misclassification, focusing on farmers’ observed input use and intensification choices. To facilitate comparisons, we introduce a discrete variable, , which assumes the following values depending on the value of and . Using the above relationship, we then estimate the following regression specification: where all terms are as defined above and are other determinants of input use. Unlike the more simplified analytical frameworks, our empirical specification in Eq. (17) is slightly saturated and hence accounts for additional household and plot level characteristics.24 Assuming a positive relationship between adoption of improved cassava varieties and input use and letting as the base category, we expect a positive and statistically significant coefficient on and but an insignificant coefficient on in the presence of misperception. Similarly, letting as the base category, is expected to be statistically insignificant while and are expected to be negative and significant in the presence of variety misperception. In terms of inference, the OLS estimator of in Eq. (17) can also be used to establish the lower bounds of (Black et al., 2000). In particular, under our maintained assumptions and assuming the following condition is expected to hold in our data: . Once we examined the behavioral and inferential implications of misclassification on input use and intensification choices, we extend and test the inferential implications of misclassification on productivity. We do so, for demonstrating two objectives: (i) we aim to show the implication of misclassification to inference related to yield, (ii) we also aim to explore whether input crowding-in(out) driven by misperception are yield-enhancing or not. To demonstrate this, we estimate the following linear regression specifications: where stands for log-transformed cassava yield for each plot. As discussed in Section 2.4, Eq. (18a) generates biased estimates due to the non-classical nature of variety misclassification. Eq. (18b) is estimated conditional on and (i.e., to account for input adjustments). For the purpose of hypothesis testing, we also estimate Eq. (18c) (i.e., as explicated in Eq. (12), with the assumption that misperception affects yields only through input choices, conditional on and , is expected to be insignificant.) Finally, we examine the inferential implication of misperception and associated input adjustments on the marginal yield response of inputs using the following empirical specification: where all terms are as defined above. To quantify the implications of input allocation driven by misperception, we interact with . These interaction terms can inform whether mis(perception) driven allocation of inputs such as fertilizer is yield-enhancing or not. For instance, if misperception induced crowding-in of fertilizer has no discernible influence on yield , then the allocation of fertilizer can be argued to be inefficient.

Results

Non-parametric and parametric test results

In this section, we present the results of our formal non-parametric and parametric tests carried out to identify the presence of misclassification in and . As discussed in Section 2.2, these tests gauge whether farmer perceptions are equal to their reports (i.e., ) or the genetic test results (i.e., ). For each input use, both at the extensive and intensive margins, the results of the non-parametric tests are reported in Panel A of Table 3. For each input, the first two column report the results of the unconditional and conditional version of our tests, respectively. In the conditional version of our test, the non-differential misclassification assumption of the two proxies for input use is expected to hold conditional on and other additional controls () (i.e., . We use the -value of both the Cramer–von Mises (CvM) and Kolmogorov–Smirnov (KS) test statistics to compare the equality of the input use expectations (i.e., null hypothesis of no misclassification in the two proxies).

Table 3

Test results for the presence of misclassification.

Panel A: Non-parametric test results

T∗=TR w.r.t. E[X\|T∗,TR,TG]=E[X\|T∗]

	Fertilizer				Herbicide
	dummy		kg/ha		dummy		lit/ha
p(CvM < CvM∗)	0.222	0.328	0.746	0.438	0.016	0.332	0.154	0.884
p(KS < KS∗)	0.174	0.558	0.622	0.638	0.006	0.208	0.108	0.836

T∗=TG w.r.t. E[X\|T∗,TG,TR]=E[X\|T∗]

p(CvM < CvM∗)	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
p(KS < KS∗)	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000

(#) of bootstrap replications	500	500	500	500	500	500	500	500

Panel B: Parametric test results

βˆRTR	0.109***	0.107***	0.667***	0.661***	0.372***	0.320***	0.783***	0.674***
	(0.023)	(0.023)	(0.115)	(0.115)	(0.028)	(0.030)	(0.08)	(0.088)
βˆGTG	0.017	0.017	0.043	0.042	0.044	0.023	0.071	0.007
	(0.024)	(0.024)	(0.117)	(0.116)	(0.029)	(0.029)	(0.086)	(0.088)

Naive Wald	0.15	0.153	0.468	0.473	0.001	0.131	0.097	0.892
LM-Wald	0.264	0.266	0.581	0.585	0.011	0.188	0.192	0.906

Panel C: Parametric test results with household fixed effects

βˆRTR	0.141***	0.141***	0.860***	0.860***	0.294***	0.294***	0.526***	0.526***
	(0.039)	(0.04)	(0.199)	(0.199)	(0.048)	(0.048)	(0.160)	(0.160)
βˆGTG	−0.006	−0.006	−0.071	−0.07	0.009	0.009	−0.058	−0.058
	(0.033)	(0.033)	(0.148)	(0.148)	(0.031	(0.031)	(0.105)	(0.105)

Other controls	No	Yes	No	Yes	No	Yes	No	Yes
No. observation	3933	3933	3933	3933	3933	3933	3933	3933

Note. For parametric tests, standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The conditioning variable in the fertilizer input equation is distance to the nearest fertilizer dealer. In the herbicide input equation, the conditioning variables are self-reported incidence of cassava pest and herbicide price. The FE estimates for the unconditional and conditional version of our tests are the same since the conditioning variables are plot invariant.

In the first two rows of Panel A, we test whether farmer perceptions are equal to their reports (i.e., ). In this case, conditional on , is expected to be uninformative about the conditional mean of . In the unconditional version of our tests, except for herbicide input use at the extensive margin, both the CvM and KS versions of our tests consistently fail to reject the null of no misclassification in (i.e., ). In the conditional version of our tests, we consistently fail to reject the null of no misclassification in (i.e., . Following a similar logic, we also test whether farmer perceptions are equal to the genetic test results in the third and fourth rows of Panel A. In this case, conditional on , is expected to be uninformative about the conditional mean of . Both the conditional and unconditional version of our tests consistently reject the null of no misclassification in (i.e., ). Hence, under Assumption 2.1, our data rejects the assumption of no misclassification in and is consistent with the assumption of no misclassification in . Imposing linearity, we also report additional predictive parametric tests by regressing on both and (and a constant) in Panel B and C of Table 3. Under our maintained assumptions, the hypothesis that holds if and . Test results reported in Panel B and C of Table 3 show that, and holds in our data. In addition, the p-values of the Naive Wald statistics reported at the bottom of Panel B shows that the exclusion restriction of is satisfied in our data.25 The p-values of the Lochner–Moretti (LM) statistics, which is robust to misspecification (i.e., nonlinearities in the regression specifications), further confirms that the exclusion restriction of is satisfied in our data (Lochner and Moretti, 2015). Thus, consistent with our non-parametric tests, the results reported in Panel B and C of Table 3 support the hypothesis that farmer perceptions are equal to their reports but not with the genetic-test results. However, since is expected to measure the true genetic quality of the cassava varieties grown by farmers more accurately compared to , the above test results imply that the main source/origin of variety misclassification in our data is misperception instead of misreporting. Note that, since our non-parametric tests rely on the weaker conditional mean independence assumption (i.e., Assumption 2.1), instead of the absence of misclassification in , the above test results imply misclassification in , if any, will not distort the conditional mean of (Wilhelm, 2019).26 That is, for randomly drawn cassava farmers, the following condition hold in our data: but . Our results slightly differ from those documented by Abay et al. (2021), who reported that measurement error in plot size represent a mixture of misreporting and misperceptions.27 However, our results are intuitive in the sense that farmers are more likely to be prone to variety misperception due to the credence nature of crop variety traits. Test results for the presence of misclassification. Note. For parametric tests, standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The conditioning variable in the fertilizer input equation is distance to the nearest fertilizer dealer. In the herbicide input equation, the conditioning variables are self-reported incidence of cassava pest and herbicide price. The FE estimates for the unconditional and conditional version of our tests are the same since the conditioning variables are plot invariant.

Point identification results

In this section, we directly explore the inferential implications of misclassification focusing on the estimation of input complementarity relationships. As discussed in Section 2.3, the results of the above non-parametric tests are informative about the appropriate estimation strategies since the input complementarity relationships estimated using and are expected to be unbiased in the absence of misreporting and misperception, respectively. In our case, since both the parametric and non-parametric tests consistently fail to reject the hypothesis that farmer perceptions are equal to their reports, we expect the difference between and to be statistically insignificant. To confirm this, we report the GMM point-estimates of as well as the naive OLS and IV estimates that ignores misclassification in Table 4. In particular, for each input, both at the extensive and intensive margins, we report the GMM point-estimates of in the first row. We then report OLS estimates of and using and as a proxy for (i.e., by regressing on (and a constant) and on (and a constant), respectively) in the second and third rows, respectively. The fourth row reports the results of our parameter equality tests (i.e., ).

Table 4

Point identification results.

Estimation strategy	Fertilizer		Herbicide
	Dummy	Kg/ha	Dummy	Lit/ha
E(X\|T∗): GMM	0.145***	0.742***	0.450***	0.925***
	(0.042)	(0.216)	(0.050)	(0.15)
E(X\|TR): OLS	0.113***	0.677***	0.382***	0.799***
	(0.023)	(0.112)	(0.027)	(0.080)
E(X\|TG): OLS	0.047**	0.224**	0.145***	0.284***
	(0.023)	(0.113)	(0.03))	(0.088)

E(X\|TR)−E(X\|TG)	0.066***	0.453***	0.237***	0.515***
	(0.026)	(0.133)	(0.033)	(0.094)

Estimated misclassification rates in TR and TG

Misreporting rates (α0+α1)	0.156	0.098	0.155*	0.114
	(0.233)	(0.221)	(0.081)	(0.139)
Misperception rates (κ0+κ1)	0.703***	0.745***	0.728***	0.733***
	(0.068)	(0.064)	(0.033)	(0.041)

Additional estimation results

βIVR: 2SLS	0.172*	0.824*	0.532***	1.04***
	(0.088)	(0.422)	(0.103)	(0.312)
βIVG: 2SLS	0.488***	2.93***	1.65***	3.46***
	(0.108)	(0.549)	(0.187))	(0.445)
(TG=TR) sample: OLS	0.115***	0.690***	0.398***	0.802***
	(0.031)	(0.143)	(0.036)	(0.113)

No. observation	3933	3933	3933	3933

Note. Standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The sample has 2504 observations.

Consistent with our non-parametric test results, estimates reported in Table 4 suggest that the GMM point-estimates of are almost identical with the OLS estimates that uses as a proxy for (i.e., ). For instance, for fertilizer use (kg/ha), estimates of and are 0.677 and 0.742, respectively, suggesting the bias in is only about 9 percent. Note that, the difference between the two conditional means is expected to be statistically insignificant (i.e., as confirmed in our non-parametric test results above). On the other hand, the OLS estimator that uses as a proxy for (i.e., ) yields estimates of that are biased towards zero by up to 70 percent. Under our maintained assumptions, the difference between the GMM point-estimates of and is expected to be caused by measurement error in (i.e., misperception). In fact, since our GMM estimation approach point identifies the misclassification rates in and , we can also get a sense of how different the farmer reports and genetic test results are from the true, but unobserved, perception (). Inspecting the estimated misclassification rates reported in Table 4, we can see that the estimated misclassification rate in is small (albeit non-zero) and mostly statistically insignificant but the estimated misclassification rate in is very high and statistically significant.28 In particular, while the maximum misreporting rate derived from our GMM estimation approach is about 15% (i.e., measurement error in denoted by: , for ), the corresponding misperception rate in our data is about 75%(i.e., measurement error in denoted by: , for )). Given the estimates reported in Table 4 (i.e., ), the inferential implications of misperception can thus be inferred by probing the equality of and . As expected, the parameter equality test reported in the fourth row of Table 4 (i.e., ) detects statistically significant difference in the two conditional means. In fact, we reject the null of parameter equality for all inputs, both at the intensive and extensive margins. This comparison reveals that the estimated input complementarity relationship is stronger when using farmer reports instead of genetic-test results as a proxy to farmers’ unobserved but true variety perceptions. Therefore, failure to identify the origins of misclassification could undermine inference related to input complementarity relationships, implying that treating misperception as survey misreporting and vice versa can lead to potentially wrong policy prescriptions. For instance, if we wrongly assume misreporting instead of misperception as the source of variety misclassification (i.e., this amounts imposing the assumption that is a more reliable and accurate measure of both the unobserved but true variety perceptions of farmers () and the genetic quality of the cassava varieties grown by farmers compared to ), inference on input complementary relationships relies on instead of . However, the parameter equality tests reported in the fourth row of Table 4 suggest that is almost threefold of . To provide additional perspectives on the inferential implications of misperception, we also report the naive IV estimates that ignores misclassification from a regression of on using as instrument (i.e., ) and a regression of on using as instrument (i.e., ) in the seventh and eighth rows of Table 4, respectively. In the final row, we also report OLS estimates from a restricted sample by discarding observations with conflicting and values as if these observations were missing at random. Under our maintained assumptions, we expect the naive IV estimates to be larger in magnitude relative to their OLS counterparts and the GMM point-estimates of since non-differential measurement error in our setting is expected to inflate the IV estimates (Aigner, 1973, DiTraglia and García-Jimeno, 2020).29 IV estimates reported in the seventh and eighth rows of Table 4 show that the naive IV estimator generates significantly inflated estimates compared to the GMM point-estimates of . As expected, we find that the IV estimator that uses as a proxy and as instrument (i.e., ) is mostly similar to the GMM point-estimates of . As mentioned above, in the absence of misclassification in (i.e., when ), is expected to be identical with . In our case, given the estimated misreporting rates (i.e., reported in the fifth row of Table 4), we would expect to be: as long as is a valid instrument. However, we find that the IV estimator that uses as a proxy and as instrument (i.e., ) is up to four times larger in magnitude compared to the GMM point-estimates of . This is expected given the high misclassification rate in in our data. That is, given the estimated misperception rate in our data (i.e., reported in the sixth row of Table 4), we would expect to be: as long as is a valid instrument. Point identification results. Note. Standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The sample has 2504 observations.

Implications for input use behavior

In this section, we explore the relationship between the different forms of variety misperceptions and farmers’ observed input use behavior since farmers are expected to act up on their perceptions (wrong or right). To quantify and compare the behavioral implication of the different forms of misperception, we categorize farmer perceptions of improved varieties into the following four groups following Eq. (16) in Section 4: (i) correct improved variety perception (i.e., ); (ii) false positive perception (i.e., ); (iii) false negative perception (i.e., ) and (iv) correct local variety perception (i.e., ). Our estimates on farmers actual input use and intensification behavior are reported in Table 5. For each input, we report both OLS and FE estimates from Eq. (17) using correctly identified local variety () as a reference group. Furthermore, at the bottom of Table 5, we also report parameter equality tests. These tests probe the correlation between the different forms of variety perceptions and observed input allocation decisions based on the equality of the estimated coefficients conditional on the different forms of farmer perceptions (i.e., ).

Table 5

Misperception and input use behavior.

Panel A: Fertilizer use behavior
	Dummy				Kg/ha
	OLS	FE	OLS	FE	OLS	FE	OLS	FE
ϕ0(V0)	0.115***	0.125***	0.106***	0.123***	0.690***	0.780***	0.636***	0.769***
	(0.031)	(0.043)	(0.031)	(0.043)	(0.143)	(0.210)	(0.142)	(0.21)
ϕ1(V1)	0.057	0.105*	0.059***	0.103*	0.572***	0.828***	0.579***	0.819***
	(0.041)	(0.061)	(0.04)	(0.061)	(0.217)	(0.297)	(0.215)	(0.297)
ϕ2(V2)	−0.016	−0.03	−0.027	−0.03	−0.018	−0.092	−0.074	−0.096
	(0.029)	(0.043)	(0.029)	(0.043)	(0.127)	(0.192)	(0.129)	(0.19)

ϕ0=ϕ1(p- value)	0.128	0.686	0.210	0.688	0.568	0.832	0.778	0.823
ϕ0=ϕ2(p- value)	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
ϕ1=ϕ2(p- value)	0.06	0.032	0.024	0.032	0.005	0.002	0.002	0.002

Panel B: Herbicide use behavior

ϕ0(V0)	0.398***	0.298***	0.265***	0.296***	0.802***	0.440***	0.624***	0.428**
	(0.036)	(0.059)	(0.036)	(0.058)	(0.113)	(0.172)	(0.111)	(0.171)
ϕ1(V1)	0.289***	0.276***	0.204***	0.274***	0.540***	0.426**	0.401***	0.417**
	(0.051)	(0.062)	(0.053)	(0.062)	(0.143)	(0.193)	(0.144)	(0.192)
ϕ2(V2)	−0.009	−0.003	−0.009	−0.004	−0.084	−0.124	−0.081	−0.130
	(0.034)	(0.051)	(0.033)	(0.051)	(0.113)	(0.17)	(0.110)	(0.170)

ϕ0=ϕ1(p- value)	0.020	0.477	0.040	0.492	0.003	0.906	0.067	0.928
ϕ0=ϕ2(p- value)	0.000	0.000	0.000	0.000	0.000	0.002	0.000	0.002
ϕ1=ϕ2(p- value)	0.000	0.000	0.000	0.000	0.000	0.007	0.000	0.007

Other controls	No	No	Yes	Yes	No	No	Yes	Yes
No. observation	3933	3933	3933	3933	3933	3933	3933	3933

The estimates reported in Table 5 show that farmers are more likely to use fertilizer and herbicide inputs when they perceive growing an improved variety, irrespective of the true genetic quality of the cassava varieties they grow. The reverse holds true when farmers perceive growing unimproved variety. As such, farmers are more likely to crowd-in(out) fertilizer and herbicide inputs when they perceive the cassava varieties they planted are improved (unimproved), implying farmer variety perceptions affect their adoption decision of complementarity farm inputs. Reassuringly, the above findings on input use patterns are consistent across the intensive margins of fertilizer and herbicide input use, implying farmers are not just crowding-in(out) fertilizer and herbicide inputs but they are also changing their intensification behavior. The parameter equality tests reported at the bottom of Table 5 provides additional behaviorally relevant information about farmers’ input use and intensification choices. In particular, while we can consistently reject the null of and , we mostly fail to reject the null of . As shown in our analytical exercise, such input use behavior is expected to occur when misperception is the dominant source of variety misclassification (i.e., misperception reflects the underlying input use behavior of farmers). The above results broadly corroborate the findings by Wineman et al., 2020, Emerick et al., 2016, Bulte et al., 2014. In the context of Tanzania, Wineman et al. (2020) report that farmers apply more fertilizer when they perceive growing an improved maize variety. Emerick et al. (2016) find evidence of crowding in of fertilizer and other modern cultivation practices among adopters of improved rice varieties in India. Similarly, in Tanzania (Bulte et al., 2014) find significant difference in farmers’ effort based on their perception about the improvement status of the cowpea varieties they planted. Our results have important implications in terms of identifying whether existing input complementarity estimates are driven by simple farmer perceptions or agronomic synergies. Our findings suggest that the allocation of fertilizer and herbicide inputs by farmers, which are complementary inputs to improved seed, are driven by their perceptions (wrong or right) instead of the true genetic qualities of cassava varieties they planted. This type of input use behavior may introduce inefficiency in the production process. This calls into question whether existing packaging and recommendations for complimentary agricultural inputs and other productivity-enhancing inputs can improve the technical efficiency of smallholders in the presence of pervasive input market imperfections (Abay et al., 2018). Misperception and input use behavior. Note. Standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The additional controls in the conditional regressions include household and plot level characteristics listed in Table 1.

Misperception and productivity

In this section, we explore the inferential implication of misperception focusing on land productivity. Following our analytical framework, we start our analysis by carrying out a hypothesis test to identify the relevant non-differential misperception assumption for yield as a first step model specification test. Table 6 reports the results of our non-parametric and parametric tests. In the first column of Table 6, the unconditional non-differential misperception assumption is rejected, implying that conditional on , is informative about the conditional mean of . We then probe the non-differential misperception assumption conditioning on fertilizer and herbicide inputs, both at the extensive and intensive margins in the second and third column of Table 6, respectively. After conditioning on , is not informative about the conditional mean of yield. In Panel B, we provide similar test results under parametric specifications. In the first column, without controlling for , the coefficient associated with is highly significant, implying conditional on , provides relevant information about the conditional mean of yield. In the second column, where we control for input use at the extensive margin, the coefficient associated with remains significant but the effect size declines considerably. However, in the third column, where we control for input use at the intensive margin, the coefficient associated with becomes insignificant. These test results imply the reduced form biophysical relationship between adoption of improved cassava varieties and yield can be estimated consistently using after conditioning on .30

Table 6

Non-parametric and parametric test results for yield.

Panel A: Non-parametric test results
	E(Y\|TG,TR)=E(Y\|TG)	E(Y\|TG,TR,X)=E(Y\|TG,X)
	Unconditional	X (dummy)	X (continuous)
p(CvM < CvM*)	0.000	0.444	0.476
p(KS < KS∗)	0.000	0.172	0.184

(# rep)bootstrap	500	500	500

Panel B: Parametric test results

TG	0.292***	0.286***	0.288***
	(0.031)	(0.031)	(0.032)
TR	0.098***	0.061**	0.035
	(0.031)	(0.031)	(0.031)

No. observation	3933	3933	3933

Following the above test results, we then examine the inferential implication of misperception focusing on land productivity. As discussed in Section 2.4, in the presence of variety misperception, the traditional regression specifications via or may generate biased parameter estimates. In an attempt to highlight the inferential challenges of variety misperception, we estimate the reduced form biophysical relationship between improved genetics and yield with and without accounting for farmers’ ex-post input choices (). Our estimation results are reported in Panel A and B of Table 7. In each Panel, the first and second columns report unconditional OLS and fixed effect estimates. We then report both OLS and fixed effect estimates controlling for farmers ex-post input choices () in the third and fourth columns, respectively. In the final two columns, we report OLS and fixed effect estimates controlling for and additional farmer and plot level characteristics. As shown in Panel B of Table 7, the size and sign of the expected bias due to variety misclassification is inferred by comparing the estimated coefficients associated with and (i.e., ).

Table 7

Misperceptions and cassava yield.

Panel A: Using farmer reports (TR)
	OLS	FE	OLS	FE	OLS	FE
γˆR	0.165***	0.287***	0.096**	0.207***	0.096***	0.204***
	(0.031)	(0.069)	(0.029)	(0.065)	(0.031)	(0.065)

Panel B: Using the genetic test results (TG)

γˆG	0.319***	0.399***	0.294***	0.389***	0.294***	0.388***
	(0.031)	(0.049)	(0.03)	(0.048)	(0.03)	(0.048)

γˆG−γˆR	0.154***	0.112**	0.197***	0.182***	0.198***	0.184***
	(0.36)	(0.046)	(0.036)	(0.051)	(0.048)	(0.057)

Inputs (Kg/ha)	No	No	Yes	Yes	Yes	Yes
Other controls	No	No	No	No	Yes	Yes
No. observation	3933	3933	3933	3933	3933	3933

Note. Standard errors clustered at enumeration area level are reported in parentheses. For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. All our estimations control for those household and plot characteristics listed in Table 1. * , ** , *** .

Non-parametric and parametric test results for yield. Note. For parametric tests, standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The dependent variable is log-transformed cassava yield. Consistent with our analytical predictions, the naïve regression specifications based on generate parameter estimates that are biased towards zero compared to estimates based on . The parameter equality test reported in Panel B of Table 7 (i.e., ) further confirms our prediction as the difference between the two conditional means is statistically significant.31 Misperceptions and cassava yield. Note. Standard errors clustered at enumeration area level are reported in parentheses. For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. All our estimations control for those household and plot characteristics listed in Table 1. * , ** , *** . Next, we consider whether variety misperception induced input crowding-in and crowding-out behavior has any bearing on productivity. A central question related to this is how different the overall returns to improved genetics are with and without variety misperception. In Table 8, we report estimates from our regression specifications that explicitly quantify the correlation between the different forms of cassava variety perceptions and land productivity (i.e., cassava yield). Based on the estimates reported in Table 5, farmers’ input allocation decision on improved varieties and unimproved varieties that are incorrectly perceived as improved is statistically identical. Similarly, farmers’ input allocation decision on unimproved varieties and improved varieties that are incorrectly perceived as unimproved is also statistically identical. Estimates reported in the first four columns of Table 8 suggest that while input allocations are more responsive to farmer perceptions (right or wrong), yield responses to these input adjustments depend on the true genetic quality of the cassava varieties grown by farmers. For instance, the returns associated with cassava varieties that are incorrectly perceived as unimproved is lower than that of varieties that are correctly perceived as improved (i.e., ) but higher than the returns associated with varieties that are correctly perceived as unimproved (i.e., ). Moreover, the returns associated with improved varieties that are incorrectly perceived as unimproved is still higher than that of unimproved varieties that are incorrectly perceived as improved (i.e., )). In fact, conditional on observed input choices, farmer perceptions become irrelevant as the conditional mean of only varies with the genetic quality of the cassava varieties grown by farmers (i.e., whether it is improved or not). In particular, after conditioning on , we consistently fail to reject the null of and .

Table 8

Misperceptions and input productivity.

Dependent variable: Log cassava yield
	OLS	FE	OLS	FE	OLS	FE
Correct improved (V0)	0.375***	0.570***	0.301***	0.492***	0.169***	0.332***
	(0.041)	(0.07)	(0.039)	(0.067)	(0.049)	(0.085)
False positives (V1)	0.028	0.189**	−0.018	0121	−0.089	0.012
	(0.054)	(0.083)	(0.057)	(0.082)	(0.076)	(0.110)
False negatives (V2)	0.247***	0.348***	0.254***	0.363***	0.227***	0.394***
	(0.044)	(0.066)	(0.042)	(0.062)	(0.050)	(0.073)
Fertilizer (Kg/ha)			0.07***	0.048***	0.032**	0.031
			(0.006)	(0.012)	(0.014)	(0.021)
Herbicide (lit/ha)			0.03***	0.064***	−0.006	0.059**
			(0.009)	(0.015)	(0.017)	(0.028)
Correct improved × Fertilizer					0.056***	0.044*
					(0.017)	(0.026)
False positives × Fertilizer					0.038*	−0.001
					(0.022)	(0.033)
False negatives × Fertilizer					0.013	0.007
					(0.019)	(0.024)
Correct improved × Herbicide					0.064***	0.043
					(0.022)	(0.036)
False positives × Herbicide					0.039	0.064
					(0.035)	(0.047)
False negatives × Herbicide					0.015	−0.040
					(0.0245)	(0.037)

E(Y\|V0)=E(Y\|V1)(p- value)	0.000	0.000	0.000	0.000	0.000	0.001
E(Y\|V0)=E(Y\|V2)(p- value)	0.000	0.006	0.137	0.083	0.216	0.475
E(Y\|V1)=E(Y\|V2)(p- value)	0.000	0.095	0.000	0.011	0.000	0.000

Inputs (X)	No	No	Yes	Yes	Yes	Yes
Other controls	No	No	Yes	Yes	Yes	Yes
No. observation	3933	3933	3933	3933	3933	3933

Note. Standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . The dependent variable is log-transformed cassava yield. Base category: Correctly identified local varieties . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The additional controls include household and plot characteristics listed in Table 1.

To understand the mechanisms through which variety misperception may affect farmers actual production efficiency, we next explore heterogeneity in the yield responses to inputs by interacting farmers observed fertilizer and herbicide input allocations with the different forms of variety perceptions. We do so since the different forms of variety perceptions affect the allocation of fertilizer and herbicide inputs, which are complementary inputs to improved seed. In particular, in the presence of variety misperception, input allocation is expected to be based on , which is different from an allocation based on . Hence, farmers holding false positive and negative variety perceptions base their input allocation decisions on the wrong variety type, which is likely to be inefficient. Our estimates reported in the last two columns of Table 8 clearly show that the association between input use and yield is weaker in the presence of variety misperception in the sense that the yield response of fertilizer and herbicide inputs is higher when there is no misperception. For instance, OLS estimates reported in the fifth column of Table 8 show that fertilizer input is yield-enhancing even when farmers hold false positive variety perception but not at the rate that would have been achieved with correct variety perception (i.e., while the interaction term between and fertilizer input is 0.056, the interaction term between and fertilizer is only 0.038.) In addition, while the interaction term between and fertilizer input is statistically significant both in the OLS and FE specifications, the interaction term between and fertilizer input become statistically insignificant in the FE specification. Misperceptions and input productivity. Note. Standard errors clustered at enumeration area level are reported in parentheses. * , ** , *** . The dependent variable is log-transformed cassava yield. Base category: Correctly identified local varieties . For input use at the intensive margin, we employ inverse hyperbolic sine (IHS) transformation to keep observations with zero input use. The additional controls include household and plot characteristics listed in Table 1. The results reported in Table 5, Table 7, Table 8 also (indirectly) highlight the opportunity cost of misperceptions. In particular, given our results, we can reasonably argue that while input allocations are more responsive to farmer perceptions (right or wrong), yield responses to these input adjustments depend on the true genetic quality of the cassava varieties grown by farmers. Intuitively, this implies that variety misperception may introduce inefficiencies in the production process by distorting the allocation of complementary inputs to improved seed relative to their allocation with correct variety perception (i.e., irrespective of its optimality). For instance, farmers holding false positive variety perception could have reduced their input use without offsetting their yield by a significant margin had they acted upon perfect information about the genetic quality of the cassava varieties they planted (i.e., by crowding-out fertilizer and herbicide inputs). On the other hand, farmers holding false negative variety perception could have increased their cassava yield had they acted upon perfect information about the genetic quality of the cassava varieties they planted (i.e., by crowding-in more fertilizer). As such, rectifying misperception may induce efficiency-enhancing reallocation of fertilizer and herbicide inputs by farmers.32 When quality is difficult to observe, economic theory suggests that intervention such as regulatory systems that impose quality standards could play a vital role in improving overall market efficiency (Maredia et al., 2019, Gilligan and Karachiwalla, 2019). For instance, Gilligan and Karachiwalla (2019) show that an input assurance scheme in Uganda influences farmers’ beliefs regarding the quality of inputs (i.e., herbicide and hybrid maize seeds) as well as their adoption patterns. As such, addressing agricultural input and seed market frictions may improve farmers’ investment choices and productivity outcomes.

Conclusions

This paper illustrates, both analytically and empirically, the inferential and behavioral implications of misperceiving and misreporting crop varieties using a unique self-reported and genetic test based cassava variety identification data from Nigeria. By examining the conditional means of input use functions non-parametrically, we identify and decompose the origins of variety misclassification into misreporting and misperception. In particular, we show that crop variety misclassification in our data is mostly driven by misperceptions. A key implication of this result is that misperception reflects the underlying decision making process of farmers and hence should not be regarded as misreporting or survey error per se as it provides behaviorally relevant information about farmers’ input use and intensification choices. Our findings have important implications both in terms of improving statistical inference and understanding the potential costs of input market frictions. In terms of inference, we characterize the expected bias under misperception and misreporting and show the implications of treating misperception as misreporting and vice versa. We also show that crop variety misperception induces crowding-in(out) of fertilizer and herbicide inputs. However, our estimates on the yield responses of these inputs suggest that misperception induced input allocation may not necessarily be yield-enhancing. As such, our results underscore the potential costs of agricultural input and seed market imperfections and hence addressing agricultural input and seed market frictions can play an important role in improving farmers’ investment choices and productivity outcomes. In terms of future research agenda, several important extensions can be proposed on the basis of our results. First, understanding the existence and persistence of misperception and the role of potential remedies such as access to quality information, verification and regulatory mechanisms will be important to address agricultural input market frictions. For instance, Gilligan and Karachiwalla (2019) show that an input assurance scheme in Uganda changes farmers’ beliefs regarding the quality of inputs as well as their adoption patterns. Second, more accurate diagnosis of farmers input use and production choices will require improved measurement of crop variety traits, input and production data. This implies increased investments in improving data quality by national governments and their development partners. In the short term, such investments may focus on identifying and mainstreaming methodological innovations and traceability systems (e.g., DNA-fingerprinting for monitoring varietal adoption) into large scale national surveys.33 In terms of data collection efforts and strategies, we argue that the choice of the most appropriate variety data collection can be studied as an optimal survey design problem. Third, our results are only as credible as the key identification assumptions we use to derive our estimates. In particular, our main non-parametric tests for identifying the presence of misclassification are applied in the context of an exogenous regressor subject to non-differential misclassification. Hence our results might not hold if the misclassification process in the two proxies is differential. Therefore, it would be useful to explore approaches for identifying and decomposing misclassification into misreporting and misperception in the presence of differential (i.e., endogenous) misclassification. Our estimates can also be sensitive to possible nonclassical measurement error in input and production data (Abay et al., 2019). In particular, since we relied on self-reported input and production data, our estimates can be biased in the presence of correlated measurement error in crop variety, input and production data. As such, combining innovative data collection approaches in research designs that allow for causal inference can help confirm our findings and provide additional insights on the inferential and behavioral implication of misperception.

CRediT authorship contribution statement

Tesfamicheal Wossen: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. Kibrom A. Abay: Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review & editing. Tahirou Abdoulaye: Conceptualization, Funding acquisition, Project administration, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

3 in total

1. Tracking crop varieties using genotyping-by-sequencing markers: a case study using cassava (Manihot esculenta Crantz).

Authors: Ismail Y Rabbi; Peter A Kulakow; Joseph A Manu-Aduening; Ansong A Dankyi; James Y Asibuo; Elizabeth Y Parkes; Tahirou Abdoulaye; Gezahegn Girma; Melaku A Gedil; Punna Ramu; Byron Reyes; Mywish K Maredia
Journal: BMC Genet Date: 2015-09-23 Impact factor: 2.797

2. Misidentification by farmers of the crop varieties they grow: Lessons from DNA fingerprinting of wheat in Ethiopia.

Authors: Moti Jaleta; Kindie Tesfaye; Andrzej Kilian; Chilot Yirga; Endeshaw Habte; Habekiristos Beyene; Bekele Abeyo; Ayele Badebo; Olaf Erenstein
Journal: PLoS One Date: 2020-07-07 Impact factor: 3.240

3. VARIETAL IDENTIFICATION IN HOUSEHOLD SURVEYS: RESULTS FROM THREE HOUSEHOLD-BASED METHODS AGAINST THE BENCHMARK OF DNA FINGERPRINTING IN SOUTHERN ETHIOPIA.

Authors: FrÉdÉric Kosmowski; Abiyot Aragaw; Andrzej Kilian; Alemayehu Ambel; John Ilukor; Biratu Yigezu; James Stevenson
Journal: Exp Agric Date: 2018-02-20 Impact factor: 2.118

3 in total