Literature DB >> 31301144

Measurement accuracy and uncertainty in plant biomechanics.

Nathanael Nelson¹, Christopher J Stubbs², Ryan Larson¹, Douglas D Cook¹.

Abstract

All scientific measurements are affected to some degree by both systematic and random errors. The quantification of these errors supports correct interpretation of data, thus supporting scientific progress. Absence of information regarding reliability and accuracy can slow scientific progress, and can lead to a reproducibility crisis. Here we consider both measurement theory and plant biomechanics literature. Drawing from measurement theory literature, we review techniques for assessing both the accuracy and uncertainty of a measurement process. In our survey of plant biomechanics literature, we found that direct assessment of measurement accuracy and uncertainty is not yet common. The advantages and disadvantages of efforts to quantify measurement accuracy and uncertainty are discussed. We conclude with recommended best practices for improving the scientific rigor in plant biomechanics through attention to the issues of measurement accuracy and uncertainty.

Entities: Chemical Disease Species

Keywords: Best practices; error; measurement; repeatability; uncertainty; validation

Year: 2019 PMID： 31301144 PMCID： PMC6650135 DOI： 10.1093/jxb/erz279

Source DB: PubMed Journal: J Exp Bot ISSN： 0022-0957 Impact factor: 6.992

Introduction

Although it seems paradoxical, science is based upon both skepticism and trust. As scientists, we are keenly skeptical, but can be convinced by data that are deemed to be trustworthy. In 2016, a Nature survey revealed that 90% of scientists felt that there existed either a slight or significant reproducibility crisis (Baker, 2015); that is, scientists cannot recreate another’s experiment or results. Both engineering and biology were identified as troubled areas, with only 24% of respondents in physics and engineering reporting that they had taken steps to improve reproducibility in their lab. The scientists involved in this survey were also asked to provide suggestions for increasing scientific reproducibility. The most common suggestion was to develop more robust experimental designs. The Nature study was in part a reaction to the reproducibility crisis in the fields of psychology and biomedicine (Ioannidis, 2005; Collins and Tabak, 2014; Baker, 2015; Bustin and Nolan, 2016). As a relatively small and developing field, plant biomechanics has a unique opportunity to act more quickly than older, more developed fields. A concerted effort to increase scientific rigor will have many long-term benefits (Huth, 1987), allowing our field to progress rapidly. In the broadest terms, reproducible research can be conceptualized as arising from two factors: reliable methods and a thorough documentation of these methods (Goodman ). The current review focuses on methods for increasing the reliability of mechanical measurements and also touches on the documentation of measurement methods. It does not delve into the ways in which data can be misused, as this has been reported in a number of other previous studies, both in the popular press (Randall, 2018) and in the scientific literature (Kerr, 1998; Head ). Although this review focuses on mechanical measurements, the principles and techniques described are readily applicable to other types of measurements. In many fields, measurement standards and well-defined best practices help ensure that measurement results are accurate and reliable (Boone ; Atkinson and Nevill, 1998; Van Gheluwe ; Weir, 2005; Bartlett and Frost, 2008). Indeed, the purpose of measurement standards and best practices is to establish trust between the generators of information and the users of information (Taylor and Kuyatt, 1994). In fields such as physics, chemistry, and engineering, specimens are typically fabricated according to well-defined specifications and standards. Several standards organizations exist to curate these standards [e.g. ASTM International, the National Institute of Standards and Technology (NIST), the International Organization for Standardization (ISO), and the International Committee for Weights and Measures (CIPM)]. Obtaining accurate, reliable data from mechanical measurements of plant specimens is fraught with challenges (Huth, 1987). The irregular geometry of plant specimens cannot be fabricated in a controlled fashion (Sacks and Sun, 2003; Lim ). Also, even genetically identical individuals that are maintained in seemingly identical environments will eventually exhibit unique differences due to the accumulated influence of minor environmental differences (Johnson ; Allard and Bradshaw, 1964). These are just two of many challenges in this area. Here we review several methods for quantifying the accuracy and reliability of scientific measurements. The purposes of this article are (i) to provide a brief review of methods for quantifying and reporting measurement error and uncertainty in the field of plant biomechanics; and (ii) to survey the field of plant biomechanics research in an attempt to describe the current state of our measurement and reporting practices. Although much of the article focuses on mechanical tissue measurements, the principles described herein are applicable to many other types of measurements in plant biomechanics. The review consists of four major sections: measurement theory; case studies; an examination of current practices in the plant biomechanics literature; and, finally, conclusions and recommendations.

Measurement theory

Error and uncertainty

Researchers collect data in order to test a hypothesis. However, all data are subject to both measurement error and uncertainty (Beckwith ,). Error is the difference between a measured value and the specimen’s true value (Croarkin ). Because all measurement methods are imperfect, the true value can never be perfectly measured (Beckwith ). Error is, therefore, an abstract idea—an extremely useful concept, but inherently unknowable. Although we cannot measure error directly, we can make informed estimates regarding error. The field of measurement theory was developed to provide methods for dealing with the uncertainty surrounding measurement error. Errors arise from both random and systematic effects (Beckwith ). Random effects are often due to a multitude of small factors that cannot be experimentally controlled and are difficult or impossible to identify and eliminate. Systematic effects are typically due to some kind of measurement process error such as a poorly calibrated measurement device or a sample preparation procedure that induces a systematic error. The term ‘bias error’ is sometimes used to refer to systematic effects because most systematic effects introduce some amount of bias into the data (Sanderson ). However, in certain circumstances, systematic errors can be introduced that may have no detectable bias; instead they may tend to increase the variance of collected data (Bland and Altman, 1996). Briefly stated, systematic effects can be attributed to some cause while random effects cannot.

Quantifying random effects: measurement repeatability

Measurement repeatability is one way of quantifying measurement uncertainty. As will be shown in the ‘Case studies’ section, it can also provide valuable insights into the measurement process itself. Measurement repeatability has been defined as the degree of agreement between the results of successive measurements carried out under identical conditions (Taylor and Kuyatt, 1994). This concept is distinct from reproducibility: the ability to reproduce another scientist’s method or results (Taylor and Kuyatt, 1994). Repeatability provides an estimate of the amount of uncertainty that should be attributed to the testing process itself. In industrial practice, an assessment of measurement repeatability is often carried out on a number of specimens, with the assumption that the specimens are, for all intents and purposes, identical to each other. This approach is referred to as between-specimen repeatability. However, where variation between individual specimens is significant (as in plant biomechanics), within-specimen repeatability is more relevant. Within-specimen repeatability is the degree of agreement between repeated measurements that are performed with the same specimen (Atkinson, 1995; Gobbe ; D’Onofrio ; Al-Zube ). However, within-specimen repeatability cannot be assessed if the test causes damage to the specimen. In these cases, inter-specimen repeatability must be used, with the specimens chosen to be as similar to one another as possible. To obtain a single repeatability value, one specimen is tested multiple times in relatively quick succession. When this is done, we assume that nothing about the process or specimen changes between testing cycles. For most mechanical tests, this requires the application of relatively low strain values so as to prevent tissue damage. The standard deviation of the resulting data is used to obtain a single estimate of repeatability for the test method. Repeatability is often reported as a percentage of the mean, also known as the coefficient of variation (the standard deviation divided by the mean). Because the repeatability can vary from specimen to specimen, this process can be repeated a number of times to assess the variation in repeatability values between samples. The set of repeatability values may then be averaged to obtain an aggregate repeatability value. Mechanical measurements of plant tissues typically require the combination of multiple measurements using some kind of model. For example, the measurement of Young’s modulus using a compression test requires the measurement of the cross-sectional area, the applied force, and the resulting deformation of the specimen (Young and Budynas, 2002; Gibson, 2011). Repeatability values can be obtained for each step of the process independently, and combined using the law of propagation of uncertainty (Taylor and Kuyatt, 1994). This approach is advantageous because it allows the scientist to assess the repeatability of each step of the process independently. A disadvantage of this approach is that the law of propagation of uncertainty tends to exaggerate the predicted overall uncertainty (Hall, 2004). An alternative approach is to follow the measurement process from start to finish a number of times, each of which involves the collection of each individual measurement. All measurements for a single specimen are then combined in the normal fashion. The overall repeatability is then obtained by examining the repeatability of the final product of the measurement process. The repeatability of individual steps in the process can also be assessed from this data set.

Interpreting repeatability values

Repeatability values provide information about the reliability and consistency of a measurement process. Because repeatability is highly context specific, we have chosen not to list (arbitrary) limits or ranges for ‘acceptable’ and ‘unacceptable’ values. However, several in-context repeatability values are provided in the ‘Case studies’ section later in this review. Under ideal conditions, individual repeatability values will be relatively small and, as a group, repeatability values will be consistent between specimens (i.e. the standard deviation of repeatability values will be small). High repeatability values or a high variation between samples are indicators that the measurement process is adversely affected by unknown factors. For example, sample preparation processes often require subjective judgement and manual preparation. This can result in a wide variation of measured or calculated values between samples. An assessment of each step of the measurement process as described in the previous section is an extremely beneficial means of identifying the largest contributors to measurement uncertainty. As will be shown in the ‘Case studies’ section, an examination of these factors often leads to new insights for improving the measurement process. However, repeatability values alone provide no information about the accuracy of the measurement process. In fact, it is entirely possible for a measurement process to have excellent repeatability while also being highly erroneous.

Accuracy and validation

Accuracy is the degree of agreement between a measured quantity and the actual (i.e. exact, but unknown) value (BIPM ). Validation is the process of providing objective data to confirm that accuracy is appropriate for the purposes of a given study. One way to estimate measurement accuracy is through independent replication of the experiment. This is typically not practical or cost-effective. Another approach is to use a validation technique referred to as triangulation.

Quantifying accuracy: validation studies

Validation by triangulation is performed by measuring the same quantity using two or more independent measurement methods (Lawlor ). The same set of specimens is used, and the average discrepancies between methods (if any) are used as an estimate of the overall error. Direct, quantitative validation can be achieved by showing that the data from two or more methods are statistically equivalent. The approach described above relies upon an assumption that different measurement methods are unlikely to be susceptible to the same sources and amounts of bias error. The statistical power is greater for paired testing, but this is not always possible. If the same set of specimens cannot be used for both tests, a single larger set of specimens can be randomly sorted into two groups. A less rigorous form of validation is obtained when data from two or more different tests are statistically different, but the absolute numeric difference between tests is judged to be of no practical significance. For example, one type of test may be suspected of producing a slight bias in a predictable direction. Validation efforts are informative, regardless of the outcome. If validation is achieved, we have higher confidence in the measured values as well as an estimate of the error. On the other hand, if validation is not achieved, we are often led to insights that would not otherwise have been possible. Consider a situation in which two tests will be used to (hopefully) obtain validation. The researcher begins the study with an expectation that these two tests will produce results that are in agreement with each other. However, a discrepancy between the results of the two tests, along with further experimentation, may lead the researcher to the conclusion that the hierarchical nature of plant tissues causes the tissue to respond differently to these two different tests (Bidhendi and Geitmann, 2019). The purpose of validation is to confirm that measurement processes are acceptably accurate, or to obtain an estimate of the accuracy of these methods. In many fields, researchers achieve validation through inter-study triangulation: by comparing new results with those previously reported in the literature. This approach can be problematic when dealing with biological specimens. This is because biological tissues are typically influenced by three major categories of factors: measurement techniques (including specimen preparation), genotype, and environment. Each of these categories has the potential to dramatically influence measurement results. A matching between genotype, method, and environment between studies can allow for inter-study triangulation, but, at this time, it is relatively rare to find another study with matching genotype and environmental data. When the goal is to assess the accuracy of a measurement process, major confounding factors such as genotype and environment must be controlled in some way. The most reliable approach is for each researcher to perform his/her own validation. Intra-study triangulation allows for a direct, quantitative assessment of measurement accuracy. This information is extremely valuable, and the attainment of this data need not be overly onerous. The great advantage of intra-study triangulation is the elimination of genetic and environmental factors. With these factors controlled, only the measurement processes themselves will influence the measured results. Ideally, triangulation is performed before the primary experiment is conducted and need not consist of a full replication of the entire study. Instead, a sample of specimens can be tested using two or more methods to obtain validation, and then the most advantageous measurement method can be used for the majority of data collection.

Case studies

In this section, a number of case studies from the authors’ previous research are reviewed in order to give concrete examples of the concepts introduced above. We also include examples of mistakes that were made, but not included in the associated scientific publications. We hope that these examples will be helpful in both illustrating the use of these techniques and in aiding future researchers to avoid similar mistakes.

Repeatability

The assessment of repeatability was instrumental in revealing measurement limitations in our research group’s recent study on maize tissues (Al-Zube ). The purpose of that study was to quantify the longitudinal Young’s modulus of maize rind tissues using a compression testing approach. Although best practices from ASTM standards were used (spherical platens and local measurement of strain, ASTM-E9, 2009; ASTM-D695, 2015), the initial repeatability of this test method was ±24%. Based upon the authors’ previous experience with this equipment and measurement techniques, this repeatability value suggested an error in the measurement process. We began by re-examining the assumptions behind the test methodology. Two factors were identified as potential contributors. First, our initial testing method was based on the assumption that stress was evenly distributed within the specimen, but this assumption had not been confirmed. Secondly, if the cause was uneven strain distribution, the position of the specimen relative to the axis of rotation of the spherical platens could potentially exacerbate this problem. Two small repeatability experiments were designed to test these hypotheses. The specimen position was found to have little effect on repeatability values, but the assumption of uniform strain was found to be erroneous. As shown in Fig. 1, the distribution of values obtained for a single specimen decreased dramatically as the number of strain measurements was increased (Al-Zube ). By taking multiple measurements of surface strain, average strain was assessed more accurately, leading to better repeatability values and a much improved test method. The final test method had an average repeatability value of ±5% which represents a nearly 5-fold improvement over the initial repeatability value.

Fig. 1.

The distribution of single-specimen modulus of elasticity (E) values obtained in a series of repeatability tests. Repeatability is represented in this figure by distribution widths. From Al-Zube .

The distribution of single-specimen modulus of elasticity (E) values obtained in a series of repeatability tests. Repeatability is represented in this figure by distribution widths. From Al-Zube . Another example of repeatability testing is available from a follow-up study (also from our research group). In this study, we quantified the repeatability of four test types. We were surprised to learn that the three-point bending method had the lowest (best) average repeatability at 1.5%. The repeatability values for tension and compression test were found to be 1.9% and 3.8–3.9%, respectively (Al-Zube ). In the authors’ experience, repeatability values of biological tissues >5% may be influenced by a methodological error, but repeatability values <5% are most likely to be due to random error. This threshold is not at all concrete, but rather depends upon the equipment, type of measurement, specimen type, etc. It is worth noting here that one common pitfall when computing repeatability is to use a single input measurement repeatedly in a series of tests employing a single specimen. For example, when performing a compression test, the cross-sectional area of a test specimen is required to obtain the Young’s modulus. It is tempting to collect this measurement once, and then perform five compression tests using the specimen. Indeed, we initially overlooked this issue. When the same cross-sectional area is used to compute the Young’s modulus for each of these tests, the resulting repeatability value reflects only the repeatability of the compression test data—not the overall uncertainty of the entire measurement process. This approach is not necessarily incorrect, but any reported repeatability measurements should clearly specify the scope of the repeatability measurement process.

Triangulation

Intra-study triangulation was used in two closely related studies from our research group (Robertson et al., 2014, 2015) to reveal weaknesses in bending test methods. In these studies, specimens were subjected to two types of three-point bending tests: a short-span test which has been used in numerous studies (Jenkins, 1930; Hondroyianni ; Tongdi ; Hu ; Gomez et al., 2017, 2018), and a revised three-point bending test protocol that used a longer test span (Robertson ). Short-span tests were shown to induce premature failure of the stalk, producing bending test results that ranged from 1/2 to 1/4 of the values obtained when using the long-span test protocol. By using two types of test along with engineering analysis, we were able to demonstrate that both the span and the placement of the loading anvil contributed to the erroneous results obtained under the short-span method. Another example of discrepancies between test methods is available in a study that examined differences in the Young’s modulus of wheat, barley, and maize tissues (Wright ). This study was notable in that four different test methods were used: three-point bending, four-point bending, compression, and tension tests. The bending test results were in relatively close agreement, but compression and tension test results were quite different. Figure 2 depicts representative box plots based on the data reported by Wright . On average, the tension test results were 2.2 times higher than the bending results, and 6.3 times higher than the compression test results. Compression results were approximately half the value of bending results. The striking disagreement between test results indicates potential problems with the testing methods.

Fig. 2.

Data from Wright , depicting differing modulus of elasticity values measured for barley and wheat, as assessed using four different methods. Bars represent mean values. As per the data in Wright et al., whiskers represent the standard error of the mean. A similar study was conducted by Al-Zube . This study also used three-point bending, two types of compression test, and a tension test to assess the Young’s modulus of maize. Although there were discrepancies between the results from these testing methods, the degrees of discrepancy were relatively moderate. For example, the largest discrepancy was between compression and bending results: 12.87 GPa and 10.1 GPa (a 27% discrepancy). The average value for the Young’s modulus of maize rind reported by Al-Zube et al. was 11.4 GPa, but the average value reported by Wright et al. was 0.38 GPa. This represents a discrepancy of >2900% (see Fig. 3). Although this discrepancy is notable, it is difficult to determine how much of the discrepancy should be attributed to testing method, genotype, or environment. One difference between these studies is that Al-Zube et al. provided intra-study validation, whereas the measurements in Wright et al. were inconsistent.

Fig. 3.

Reported modulus of elasticity values for maize rind tissues. Both charts use the same scaling to facilitate comparisons. Left: data from four varieties of maize as reported in Wright . In this study, maize tissues were measured only in compression. The solid bar represents these reported data while the dashed bars represent estimates that were imputed based on the data trends shown in Fig. 2. Right: data from five varieties of maize reported in Al-Zube . Another instance of intra-study validation involved the measurement of turgor pressure. Tomato cell turgor pressure was measured using two methods: a pressure probe and force-sensing micromanipulation (Wang ). The reported values were found to be 3.3 bar and 3.2 bar, respectively. Because these values were statistically insignificant, the accuracy of each method could be reported as better than ±5%.

Literature meta-analysis

In preparing this article, the authors carefully examined 40 papers from the plant biomechanics literature that reported the measurement of mechanical tissue properties. These papers are indicated in the reference list by a dagger symbol. As a disclaimer, several of these papers (Robertson et al., 2014, 2015; Al-Zube et al., 2017, 2018) originated from the last author’s research group (The Crop Biomechanics Laboratory). Two of these papers (5% of the sample) provided measurement repeatability values for their measurement process (Al-Zube et al., 2017, 2018). Validation was addressed in some form in 15 of 40 studies (35%) (Mattheck, 1995; Moulia and Fournier, 1997; Henry and Thomas, 2002; Ryden ; Wright ; Green ; Wang ; Onoda ; Masselter ; Milani ; Sharma ; Robertson et al., 2014, 2015; Leblicq ; Al-Zube ) Among these 15 studies, intra-study validation was discussed in four studies (Green, 2006; Wang ; Sharma, 2013; Al-Zube ). Based on this survey of the literature, it seems that a minority of studies from the plant biomechanics literature report on measurement repeatability or measurement validation. Also, the authors’ research is not an exception to this trend: the majority of our papers have not included repeatability or validation data. However, it is certainly not the case that the plant biomechanics community has simply ignored the issue of validation altogether. In reviewing the literature, we found that validation was most often addressed through two informal approaches: inter-study comparisons and justification by previously reported method.

Inter-study comparisons

Inter-study triangulation is an important part of science. The purpose of these comparisons is to place each study in the broader scientific context, providing comparisons between data, trends, results, and conclusions of other studies (Glass, 1976). Inter-study comparisons are regularly performed in the plant biomechanics literature. For example, Kokubo studied brittle barley culms and used existing research on non-brittle strains of barley as well as maize as points of comparison for their results. Leblicq used bending tests on wheat and barley stems, comparing the results with other values found in the literature. Many other examples can be found in the literature (Rüggeberg ; Masselter ; Al-Zube et al., 2017, 2018). Although inter-study triangulation is useful for many purposes, it should not be misinterpreted as measurement validation.

Justification by prior method

Another common practice is the justification of a measurement technique via citation of a prior study that used the same method. This argument is based upon two important assumptions: (i) the accuracy of the cited method has been validated previously; and (ii) the method as described by the original authors has been followed precisely in subsequent studies. As an exercise, we examined a set of 12 studies that used justification by method. For each study, we followed the citation trail back into the literature to determine if any prior study had reported direct, quantitative validation. None of the selected citations could be traced back to a fully validated measurement process. Based on our experience and observations, it seems that published studies may sometimes be misinterpreted as ‘validation by peer review’. Subsequent researchers may therefore place undue trust in these prior studies, which causes the methods to be used repeatedly, but each time without validation. This pattern has been observed by the authors in the case of flawed bending methods (Evans ; Robertson et al., 2014, 2015), omission of test set-up justification (Crook and Ennos, 1996; Abasolo ), and misapplication of measurement standards (Ampofo ). Similar patterns have been observed in other fields of biomechanics (Cook, 2009; Alipour ). Publication, even in a high-quality journal, does not guarantee freedom from measurement error.

Discussion

A cognitive bias towards trusting data?

Data are at the heart of the scientific endeavor. It is the raw material for new scientific insights and testing scientific hypotheses. However, even data that are carefully collected by experienced researchers can be erroneous. Based on the authors’ experience and review of the literature, the community regularly asks important questions such as ‘Do the methods seem appropriate for the stated purpose?’ ‘Have these methods been used before in the literature?’ ‘Do the results agree with some other data in the literature?’ There exists an opportunity to improve the review process by asking additional questions such as, ‘Is there direct, quantitative evidence that these measurements are accurate?’ and ‘Is there quantitative evidence that the measurement uncertainty has been assessed?’ These questions have the potential to bring a healthy skepticism to the evaluation of data, a skepticism that acknowledges the difficulty of collecting accurate data in the field of plant biomechanics. We also wish to address a counter-argument. Namely, is it necessary to rigorously validate every type of measurement? In our opinion, skepticism should be inversely proportional to the proven reliability of the measurement process. For example, it is probably not necessary to perform intra-study validation for an electronic mass balance purchased from a reliable manufacturer. This is because the mass balance represents a mature technology, and research-grade devices are typically certified to provide measurements with stated values of accuracy and uncertainty. In contrast, some measurements processes performed in the field of plant biomechanics represent emerging methods and technologies that have yet to be assessed in a similar fashion. An argument could also be made that an assessment of measurement accuracy and uncertainty is not necessary when performing comparative studies. For example, comparisons between mutant and wild-type varieties are common in the plant biomechanics literature (Ryden ; Paul-Victor and Rowe, 2011; Park and Cosgrove, 2012). Studies of this kind are primarily interested in differences between the two varieties rather than precise physical values. However, if a measurement technique is inaccurate, there is the possibility that the intended quantity and the measured quantity differ in important ways. For example, if an effort to assess the shear modulus of some tissue is in error by an unknown amount, should this measurement be interpreted to represent the behavior of the actual shear modulus? Without a validated measurement of shear modulus, we have no way of knowing if an erroneous shear modulus measurement follows the same trends as the actual shear modulus; there are many layers of uncertainty surrounding an unvalidated measurement. Although these data may be intended for comparative purposes, they may be used by future researchers for non-comparative purposes (e.g. computational modeling). The future misinterpretation of data is problematic even if inaccuracies in measurement do not directly affect the accuracy of intended comparisons.

Why bother to assess repeatability and accuracy?

Some researchers may feel that the effort required to assess repeatability and validation outweighs the benefits. We frankly admit that these activities require additional effort. Here are several reasons why we think that the additional effort is justified: First, repeatability can often be accomplished with a small amount of additional experimentation. A reasonable estimate of repeatability can be obtained from 5–10 repeated tests on 5–10 samples. Secondly, repeatability data often reveal weaknesses in a measurement process. In addition, improvements to the repeatability of a measurement process will have a direct effect on the statistical power of the test (i.e. all other factors held constant, lowering random error will increases statistical power). Thirdly, reporting repeatability values is extremely useful for future researchers, both as a metric for judging which method to use in an upcoming experiment, and as a diagnostic tool to confirm that a previously described process has been correctly implemented. The assessment of measurement accuracy requires more investment as compared with the assessment of repeatability. However, there are distinct advantages to doing so. The primary reason to perform validation is to gain a quantitative understanding of the accuracy of one’s own measurement process. After all, if the measurement is erroneous, fallacious scientific conclusions may be reached. Secondly, even when validation is not achieved, the results can provide valuable insights. Thirdly, validation of this type accelerates research progress by contributing to a scientific literature that is based upon quantified accuracy instead of unknown amounts of accuracy in each study. This is important because incorrect results, once published, can be difficult to overturn (Lehrer, 2010). Finally, as validation and uncertainty quantification are practised, they quickly become more natural and less time-consuming. Thus, the effort to incorporate these techniques has a cost which diminishes over time, but the advantages to this approach remain constant.

Reporting reproducibility, accuracy, and methods

It is commonly understood that a detailed description of one’s experimental methods is required for subsequent researchers to build on and perhaps replicate these studies (Baker, 2015). However, it has also been shown in numerous recent studies that scientific publications often do not contain the level of detail that would be required for actual replication (Cook, 2009; Alipour ; Baker, 2015). The plant biomechanics literature appears to be consistent with these trends from other fields. Specimen selection and preparation are typically well documented, but equipment information is sometimes incomplete. Based on our reading, finer details of the test procedure—such as are pre-conditioning, methods of fixation, applied strain rate, and load cycling—are sometimes not reported in sufficient detail. In addition, the results are sometimes reported only as derived quantities (mean, standard deviation, modulus, etc.) instead of as raw data. In previous years, page limits imposed by journals may have inadvertently deterred researchers from fully describing their measurement methods and results. But most journals now allow authors to upload supplementary data files and descriptions of methods. With this option, there exists a new opportunity for more complete explanations of measurement techniques, including raw measurement data, valuable tips and pitfalls that the authors discovered in the course of research, etc. Both authors and reviewers can help improve this aspect of plant biomechanics research by including/requesting this information as part of the peer-review process.

Conclusions

Authors and reviewers in the field of plant biomechanics can increasingly seek to apply principles of measurement theory in their work. In this review, we have discussed three best practices associated with data collection and reporting in plant biomechanics (i) the assessment of measurement repeatability; (ii) the use of intra-study triangulation to estimate the accuracy of measurement techniques; and (iii) detailed reporting of both measurement methods and the associated repeatability and triangulation data. Journal editors and reviewers can play a pivotal role by beginning to expect authors to use these best practices. Specifically, editors and reviewers can emphasize questions such as ‘Is there direct, quantitative evidence that the reported measurements are accurate?’, ‘Is there quantitative evidence that measurement repeatability has been assessed?’, and/or ‘Are the test methods described in sufficient detail for replication?’ The adoption of these practices has advantages for both individual scientists and the broader community. Specific advantages to the individual scientists include the following: (i) identification of flaws or limitations of a measurement processes; (ii) increased statistical power; and (iii) insurance against potentially erroneous conclusions that could be made when using unvalidated data These best practices also have benefits to the broader scientific community, including the following: (i) increased confidence in reported results; (ii) reported data can be used as quantitative benchmarks for future researchers; (iii) encouragement of future use or replication of reported methods; and (iv) accelerated research progress. As the field of plant biomechanics seeks to implement measurement best practices, the accuracy of and confidence in our scientific findings will increase. At the same time, studies that utilize these best practices will be more useful and informative to future researchers. Overall, the application of these methods will serve to accelerate research progress in the field of plant biomechanics Accuracy: the degree of agreement between a set of measurements and the actual value (i.e. average error) (BIPM ). Error: the difference between a single measured value and the specimen’s actual (i.e. exact, but unknown) value Croarkin . Repeatability: the degree of agreement between the results of successive measurements carried out under identical conditions Taylor and Kuyatt (1994). Triangulation: validation obtained by measuring the same quantity using two or more independent measurement methods. Validation is obtained when the results are sufficiently similar Lawlor . Validation: the process of providing objective data to confirm that accuracy is appropriate for the purposes of a given study (BIPM ).

42 in total

1. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants.

Authors: M J Sanderson; M F Wojciechowski; J M Hu; T S Khan; S G Brady
Journal: Mol Biol Evol Date: 2000-05 Impact factor: 16.240

2. Reliability and accuracy of biomechanical measurements of the lower extremities.

Authors: Bart Van Gheluwe; Kevin A Kirby; Philip Roosen; Robert D Phillips
Journal: J Am Podiatr Med Assoc Date: 2002-06

Review 3. Multiaxial mechanical behavior of biological materials.

Authors: Michael S Sacks; Wei Sun
Journal: Annu Rev Biomed Eng Date: 2003-04-18 Impact factor: 9.590

Review 4. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM.

Authors: Joseph P Weir
Journal: J Strength Cond Res Date: 2005-02 Impact factor: 3.775

5. Biomechanics of Wheat/Barley Straw and Corn Stover.

Authors: Christopher T Wright; Peter A Pryfogle; Nathan A Stevens; Eric D Steffler; J Richard Hess; Thomas H Ulrich
Journal: Appl Biochem Biotechnol Date: 2005 Impact factor: 2.926

6. HARKing: hypothesizing after the results are known.

Authors: N L Kerr
Journal: Pers Soc Psychol Rev Date: 1998

7. Micromechanics of plant tissues beyond the linear-elastic range.

Authors: Lothar Köhler; Hanns-Christof Spatz
Journal: Planta Date: 2002-02-06 Impact factor: 4.116

8. Tensile properties of Arabidopsis cell walls depend on both a xyloglucan cross-linked microfibrillar network and rhamnogalacturonan II-borate complexes.

Authors: Peter Ryden; Keiko Sugimoto-Shirasu; Andrew Charles Smith; Kim Findlay; Wolf-Dieter Reiter; Maureen Caroline McCann
Journal: Plant Physiol Date: 2003-05-22 Impact factor: 8.340

9. Why most published research findings are false.

Authors: John P A Ioannidis
Journal: PLoS Med Date: 2005-08-30 Impact factor: 11.613

10. Measurement repeatability of corneal aberrations.

Authors: Marine Gobbe; Michel Guillon; Cecile Maissa
Journal: J Refract Surg Date: 2002 Sep-Oct Impact factor: 3.573

5 in total

1. Integrated Puncture Score: force-displacement weighted rind penetration tests improve stalk lodging resistance estimations in maize.

Authors: Christopher J Stubbs; Christopher McMahan; Will Seegmiller; Douglas D Cook; Daniel J Robertson
Journal: Plant Methods Date: 2020-08-15 Impact factor: 4.993

2. The effect of plant weight on estimations of stalk lodging resistance.

Authors: Christopher J Stubbs; Yusuf A Oduntan; Tyrone R Keep; Scott D Noble; Daniel J Robertson
Journal: Plant Methods Date: 2020-09-21 Impact factor: 4.993

3. Plant biomechanics in the 21st century.

Authors: Anja Geitmann; Karl Niklas; Thomas Speck
Journal: J Exp Bot Date: 2019-07-23 Impact factor: 6.992

Review 4. Multiscale Mechanical Performance of Wood: From Nano- to Macro-Scale across Structure Hierarchy and Size Effects.

Authors: Yuri I Golovin; Alexander A Gusev; Dmitry Yu Golovin; Sergey M Matveev; Inna A Vasyukova
Journal: Nanomaterials (Basel) Date: 2022-03-29 Impact factor: 5.076

5. A practical approach example to measurement uncertainty: Evaluation of 26 immunoassay parameters.

Authors: Rabia Tan; Mustafa Yilmaz; Yusuf Kurtulmuş
Journal: Biochem Med (Zagreb) Date: 2022-08-05 Impact factor: 2.515