| Literature DB >> 33975646 |
Rory Wilson1, Anne-Laure Boulesteix2, Stefan Buchka3, Alexander Hapfelmeier4,5, Paul P Gardner6.
Abstract
Most research articles presenting new data analysis methods claim that "the new method performs better than existing methods," but the veracity of such statements is questionable. Our manuscript discusses and illustrates consequences of the optimistic bias occurring during the evaluation of novel data analysis methods, that is, all biases resulting from, for example, selection of datasets or competing methods, better ability to fix bugs in a preferred method, and selective reporting of method variants. We quantitatively investigate this bias using an example from epigenetic analysis: normalization methods for data generated by the Illumina HumanMethylation450K BeadChip microarray.Entities:
Keywords: Benchmarking; Illumina HumanMethylation450K BeadChip; Neutral comparison study; Normalization; Optimistic bias
Mesh:
Year: 2021 PMID: 33975646 PMCID: PMC8111726 DOI: 10.1186/s13059-021-02365-4
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1We examine the point in time where a method “A” exists in the literature, and a new method “B” is introduced. In the paper presenting method B, the authors compare it to method A: a “non-neutral comparison” (dark red color, see also Fig. 2), as method B’s authors may have some bias in presenting the results. Some time later, a paper introducing another method, “C”, is published. The authors compare C to existing methods A and B, which implies also a comparison of B to A, although this comparison is not the study’s focus. This latter study is assumed to be neutral (dark blue color, see also Fig. 2) with respect to A and B and is termed a “neutral benchmark study”
Fig. 2Comparisons between 450K data normalization methods: percentages of comparisons identifying the newer method as better than the older. The x-axis shows which two methods are being compared (see Additional files 1 and 2 for the references of the methods, abbreviations, and any aliases used in the paper). “Introducing” (i.e., “non-neutral”) comparisons (red dots) are those from publications introducing the new method. “Neutral” comparisons (blue dots) are from subsequent studies written by authors who developed neither of the methods being compared. Two analyses are presented. In the first analysis (light dots), the percentage is calculated over papers, i.e., where the overall rank from a paper is taken as the comparison. The percentage is thus either 0% or 100% for introducing comparisons, as for each pair there is only one introducing study (that in which the newer method is introduced). Papers were excluded from this analysis if individual components of the evaluation used only a subset of the methods examined in the paper (“partial substudies”). In the second analysis (dark dots), the percentage is calculated over substudies, i.e., individual comparisons within a paper. The “number of comparisons” refers to the number of papers (for the first analysis, which takes papers as units) or to the number of substudies (for the second analysis, which takes substudies as units)