Literature DB >> 32960878

Prevalence and Prevention of Reproducibility Deficiencies in Life Sciences Research: Large-Scale Meta-Analyses.

Nadine M Mansour^1,2, E Andrew Balas¹, Frances M Yang³, Marlo M Vernon⁴.

Abstract

BACKGROUND Studies have found that many published life sciences research results are irreproducible. Our goal was to provide comprehensive risk estimates of familiar reproducibility deficiencies to support quality improvement in research. MATERIAL AND METHODS Reports included were peer-reviewed, published between 1980 and 2016, and presented frequency data of basic biomedical research deficiencies. Manual and electronic literature searches were performed in seven bibliographic databases. For deficiency concepts with at least four frequency studies and with a sample size of at least 15 units in each, a meta-analysis was performed. RESULTS Overall, 68 publications met our inclusion criteria. The study identified several major groups of research quality defects: study design, cell lines, statistical analysis, and reporting. In the study design group of 3 deficiencies, missing power calculation was the most frequent (82.3% [95% Confidence Interval (CI): 69.9-94.6]). Among the 6 cell line deficiencies, mixed contamination was the most frequent (22.4% [95% CI: 10.4-34.3]). Among the 3 statistical analysis deficiencies, the use of chi-square test when expected cells frequency was <5 was the most prevalent (15.7% [95% CI: -3.2-34.7]). In the reporting group of 12 deficiencies, failure to state the number of tails was the most frequent (65% [95% CI: 39.3-90.8]). CONCLUSIONS The results of this study could serve as a general reference when consistently measurable sources of deficiencies need to be identified in research quality improvement.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 32960878 PMCID： PMC7519945 DOI： 10.12659/MSM.922016

Source DB: PubMed Journal: Med Sci Monit ISSN： 1234-1010

Background

Reproducibility is a crucial requirement of scientific validity. Lack of rigor, non-repeatable research, and quality defects are increasingly mentioned concerns. Much preclinical research may be irreproducible, wasting, by one estimate, billions of dollars in research dollars each year [1]. In some specific types of study (e.g., drug target identification and validation), the majority of published preclinical results could not be validated, implying poor quality research and wasted efforts to replicate [2,3]. Others pointed out that the majority of published biomedical research findings may be unreliable due to the use of invalid statistical methods [4]. The high failure rate of clinical trials is partly blamed on promising but unreliable results coming out of preclinical research [5]. Retractions of scientific papers have also increased 15-fold according to Thomson Reuters Web of Science. Between 2000 and 2010, a large percentage (73.5%) of retractions in medicine and science were withdrawn simply for deficiencies [6,7]. Analysis of 423 retracted articles showed that the most common causes of retractions were laboratory deficiencies (55.8%) and analytical deficiencies (18.9%) and other sources of irreproducibility (16.1%). Cell line contamination was a common cause for retraction in the past, whereas analytical deficiencies were found to be increasing in frequency [8]. The old adage “publish or perish” has elevated tension in the current era of limited funding. According to a study by Foster and colleagues, the majority of published biomedical research studies were based on a traditional model – studying existing known relationships in the biochemistry literature – as opposed to innovation – results that introduce novel relationships, as evidenced by scientific prizes [9]. With the increasing pressures of publications and grant attainment for academics globally, it is no wonder that inadvertent or careless deficiencies appear in scientific research. Additionally, one may also raise the question of economic resources and country income level in deficiency frequencies, especially at a time when the National Institutes of Health (NIH) is implementing policies to promote international biomedical research collaboration [10]. Despite the NIH taking notice that basic biomedical research is most susceptible to reproducibility concerns, the significance of quality defects is still underestimated by the research community [11]. Many researchers are in denial that these quality problems either do not exist or at least “not in my lab”. Meanwhile, the number of articles reporting the frequency of various research deficiencies is steadily increasing. Measurement of defects is integral to improving the quality of research in the life sciences. Identifying measurable defect frequencies can show measurement opportunities to assess progress of quality improvement and also can guide improvement initiatives by identifying the most frequent types of defects in the research enterprise. To address inaccurate perceptions and to orient improvement efforts, there is a need for risk or frequency estimates of deficiencies based on large and diverse deficiency frequency studies of life sciences research. The purpose of developing this series of meta-analyses was to assist basic scientists as readers and producers of research results by not only itemizing deficiencies responsible for non-reproducible results, but also by providing frequency estimates. In this study, measurements of defects were searched based on the availability necessary published data and also based on the repeated NIH calls to enhancing reproducibility and integrity of research. While this series of analyses was to the extent possible comprehensive, it should not be considered all-inclusive, as the list of recognized and measured research quality deficiencies is continuously evolving.

Material and Methods

In this study, a series of meta-analyses of research deficiency frequency studies was conducted at the Biomedical Research Innovation Laboratory at Augusta University between January and October 2017.

Eligibility criteria

We identified studies that met the following eligibility criteria: (i) provided a quantitative assessment of the frequency of one or more quality defects in life sciences research (i.e., calculated the frequency of specific deficiencies by dividing the total number of studies showing defect with the total number of studies reporting the particular quality aspect); (ii) presented original frequency data about defects (numerator and denominator); (iii) were peer-reviewed scientific articles that at least had an abstract with numeric results and were written in English; (iv) published between 1980 and 2016. This study focused on preclinical studies that met stated eligibility criteria. Defects of randomized controlled trials are discussed elsewhere and, therefore, were ineligible for inclusion in this study. Quality aspects that did not meet the criteria, including necessary number of independent studies for a meta-analysis, are recognized but could not be included. Ineligibility criteria also included human clinical trials and articles without online access. Due to the goals of this meta-analysis of deficiency frequency publications, all editorials, commentary, letters, surveys and case reports that did not present data on the frequency of defects were excluded. Studies reporting deficiency frequencies in already known to be defective populations were also ineligible (e.g., studies that analyzed deficiency detection in cell lines that were already known to be contaminated).

Search strategy

Electronic and manual literature searches were performed to identify all eligible quantitative studies. This study applied a comprehensive search strategy that was based on various combinations of terms and it is available together with all data collected through the Augusta University Scholarly Commons. The searches included the following databases: MEDLINE, CINAHL, Google Scholar, ProQuest Nursing and Allied Health Source, and WOS. The literature search strategy was developed using medical subject headings (MeSH) as well as all key terms related to the following 4 research deficiency groups: study design, statistical analysis, reporting, and cell line. Numerous iterations and combinations of search expressions and phrases were used to achieve maximum retrieval. For example, search terms included “statistical analysis”, “methodology”, “statistical method”, “inappropriate design”, “cell line authentication”, and “contamination in cell lines” and others in combination with terms of “deficiency,” “defect,” “flaw,” or “faulty interpretation.” In addition, manual searches were performed by screening the citations of review articles and bibliographies of potentially eligible studies. The reference list of included studies, relevant reviews, and authors’ personal files were searched to ensure literature saturation.

Study selection and quality assessment

All eligible articles were downloaded in a portable document format (PDF). The search strategy included a 5-step approach (illustrated in Figure 1). Each paper was assessed regarding potential relevance by screening the titles and abstracts. Subsequently, the full text of articles meeting the eligibility criteria was retrieved and reviewed. Two reviewers (NM and MV) judged the full texts of the potentially eligible reports. If there was a difference in the perceived eligibility of a study, 3 authors (NM, MV, and AB) discussed the report to arrive at a consensus, and the reason for the decision was recorded. We used the PRISMA-P (Preferred Reporting Items for Systematic review and Meta-Analysis Protocols) guidelines to maintain a high-level of quality control throughout the entire study [12].

Figure 1

Search Strategy of the Meta-Analysis. (PRISMA, 2009).

Data extraction and classification

Relevant data from each deficiency frequency report were extracted into a structured spreadsheet. We extracted the quality defect(s) as defined by the author, frequency data (numerator and denominator), sample description, the detection methods used, and citations. Deficiencies were assigned to one of the following deficiency groups; (i) study design; (ii) statistical analysis; (iii) cell lines; or (iv) reporting. Within each group, identical or essentially similar deficiencies were identified as deficiency concepts (e.g., sample size/power calculation deficiency; mycoplasma contamination of cell lines; or parametric test for non-parametric data). In the groups of study design, statistical analysis, and reporting deficiencies, we used the modified framework of Emerson and Colditz for definition of deficiency concepts [13]. For cell line deficiency concepts, we used the modified framework of both Capes-Davis et al. and Dexler et al. [14,15]. Subsequently, results of collected frequency studies were pooled for meta-analysis based on the deficiency concepts for further meta-analysis.

Data analysis

For deficiency concepts with 4 or more deficiency frequency studies and with a sample size of at least 15 in each, a meta-analysis was performed. Using the Meta-Essentials calculation formulas and software [16], the overall frequency and 95% confidence intervals were calculated for each eligible deficiency concept. The results of this analysis were displayed by multiple forest plots. To estimate heterogeneity among studies, I2 was used. According to the Cochrane Handbook, heterogeneity is divided into 4 levels: low heterogeneity, 0–25%; moderate heterogeneity, 25–50%; high heterogeneity, 50–75%; and extreme high heterogeneity, 75–100%. Where p<0.05 indicated significant heterogeneity, it could be accepted if the I2 ≤50% [17]. Due the diversity of study sources, we assumed heterogeneity, which was confirmed by the heterogeneity test. The random effects model based on the DerSimonian and Laird approach was used for all studies [18]. Subgroup analysis was performed to explore possible sources of heterogeneity based on the income level of countries, based on the World Bank categorizations [18], which assigns the world’s economies into 4 income groups: high, upper-middle, lower-middle, and low. We combined the upper-middle and lower-middle into one middle category and none of the studies came from the low-income group. We assessed potential regional variation of research quality when sufficient number of deficiencies frequency reports were available for both high-income and middle-income countries. The publication bias was assessed by funnel plot. Egger regression was used to examine funnel plot asymmetry (p<0.05 indicated significant publication bias). The Begg and Mazumdar rank correlation test was used to examine the funnel plot asymmetry if the deficiency frequency has been published by 10 or more studies [17]. Additionally, the trim and fill method was applied to all forest plots to identify and correct for funnel plot asymmetry arising from publication bias, as well as for estimating the number of missing studies that might exist [19,20]. Accepting recent recommendations, to p-value thresholds were set at 0.005 in this study [21].In addition to the search strategies, all data collected and underlying the findings described in this article are fully available without restriction through Scholarly Commons, the institutional repository for Augusta University ().

Results

Searches in the listed databases and screening for eligibility resulted in 68 studies that fully met our criteria (Figure 1). After a careful reading of the full text, 206 articles were excluded because they were irrelevant, measured defects of clinical trials, were not in English, and did not contain frequency data. The remaining 96 articles were further reviewed in detail, and 28 of them were further excluded for lacking 4 or more deficiency frequency studies of the same deficiency concept with a sample size of at least 15 in each. The 68 included studies were aggregate quality assessment studies. These aggregate studies analyzed a large number of original research publications and specimens. Ultimately, 10 203 original research articles and 6481 cell lines were assessed by the included aggregate studies and served as the basis for our statistical analysis. Several of these publications reported the analysis of more than one quality aspect. Ultimately, there were 128 quality aspects analyzed in the collected studies (19 in study design, 63 in cell lines, 18 statistical analysis, and 28 reporting). The included studies were from the USA, Canada, Sweden, Australia, Austria, Spain, Germany, the UK, France, Italy, Czech, the Netherlands, Croatia, Korea, Japan, China, Brazil, Egypt, India, Iran, Turkey, and Pakistan. Additional basic characteristics of all studies are shown in Table 1.

Table 1

Baseline characteristics of the included studies.

Author	Year	Country	Deficiency Group	Sample type	Size	Author	Year	Country	Deficiency Group	Sample type	Size
Armstrong [25]	2010	USA	Mycoplasma contamination	Human & animal cell cultures	38225	Mariotti [26]	2008	Italy	Mycoplasma contamination & misidentification	Human & animal cell lines	37
Avram [27]	1985	USA	Design, statistics, & reporting	Anesthesia articles	243	McGarrity [28]	1986	USA	Mycoplasma contamination	Cell cultures	2589
Azari [29]	2007	Iran	Misidentification	Human cell lines	100	McGuigan [30]	1995	UK	Design, statistics, & reporting	Psychiatry research articles	164
Berglind [31]	2008	Sweden	Misidentification	Human cancer cell lines	384	McKinney [32]	1989	USA	Design, statistics, & reporting	Medical research articles	56
Bölske [33]	1988	Sweden	Mycoplasma contamination	Cell cultures	1424	Mirjalili [34]	2005	Iran	Mixed contamination	Human and animal cell lines	138
Capes-Davis [14]	2010	Australia	Misidentification	Human cell lines	360	Neville [35]	2006	USA	Design, statistics, & reporting	Dermatology research articles	155
Capes-Davis [36]	2013	Australia	Misidentification	Human cell lines	1157	Nour-Eldein [37]	2016	Egypt	Design, statistics, & reporting	Medical Research articles	60
Cobo [38]	2007	Spain	Mixed contamination	Stem cell cultures	151	Olarerin-George [39]	2015	USA	Mycoplasma contamination	Human & animal cell cultures	484
Didion [40]	2014	USA	Cross-, mixed contamination, & misidentification	Mouse cell lines	99	Oliver [41]	1989	Australia	Design, statistics, & reporting	Surgery research articles	240
Drexler [42]	1999	Germany	Cross-contamination	Human hematopoietic cell lines	189	Onwuegbuzie [43]	2002	USA	Design, statistics, & reporting	Educational Research	36
Drexler [15]	2002	Germany	Contamination, cross-contamination, & false	Human leukemia lymphoma cell lines	1404	Patel [44]	2014	India	Design, statistics, & reporting	Basic Medical articles	128
Drexler [45]	2003	Germany	Misidentification	Human leukemia lymphoma cell lines	550	Pienkowska [46]	1998	Canada	Viral contamination	Human cell lines	75
Drexler [47]	2010	Germany	Mycoplasma contamination & misidentification	Human leukemia lymphoma cell lines	1331	Pilčèk [48]	2003	Czech	Design, statistics, & reporting	Biomedical articles	171
Drexler [49]	2017	Germany	Mycoplasma contamination & misidentification	Human leukemia lymphoma cell lines	330	Roulland-Dussoix [50]	1994	France	Mycoplasma contamination	Cell cultures	372
Ercan [51]	2012	Turkey	Design, statistics, & reporting	Medical sciences articles	181	Schweppe [52]	2008	USA	Misidentification	Human thyroid cancer cell lines	40
Ercan [53]	2015	Turkey	Design, statistics, & reporting	Medical sciences articles	217	Šimundić [54]	2009	Croatia	Design, statistics, & reporting	Medical research articles	55
Ercan [55]	2017	Turkey	Design, statistics, & reporting	Veterinary sciences articles	204	Spierenburg [56]	1988	Netherlands	Mycoplasma contamination	Animal cell lines	115
Felson [57]	1984	USA	Design, statistics, & reporting	Rheumatology research articles	74	Störmer [58]	2009	Germany	Mycoplasma contamination	Human cell lines	176
Hanif [59]	2011	Pakistan	Design, statistics, & reporting	Medical sciences articles	80	Strasak [60]	2007	Austria	Design, statistics, & reporting	Medical sciences articles	15
Hassan [61]	2015	India	Design, statistics, & reporting	Medical research articles	2012	Strasak [62]	2007	Austria	Design, statistics, & reporting	Medical sciences articles	53
Hopert [63]	1993	Germany	Mycoplasma contamination	Continuous cell lines	42	Teyssou [64]	1993	France	Mycoplasma contamination	Animal cell cultures	82
Huang [65]	2017	China	Misidentification & cross-contamination	Tumor cell lines	278	Timenetsky [66]	2006	Brazil	Mycoplasma contamination	Human cell cultures	301
Hué [67]	2010	UK	Viral contamination	Human cell lines	411	Uchio-Yamada [68]	2017	Japan	Misidentification	Mouse cell lines	80
Hukku [69]	1984	USA	Mixed contamination	Cell cultures	275	Uphoff [70]	2002	Germany	Mycoplasma contamination	Leukemia lymphoma cell lines	451
Ishikawa [71]	2006	Japan	Mycoplasma contamination	Cell cultures	337	Uphoff [72]	2010	Germany	Viral contamination	Animal cell lines	465
Jin [73]	2010	China	Design, statistics, & reporting	Medical research articles	2913	Uphoff [74]	2015	Germany	Viral contamination	Human cell lines	577
Jung [75]	2003	USA	Mycoplasma contamination	Human & animal+ cell lines	15	Van Kuppeveld [76]	1994	Netherlands	Mycoplasma contamination	Human & animal cell cultures	95
Kazemiha [77]	2009	Iran	Mycoplasma contamination	Mammalian cell lines	200	Welch II [78]	2002	USA	Design, statistics, & reporting	OB/GYN research articles	195
Kazemiha [79]	2014	Iran	Mycoplasma contamination	Human and animal cell lines	40	Welch [80]	1996	USA	Design, statistics, & reporting	OB/GYN research articles	145
Korch [81]	2012	USA	Misidentification	Endometrial & ovarian cancer	51	Wu [82]	2011	China	Design, statistics, & reporting	Medical research articles	2145
Kurichi [83]	2006	USA	Design, statistics, & reporting	Surgery research articles	187	Ye [84]	2015	China	Cross-contamination	Human cell lines	380
Lucena [22]	2011	Spain	Design, statistics, & reporting	Dentistry research studies	226	Yim [85]	2010	Korea	Design, statistics, & reporting	Medical research articles	139
MacArthur [86]	1984	USA	Design, statistics, & reporting	Medical sciences articles	64	Yoshino [87]	2006	Japan	Misidentification	Human cell lines	400
MacLeod [88]	1999	Germany	Cross-contamination	Human tumor cell lines	252	Zhao [89]	2011	USA	Cross-contamination	Human cell lines	122

Studies of multiple samples and deficiencies

In the pool of eligible studies, 18 reports on deficiency frequency presented results obtained from more than one sample. When a deficiency frequency report was analyzed multiple samples, each sample was given a unique reference number added to the author’s name. For example, the Strasak 2007 study was considered as multiple separate studies and was referenced as Strasak 1 2007, Strasak 2 2007, and so on, for each different sample. Another illustration is the composite publication by Hassan (2015) that reviewed original research studies in multiple groups; therefore, the composite publication was considered a collection of 18 different groups of studies numbered accordingly. Most deficiency frequency publications analyzed multiple deficiency concepts, not just one using one sample. For example, Lucena [22] estimated the frequency of several study design deficiencies (e.g., eligibility criteria use, power calculation, and randomization) using a sample of 226 dentistry articles. To illustrate the concept of information aggregation, Figure 2 is an illustrative, partial representation of aggregating studies in the meta-analysis of randomization deficiencies: (a) the left side of the figure shows the level of aggregating information, (b) the middle part shows the pyramid of aggregation from original research studies through deficiency frequency studies and to meta-analysis of deficiency frequency studies, and also the number of studies aggregated and (c) illustrative statements from each level of aggregation.

Figure 2

Simplified illustration of the aggregation of information. (A) Description of study levels; (B) pyramid of aggregating information about research deficiencies; and (C) illustrative study statements at each level.

Pool of samples and deficiency concepts

Of the 68 publications included in this study, several reported the analysis of more than one quality aspect. Ultimately, there were 128 quality aspects analyzed in the collected studies (19 in study design, 63 in cell lines, 18 statistical analysis, and 28 reporting). There were 128 samples and 24 different measured deficiency concepts in the pool of 85 deficiency frequency publications. Based on this information, a total of 24 meta-analyses were performed for quality defects. Deficiency concepts were meta-analyzed in 4 separate groups: study design, cell lines, statistical analysis, and reporting deficiencies. For the defects in the study design, 3 meta-analyses were conducted based on frequency data provided by 12 research studies that reviewed 1842 original research articles (Figure 3). The deficiency in sample calculation was the most frequent in the study design category, showing an overall frequency of 82.3% [95%: 69.9–94.6%; SE ±6.3%].

Figure 3

Frequency estimates of 3 (A–C) study design deficiencies in original research articles.

Meta-analyses of 6 deficiencies in 64810 cell lines used in life sciences research were analyzed by 42 deficiency frequency studies (Figure 4). The most frequent deficiency was mixed contamination in cell lines, with an overall frequency of 22.4% [95%: 10.4–34.3% SE ±5.3%]. Figure 5 shows the meta-analyses of 3 deficiencies in the statistical analysis of 2419 published research studies provided by 12 deficiency studies. The use of the chi-square test when expected cells frequency was <5 was the most frequent (15.7% [95%: −3.2–34.7%; SE ±7.4%]).

Figure 4

Frequency estimates of 6 (D–I) cell line defects.

Figure 5

Frequency estimates of 3 (J–L) statistical analysis deficiencies in original research articles.

Based on a combined number of 19 studies, 12 meta-analyses were conducted for defects in reporting of 5942 original research results (Figure 6). The most frequent defects were tail numbers not stated, p-values reported without a statistical test, and statistical software not mentioned, showing an overall frequency of 65% [95%: 39.3–90.8%; SE ±10.9%]; 61.5% [95%: 51–72%; SE ±3.8%]; and 54.5% [95%: 34.2–74.9%; SE ±8.6%], respectively.

Figure 6

Frequency estimates of 12 (M–X) reporting deficiencies in original research articles.

Subgroup analyses

To investigate the influence of other possible factors on heterogeneity across the studies, subgroup analyses were conducted based on country income level. Table 2 shows the separately estimated I2 variation across studies for both high- and middle-income countries. Our results indicated that there were no significant differences from total variation after separately pooling studies from high- and middle-income countries, except for one deficiency concept. Therefore, based on our results, the country income-based subgroup analysis failed to explain heterogeneity, except for the mean (SD) used for non-normal or ordinal data. Several other subgroup analyses were explored, but none were ultimately feasible due to the insufficient number of error frequency reports (e.g., year of publications, cell line type non-cancer/cancer, human/animal/mixed). In other words, there was a lack of evidence for quality improvement over time.

Table 2

Subgroup analysis of the estimated variation of reproducibility deficiencies in high-income and middle-income countries.

Deficiency concepts	Studies	Sample	Combined I² %	High-income I² %	Middle-income I² %
Sample/power calculation deficiency	16	1486	81.78%	84.17%	85.40%
Misidentified cell lines	18	5610	94.45%	93.14%	97.96%
Mycoplasma contamination in cell lines	30	57052	99.39%	99.08%	69.99%
Parametric test for non-parametric data vice versa	9	753	78.73%	88.09%	83.71%
Related data independent test or vice versa	10	1695	91.82%	95.68%	95.26%
Mean (SD) used for non-normal or ordinal data	8	3331	73.99%	38.17%	10.16%
Failure to report the exact p-value	12	4094	98.13%	74.62%	99.22%
P-value significance level not defined	6	434	0.00%	69.01%	0.00%
Name of statistical software not mentioned	8	758	82.15%	70.89%	92.63%
Number of tails not stated	8	608	84.46%	85.63%	74.92%

Analysis of publication bias showed funnel plot symmetry and corresponding lack of statistically significant bias for the majority of studied deficiency concepts. There were some exceptions, suggesting publication bias in the literature: cell line bacterial contamination other than mycoplasma (p=0.002); mixed contamination of cell lines (p=0.002); chi-square test used when expected cells frequency <5 (p=0.007); and p-value significance level not defined (p=0.007). (Figure 7).

Figure 7

Funnel plots. (A) Eligibility criteria not mentioned or inappropriate (B) Randomization deficiency (C) Sample/power calculation deficiency (D) Cell line bacterial contamination other than mycoplasma (E). Cell line cross-contamination (F). Misidentified cell lines (G). Mixed contamination of cell lines (H). Mycoplasma cell line contamination (I). Viral contamination of cell lines (J). Chi-square test used when expected cells frequency are <5 (K). Parametric test for non-parametric data and vice versa (L). Related data independent test and vice versa (M). Mean(SD) used for non-normal or ordinal data (N). variability description +/− notation undefined (O). Failure to report exact p-value (P). p-value significance level not defined (Q). p-value reported without statistical test (R). Significance stated without providing statistical test (S). Statistical software not mentioned (T). Statistical test name incorrect (U). Study population baseline characteristics not described (V). Number of tails not stated (W). Reporting of “Where appropriate statement” (X). Statistical test used for dataset not specified.

Discussion

While research is inherently innovative and variable, many methodologies became routinely used; therefore, associated deficiency rates are increasingly recognized. Due to the growing number of deficiency frequency studies, integration of results is becoming possible and necessary. This series of meta-analyses is the first comprehensive study to provide numeric frequency estimates for 21 different deficiencies in life sciences research. The complexity of the life sciences research process makes it prone to deficiencies. Some research studies use pioneering or unique methodologies, but many studies use standard methods (e.g., knockout mouse, standard cell lines). Results of this study indicate that the frequency of deficiencies in life sciences research can be reliably measured. Interestingly, the studies on possible reasons for non-reproducibility have been largely based on expert opinion and are themselves non-reproducible. This study represents the first comprehensive collection of research deficiency detection studies that solely relies on deficiency definitions successfully reproduced in several studies. We found that deficiency rates vary between 1.3% and 82.3%, depending on the particular type of deficiency in life sciences research. Our comprehensive meta-analysis indicates that the following deficiencies in life sciences research are particularly frequent (i.e., meta-frequency exceeding 20%): sample size/power calculation deficiency, tails number not stated, p-values reported without statistical test, statistical software not mentioned, eligibility criteria incomplete, failure to report the exact p-value, p-value significance level not defined, randomization deficiency, statistical test used for a dataset not specified, mixed contamination of cell lines, and no description of the study population. When many researchers use at least partly identical methodologies, certain deficiencies are becoming recognizable and their frequencies can be estimated. This does not mean that the particular methodology is flawed, only that it is vulnerable to certain deficiencies. For example, the use of cancer cell lines is an excellent laboratory methodology, but it is occasionally vulnerable to misidentification or contamination. Researchers need to be aware of such sources of deficiencies and prepared to prevent and detect them. Scientific quality control has long been reliant on peer review. However, such control is too late when the research itself is already done. For many defects, it would be more advantageous to consider them while the research is still progressing. This meta-analysis provides actionable and measurable defect identification, unlike the majority of articles on quality control in research. When scientists get these numbers, they should know which errors are more frequent and what needs to be considered at a particular phase of their study. This study focused on 4 key research aspects relevant to the reproducibility of results from the initial to the late phases of basic biomedical research (study design, cell lines, statistical analysis, and reporting). This information should be valuable for researchers and also research administrators in recognizing the most frequent errors and to prevent them most effectively. As new aggregate research deficiency studies will be emerging, they can be added to expand the scope and applicability to quality improvement in research laboratories and institutions. The deficiencies highlighted by this meta-analysis were the most frequent within their own category (e.g., cell line contamination). It should also be recognized that the deficiencies reported were not necessarily the most important sources of irreproducibility either during the review period or for the present time. There might be other errors that have not been systematically measured yet but that can be included in the future when pertinent frequency measurements arise. Our study should encourage further and wider-ranging studies on the frequency of deficiencies of biomedical research. This study did not find evidence that variations in the frequency of research reproducibility deficiencies are explained by differences between high-income and middle-income countries. Apparently, the income environment does not influence the quality of research, although it may influence the choice of research focus and access to resources. There are many distinguished scientists from the developing world who are making important contributions to the scientific community worldwide. It is well recognized that the number of scientific publications is rapidly growing worldwide. The rising trends of research publications can be partly attributed to the increase of international scientific collaboration. Researchers, funders, and journal editors communicate science the same way all over the world. The method of science has to meet the same quality standards everywhere and is not linked to the region. The potential for quality improvement over time was considered, but we found no evidence of such trends. It is possible that the timeframe of available data-driven quality studies was not sufficient to detect changes/improvement over time. The lack of evidence for research quality improvement over time is not surprising, for several reasons. The potential for quality improvement over time was not the scope of this analysis, as the included studies have different methodologies and sample types, making comparisons difficult. According to the principles of management science, general improvement in quality comes from systematic and regular measurements of deficiencies and organized efforts to manage quality (e.g., car manufacturing industry, health care quality improvement in many countries). With rare exceptions, such systematic institutional quality management initiatives are uncommon in the biomedical research enterprise. A limitation of the present study is the reliance on already-published numeric analyses of research deficiencies. There are many more suspected and actual life sciences research deficiencies that have not yet been analyzed by a sufficient number of studies to be included in this meta-analysis (e.g., dysfunctional reagents). Further, subgroup analysis by sample type was not possible due to insufficient sample size. It is also obvious that defects in research are probably under reported. Moreover, in the cell line group, different studies used different techniques for identifying the various defects in cell lines. Our study selection was restricted to articles published in English. It is possible that studies published in other languages or unpublished studies could shift the overall conclusion. The 4 deficiency categories were selected based on reviewing the literature, talking to scientists in the field, and the repeated NIH calls to enhancing reproducibility and integrity of research [23]. Deficiency in animal studies was one of these categories. In collecting studies for these meta-analyses, several highly publicized articles on research quality issues did not provide frequency estimates and thus were not eligible for inclusion. Due to the diversity of issues and defects, animal modeling studies was not the target of our study. While this series of meta-analyses was intentionally comprehensive, it should not be considered all-inclusive, as the list of recognized research quality deficiencies is continuously evolving. Management science often stresses that narratives without data are rarely effective in improving quality. This meta-analysis shows the theoretical and practical significance of measuring quality in basic biomedical research. With more emphasis on continuous quality improvement, the number of deficiency frequency studies is likely to substantially grow.

Conclusions

Research quality improvement should be a continuous and comprehensive process, from the design and conduct of research to the publication of results. With periodic analyses, corrective actions should be recommended and implemented to reduce the chances of deficiencies. Life sciences research deficiencies can be one of the following types. The first type of research deficiency is the project-dependent deficiency. Such deficiencies are produced in the research process and are fully under the control of the researcher or principal investigator. For example, study design, statistical analysis, or reporting are such deficiencies. To prevent this, researchers should use rigorous design, standards, and methods when conducting their projects [24]. Among other tools, the deficiency concepts of this meta-analysis should be used by researchers and reviewers as a checklist for deficiency prevention. The second type of deficiency is supplier-dependent. In such cases, the researcher is the receiver of commercially available goods and services (e.g., cell lines). In such cases, individual researchers need to be alert and take appropriate quality cross-check measures. More importantly, universities and research institutions have to take greater responsibility for selecting and controlling suppliers. Particularly, they should take responsibility to ensure the provided cell lines are authentic and contaminant-free. Research quality safeguarding should be part of institutional infrastructural support (F&A). Infrequently occurring deficiencies from either of the above listed categories are particularly hard to recognize and prevent at the level of individual research laboratories. Institutions with research laboratories should gather information about deficiencies and help to keep their research protected from deficiencies. In other words, institutional quality assessment and improvement efforts are needed to ensure that the conducted research is based on rigorous practices and prevention of deficiencies that can threaten reproducibility. In spite of the growing literature, the recognition of threats to quality research, need for studies on research quality, and understanding of comprehensive research quality improvement lag behind expectations. It is important that error definitions themselves become reproducible and measurable to track improvement. Continuous quality improvement is a major challenge that needs to be fully recognized by research institutes and universities. A collaborative culture at the institutional level is needed to eliminate deficiencies in life sciences research. Researchers and research institutions need to appreciate the value of measurement of deficiencies and work together to implement the needed changes. Improvement efforts should be built on these comprehensive measures, which should reduce deficiencies, increase research productivity, and multiply meritorious scientific discoveries.

71 in total

1. Statistical errors in microleakage studies in operative dentistry. A survey of the literature 2001-2009.

Authors: Cristina Lucena; José M López; Camilo Abalos; Virginia Robles; Rosa Pulgar
Journal: Eur J Oral Sci Date: 2011-12 Impact factor: 2.612

2. Detection of squirrel monkey retroviral sequences in interferon samples.

Authors: M Pienkowska; A Seth
Journal: J Hepatol Date: 1998-03 Impact factor: 25.083

3. Sensitivity of biochemical test in comparison with other methods for the detection of mycoplasma contamination in human and animal cell lines stored in the National Cell Bank of Iran.

Authors: Vahid Molla Kazemiha; Amir Amanzadeh; Arash Memarnejadian; Shahram Azari; Mohammad Ali Shokrgozar; Reza Mahdian; Shahin Bonakdar
Journal: Cytotechnology Date: 2014-02-04 Impact factor: 2.058

4. Sources of error in the retracted scientific literature.

Authors: Arturo Casadevall; R Grant Steen; Ferric C Fang
Journal: FASEB J Date: 2014-06-13 Impact factor: 5.191

5. Widespread intraspecies cross-contamination of human tumor cell lines arising at source.

Authors: R A MacLeod; W G Dirks; Y Matsuo; M Kaufmann; H Milch; H G Drexler
Journal: Int J Cancer Date: 1999-11-12 Impact factor: 7.396

6. A retrospective survey of research design and statistical analyses in selected Chinese medical journals in 1998 and 2008.

Authors: Zhichao Jin; Danghui Yu; Luoman Zhang; Hong Meng; Jian Lu; Qingbin Gao; Yang Cao; Xiuqiang Ma; Cheng Wu; Qian He; Rui Wang; Jia He
Journal: PLoS One Date: 2010-05-25 Impact factor: 3.240

7. Analysis of statistical methods and errors in the articles published in the korean journal of pain.

Authors: Kyoung Hoon Yim; Francis Sahngun Nahm; Kyoung Ah Han; Soo Young Park
Journal: Korean J Pain Date: 2010-03-10

Review 8. Misuse of statistical methods in 10 leading Chinese medical journals in 1998 and 2008.

Authors: Shunquan Wu; Zhichao Jin; Xin Wei; Qingbin Gao; Jian Lu; Xiuqiang Ma; Cheng Wu; Qian He; Meijing Wu; Rui Wang; Jinfang Xu; Jia He
Journal: ScientificWorldJournal Date: 2011-11-02

9. SNP array profiling of mouse cell lines identifies their strains of origin and reveals cross-contamination and widespread aneuploidy.

Authors: John P Didion; Ryan J Buus; Zohreh Naghashfar; David W Threadgill; Herbert C Morse; Fernando Pardo-Manuel de Villena
Journal: BMC Genomics Date: 2014-10-03 Impact factor: 3.969

10. Building global capacity for brain and nervous system disorders research.

Authors: Linda B Cottler; Joseph Zunt; Bahr Weiss; Ayeesha Kamran Kamal; Krishna Vaddiparti
Journal: Nature Date: 2015-11-19 Impact factor: 49.962