Literature DB >> 35941841

Publication practices during the COVID-19 pandemic: Expedited publishing or simply an early bird effect?

Abstract

This study explores the evolution of publication practices associated with the SARS-CoV-2 research papers, namely, peer-reviewed journal and review articles indexed in PubMed and their associated preprints posted on bioRxiv and medRxiv servers: a total of 4,031 journal article-preprint pairs. Our assessment of various publication delays during the January 2020 to March 2021 period revealed the early bird effect that lies beyond the involvement of any publisher policy action and is directly linked to the emerging nature of new and 'hot' scientific topics. We found that when the early bird effect and data incompleteness are taken into account, COVID-19 related research papers show only a moderately expedited speed of dissemination as compared with the pre-pandemic era. Medians for peer-review and production stage delays were 66 and 15 days, respectively, and the entire conversion process from a preprint to its peer-reviewed journal article version took 109.5 days. The early bird effect produced an ephemeral perception of a global rush in scientific publishing during the early days of the coronavirus pandemic. We emphasize the importance of considering the early bird effect in interpreting publication data collected at the outset of a newly emerging event.

Entities: Chemical

Keywords: COVID‐19; expedited publication practices; publication delays

Year: 2022 PMID： 35941841 PMCID： PMC9349734 DOI： 10.1002/leap.1483

Source DB: PubMed Journal: Learn Publ ISSN： 0953-1513

The early bird effect is observed in extremely short publication delays for scientific manuscripts on new and ‘hot’ topics at the outset of a newly emerging event. The early bird effect produced an ephemeral perception of a global rush in scientific publishing during the early days of the coronavirus pandemic. Publication delays for SARS‐CoV‐2 research papers show only a moderate expediting as compared to the pre‐pandemic era; a median peer‐review and production stage delays were 66 and 15 days, respectively, during the 1 January 2020 to 31 March 2021 period. Early bird manuscripts and data incompleteness are both intrinsic features of publication data and shall be taken into account when interpreting the publishing landscape.

INTRODUCTION

A pneumonia of unknown origin was first reported in Wuhan, China on 30 December 2019 (ProMED International Society for Infectious Diseases, 2019), and in about 3 months, the coronavirus was declared a pandemic (WHO, 2020). The response of the scientific community was outstanding; already in January, the first research reports appeared as preprints, clinical trials and journal articles (Fidahic et al., 2020). On 31 January 2020, various journal publishers, and research organizations signed the Statement on Data Sharing in Public Health Emergencies reaffirming the principles of rapid access to research data and publications relevant to the COVID‐19 outbreak (Welcome Fund, 2020). Within a few months of the outbreak, journal publishers partially or completely lowered their paywalls concerning the sharing of SARS‐CoV‐2 related research (Retta, 2021), supported and encouraged scientific communication through preprints (Eisen et al., 2020), and ensured a fast‐track peer‐review process for COVID‐19 works (CFP for COVID‐19 works, 2020). A successful example of the latter is the initiative Rapid Reviews: COVID‐19 (RR:C19), launched on 27 April 2020, by Hindawi, the Royal Society, PLOS and PeerJ to create and share a pool of expert reviewers for COVID‐19 manuscripts (EurekAlert, 2020; OASPA, 2020). These enhanced peer‐review practices and publication policies, coupled with an increasing number of preprints, suggested the emergence of a new era in scientific communication prompted by the COVID‐19 pandemic (Callaway, 2020; Krumholz et al., 2020; Kupferschmidt, 2020). The urgency and transparency of scientific communication were genuinely welcomed early in the pandemic by the public as well as the scientific community, but the unprecedented volume of research jeopardized the previously established standards for peer‐review and publication policies (Dinis‐Oliveira, 2020). For instance, eLife announced curtailing requests for additional experiments when reviewing the SARS‐CoV‐2 papers (Eisen et al., 2020). Further, early analyses of publication practices related to COVID‐19 manuscripts reported a median peer‐review time of 6 days (Kun, 2020; Palayew et al., 2020), which stands in stark contrast with a standard peer‐review time that remained at around 100 days for the last 30 years (Powell, 2016). The daily rate of COVID‐19 preprints posted on servers increased dramatically and many of them had a strong impact on the public health policy‐making during the early pandemic (Fraser et al., 2021). The following wave of article retractions (Retraction Watch, 2020) summoned initial concerns that the fast publication speed of COVID‐19 works could be incompatible with a rigorous peer‐review process and may lead to damaging the integrity of science communication (Dreisbach, 2020; Steinberg, 2020). Indeed, the year 2020 was fraught with intense debates on social media around several controversial, high‐profile journal publications that affected COVID‐19 related health policies. To cite a few, a published clinical trial describing the successful use of hydroxychloroquine in COVID‐19 patients (Gautret et al., 2020) was actively promoted by the Trump administration (Baker et al., 2020) despite multiple concerns regarding the quality of its study design (Fauci et al., 2020; Servick, 2020; Voss, 2020). Another study, a letter published in the Lancet Respiratory Medicine journal early in the pandemic raising concerns about the use of ibuprofen to treat COVID‐19 symptoms (Fang et al., 2020), was initially supported by WHO (Moffitt, 2020) and led doctors to advise against treating COVID‐19 fever with nonsteroidal anti‐inflammatory drugs (NSAIDs), like ibuprofen (Day, 2020), and the French Health Ministry to completely ban NSAIDs (DGS‐urgent, 2020); all these actions being swiftly reversed once data insufficiency became apparent (Drake et al., 2021). These disputes demonstrate the pressure that the peer‐review system and the entire scientific community experienced during the early coronavirus pandemic (Chirico et al., 2020). In an effort to maintain high‐quality standards in reviewing COVID‐19 studies, in April 2020, EASE (European Association of Science Editors) encouraged all editors to enforce the previously established guidelines on authors and require a clear statement of study limitations (EASE, 2020). Throughout the first year of the pandemic, a number of editorials further reiterated the importance of maintaining a rigorous peer‐review process during such a large‐scale public health emergency (Sepúlveda‐Vildósola et al., 2020; Smart, 2020). To cite a few, the editor of JAMA warned that ‘Rushing publication, if there are mistakes, will ultimately undermine public trust in science’ (Bauchner et al., 2020); the editor of Thorax stressed that ‘…it is crucial that journals streamline, but maintain high‐quality peer‐review processes’ (Smyth et al., 2020); and The Lancet Global Health called for ‘a need to slow down’ and ‘resist pressure from researchers and their institutions to expedite every step’ because ‘When research, writing, and peer review are rushed, the consequences may be damaging’ (The Lancet Global Health, 2020). Judging by the subsequent analyses that demonstrated consistently longer publication delays for COVID‐19 manuscripts as the pandemic evolved, these concerns seemed to be addressed. For example, early in the pandemic, from January to April 2020, median peer‐review time for COVID‐19 manuscripts was reported as 6 days (Kun, 2020; Palayew et al., 2020) and median elapsed time for COVID‐19 preprints, which is how long it takes for a preprint to transform into a peer‐reviewed journal article, was 22.5 days (Fraser et al., 2020). However, later in the pandemic, based on data we collected in October 2020 (Sevryugina & Dicks, 2021b), these median delays lengthened to 37 and 57 days, respectively; and within a month, the latter extended to 68 days (Fraser et al., 2021). Herein, we initially set out to determine if the same trend was maintained throughout the first 15 months of the pandemic—January 2020 to March 2021—by exploring the evolution of publication practices during this period. We also enquired on the origin for the abovementioned elongation of publication delays and found a novel phenomenon, addressed herein as the early bird effect, that produced an ephemeral perception of a global rush in scientific publishing during the early pandemic. We will show that this new effect lays beyond the involvement of any publisher policy action and is directly linked to the emerging nature of new and ‘hot’ scientific topic.

METHODS

Scope

The scope of this study is biomedical literature related to the SARS‐CoV‐2 research, namely, peer‐reviewed journal and review articles indexed in PubMed and their associated preprints posted on bioRxiv and medRxiv servers, a total of 4,031 deduplicated journal article‐preprint pairs. Since our study focuses on publication delays experienced by manuscripts as they transition from preprints to peer‐reviewed journal articles, we do not include in our analysis “unpublished” preprints that do not have peer‐reviewed articles associated with them. Based on the 70% publication rate for bioRxiv preprint server (Sever et al., 2019), we believe, we covered the majority of COVID‐19 publications associated with two major preprint servers for biomedical literature.

Timeline

Two data sets are discussed herein. One of them was collected on 4 May 2021 and it includes a total of 4,031 deduplicated journal article‐preprint pairs, where preprints were posted on preprint servers from 1 January 2020 to 31 March 2021. Another data set was collected on 19 October 2020 and it includes 1,099 journal article‐preprint pairs, where preprints were posted on preprint servers from 1 January 2020 to 30 September 2020. The October 2020 data set has been discussed in our preprint (Sevryugina & Dicks, 2021b) and its data analyses were posted on Zenodo (Sevryugina & Dicks, 2021a).

Data sources

This paper examines data acquired from a number of sources, including the database of COVID‐19 SARS‐CoV‐2 preprints from medRxiv and bioRxiv (BioRxiv API, 2021), Crossref (Crossref REST API, 2021), E‐utilities (Bethesda, 2010), Dimensions (Herzog et al., 2020), CORD‐19 (Wang et al., 2020) and CADRE (Mabry et al., 2020). Metadata for each individual COVID‐19 preprint deposited to bioRxiv or medRxiv was gathered by accessing the bioRxiv database of COVID‐19 SARS‐CoV‐2 preprints from medRxiv and bioRxiv, to which we will further refer as BioRxiv API (BioRxiv API, 2021). Data were retrieved in JavaScript Object Notation (JSON) format. Data analysis and visualization was done in Python (pandas, numpy, requests, matplotlib, bokeh and seaborn) using Jupyter Notebook. Crossref (Crossref REST API, 2021) is an official DOI registration agency of the International DOI Foundation that establishes a cross‐publisher citation linking system for academic that include journals, conference proceedings, books, data sets, and so forth. It works with thousands of publishers to provide authorized access to their metadata including DOI, publication date and other basic information. To search PubMed, we used Entrez Programming Utilities (E‐utilities) (Bethesda, 2010), an application programming interface (API) that allows searching 38 databases from the National Center for Biotechnology Information (NCBI). We used PubMed (through E‐Utilities) to obtain metadata on peer‐reviewed articles of ‘Journal Article’ and ‘Review’ article types as the most traditional types of scholarly output. According to Kun's (2020) estimates, these two types constitute about 24% of all PubMed publications that include 187 different publication types (NLM, 2020). We used the single‐term search query ‘COVID‐19’ followed on the recommendations by Lazarus et al. (2020). From E‐Utilities, data were downloaded via CSV and converted to Microsoft Excel for further analysis and visualization. Dimensions (Herzog et al., 2020) is a comprehensive database that links scholarly outputs to a research analytics suite to track the impact of research across its life cycle. Dimensions tracks many preprint servers (Altmetric, 2020) but we only used it for bioRxiv and medRxiv preprints (Data Flow Chart in Data S1). CORD‐19 or COVID‐19 Open Research Data set (Wang et al., 2020) is a free resource of over 200,000 scholarly articles about COVID‐19, SARS‐CoV‐2 and related coronaviruses prepared by the Allen Institute for AI (AI2) in collaboration with many partners and released on 16 March 2020. We used its 4 March 2021 release downloaded on 26 April 2021 from CADRE (Mabry et al., 2020) for metadata associated with refereed journal articles.

Data availability

Source data for all figures have been provided in supporting files that were deposited in a Zenodo repository with DOI 10.5281/zenodo.6415280.

Analysis of published preprints

When a preprint is published in a peer‐review journal, a reference to the new DOI of the journal article appears next to its title, and DOIs of a preprint and a published article are permanently linked in indexing platforms and tools, which pull from various APIs. We found that the most reliable method of extracting metadata about each individual preprint was by accessing the BioRxiv API (BioRxiv API, 2021). Using the Python library requests, we were able to extract information about each preprint based on DOI, which gave us a column called ‘published.’ Within this column, if the preprint was also published in a journal, the metadata provided the DOI that corresponded to the published version of the paper. To ensure we found all published preprints, we also accessed data from Crossref, Dimensions and CORD‐19 APIs. To establish the linkage between the preprints and corresponding peer‐reviewed journal articles we performed both, DOI and title matching. All channels were then combined and duplicates were dropped. For a detailed demonstration of data obtained by every data channel, see published collections in Data S1. To validate whether we found all peer‐reviewed preprint versions based on a combination of Crossref, CORD‐19, Dimensions and BioRxiv API, we randomly selected a sample of 100 preprints that our data returned as ‘unpublished’ from both bioRxiv and medRxiv, and searched Google Scholar by title. Our analysis of ‘unpublished’ preprints returned 10% of bioRxiv and 4% of medRxiv preprints as being published in refereed journals. All found journal publications had slight modifications in article titles or authors' list, and the original ‘unpublished’ preprints were not linked on preprint servers to the corresponding published versions. In comparison, this false‐negative rate is lower than the 37.5%, reported by Abdill and Blekhman (2019) and is similar to the 9.1% rate reported by Cabanac et al. (2021). All manually found journal article versions of ‘unpublished’ preprints were manually added to data discussed in this article.

Double DOIs

When we looked for published preprints based on title matching, we encountered a few instances when two published DOIs existed for a peer‐reviewed preprint version. In one case, it was erratum for the paper and in the other case it was a publication on another preprint server. In both cases, we used only the DOI for the article in the peer‐reviewed journal and publication on another preprint server was removed from further analysis. We also encountered a few cases when preprints with different DOIs were linked to the same DOI of the published version. On inspection, preprints with different DOIs were somewhat similar in titles and authors' list but not identical. For our analysis, we kept only one DOI for a preprint that was published earlier.

Analysis of publication delays

Preprints posting dates were extracted from Crossref. For journal articles received, accepted and published online dates, we used E‐Utilities: PubMedPubDate@PubStatus = ‘received’; PubMedPubDate@PubStatus = ‘accepted’ and ArticleDate@DateType = ‘Electronic’. When ArticleDate@DateType = ‘Electronic’ from PubMed was not available, we substituted it with the ‘created‐date’ from Crossref. A detailed description of our selection process for the appropriate dates was reported in our preprint (Sevryugina & Dicks, 2021b).

Pre‐submission time (tα)

Interval between the date when a preprint is deposited to the server and the date when it is submitted to the journal. Pre‐submission time = date the journal article was ‘received’—preprint deposition date.

Review time (t)

Interval between the date when manuscript is submitted to the journal and the date it is accepted for publication. Review time = date the journal article was ‘accepted’—date the journal article was ‘received’.

Production stage time (tβ)

Interval between the acceptance date for a manuscript and the date the peer‐reviewed journal article appears online. Production stage time = date the journal article was posted online (‘Electronic’)—date the journal article was ‘accepted’.

Elapsed time (T)

Interval between the date when a preprint was deposited to the server and publication date for its journal article analogue. Elapsed time = date the journal article was posted online (‘Electronic’)—preprint deposition date.

Altmetric data

Altmetric Attention Scores were retrieved from Dimensions by querying for articles published during the 1 January to 30 September 2020 period using the recommended query for COVID‐19 (DSL, 2020). We matched DOI's of articles found by Dimensions to DOI's of articles we identified earlier as published preprints.

Statistical analysis

Descriptive analysis of the data and Student's t test were conducted on the Statistical Package for Social Sciences version 27 (SPSS).

RESULTS AND DISCUSSION

Our study is based on data we collected on 4 May 2021 that includes peer‐reviewed journal articles indexed in PubMed as related to the SARS‐CoV‐2 research and their associated preprints posted on bioRxiv and medRxiv servers from 1 January 2020 to 31 March 2021, a total of 4,031 deduplicated journal article‐preprint pairs. The inclusion of preprint data in our study allows for a glimpse into the pre‐submission history of draft manuscripts through their timestamping on preprint servers. In discussing the various publication delays involved in scholarly publishing, we will further refer to the simplified model depicted in Scheme 1, where the elapsed time (T ) is made up of three components: pre‐submission time (t ), peer‐review time (t ), and production stage time (t ), where the combined (t + t ) represents the publication time (T ).

SCHEME 1

Preprint publication process (Icons, 2021). Pre‐submission time (t )—interval between the date when a preprint is posted on the preprint server and the date when it is submitted to the journal. Review time (t )—interval between the date when manuscript is submitted to the journal and the date it is accepted for publication. Production stage time (t )—interval between the acceptance date for a manuscript and the date the peer‐reviewed journal article appears online. Publication time (T P)—interval between the publication date for a journal article and the date when manuscript is submitted to the journal. Elapsed time (T Σ)—interval between the date when a preprint was deposited to the server and publication date for its peer‐reviewed journal article analogue. The 15‐month assessment of publication delays yielded medians of 67 days for peer‐review time (t ), 82 days for publication time (T ), and 104 days for elapsed time (T ), in agreement with the abovementioned trend of the overall elongation of publication delays as the pandemic evolved. To shed some light onto the observed trend, we dissected our data into monthly intervals and noted that publication delays averaged by month followed a convex‐shaped curve (Fig. 1). The curvature of the plot revealed two striking features: (i) the earliest as well as the most recent dates both displayed short publication delays, and (ii) publication delays were longer as compared to the previously reported values. To exemplify (i), the 3‐month period, 1 January–31 March, in 2020 and in 2021, yielded mean publication delays much shorter than the 3‐month period in‐between (T = 72.7 days and 54.1 days, respectively vs. the 109.6 days during 1 June–31 August 2020). To exemplify (ii), we obtained a median peer‐review time of 13 days for COVID‐19 works published during the 30 January to 23 April 2020 period versus the previously reported 6 days for the same period (Palayew et al., 2020). Intrigued by these features, we retrieved the data collected in an identical way on 19 October 2020 (Sevryugina & Dicks, 2021b) and confirmed that it displayed similar key characteristics (Fig. 1).

FIGURE 1

Average publication delays by month for COVID‐19 journal articles that were posted as preprints since 1 January 2020: T in shades of brown, T in shades of green, and t = T . Two data sets are overlapped: one collected in October 2020 includes all preprints posted before 1 October 2020 (dark shades), the other collected in May 2021 includes all preprints posted before 1 April 2021 (light shades). Furthermore, by overlaying the October 2020 and May 2021 data sets, the variation between our two estimates became apparent (Fig. 1); for example, the estimated average publication time (T ) for COVID‐19 works posted on preprint servers in July 2020 is 51.5 versus 112.8 days, respectively. We then subtracted the data we collected in October 2020 from the data we collected in May 2021 and analysed the remainder data (added data (D) in Table 1). We identified two major factors contributing to the observed discrepancy between the two data sets. One is the emergence of previously invisible manuscripts that were undergoing review or production stages at the time of analysis, all of which have a publication date after 30 September 2020. The second factor is related to a persistent problem of unreliable linking between preprints and their peer‐reviewed journal article versions (Lin et al., 2020). Despite our best efforts, title matching across multiple databases was only partially effective in identifying the missing links (Sevryugina & Dicks, 2021b). During the 6‐month period between our two data collections, preprint servers continued to link preprints and their journal article versions enabling us to re‐discover older manuscripts that previously appeared as lacking preprint analogs. This second factor accounts for the unexpectedly short publication delays found for the added data, such as the mean publication time of 78 days in February 2020 that implies those articles were already published in October 2020, but were not included in our October 2020 data set. Both aforementioned factors are well‐known culprits for introducing systematic errors in publication data (Abdill & Blekhman, 2019; Sevryugina & Dicks, 2021b). The data incompleteness resulting from these two factors is the main reason for observing the aforementioned elongation of publication delays with widening of the analysis window.

TABLE 1

Average publication delays by month for COVID‐19 preprints and corresponding peer‐reviewed articles

	October data set (d)			Added data (D)			Completeness, %	Student's t test for T _P
Month	N	T _P	t _a	N	T _P	t _a	Completeness, %	df	t	p	Cohen's d
Jan 2020 ^a	14	29.2	3.3	3	23.5	8.5	82
Feb 2020	70	51.9	9.5	36	78.0	39.0	66	34.4	1.66	0.11
Mar 2020	174	61.3	8.8	108	113.1	50.5	62	103.8	5.18	<0.001	0.85
Apr 2020	298	60.0	10.0	256	136.8	42.5	54	313.5	13.70	<0.001	1.35
May 2020	295	61.2	5.2	488	131.0	35.8	58	656.7	17.87	<0.001	1.28
Jun 2020	151	57.7	0.4	401	126.8	32.1	27	417.6	16.01	<0.001	1.27
Jul 2020	75	51.5	−2.1	392	127.4	20.8	16	290.1	15.87	<0.001	1.22
Aug 2020	20	42.9	−12.9	317	113.2	18.1	6	20.1	8.82	<0.001	1.38
Sep 2020 ^a	2	38.0	−18.5	310	103.9	12.9	1
Oct 2020				205	95.4	6.8
Nov 2020				193	88.4	4.3
Dec 2020				107	71.5	2.1
Jan 2021				71	58.1	−2.5
Feb 2021				24	57.3	−11.1
Mar 2021				22	40.0	−16.2
Overall	973	58.4	5.8	2,467	113.5	22.9		2,690.4	25.01	<0.001	0.98

Note: October data set (d) includes preprint posting dates 1 January–30 September 2020. The added data is a difference between two data sets, one collected in May 2021 and the other collected in October 2020. The Student's t test is used to compare publication delays (T ) between the October data set (d) and added data (D). N corresponds to the number of analysed preprint‐journal article pairs.

Not enough papers to support the Student's t test; p is two‐sided. % Completeness = d/(d + D).

Average publication delays by month for COVID‐19 preprints and corresponding peer‐reviewed articles Note: October data set (d) includes preprint posting dates 1 January–30 September 2020. The added data is a difference between two data sets, one collected in May 2021 and the other collected in October 2020. The Student's t test is used to compare publication delays (T ) between the October data set (d) and added data (D). N corresponds to the number of analysed preprint‐journal article pairs. Not enough papers to support the Student's t test; p is two‐sided. % Completeness = d/(d + D). The most intriguing feature of the data plot in Fig. 1 is its convex shape. Taking into account that the most recent publication delays are subject to data incompleteness, we focused on publication delays during the early pandemic. We propose their low values can be explained through a phenomenon related to what we will further refer herein to as early bird manuscripts. Those are fast‐tracked articles published within up to 2 months from their submission date (T < 60 days) and in the journal of their first choice (median t = 3 days). Despite being inherent entities of the publishing landscape, early bird manuscripts represent only a small fraction of the total published works; for example, about 24% of manuscripts in our May 2021 data set could be considered early birds. We found that the mean Altmetric Attention Score for early bird manuscripts related to the SARS‐CoV‐2 research exceeds threefold that for the remainder of manuscripts on the same topic (393.4 vs. 145.3, respectively). The latter explains our observation of negative pre‐submission delays for early bird manuscripts (Fig. 1), an indication that their authors prefer posting them on preprint servers after their submission to journals, likely, in an attempt to avoid scooping (Anderson, 2019); for example, among the most recently posted preprints (1 February–31 March 2021), 51% have t < 0 as compared with 24% of preprints posted during the 1 April–31 December 2020 period. The practice of promoting the visibility of important research findings by publishing them rapidly, certainly, is not new and has been exploited by a number of medical (CMAJ, 2001; Ghali & Cornuz, 2000; Goldbeck‐Wood & Robinson, 1999; Kassirer & Angell, 1997; McNamee & Horton, 1997; Winker & Fontanarosa, 1999) and multidisciplinary high‐profile journals, among which are Science (Science, 2021) and Nature (AOP Nat Immunol, 2001; AOP Nat Med, 2002a; AOP Nat Struct Biol, 2002b), in as early as 1997. For example, prior to the coronavirus pandemic, selected articles submitted to JAMA were published in 10 to 12 days (Bauchner et al., 2020); and a study identifying the cellular receptor for anthrax toxin submitted to Nature was reviewed in 19 days and posted on the Web in 13 days (Bradley et al., 2001); to name a few. During the early months of the coronavirus pandemic, these pre‐existing fast‐track publication practices proved extremely useful in disseminating the critical knowledge on what was in January simply known as the ‘novel coronavirus’. The subtle presence of early bird manuscripts in the publishing landscape becomes most apparent either at the outset of a newly emerging event or in the most recent data; the latter explains the shortening of publication delays as data analysis approaches the most recent dates (Scheme 2). The temporal bias that early bird manuscripts introduce into publication data, observed in shorter than usual publication delays, is normally continuously compensated by new manuscripts that took a long time to be published. However, the sudden onset of the coronavirus pandemic meant that prior to 1 January 2020, no COVID‐19 publications existed and early bird manuscripts became then the dominant species of the publishing landscape; indeed, the number of COVID‐19 manuscripts reviewed in less than a week was 59% in April 2020 (Palayew et al., 2020), but only 4% by the end of 2020. Along this line of thought, we suggest that short publication delays for COVID‐19 manuscripts during the early pandemic are primarily the result of the inadvertent bias introduced by early bird manuscripts, namely the early bird effect (Scheme 2). The latter is a permanent feature of the publishing landscape as early bird COVID‐19 manuscripts will forever remain the only publication entities during the early months of the pandemic.

SCHEME 2

The early bird effect. Data set 1 is collected earlier than data set 2.

The early bird effect. Data set 1 is collected earlier than data set 2. The early bird effect is not unique to the coronavirus pandemic but, we believe, is associated with any new and ‘hot’ topic. As an example, we verified that Zika‐related manuscripts were published much faster during the onset of Zika outbreak (mean (T P) < 54 days during February–April 2016) as compared to the following 18 months of the Zika virus epidemic (66 days < mean (T P) < 182 days) (Fig. 2).

FIGURE 2

Average publication delays (T P) by month (N > 4) for a total of 482 peer‐reviewed Zika manuscripts (of which 20% had preprint analogs) published between 1 February 2016 and 31 August 2017.

Average publication delays (T P) by month (N > 4) for a total of 482 peer‐reviewed Zika manuscripts (of which 20% had preprint analogs) published between 1 February 2016 and 31 August 2017. Our study further suggests that the observed steady growth of publication delays following the early months of the coronavirus pandemic is not specifically related to a slowdown in scientific publishing, rather it demonstrates a natural evolution of the publishing landscape upon the emergence of a new high‐impact topic. Over time, new arrivals, together with the initially present early bird manuscripts, form a complete publication data set capable of representing the diverse collection of journal articles. To evaluate the length of this stabilization period for COVID‐19 publication data set, we referred to the data completeness calculated in Table 1. We found that in the October 2020 data set (Sevryugina & Dicks, 2021b), the last 9 months of data, February to October 2020, were subject to data incompleteness with the amount of missing data exceeding 20% (Table 1). Similarly, our May 2021 data set displays a reduction in publication delays starting in August 2020, 9 months prior to the data collection date on 4 May 2021 (Fig. 2). Taking into account the previous studies that showed that as the amount of missing data approaches 20%, the time‐dependent bias becomes likely inconsequential (Derrick et al., 2017; Schlomer et al., 2010), we trimmed off the last 9 months of publication data—1 August 2020 to 31 March 2021—which maintained 68.5% of the initial May 2021 data set. The resulting data are time‐independent and bias‐free as it now represents a collection of manuscripts with various publication experiences. The revised publication delays (Table 2 and Fig. 3) display only a moderate expediting in publishing COVID‐19 works as compared to the pre‐pandemic period. While the median peer‐review time for COVID‐19 manuscripts (t = 66 days) is smaller as compared to the 100‐day benchmark established prior to the pandemic (Powell, 2016), it is significantly longer than that reported in April 2020 (t = 6 days; Kun, 2020; Palayew et al., 2020) or in September 2020 (t = 37 days; Sevryugina & Dicks, 2021b). Likewise, while the mean production stage for COVID‐19 manuscripts (t = 20.4 days) is 86% faster than that almost a decade ago (t = 146.6 days; Björk & Solomon, 2013), it is 40% slower than that in September 2020 (t = 14.6 days; Sevryugina & Dicks, 2021b) and 55% slower than that in March 2020 (t = 9.3 days; Horbach, 2020). Our current data show that the entire transformation from a COVID‐19 preprint to a peer‐reviewed journal article, median elapsed time, took 109.5 days, which is 1.6 times longer than the 68 days found in October 2020 (Fraser et al., 2021), and almost five times longer than the 22.5 days reported in April 2020 (Fraser et al., 2020), but significantly shorter than the 150 days reported for Zika or Ebola related preprints (Johansson et al., 2018). Clearly, biomedical journal publishers succeeded in expediting publishing of COVID‐19 works, but the magnitude of this success was rather mild.

TABLE 2

Revised publication delays (in days) for COVID‐19 manuscripts and their preprint versions

Publication delays	Symbol	Mean	SD	Median	IQR	Mode	N
Pre‐submission	t _α	22.6	43.9	7	37	0	2,367
Review	t _R	79.3	59.2	66	79	15	2,312
Production	t _β	20.4	20.1	15	21	4	2,338
Elapsed time	T _Σ	120.3	77.2	109.5	116	42 and 51	2,760

Note: Descriptive statistics reflects the preprint posting period 1 January–31 July 2020.

FIGURE 3

Revised publication delays (in days) for COVID‐19 manuscripts and their preprint versions. Box plot displays data for the preprint posting period 1 January–31 July 2020.

Revised publication delays (in days) for COVID‐19 manuscripts and their preprint versions Note: Descriptive statistics reflects the preprint posting period 1 January–31 July 2020. Revised publication delays (in days) for COVID‐19 manuscripts and their preprint versions. Box plot displays data for the preprint posting period 1 January–31 July 2020.

CONCLUSION

In summary, we observed that publication delays for COVID‐19 manuscripts were definitely improved, thanks to various initiatives launched after a few months into the pandemic, but this advance was only incremental. We showed that the alarmingly short publication delays reported during the early pandemic are a result of the early bird effect, an unrecognized bias in publication data by fast‐tracked manuscripts that dominated the publishing landscape during the first 2 months of the pandemic. These delays, however, are in no case representative of the whole period of the coronavirus pandemic and do not reflect the publication delays of the entire collection of the published COVID‐19 research articles. Our revised publication delays present no reasons for concern in regard to the deterioration of scientific literature, at least based on how fast the COVID‐19 works were published. In fact, we believe, the scientific community successfully withheld just another test by skillfully employing the previously developed tools of scholarly communication and data sharing. Finally, we urge scientometrics researchers to consider the early bird effect and data completeness when analysing the publication data.

AUTHOR CONTRIBUTIONS

YS conceived the project and wrote the article; YS and AD developed the methodology, performed searches, study assessments, data extraction, and created the images. Appendix S1. Supporting Information. Click here for additional data file.

33 in total

1. Best practices for missing data management in counseling psychology.

Authors: Gabriel L Schlomer; Sheri Bauman; Noel A Card
Journal: J Couns Psychol Date: 2010-01

2. Will the pandemic permanently alter scientific publishing?

Authors: Ewen Callaway
Journal: Nature Date: 2020-06 Impact factor: 49.962

3. "Questionable" peer review in the publishing pandemic during the time of COVID-19: implications for policy makers and stakeholders.

Authors: Francesco Chirico; Jaime A Teixeira da Silva; Nicola Magnavita
Journal: Croat Med J Date: 2020-07-05 Impact factor: 1.351

4. Preprint servers: a 'rush to publish' or 'just in time delivery' for science?

Authors: Alan Robert Smyth; Claire Rawlinson; Gisli Jenkins
Journal: Thorax Date: 2020-04-20 Impact factor: 9.139

5. Tracking the popularity and outcomes of all bioRxiv preprints.

Authors: Richard J Abdill; Ran Blekhman
Journal: Elife Date: 2019-04-24 Impact factor: 8.140

6. Covid-19 - Navigating the Uncharted.

Authors: Anthony S Fauci; H Clifford Lane; Robert R Redfield
Journal: N Engl J Med Date: 2020-02-28 Impact factor: 91.245

7. Day-to-day discovery of preprint-publication links.

Authors: Guillaume Cabanac; Theodora Oikonomidi; Isabelle Boutron
Journal: Scientometrics Date: 2021-04-18 Impact factor: 3.238

8. Non-steroidal anti-inflammatory drug use and outcomes of COVID-19 in the ISARIC Clinical Characterisation Protocol UK cohort: a matched, prospective cohort study.

Authors: Thomas M Drake; Cameron J Fairfield; Riinu Pius; Stephen R Knight; Lisa Norman; Michelle Girvan; Hayley E Hardwick; Annemarie B Docherty; Ryan S Thwaites; Peter J M Openshaw; J Kenneth Baillie; Ewen M Harrison; Malcolm G Semple
Journal: Lancet Rheumatol Date: 2021-05-07

9. Searching PubMed to Retrieve Publications on the COVID-19 Pandemic: Comparative Analysis of Search Strings.

Authors: Jeffrey V Lazarus; Adam Palayew; Lauge Neimann Rasmussen; Tue Helms Andersen; Joey Nicholson; Ole Norgaard
Journal: J Med Internet Res Date: 2020-11-26 Impact factor: 5.428

10. COVID-19 research: pandemic versus "paperdemic", integrity, values and risks of the "speed science".

Authors: Ricardo Jorge Dinis-Oliveira
Journal: Forensic Sci Res Date: 2020-06-10