| Literature DB >> 36177198 |
Michael Gordon1, Michael Bishop2, Yiling Chen3, Anna Dreber4,5, Brandon Goldfedder6, Felix Holzmeister5, Magnus Johannesson4, Yang Liu7, Louisa Tran8, Charles Twardy8,9, Juntao Wang3, Thomas Pfeiffer1.
Abstract
Many publications on COVID-19 were released on preprint servers such as medRxiv and bioRxiv. It is unknown how reliable these preprints are, and which ones will eventually be published in scientific journals. In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric score. Most of these preprints were published within 1 year of upload on a preprint server (70%), with a considerable fraction (45%) appearing in a high-impact journal with a journal impact factor of at least 10. On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after 1 year and if the publishing journal has high impact. Forecasts are also informative with respect to Google Scholar citations within 1 year of upload on a preprint server. For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than traditional peer-review, it remains to be investigated if such an assessment is suited to identify methodological problems in preprints.Entities:
Keywords: forecasting; preprinting; science policy
Year: 2022 PMID: 36177198 PMCID: PMC9515639 DOI: 10.1098/rsos.220440
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 3.653
Figure 1Forecasts of publication outcomes: (a) shows the fraction of published preprints depending on the mean forecasted probability. Of the preprints with a lowest forecasted probability of being published within 1 year (lowest quantile), about 30% were published. For the highest quantile, this fraction is about 60%. Publication outcomes are correlated with mean forecasts (r = 0.23, t245 = 3.69, p < 0.001). Preprints for which the publication status was known during the forecasting period were excluded from this figure; (b) shows the fraction of preprints published in high-impact journals depending on the corresponding forecast. Of the preprints with the lowest forecasted probability (lowest quantile) of being published in a high-impact journal, only about 15% were published in a high-impact journal. For the highest quantile, this fraction is over 60%. High-impact publication is correlated with predicted high-impact publication (r = 0.38, t122 = 4.58, p < 0.001). Unpublished preprints and preprints for which the publication status was known during the forecasting period were excluded from this figure.
Figure 2Actual citation ranks versus forecasted citation ranks. This plot demonstrates the relationship between mean forecasted citation ranks and actual citation ranks. The colour of the markers indicates the publication status (irrespective of the impact factor of the journal), with blue markers indicating published preprints and red markers indicating preprints not published within 1 year. While the mean forecasts are highly correlated with realized citation ranks (r = 0.75, t398 = 22.36, p < 0.001), the aggregated forecasts are not extreme enough with few preprints forecasted to be ranked below 25 or above 75.
Pairwise Pearson correlation coefficients of survey questions and outcomes. The order of questions 3 and 4 was randomized so that some participants always saw question 3 first and some always saw question 4 first. We found evidence for order effects for question 3 and so included all answers for question 3 (labelled ‘Q3—agreement with other papers' in the table) and split by whether it was asked first or second.
| Q1: not published | Q1: published (IF < 10) | Q1: published (IF > 10) | combined published forecast (any IF) | Q2: cite Rank | Q3: agreement with other papers | Q3: agreement with other papers (asked first) | Q3: agreement with other papers (asked second) | Q4: helpful | publication outcome | published (IF > 10) | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Q1: published (IF < 10) | –0.71 ( | ||||||||||
| Q1: published (IF > 10) | –0.73 ( | 0.05 ( | |||||||||
| combined published forecast (any IF) | –1.00 ( | 0.71 ( | 0.73 ( | ||||||||
| Q2: cite rank | –0.78 ( | 0.44 ( | 0.68 ( | 0.78 ( | |||||||
| Q3: agreement with other papers | –0.64 ( | 0.50 ( | 0.43 ( | 0.64 ( | 0.52 ( | ||||||
| Q3: agreement with other papers (asked first) | –0.53 ( | 0.38 ( | 0.38 ( | 0.53 ( | 0.50 ( | 0.90 ( | |||||
| Q3: agreement with other papers (asked second) | –0.56 ( | 0.45 ( | 0.35 ( | 0.56 ( | 0.35 ( | 0.76 ( | 0.46 ( | ||||
| Q4: helpful | –0.70 ( | 0.36 ( | 0.65 ( | 0.70 ( | 0.69 ( | 0.62 ( | 0.56 ( | 0.48 ( | |||
| publication outcome | –0.23 ( | 0.21 ( | 0.13 ( | 0.23 ( | 0.25 ( | 0.36 ( | 0.32 ( | 0.33 ( | 0.23 ( | ||
| published (IF > 10) | –0.23 ( | 0.02 ( | 0.31 ( | 0.23 ( | 0.41 ( | 0.30 ( | 0.29 ( | 0.23 ( | 0.33 ( | 0.45 ( | |
| actual cite rank | –0.54 ( | 0.31 ( | 0.47 ( | 0.54 ( | 0.75 ( | 0.43 ( | 0.43 ( | 0.27 ( | 0.50 ( | 0.41 ( | 0.54 ( |