Literature DB >> 35341949

The currency and completeness of specialized databases of COVID-19 publications.

Robyn Butcher1, Margaret Sampson2, Rachel J Couban3, James Edward Malin4, Sara Loree5, Stacy Brody6.   

Abstract

OBJECTIVE: Several specialized collections of COVID-19 literature have been developed during the global health emergency. These include the WHO COVID-19 Global Literature Database, Cochrane COVID-19 Study Register, CAMARADES COVID-19 SOLES, Epistemonikos' COVID-19 L-OVE, and LitCovid. Our objective was to evaluate the completeness of these collections and to measure the time from when COVID-19 articles are posted to when they appear in the collections. STUDY DESIGN AND
SETTING: We tested each selected collection for the presence of 440 included studies from 25 COVID-19 systematic reviews. We sampled 112 journals and prospectively monitored their websites until a new COVID-19 article appeared. We then monitored for 2 weeks to see when the new articles appeared in each collection. PubMed served as a comparator.
RESULTS: Every collection provided at least one record not found in PubMed. Four records (1%) were not in any of the sources studied. Collections contained between 83% and 93% of the primary studies with the WHO database being the most complete. By 2 weeks, between 60% and 78% of tracked articles had appeared.
CONCLUSION: Our findings support the use of the best performing COVID-19 collections by systematic reviews to replace paywalled databases.
Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  COVID-19 publication; Completeness; Currency; Database; Evaluation; PubMed

Mesh:

Year:  2022        PMID: 35341949      PMCID: PMC8942908          DOI: 10.1016/j.jclinepi.2022.03.006

Source DB:  PubMed          Journal:  J Clin Epidemiol        ISSN: 0895-4356            Impact factor:   7.407


Multiple COVID-19 collections, including COVID-19 literature, have been developed during the global health emergency. These include the WHO COVID-19 Global Literature Database, Cochrane COVID-19 Study Register, CAMARADES COVID-19 SOLES, Epistemonikos’ COVID-19 L-OVE, and LitCovid were evaluated for both their completeness and currency of COVID-19 literature, using PubMed as a comparator. The majority of early COVID-19 systematic reviews rely on traditional literature databases suggesting a lack of awareness or comfort with COVID-19 collections. Per our analysis, most of the COVID-19 collections are comparable to PubMed in both completeness and currency. This study suggests that COVID-19 collections could replace or at least supplement traditional paywalled databases in systematic reviews. Those seeking Covid-19 information should search the Epistemonikos and WHO collections.

Introduction

COVID-19 became the dominant subject of the scientific and medical literature in 2020. Finding the latest research has become a time-consuming task for decision-makers and stakeholders as the volume of literature continues to grow. To address this problem, several groups have developed specialized collections of COVID-19 literature. These include the WHO COVID-19 Global literature on coronavirus disease, The Cochrane Collaboration’s COVID-19 Study Register, CAMARADES COVID-19-SOLES, Epistemonikos’ COVID-19 L-OVE, and the United States National Library of Medicine’s LitCovid. Every collection is curated, and the inclusion criteria differ between the databases. However, the purpose of all collections is to create centralized repositories of COVID-19 literature for simplified access and discovery. Each collection has a different methodology for identifying material to include. All involve regular human and/or machine searching and screening of bibliographic databases such as PubMed, Embase, as well as preprint sites, journal publishers’ websites, and in some cases, other collections such as the CDC and WHO COVID-19 websites. Except for LitCovid, all include preprints. All the collections describe their methodology on their websites. Those undertaking evidence synthesis projects can search any of these specialized collections along with standard methods such as searching bibliographic databases, study registries, checking reference lists, and contacting researchers. Or, if their currency and completeness could be demonstrated, a COVID-19 collection could be a single source for rapid evidence synthesis. Our research question asks, “how current and how complete are the COVID-19 collections?”

Objectives

To evaluate the completeness of the collections. To measure the time from when COVID-19 articles are posted by journals on the journal website to when they are retrievable from the specialized collections.

Methods

This study is divided into two parts: currency and completeness.

Selection of Covid-19 collections for study

The following collections were selected for study: WHO COVID-19 Global Literature Database (WHO), COMRADES COVID-19 SOLES (SOLES), Cochrane COVID-19 study register (Cochrane), United States National Library of Medicine’s LitCovid (LitCovid) and Epistemonikos’ COVID-19 L-OVE (Epistemonikos). The criteria for selection were that the collections had to be established and functional as of April 2020, when we started the study. COVID-19 must be the primary focus of the databases. Resources had to have at least a basic search function. Living Systematic Reviews or datasets that only allowed browsing were excluded. PubMed was also searched as a comparator.

Completeness

Methods are based on proven methods for studying database completeness [[1], [2], [3], [4], [5]] and the methodology used in a recent Cochrane COVID-19 Study Register Sensitivity Evaluation [6]. We used the included studies from systematic reviews that met the eligibility criteria described below until we had gathered a sample of 500 references. These references were then searched in each of the COVID-19 specialized collections under study to determine how many were contained in each collection. The systematic reviews were selected from the Epistemonikos’ COVID-19 L-OVE. Any study classified as a systematic review by Epistemonikos was eligible if it otherwise met the criteria, regardless of the terminology used by the authors (eg, such as meta-analyses, qualitative reviews, or rapid reviews), and regardless of the language of publication. We sought a mix of interventional, diagnostic, prediction, and observational reviews as described in the selection process below. For being eligible, systematic reviews must have reported searching at least four databases. A minimum of five included studies from each review must have been published or posted in 2020. For reviews with fewer than five included studies, all must have been published or posted in 2020. Note that we completed this portion of the study in 2020.

The selection process for systematic reviews

Systematic reviews were considered in sequence as they were listed in each of the five Epistemonikos categories (Prevention/Treatment, Diagnosis, Etiology, Epidemiology, and Prognosis). Once an eligible review was identified, the count of its eligible included primary studies was made. A review was then selected from the next category. This continued until 500 primary studies were identified. Five reviewers (R.B., M.S., R.C., J.M., S.L.) screened reviews and extracted primary studies. If ineligible, the main exclusion reason was noted. In all cases, the sources that were searched and the number of studies included in the review were recorded. Once 100 primary studies were identified from each category, the sets were examined for duplicate primary studies.

Sample of primary studies

Based on the Epistemonikos’ COVID-19 L-OVE, we estimated that there were 8,778 primary studies relevant to COVID-19 as of July 24, 2020. A 5% sample would have been 440 studies. We screened systematic reviews until we identified 500 primary studies with the expectation that there would be duplicate studies among the systematic reviews.

Eligibility criteria for primary studies

Studies must have been cited as an included study in one of the systematic reviews selected. Studies must have been primary studies but could be in published articles, preprints, conference abstracts, or trial registrations. No restrictions were imposed based on the language of publication. Publications without primary data were excluded. This included protocols or trial registrations that did not report results. Smartphone applications or datasets that were included in systematic reviews were excluded. If a systematic review included more than 50 primary studies, the first 50, as they appeared in the reference list, were selected.

Determination of inclusion of primary studies of interest in the Covid-19 collections

Primary studies were assigned to searchers who searched for each of the studies in each specialized COVID-19 collection and recorded its presence or absence. Searchers were drawn from the registered searchers of the Librarian Reserve Corps [7] and are named in section 6.0. Searchers could search using the DOI by a search string composed of author and title fragments or any other combination of features the searchers determined was suitable for the study and collection being searched. The searchers could make more than one attempt to retrieve the items, until satisfied that they were present or not in the collection. If a study was found to be present in that collection, the date added to the collection was recorded when available. This work occurred in December 2020. See protocol section “Determination of inclusion in COVID-19 collection” for more detail.

Outcome measures

The primary outcome measure was completeness, defined as the proportion of relevant primary studies found in a collection. The denominator was adjusted for the Cochrane collection as its inclusion criteria were narrower than those of the other collections. Each record not found in the Cochrane collection was assessed by one investigator (RB) and reviewed by a second investigator (MS) to determine if it was in the scope of that collection. The inclusion status of each selected study was shared with the database creators to allow them to audit any missed studies as a quality control measure.

Currency

We sought a sample of 50 newly appearing studies to track in the databases. One hundred and twelve journals were selected and tracked daily. Once a newly released article (preprint or final) relevant to the COVID-19 pandemic was identified on a journal website, tracking of that journal was halted. The identified article was then searched for in each of the collections each weekday until it appeared or until a 2-week observation period had ended. This process continued until we accrued 50 studies.

Journal selection

Journals were selected from a list created by searching Web of Science with default settings and the search string “COVID-19”. Web of Science, all segments, was chosen to ensure that all aspects of the COVID-19 pandemic, including economic and social aspects, were included. This search yielded 64,032 records on January 5, 2021. The Web of Science “analyze” feature was used to obtain a list of journals sorted by the number of publications. We retained journals that, when sorted by the number of publications, accounted for two-thirds of the total number of publications. This equated to a cut point of 16 or more articles. Note that we included all journals with 16 or more articles, although this put us slightly over the two-thirds mark (66.98%). To select 50 journals from this set, we selected every 11th journal, sorted by the number of articles retrieved from that journal. We added a margin of 10 journals (20%) in anticipation that some of these journals would not publish another COVID-19 article within 30 days. When this did not yield enough studies, we drew a second sample of 50 journals, starting at the seventh journal in the original list and selecting every 11th from that point (Appendix 1).

Article identification

The selected journals were divided into sets of six journals and distributed to 10 searchers. Five searchers similarly tracked the second sample following the end of the first observation period. This work occurred from January to April 2021.

Article eligibility criteria

For being eligible, the article must deal with COVID-19 or a topic relevant to the COVID-19 pandemic response, such as personal protective equipment, and present the results of a primary study. The article must either be a preprint in peer review or an article accepted by the journal, whether in manuscript or published form and have a DOI. Protocols, editorials, commentaries, discussion papers, and guidance articles that do not present primary results were excluded. There were no restrictions based on the language of publication.

Appearance in Covid-19 collections

Once a relevant article was identified, the searcher stopped checking the journal site and began tracking that article daily (weekdays only) until its date of appearance in each of the collections or until the end of the observation period, 2 weeks from appearance. Beyond 2 weeks, the collection was deemed to be “not current” for that article; otherwise, the date of appearance was recorded.

Meta-data collected

Upon completion of tracking, articles were classified by two investigators (JM, MS), reaching a consensus on the dimensions of the population studied (human/animal/other), and for clinical studies, the study design and question type: therapy (prevention/treatment), diagnosis, etiology, epidemiology, prognosis or other.

Results

One hundred and one systematic reviews were examined to identify 25 eligible systematic reviews. Characteristics of the systematic review that were screened and selected are presented in Table 1 . Sources searched in the reviews are shown in Table 2 . PubMed was the most frequently searched source, used in 62 of the 101 systematic reviews examined. Of the specialized collections, WHO or CDC was searched in 13, and Epistemonikos was searched in 3. Lit-Covid, Cochrane, and SOLES were not searched by any of the systematic reviews examined. The inclusion rate of primary studies in each COVID-19 collection ranged from 93.2% for Epistemonikos to 83.4% for the SOLES collection (Table 3).
Table 1

Characteristics of systematic reviews screened

Question typeEligible N = 25Ineligible N = 76Total N = 101
Prevention/Treatment94958
Etiology51015
Diagnosis4610
Epidemiology448
Prognosis3710
N of sources searched (Median, Q1, Q3)5 (4, 6)3 (2, 5)4 (3, 5)
N of included studies (Median, Q1, Q3)18 (9, 40)9 (0, 19)11 (2, 22)
Table 2

Sources searched by screened systematic reviews (N = 101)

DatabasesSearch engines and platforms
PubMed62 Google Scholar29
Embase45 Google2
MEDLINE29 OVID2
Web of Science25
Cochrane Library18Preprint Servers
WHO or CDCa13 MedXriv14
China National Knowledge Infrastructure (CNKI)10 BioRxiv4
Wan-fang database9 Research Square1
CINAHL7 preprints.org1
CENTRAL6 Unspecified preprint server1
Medline/PubMed5
Chinese Biomedical Literature Databases4Trial Registries
Chinese Scientific Journal Database (VIP)4 Clinicaltrials.gov6
LILACS4 Chinese clinical trials registry1
Epistemonikos’ COVID-19 L-OVE3 Cochrane COVID-19 Study register1
PsycINFO2 EudraCT1
PubMed Central2 ISRCTN registry1
Academic Search Premier1
China Academic Literature Database1Journals
Global Health1 BMJ1
PEDro1 Cells1
SciELO1 JAMA1
Toxline1 Lancet1
 Nature1
Publishers New England Journal of Medicine1
Science Direct (Elsevier)5 Science1
Springer Nature1
Wiley Online Library1Sources not stated11

It was not always clear if the agency’s COVID-19 database was searched or if it was a general search of the agency website.

Table 3

Completeness–inclusion rate of primary studies in each COVID-19 collection

CollectionRecords in scopeRecords found
N%
Epistemonikos44041093.2
WHO44040592.0
LitCovid44039990.7
PubMed44038587.5
SOLES44036783.4
Cochrane COVID-19 study register40735888.0

Four publications were not found in any of the collections.

Characteristics of systematic reviews screened Sources searched by screened systematic reviews (N = 101) It was not always clear if the agency’s COVID-19 database was searched or if it was a general search of the agency website. Completeness–inclusion rate of primary studies in each COVID-19 collection Four publications were not found in any of the collections. Following the removal of duplicates, 440 primary studies were searched to determine their inclusion status in each of the COVID-19 collections. The study flow is illustrated in Figure 1 .
Fig. 1

Prisma flow diagram for identification of primary studies for completeness.

Prisma flow diagram for identification of primary studies for completeness. We examined all possible pairs of databases and looked at the overlap and unique publications. In any pairing, both databases contributed records not found in the other. This ranged from five records found in COVID-19 SOLES that were not found in WHO to 60 records found in Epistemonikos’ COVID-19 L-OVE that were not found in COVID-19 SOLES. Of greater practical interest is the gain from each database beyond what was available from PubMed (Table 4 ).
Table 4

Records found beyond those available through PubMed

CollectionEpistemonikosWHOLitCovidSOLESCochrane
Epistemonikos464141
WHO740412
LitCovid2929192510
SOLES6715346
Cochrane191811624
Records found beyond those available through PubMed Illustrated are the absolute and relative yields of publications of interest that were not included in PubMed. The diagonal represents the number of records found in that database and not found in PubMed. Other cells in the table show the unique records in the column database relative to the database shown in the row. As an example, Table 4 shows that Epistimonikos has 46 records not found in PubMed. Seven of these were not found in the WHO database, and 29 were not found in LitCovid. Looking at the next column, the WHO had 40 records not found in PubMed. Of these, four were not found in Epistimonikos, 29 were not found in LitCovid, and so on. Six publications of interest were found in only one of the collections studied. Epistemonikos had four of these unique publications, and Cochrane and PubMed had one unique publication each. One hundred and twelve journals were monitored to yield eligible studies published by fifty journals within the monitoring period. Characteristics of primary studies. Forty-four of the 50 studies (88%) were human studies, one (2%) involved both humans and animals, and five (10%) were laboratory, simulation, or health systems studies. Forty-one (82%) of the articles pertained to clinical studies. We classified those according to the type of question asked and the study design used. The largest group of clinical studies involved epidemiological questions (17 studies, 41.5% of clinical studies), followed by prevention or treatment (10, 24.4%) and etiology or harms (8, 19.5%) (Table 5 ).
Table 5

Question type of the studies used to determine the currency of the collections

Question typeN%
Prevention or treatment1024.4
Epidemiology1741.5
Etiology/harms819.5
Prognosis512.2
Diagnosis12.4
Question type of the studies used to determine the currency of the collections See supplemental material for the study designs found in the sample of studies (Table A4). We considered a collection to be current for an article if that article appeared in the collection within 2 weeks of appearing on the journal website. Our a priori primary outcome is the proportion of in-scope tracked articles identifiable in a collection within 3 days of appearance at the journal site. The proportion available at 1, 5, 7, 10, and 14 days is also reported by collection (Table 6 ). For our primary outcome, PubMed had 27/50 articles (74%) at day 3, substantially more than any other collection examined. The WHO collection included 72% of our test articles within 2 weeks but accrued these articles more slowly than PubMed. The median time to the appearance of these articles was 5 days for the WHO collection. SOLES showed the slowest accrual, with a median of 8 days to appearance, and it contained only 60% of the target articles by the 2-week mark (Table 6).
Table 6

Currency and accrual rate of test articles tracked to appearance in each collection

OutcomeEpistemonikosWHOLitCovidSOLESCochranePubMed
N Currenta32/4867%36/5072%36/5072%30/5060%28/4168%38/5076%
Median lagb (1st, 3rd quartile)4 [1, 7]5 [2, 6.5]4 [2, 7]8 [6.2, 9.8]6 [3, 8]2 [0, 4]
Number present after
 1 day555107
 3 daysc1615162927
 5 days22252341330
 7 days243328121835
 10 days283635232437
 14 days323936302838

Denominator corrected. Epistemonikos identified two of the target articles as “excluded.” Nine of the target articles were assessed as out of scope for the Cochrane collection by one investigator (RB) and verified by a second investigator (MS).

Calculated using articles that appeared within the 2-week monitoring period.

Proportion of in-scope tracked articles identifiable in a collection within 3 days of appearance at the journal site was our primary outcome for the currency portion.

Currency and accrual rate of test articles tracked to appearance in each collection Denominator corrected. Epistemonikos identified two of the target articles as “excluded.” Nine of the target articles were assessed as out of scope for the Cochrane collection by one investigator (RB) and verified by a second investigator (MS). Calculated using articles that appeared within the 2-week monitoring period. Proportion of in-scope tracked articles identifiable in a collection within 3 days of appearance at the journal site was our primary outcome for the currency portion.

Discussion

Considering completeness, none of the specialized collections studied here included all publications of interest. Three of the six collections had unique records, but only six records were unique to one source, and only one source had more than one unique record. The contribution of each database beyond material found in PubMed is important. PubMed indexed 87.5% of publications of interest. It is the benchmark against which other collections must be measured when considering completeness. It has the best-developed search features, and thus, PubMed records may be much easier to retrieve through subject searching than those found in the specialized collections. We tested for the presence of records using known-item searching by skilled searchers drawn from the Librarian Reserve Corps. Presence in the collection does not assure that a record will be retrieved. Factors of interface features and searcher skill will limit retrieval success. Tests of retrieval through subject searching would be necessary to fully assess the functionality of these collections, and this is beyond the scope of this study. However, our searchers reported usability issues with some of the collections, so we tabulated their basic features (Appendix 3). Searching involves a trade-off between recall of relevant items and the number needed to read (NNR) to identify one relevant record. A search of PubMed may retrieve many records not relevant to the question at hand because of its breadth of coverage. Special collections have the benefit of very high precision and a low NNR. The ideal combination of sources depends on factors such as subject coverage, access to the source, and the skill of the searcher. All specialized collections studied here contained more than 80% of the publications of interest. Yet, except for PubMed, they were rarely, if ever, used as sources in early COVID-19 systematic reviews (Table 2). This may have been due to a lack of awareness or confidence in their completeness. Our findings support their use, and they could replace paywalled databases often used to conduct systematic reviews. All sources studied here are open access, while four of the five databases searched most often in our sample of systematic reviews, shown in Table 2, are available only through subscription. Special COVID-19 collections were not able to identify and include articles faster than PubMed. The next most complete collection at the 3-day point was LitCovid, a derivative of PubMed produced by the National Library of Medicine, followed by Epistemonikos, WHO, Cochrane, and finally SOLES. Measured at 2 weeks, WHO was current for the most articles, followed by PubMed, LitCovid, Epistemonikos, Cochrane, and SOLES, in that order. Comparing this to the results from the completeness portion of this study, where the order (from most complete to least complete) was Epistemonikos, WHO, LitCovid, PubMed, SOLES, and Cochrane, we see some differences in the order, but with strong performances by the first four databases in both currency and completeness. There have been other studies that evaluated the completeness and currency of special COVID-19 collections that looked at one or two of the collections but not all five. Pierre et al. examined the sensitivity of Cochrane and Epistemonikos [8]. They found similar results with 88% accuracy in Cochrane for RCTs, 82% for observational studies. Epistemonikos had 100% accuracy in both study types [8]. Verdugo-Paiva et al. only looked at Epistemonikos but evaluated both the comprehensiveness and currency, finding comprehensiveness of 100% and the currency of 96.4% [9]. These two studies used different methodologies than this one but came to similar conclusions about the usefulness of these special collections in conducting systematic reviews. All of these evaluations reflect performance at a point in time. As with any source, users must be alert to changes in coverage, indexing practices, and timeliness. Preprints became a very important means for COVID-19 researchers to disseminate findings [10]. The indexing of preprints is reflected in our completeness results to the extent that they were included in the systematic reviews sampled. While we studied the speed of indexing of articles rather than preprints for the currency portion, all collections (but COVID-19 SOLES) actively monitored preprint sites in the same manner as journal sites, so we expect that the speed of inclusion would be similar.

Limitations

This study has three main limitations. First, both the currency and completeness results represent a particular point in time. Collections may have changed their procedures and the resources allocated to collection maintenance since we took our measurements. For example, COVID-19 SOLES has not updated its database since October 2021. Second, several studies that were marked as not found in the databases were confirmed to be there by the developers of the collections after the fact. These discrepancies may be due to challenges some of the searchers had with the search interfaces/functions of the different databases. Finally, the completeness study is retrospective.

Conclusions

Open-access special collections are an excellent resource for those looking for comprehensive and up-to-date sources. Experienced searchers may prefer PubMed.
  3 in total

1.  COVID-19 Living OVerview of Evidence repository is highly comprehensive and can be used as a single source for COVID-19 studies.

Authors:  Francisca Verdugo-Paiva; Camilo Vergara; Camila Ávila; Javier A Castro-Guevara; Josefina Cid; Valeria Contreras; Iván Jara; Valentina Jiménez; Min Ha Lee; Magdalena Muñoz; Ana María Rojas-Gómez; Pablo Rosón-Rodríguez; Karen Serrano-Arévalo; Iván Silva-Ruz; Juan Vásquez-Laval; Paula Zambrano-Achig; Giovanna Zavadzki; Gabriel Rada
Journal:  J Clin Epidemiol       Date:  2022-05-19       Impact factor: 7.407

2.  Antiviral drug treatment for nonsevere COVID-19: a systematic review and network meta-analysis.

Authors:  Tyler Pitre; Rebecca Van Alstine; Genevieve Chick; Gareth Leung; David Mikhail; Ellen Cusano; Faran Khalid; Dena Zeraatkar
Journal:  CMAJ       Date:  2022-07-25       Impact factor: 16.859

3.  Characteristics of Living Systematic Review for COVID-19.

Authors:  Zhe Chen; Jiefeng Luo; Siyu Li; Peipei Xu; Linan Zeng; Qin Yu; Lingli Zhang
Journal:  Clin Epidemiol       Date:  2022-08-04       Impact factor: 5.814

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.