Literature DB >> 35341949

The currency and completeness of specialized databases of COVID-19 publications.

Robyn Butcher¹, Margaret Sampson², Rachel J Couban³, James Edward Malin⁴, Sara Loree⁵, Stacy Brody⁶.

Abstract

OBJECTIVE: Several specialized collections of COVID-19 literature have been developed during the global health emergency. These include the WHO COVID-19 Global Literature Database, Cochrane COVID-19 Study Register, CAMARADES COVID-19 SOLES, Epistemonikos' COVID-19 L-OVE, and LitCovid. Our objective was to evaluate the completeness of these collections and to measure the time from when COVID-19 articles are posted to when they appear in the collections. STUDY DESIGN AND
SETTING: We tested each selected collection for the presence of 440 included studies from 25 COVID-19 systematic reviews. We sampled 112 journals and prospectively monitored their websites until a new COVID-19 article appeared. We then monitored for 2 weeks to see when the new articles appeared in each collection. PubMed served as a comparator.
RESULTS: Every collection provided at least one record not found in PubMed. Four records (1%) were not in any of the sources studied. Collections contained between 83% and 93% of the primary studies with the WHO database being the most complete. By 2 weeks, between 60% and 78% of tracked articles had appeared.
CONCLUSION: Our findings support the use of the best performing COVID-19 collections by systematic reviews to replace paywalled databases.

Entities: Chemical

Keywords: COVID-19 publication; Completeness; Currency; Database; Evaluation; PubMed

Mesh：

Year: 2022 PMID： 35341949 PMCID： PMC8942908 DOI： 10.1016/j.jclinepi.2022.03.006

Source DB: PubMed Journal: J Clin Epidemiol ISSN： 0895-4356 Impact factor: 7.407

Multiple COVID-19 collections, including COVID-19 literature, have been developed during the global health emergency. These include the WHO COVID-19 Global Literature Database, Cochrane COVID-19 Study Register, CAMARADES COVID-19 SOLES, Epistemonikos’ COVID-19 L-OVE, and LitCovid were evaluated for both their completeness and currency of COVID-19 literature, using PubMed as a comparator. The majority of early COVID-19 systematic reviews rely on traditional literature databases suggesting a lack of awareness or comfort with COVID-19 collections. Per our analysis, most of the COVID-19 collections are comparable to PubMed in both completeness and currency. This study suggests that COVID-19 collections could replace or at least supplement traditional paywalled databases in systematic reviews. Those seeking Covid-19 information should search the Epistemonikos and WHO collections.

Introduction

COVID-19 became the dominant subject of the scientific and medical literature in 2020. Finding the latest research has become a time-consuming task for decision-makers and stakeholders as the volume of literature continues to grow. To address this problem, several groups have developed specialized collections of COVID-19 literature. These include the WHO COVID-19 Global literature on coronavirus disease, The Cochrane Collaboration’s COVID-19 Study Register, CAMARADES COVID-19-SOLES, Epistemonikos’ COVID-19 L-OVE, and the United States National Library of Medicine’s LitCovid. Every collection is curated, and the inclusion criteria differ between the databases. However, the purpose of all collections is to create centralized repositories of COVID-19 literature for simplified access and discovery. Each collection has a different methodology for identifying material to include. All involve regular human and/or machine searching and screening of bibliographic databases such as PubMed, Embase, as well as preprint sites, journal publishers’ websites, and in some cases, other collections such as the CDC and WHO COVID-19 websites. Except for LitCovid, all include preprints. All the collections describe their methodology on their websites. Those undertaking evidence synthesis projects can search any of these specialized collections along with standard methods such as searching bibliographic databases, study registries, checking reference lists, and contacting researchers. Or, if their currency and completeness could be demonstrated, a COVID-19 collection could be a single source for rapid evidence synthesis. Our research question asks, “how current and how complete are the COVID-19 collections?”

Objectives

To evaluate the completeness of the collections. To measure the time from when COVID-19 articles are posted by journals on the journal website to when they are retrievable from the specialized collections.

Methods

This study is divided into two parts: currency and completeness.

Selection of Covid-19 collections for study

The following collections were selected for study: WHO COVID-19 Global Literature Database (WHO), COMRADES COVID-19 SOLES (SOLES), Cochrane COVID-19 study register (Cochrane), United States National Library of Medicine’s LitCovid (LitCovid) and Epistemonikos’ COVID-19 L-OVE (Epistemonikos). The criteria for selection were that the collections had to be established and functional as of April 2020, when we started the study. COVID-19 must be the primary focus of the databases. Resources had to have at least a basic search function. Living Systematic Reviews or datasets that only allowed browsing were excluded. PubMed was also searched as a comparator.

Completeness

Methods are based on proven methods for studying database completeness [[1], [2], [3], [4], [5]] and the methodology used in a recent Cochrane COVID-19 Study Register Sensitivity Evaluation [6]. We used the included studies from systematic reviews that met the eligibility criteria described below until we had gathered a sample of 500 references. These references were then searched in each of the COVID-19 specialized collections under study to determine how many were contained in each collection. The systematic reviews were selected from the Epistemonikos’ COVID-19 L-OVE. Any study classified as a systematic review by Epistemonikos was eligible if it otherwise met the criteria, regardless of the terminology used by the authors (eg, such as meta-analyses, qualitative reviews, or rapid reviews), and regardless of the language of publication. We sought a mix of interventional, diagnostic, prediction, and observational reviews as described in the selection process below. For being eligible, systematic reviews must have reported searching at least four databases. A minimum of five included studies from each review must have been published or posted in 2020. For reviews with fewer than five included studies, all must have been published or posted in 2020. Note that we completed this portion of the study in 2020.

The selection process for systematic reviews

Systematic reviews were considered in sequence as they were listed in each of the five Epistemonikos categories (Prevention/Treatment, Diagnosis, Etiology, Epidemiology, and Prognosis). Once an eligible review was identified, the count of its eligible included primary studies was made. A review was then selected from the next category. This continued until 500 primary studies were identified. Five reviewers (R.B., M.S., R.C., J.M., S.L.) screened reviews and extracted primary studies. If ineligible, the main exclusion reason was noted. In all cases, the sources that were searched and the number of studies included in the review were recorded. Once 100 primary studies were identified from each category, the sets were examined for duplicate primary studies.

Sample of primary studies

Based on the Epistemonikos’ COVID-19 L-OVE, we estimated that there were 8,778 primary studies relevant to COVID-19 as of July 24, 2020. A 5% sample would have been 440 studies. We screened systematic reviews until we identified 500 primary studies with the expectation that there would be duplicate studies among the systematic reviews.

Eligibility criteria for primary studies

Studies must have been cited as an included study in one of the systematic reviews selected. Studies must have been primary studies but could be in published articles, preprints, conference abstracts, or trial registrations. No restrictions were imposed based on the language of publication. Publications without primary data were excluded. This included protocols or trial registrations that did not report results. Smartphone applications or datasets that were included in systematic reviews were excluded. If a systematic review included more than 50 primary studies, the first 50, as they appeared in the reference list, were selected.

Determination of inclusion of primary studies of interest in the Covid-19 collections

Primary studies were assigned to searchers who searched for each of the studies in each specialized COVID-19 collection and recorded its presence or absence. Searchers were drawn from the registered searchers of the Librarian Reserve Corps [7] and are named in section 6.0. Searchers could search using the DOI by a search string composed of author and title fragments or any other combination of features the searchers determined was suitable for the study and collection being searched. The searchers could make more than one attempt to retrieve the items, until satisfied that they were present or not in the collection. If a study was found to be present in that collection, the date added to the collection was recorded when available. This work occurred in December 2020. See protocol section “Determination of inclusion in COVID-19 collection” for more detail.

Outcome measures

The primary outcome measure was completeness, defined as the proportion of relevant primary studies found in a collection. The denominator was adjusted for the Cochrane collection as its inclusion criteria were narrower than those of the other collections. Each record not found in the Cochrane collection was assessed by one investigator (RB) and reviewed by a second investigator (MS) to determine if it was in the scope of that collection. The inclusion status of each selected study was shared with the database creators to allow them to audit any missed studies as a quality control measure.

Currency

We sought a sample of 50 newly appearing studies to track in the databases. One hundred and twelve journals were selected and tracked daily. Once a newly released article (preprint or final) relevant to the COVID-19 pandemic was identified on a journal website, tracking of that journal was halted. The identified article was then searched for in each of the collections each weekday until it appeared or until a 2-week observation period had ended. This process continued until we accrued 50 studies.

Journal selection

Journals were selected from a list created by searching Web of Science with default settings and the search string “COVID-19”. Web of Science, all segments, was chosen to ensure that all aspects of the COVID-19 pandemic, including economic and social aspects, were included. This search yielded 64,032 records on January 5, 2021. The Web of Science “analyze” feature was used to obtain a list of journals sorted by the number of publications. We retained journals that, when sorted by the number of publications, accounted for two-thirds of the total number of publications. This equated to a cut point of 16 or more articles. Note that we included all journals with 16 or more articles, although this put us slightly over the two-thirds mark (66.98%). To select 50 journals from this set, we selected every 11th journal, sorted by the number of articles retrieved from that journal. We added a margin of 10 journals (20%) in anticipation that some of these journals would not publish another COVID-19 article within 30 days. When this did not yield enough studies, we drew a second sample of 50 journals, starting at the seventh journal in the original list and selecting every 11th from that point (Appendix 1).

Article identification

The selected journals were divided into sets of six journals and distributed to 10 searchers. Five searchers similarly tracked the second sample following the end of the first observation period. This work occurred from January to April 2021.

Article eligibility criteria

For being eligible, the article must deal with COVID-19 or a topic relevant to the COVID-19 pandemic response, such as personal protective equipment, and present the results of a primary study. The article must either be a preprint in peer review or an article accepted by the journal, whether in manuscript or published form and have a DOI. Protocols, editorials, commentaries, discussion papers, and guidance articles that do not present primary results were excluded. There were no restrictions based on the language of publication.

Appearance in Covid-19 collections

Once a relevant article was identified, the searcher stopped checking the journal site and began tracking that article daily (weekdays only) until its date of appearance in each of the collections or until the end of the observation period, 2 weeks from appearance. Beyond 2 weeks, the collection was deemed to be “not current” for that article; otherwise, the date of appearance was recorded.

Meta-data collected

Upon completion of tracking, articles were classified by two investigators (JM, MS), reaching a consensus on the dimensions of the population studied (human/animal/other), and for clinical studies, the study design and question type: therapy (prevention/treatment), diagnosis, etiology, epidemiology, prognosis or other.

Results

One hundred and one systematic reviews were examined to identify 25 eligible systematic reviews. Characteristics of the systematic review that were screened and selected are presented in Table 1 . Sources searched in the reviews are shown in Table 2 . PubMed was the most frequently searched source, used in 62 of the 101 systematic reviews examined. Of the specialized collections, WHO or CDC was searched in 13, and Epistemonikos was searched in 3. Lit-Covid, Cochrane, and SOLES were not searched by any of the systematic reviews examined. The inclusion rate of primary studies in each COVID-19 collection ranged from 93.2% for Epistemonikos to 83.4% for the SOLES collection (Table 3).

Table 1

Characteristics of systematic reviews screened

Question type	Eligible N = 25	Ineligible N = 76	Total N = 101
Prevention/Treatment	9	49	58
Etiology	5	10	15
Diagnosis	4	6	10
Epidemiology	4	4	8
Prognosis	3	7	10
N of sources searched (Median, Q1, Q3)	5 (4, 6)	3 (2, 5)	4 (3, 5)
N of included studies (Median, Q1, Q3)	18 (9, 40)	9 (0, 19)	11 (2, 22)

Table 2

Sources searched by screened systematic reviews (N = 101)

Databases		Search engines and platforms
PubMed	62	Google Scholar	29
Embase	45	Google	2
MEDLINE	29	OVID	2
Web of Science	25
Cochrane Library	18	Preprint Servers
WHO or CDCa	13	MedXriv	14
China National Knowledge Infrastructure (CNKI)	10	BioRxiv	4
Wan-fang database	9	Research Square	1
CINAHL	7	preprints.org	1
CENTRAL	6	Unspecified preprint server	1
Medline/PubMed	5
Chinese Biomedical Literature Databases	4	Trial Registries
Chinese Scientific Journal Database (VIP)	4	Clinicaltrials.gov	6
LILACS	4	Chinese clinical trials registry	1
Epistemonikos’ COVID-19 L-OVE	3	Cochrane COVID-19 Study register	1
PsycINFO	2	EudraCT	1
PubMed Central	2	ISRCTN registry	1
Academic Search Premier	1
China Academic Literature Database	1	Journals
Global Health	1	BMJ	1
PEDro	1	Cells	1
SciELO	1	JAMA	1
Toxline	1	Lancet	1
		Nature	1
Publishers		New England Journal of Medicine	1
Science Direct (Elsevier)	5	Science	1
Springer Nature	1
Wiley Online Library	1	Sources not stated	11

It was not always clear if the agency’s COVID-19 database was searched or if it was a general search of the agency website.

Table 3

Completeness–inclusion rate of primary studies in each COVID-19 collection

Collection	Records in scope	Records found
Collection	Records in scope	N	%
Epistemonikos	440	410	93.2
WHO	440	405	92.0
LitCovid	440	399	90.7
PubMed	440	385	87.5
SOLES	440	367	83.4
Cochrane COVID-19 study register	407	358	88.0

Four publications were not found in any of the collections.

Characteristics of systematic reviews screened Sources searched by screened systematic reviews (N = 101) It was not always clear if the agency’s COVID-19 database was searched or if it was a general search of the agency website. Completeness–inclusion rate of primary studies in each COVID-19 collection Four publications were not found in any of the collections. Following the removal of duplicates, 440 primary studies were searched to determine their inclusion status in each of the COVID-19 collections. The study flow is illustrated in Figure 1 .

Fig. 1

Prisma flow diagram for identification of primary studies for completeness.

Prisma flow diagram for identification of primary studies for completeness. We examined all possible pairs of databases and looked at the overlap and unique publications. In any pairing, both databases contributed records not found in the other. This ranged from five records found in COVID-19 SOLES that were not found in WHO to 60 records found in Epistemonikos’ COVID-19 L-OVE that were not found in COVID-19 SOLES. Of greater practical interest is the gain from each database beyond what was available from PubMed (Table 4 ).

Table 4

Records found beyond those available through PubMed

Collection	Epistemonikos	WHO	LitCovid	SOLES	Cochrane
Epistemonikos	46	4	1	4	1
WHO	7	40	4	1	2
LitCovid	29	29	19	25	10
SOLES	6	7	15	34	6
Cochrane	19	18	1	16	24

Records found beyond those available through PubMed Illustrated are the absolute and relative yields of publications of interest that were not included in PubMed. The diagonal represents the number of records found in that database and not found in PubMed. Other cells in the table show the unique records in the column database relative to the database shown in the row. As an example, Table 4 shows that Epistimonikos has 46 records not found in PubMed. Seven of these were not found in the WHO database, and 29 were not found in LitCovid. Looking at the next column, the WHO had 40 records not found in PubMed. Of these, four were not found in Epistimonikos, 29 were not found in LitCovid, and so on. Six publications of interest were found in only one of the collections studied. Epistemonikos had four of these unique publications, and Cochrane and PubMed had one unique publication each. One hundred and twelve journals were monitored to yield eligible studies published by fifty journals within the monitoring period. Characteristics of primary studies. Forty-four of the 50 studies (88%) were human studies, one (2%) involved both humans and animals, and five (10%) were laboratory, simulation, or health systems studies. Forty-one (82%) of the articles pertained to clinical studies. We classified those according to the type of question asked and the study design used. The largest group of clinical studies involved epidemiological questions (17 studies, 41.5% of clinical studies), followed by prevention or treatment (10, 24.4%) and etiology or harms (8, 19.5%) (Table 5 ).

Table 5

Question type of the studies used to determine the currency of the collections

Question type	N	%
Prevention or treatment	10	24.4
Epidemiology	17	41.5
Etiology/harms	8	19.5
Prognosis	5	12.2
Diagnosis	1	2.4

Question type of the studies used to determine the currency of the collections See supplemental material for the study designs found in the sample of studies (Table A4). We considered a collection to be current for an article if that article appeared in the collection within 2 weeks of appearing on the journal website. Our a priori primary outcome is the proportion of in-scope tracked articles identifiable in a collection within 3 days of appearance at the journal site. The proportion available at 1, 5, 7, 10, and 14 days is also reported by collection (Table 6 ). For our primary outcome, PubMed had 27/50 articles (74%) at day 3, substantially more than any other collection examined. The WHO collection included 72% of our test articles within 2 weeks but accrued these articles more slowly than PubMed. The median time to the appearance of these articles was 5 days for the WHO collection. SOLES showed the slowest accrual, with a median of 8 days to appearance, and it contained only 60% of the target articles by the 2-week mark (Table 6).

Table 6

Currency and accrual rate of test articles tracked to appearance in each collection

Outcome	Epistemonikos	WHO	LitCovid	SOLES	Cochrane	PubMed
N Currenta	32/4867%	36/5072%	36/5072%	30/5060%	28/4168%	38/5076%
Median lagb (1st, 3rd quartile)	4 [1, 7]	5 [2, 6.5]	4 [2, 7]	8 [6.2, 9.8]	6 [3, 8]	2 [0, 4]
Number present after
1 day	5	5	5	1	0	7
3 daysc	16	15	16	2	9	27
5 days	22	25	23	4	13	30
7 days	24	33	28	12	18	35
10 days	28	36	35	23	24	37
14 days	32	39	36	30	28	38

Denominator corrected. Epistemonikos identified two of the target articles as “excluded.” Nine of the target articles were assessed as out of scope for the Cochrane collection by one investigator (RB) and verified by a second investigator (MS).

Calculated using articles that appeared within the 2-week monitoring period.

Proportion of in-scope tracked articles identifiable in a collection within 3 days of appearance at the journal site was our primary outcome for the currency portion.

Currency and accrual rate of test articles tracked to appearance in each collection Denominator corrected. Epistemonikos identified two of the target articles as “excluded.” Nine of the target articles were assessed as out of scope for the Cochrane collection by one investigator (RB) and verified by a second investigator (MS). Calculated using articles that appeared within the 2-week monitoring period. Proportion of in-scope tracked articles identifiable in a collection within 3 days of appearance at the journal site was our primary outcome for the currency portion.

Discussion

Considering completeness, none of the specialized collections studied here included all publications of interest. Three of the six collections had unique records, but only six records were unique to one source, and only one source had more than one unique record. The contribution of each database beyond material found in PubMed is important. PubMed indexed 87.5% of publications of interest. It is the benchmark against which other collections must be measured when considering completeness. It has the best-developed search features, and thus, PubMed records may be much easier to retrieve through subject searching than those found in the specialized collections. We tested for the presence of records using known-item searching by skilled searchers drawn from the Librarian Reserve Corps. Presence in the collection does not assure that a record will be retrieved. Factors of interface features and searcher skill will limit retrieval success. Tests of retrieval through subject searching would be necessary to fully assess the functionality of these collections, and this is beyond the scope of this study. However, our searchers reported usability issues with some of the collections, so we tabulated their basic features (Appendix 3). Searching involves a trade-off between recall of relevant items and the number needed to read (NNR) to identify one relevant record. A search of PubMed may retrieve many records not relevant to the question at hand because of its breadth of coverage. Special collections have the benefit of very high precision and a low NNR. The ideal combination of sources depends on factors such as subject coverage, access to the source, and the skill of the searcher. All specialized collections studied here contained more than 80% of the publications of interest. Yet, except for PubMed, they were rarely, if ever, used as sources in early COVID-19 systematic reviews (Table 2). This may have been due to a lack of awareness or confidence in their completeness. Our findings support their use, and they could replace paywalled databases often used to conduct systematic reviews. All sources studied here are open access, while four of the five databases searched most often in our sample of systematic reviews, shown in Table 2, are available only through subscription. Special COVID-19 collections were not able to identify and include articles faster than PubMed. The next most complete collection at the 3-day point was LitCovid, a derivative of PubMed produced by the National Library of Medicine, followed by Epistemonikos, WHO, Cochrane, and finally SOLES. Measured at 2 weeks, WHO was current for the most articles, followed by PubMed, LitCovid, Epistemonikos, Cochrane, and SOLES, in that order. Comparing this to the results from the completeness portion of this study, where the order (from most complete to least complete) was Epistemonikos, WHO, LitCovid, PubMed, SOLES, and Cochrane, we see some differences in the order, but with strong performances by the first four databases in both currency and completeness. There have been other studies that evaluated the completeness and currency of special COVID-19 collections that looked at one or two of the collections but not all five. Pierre et al. examined the sensitivity of Cochrane and Epistemonikos [8]. They found similar results with 88% accuracy in Cochrane for RCTs, 82% for observational studies. Epistemonikos had 100% accuracy in both study types [8]. Verdugo-Paiva et al. only looked at Epistemonikos but evaluated both the comprehensiveness and currency, finding comprehensiveness of 100% and the currency of 96.4% [9]. These two studies used different methodologies than this one but came to similar conclusions about the usefulness of these special collections in conducting systematic reviews. All of these evaluations reflect performance at a point in time. As with any source, users must be alert to changes in coverage, indexing practices, and timeliness. Preprints became a very important means for COVID-19 researchers to disseminate findings [10]. The indexing of preprints is reflected in our completeness results to the extent that they were included in the systematic reviews sampled. While we studied the speed of indexing of articles rather than preprints for the currency portion, all collections (but COVID-19 SOLES) actively monitored preprint sites in the same manner as journal sites, so we expect that the speed of inclusion would be similar.

Limitations

This study has three main limitations. First, both the currency and completeness results represent a particular point in time. Collections may have changed their procedures and the resources allocated to collection maintenance since we took our measurements. For example, COVID-19 SOLES has not updated its database since October 2021. Second, several studies that were marked as not found in the databases were confirmed to be there by the developers of the collections after the fact. These discrepancies may be due to challenges some of the searchers had with the search interfaces/functions of the different databases. Finally, the completeness study is retrospective.

Conclusions

Open-access special collections are an excellent resource for those looking for comprehensive and up-to-date sources. Experienced searchers may prefer PubMed.

3 in total

1. COVID-19 Living OVerview of Evidence repository is highly comprehensive and can be used as a single source for COVID-19 studies.

Authors: Francisca Verdugo-Paiva; Camilo Vergara; Camila Ávila; Javier A Castro-Guevara; Josefina Cid; Valeria Contreras; Iván Jara; Valentina Jiménez; Min Ha Lee; Magdalena Muñoz; Ana María Rojas-Gómez; Pablo Rosón-Rodríguez; Karen Serrano-Arévalo; Iván Silva-Ruz; Juan Vásquez-Laval; Paula Zambrano-Achig; Giovanna Zavadzki; Gabriel Rada
Journal: J Clin Epidemiol Date: 2022-05-19 Impact factor: 7.407

2. Antiviral drug treatment for nonsevere COVID-19: a systematic review and network meta-analysis.

Authors: Tyler Pitre; Rebecca Van Alstine; Genevieve Chick; Gareth Leung; David Mikhail; Ellen Cusano; Faran Khalid; Dena Zeraatkar
Journal: CMAJ Date: 2022-07-25 Impact factor: 16.859

3. Characteristics of Living Systematic Review for COVID-19.

Authors: Zhe Chen; Jiefeng Luo; Siyu Li; Peipei Xu; Linan Zeng; Qin Yu; Lingli Zhang
Journal: Clin Epidemiol Date: 2022-08-04 Impact factor: 5.814

3 in total