| Literature DB >> 35132411 |
Ginger Tsueng, Julia L Mullen, Manar Alkuzweny, Marco Cano, Benjamin Rush, Emily Haag, Alaa Abdel Latif, Xinghua Zhou, Zhongchao Qian, Emory Hufbauer, Mark Zeller, Kristian G Andersen, Chunlei Wu, Andrew I Su, Karthik Gangavarapu, Laura D Hughes.
Abstract
To combat the ongoing COVID-19 pandemic, scientists have been conducting research at breakneck speeds, producing over 52,000 peer-reviewed articles within the first year. To address the challenge in tracking the vast amount of new research located in separate repositories, we developed outbreak.info Research Library, a standardized, searchable interface of COVID-19 and SARS-CoV-2 resources. Unifying metadata from twelve repositories, we assembled a collection of over 270,000 publications, clinical trials, datasets, protocols, and other resources as of May 2022. We used a rigorous schema to enforce consistency across different sources and resource types and linked related resources. Researchers can quickly search the latest research across data repositories, regardless of resource type or repository location, via a search interface, public API, and R package. Finally, we discuss the challenges inherent in combining metadata from scattered and heterogeneous resources and provide recommendations to streamline this process to aid scientific research.Entities:
Year: 2022 PMID: 35132411 PMCID: PMC8820656 DOI: 10.1101/2022.01.20.477133
Source DB: PubMed Journal: bioRxiv
Figure 1.What are resources, who contributes to the proliferation of resources, why are resources difficult to find and use, and how can we support their use?
Figure 2.Supporting resource centralization and standardization by developing a harmonizing schema. a, Distribution of resources by resource type and source. Note that the x-axis in the bar graphs have different scales. b, Heterogeneous and filterable resources (i.e. publications, clinical trials, datasets, etc.) resulting from a single search of the phrase “Delta Variant”.
Figure 3.Aggregating resource metadata by leveraging community contributions. a, The community contribution pipeline and technology stack for outbreak.info’s Research Library. Curators may submit dataset metadata using the DDE built-in guide or from GitHub via the DDE/BioThings SDK. Pythonsavvy contributors can create parsers to contribute even more metadata via the BioThings SDK plugin architecture. A resource plugin allows the site to automatically ingest and update metadata from the corresponding external resource. Blue arrows indicate manual steps, yellow arrows indicate automatable steps after an initial set up, green arrows indicate completely automated steps. b, An example of a detailed metadata record manually-curated by volunteers as it appears in the Research Library.
Figure 4.Enabling exploration of the resources. a, Selectable options for filtering results by topic category or other facets enhance searchability and exploration from the search results view. b, Links to other records or to additional potential searches of interest enabling further exploration from a record view. c, Links from the Omicron Variant report to related resources.