Literature DB >> 33482803

A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses.

Andra Waagmeester1, Egon L Willighagen2, Andrew I Su3, Martina Kutmon2,4, Jose Emilio Labra Gayo5, Daniel Fernández-Álvarez5, Quentin Groom6, Peter J Schaap7, Lisa M Verhagen8, Jasper J Koehorst9.   

Abstract

BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.
RESULTS: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.
CONCLUSIONS: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).

Entities:  

Keywords:  COVID-19; Linked data; Open Science; ShEx; Wikidata

Mesh:

Substances:

Year:  2021        PMID: 33482803      PMCID: PMC7820539          DOI: 10.1186/s12915-020-00940-y

Source DB:  PubMed          Journal:  BMC Biol        ISSN: 1741-7007            Impact factor:   7.431


  18 in total

1.  Database resources of the National Center for Biotechnology Information.

Authors:  Eric W Sayers; Jeff Beck; J Rodney Brister; Evan E Bolton; Kathi Canese; Donald C Comeau; Kathryn Funk; Anne Ketter; Sunghwan Kim; Avi Kimchi; Paul A Kitts; Anatoliy Kuznetsov; Stacy Lathrop; Zhiyong Lu; Kelly McGarvey; Thomas L Madden; Terence D Murphy; Nuala O'Leary; Lon Phan; Valerie A Schneider; Françoise Thibaud-Nissen; Bart W Trawick; Kim D Pruitt; James Ostell
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

2.  Gene: a gene-centered information resource at NCBI.

Authors:  Garth R Brown; Vichet Hem; Kenneth S Katz; Michael Ovetsky; Craig Wallin; Olga Ermolaeva; Igor Tolstoy; Tatiana Tatusova; Kim D Pruitt; Donna R Maglott; Terence D Murphy
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

3.  Linked open drug data for pharmaceutical research and development.

Authors:  Matthias Samwald; Anja Jentzsch; Christopher Bouton; Claus Stie Kallesøe; Egon Willighagen; Janos Hajagos; M Scott Marshall; Eric Prud'hommeaux; Oktie Hassenzadeh; Elgar Pichler; Susie Stephens
Journal:  J Cheminform       Date:  2011-05-16       Impact factor: 5.514

4.  UniProt: the universal protein knowledgebase.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

5.  KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.

Authors:  Justin T Reese; Deepak Unni; Tiffany J Callahan; Luca Cappelletti; Vida Ravanmehr; Seth Carbon; Kent A Shefchek; Benjamin M Good; James P Balhoff; Tommaso Fontana; Hannah Blau; Nicolas Matentzoglu; Nomi L Harris; Monica C Munoz-Torres; Melissa A Haendel; Peter N Robinson; Marcin P Joachimiak; Christopher J Mungall
Journal:  Patterns (N Y)       Date:  2020-11-09

6.  BioGPS and MyGene.info: organizing online, gene-centric information.

Authors:  Chunlei Wu; Ian Macleod; Andrew I Su
Journal:  Nucleic Acids Res       Date:  2012-11-21       Impact factor: 16.971

7.  The Pfam protein families database in 2019.

Authors:  Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

8.  Protein Data Bank: the single global archive for 3D macromolecular structure data.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  A Novel Coronavirus from Patients with Pneumonia in China, 2019.

Authors:  Na Zhu; Dingyu Zhang; Wenling Wang; Xingwang Li; Bo Yang; Jingdong Song; Xiang Zhao; Baoying Huang; Weifeng Shi; Roujian Lu; Peihua Niu; Faxian Zhan; Xuejun Ma; Dayan Wang; Wenbo Xu; Guizhen Wu; George F Gao; Wenjie Tan
Journal:  N Engl J Med       Date:  2020-01-24       Impact factor: 91.245

10.  Wikidata as a knowledge graph for the life sciences.

Authors:  Andra Waagmeester; Gregory Stupp; Sebastian Burgstaller-Muehlbacher; Benjamin M Good; Malachi Griffith; Obi L Griffith; Kristina Hanspers; Henning Hermjakob; Toby S Hudson; Kevin Hybiske; Sarah M Keating; Magnus Manske; Michael Mayers; Daniel Mietchen; Elvira Mitraka; Alexander R Pico; Timothy Putman; Anders Riutta; Nuria Queralt-Rosinach; Lynn M Schriml; Thomas Shafee; Denise Slenter; Ralf Stephan; Katherine Thornton; Ginger Tsueng; Roger Tu; Sabah Ul-Hasan; Egon Willighagen; Chunlei Wu; Andrew I Su
Journal:  Elife       Date:  2020-03-17       Impact factor: 8.713

View more
  3 in total

1.  MIKB: A manually curated and comprehensive knowledge base for myocardial infarction.

Authors:  Chaoying Zhan; Yingbo Zhang; Xingyun Liu; Rongrong Wu; Ke Zhang; Wenjing Shi; Li Shen; Ke Shen; Xuemeng Fan; Fei Ye; Bairong Shen
Journal:  Comput Struct Biotechnol J       Date:  2021-11-16       Impact factor: 7.271

2.  Understanding signaling and metabolic paths using semantified and harmonized information about biological interactions.

Authors:  Ryan A Miller; Martina Kutmon; Anwesha Bohler; Andra Waagmeester; Chris T Evelo; Egon L Willighagen
Journal:  PLoS One       Date:  2022-04-18       Impact factor: 3.752

3.  Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata.

Authors:  Houcemeddine Turki; Dariusz Jemielniak; Mohamed A Hadj Taieb; Jose E Labra Gayo; Mohamed Ben Aouicha; Mus'ab Banat; Thomas Shafee; Eric Prud'hommeaux; Tiago Lubiana; Diptanshu Das; Daniel Mietchen
Journal:  PeerJ Comput Sci       Date:  2022-09-29
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.