Literature DB >> 30715220

PMC text mining subset in BioC: about three million full-text articles and growing.

Donald C Comeau1, Chih-Hsuan Wei1, Rezarta Islamaj Doğan1, Zhiyong Lu1.   

Abstract

MOTIVATION: Interest in text mining full-text biomedical research articles is growing. To facilitate automated processing of nearly 3 million full-text articles (in PubMed Central® Open Access and Author Manuscript subsets) and to improve interoperability, we convert these articles to BioC, a community-driven simple data structure in either XML or JavaScript Object Notation format for conveniently sharing text and annotations.
RESULTS: The resultant articles can be downloaded via both File Transfer Protocol for bulk access and a Web API for updates or a more focused collection. Since the availability of the Web API in 2017, our BioC collection has been widely used by the research community.
AVAILABILITY AND IMPLEMENTATION: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/. © Published by Oxford University Press 2019. This work is written by a US Government employee and is in the public domain in the US.

Mesh:

Year:  2019        PMID: 30715220      PMCID: PMC6748740          DOI: 10.1093/bioinformatics/btz070

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

Review 1.  Mining biological networks from full-text articles.

Authors:  Jan Czarnecki; Adrian J Shepherd
Journal:  Methods Mol Biol       Date:  2014

2.  Large-scale event extraction from literature with multi-level gene normalization.

Authors:  Sofie Van Landeghem; Jari Björne; Chih-Hsuan Wei; Kai Hakala; Sampo Pyysalo; Sophia Ananiadou; Hung-Yu Kao; Zhiyong Lu; Tapio Salakoski; Yves Van de Peer; Filip Ginter
Journal:  PLoS One       Date:  2013-04-17       Impact factor: 3.240

3.  Concept annotation in the CRAFT corpus.

Authors:  Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A Baumgartner; K Bretonnel Cohen; Karin Verspoor; Judith A Blake; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-07-09       Impact factor: 3.169

4.  Construction of phosphorylation interaction networks by text mining of full-length articles using the eFIP system.

Authors:  Catalina O Tudor; Karen E Ross; Gang Li; K Vijay-Shanker; Cathy H Wu; Cecilia N Arighi
Journal:  Database (Oxford)       Date:  2015-03-31       Impact factor: 3.451

5.  tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

Authors:  Juan Miguel Cejuela; Peter McQuilton; Laura Ponting; Steven J Marygold; Raymund Stefancsik; Gillian H Millburn; Burkhard Rost
Journal:  Database (Oxford)       Date:  2014-04-07       Impact factor: 3.451

6.  BioC implementations in Go, Perl, Python and Ruby.

Authors:  Wanli Liu; Rezarta Islamaj Doğan; Dongseop Kwon; Hernani Marques; Fabio Rinaldi; W John Wilbur; Donald C Comeau
Journal:  Database (Oxford)       Date:  2014-06-23       Impact factor: 3.451

7.  Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.

Authors:  Donald C Comeau; Haibin Liu; Rezarta Islamaj Doğan; W John Wilbur
Journal:  Database (Oxford)       Date:  2014-06-16       Impact factor: 3.451

8.  Section level search functionality in Europe PMC.

Authors:  Şenay Kafkas; Xingjun Pi; Nikos Marinos; Francesco Talo'; Andrew Morrison; Johanna R McEntyre
Journal:  J Biomed Semantics       Date:  2015-03-10

9.  BioC: a minimalist approach to interoperability for biomedical text processing.

Authors:  Donald C Comeau; Rezarta Islamaj Doğan; Paolo Ciccarese; Kevin Bretonnel Cohen; Martin Krallinger; Florian Leitner; Zhiyong Lu; Yifan Peng; Fabio Rinaldi; Manabu Torii; Alfonso Valencia; Karin Verspoor; Thomas C Wiegers; Cathy H Wu; W John Wilbur
Journal:  Database (Oxford)       Date:  2013-09-18       Impact factor: 3.451

10.  BC4GO: a full-text corpus for the BioCreative IV GO task.

Authors:  Kimberly Van Auken; Mary L Schaeffer; Peter McQuilton; Stanley J F Laulederkind; Donghui Li; Shur-Jen Wang; G Thomas Hayman; Susan Tweedie; Cecilia N Arighi; James Done; Hans-Michael Müller; Paul W Sternberg; Yuqing Mao; Chih-Hsuan Wei; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2014-07-28       Impact factor: 3.451

View more
  15 in total

1.  LitSense: making sense of biomedical literature at sentence level.

Authors:  Alexis Allot; Qingyu Chen; Sun Kim; Roberto Vera Alvarez; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

2.  PubTator central: automated concept annotation for biomedical full text articles.

Authors:  Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

3.  TeamTat: a collaborative text annotation tool.

Authors:  Rezarta Islamaj; Dongseop Kwon; Sun Kim; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

4.  Towards a unified search: Improving PubMed retrieval with full text.

Authors:  Won Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2022-09-21       Impact factor: 8.000

5.  Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration.

Authors:  Dhouha Grissa; Alexander Junge; Tudor I Oprea; Lars Juhl Jensen
Journal:  Database (Oxford)       Date:  2022-03-28       Impact factor: 4.462

6.  CoCoScore: context-aware co-occurrence scoring for text mining applications using distant supervision.

Authors:  Alexander Junge; Lars Juhl Jensen
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

7.  Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization.

Authors:  Charles C N Wang; Jennifer Jin; Jan-Gowth Chang; Masahiro Hayakawa; Atsushi Kitazawa; Jeffrey J P Tsai; Phillip C-Y Sheu
Journal:  BMC Med Inform Decis Mak       Date:  2020-09-03       Impact factor: 2.796

8.  NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.

Authors:  Rezarta Islamaj; Robert Leaman; Sun Kim; Dongseop Kwon; Chih-Hsuan Wei; Donald C Comeau; Yifan Peng; David Cissel; Cathleen Coss; Carol Fisher; Rob Guzman; Preeti Gokal Kochar; Stella Koppel; Dorothy Trinh; Keiko Sekiya; Janice Ward; Deborah Whitman; Susan Schmidt; Zhiyong Lu
Journal:  Sci Data       Date:  2021-03-25       Impact factor: 6.444

9.  SIB Literature Services: RESTful customizable search engines in biomedical literature, enriched with automatically mapped biomedical concepts.

Authors:  Julien Gobeill; Déborah Caucheteur; Pierre-André Michel; Luc Mottin; Emilie Pasche; Patrick Ruch
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

10.  PEDL: extracting protein-protein associations using deep language models and distant supervision.

Authors:  Leon Weber; Kirsten Thobe; Oscar Arturo Migueles Lozano; Jana Wolf; Ulf Leser
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.