Literature DB >> 29447159

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

David Westergaard1,2, Hans-Henrik Stærfeldt1, Christian Tønsberg3, Lars Juhl Jensen2, Søren Brunak1.   

Abstract

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29447159      PMCID: PMC5831415          DOI: 10.1371/journal.pcbi.1005962

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


  43 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Application of text mining in the biomedical domain.

Authors:  Wilco W M Fleuren; Wynand Alkema
Journal:  Methods       Date:  2015-01-30       Impact factor: 3.608

3.  Collaborations: The rise of research networks.

Authors:  Jonathan Adams
Journal:  Nature       Date:  2012-10-18       Impact factor: 49.962

Review 4.  Mining electronic health records: towards better research applications and clinical care.

Authors:  Peter B Jensen; Lars J Jensen; Søren Brunak
Journal:  Nat Rev Genet       Date:  2012-05-02       Impact factor: 53.242

5.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

6.  Trends in the global funding and activity of cancer research.

Authors:  Seth Eckhouse; Grant Lewison; Richard Sullivan
Journal:  Mol Oncol       Date:  2008-03-27       Impact factor: 6.603

7.  Disease Ontology: a backbone for disease semantic integration.

Authors:  Lynn Marie Schriml; Cesar Arze; Suvarna Nadendla; Yu-Wei Wayne Chang; Mark Mazaitis; Victor Felix; Gang Feng; Warren Alden Kibbe
Journal:  Nucleic Acids Res       Date:  2011-11-12       Impact factor: 16.971

Review 8.  Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.

Authors:  Graciela H Gonzalez; Tasnia Tahsin; Britton C Goodale; Anna C Greene; Casey S Greene
Journal:  Brief Bioinform       Date:  2015-09-29       Impact factor: 11.622

9.  The readability of scientific texts is decreasing over time.

Authors:  Pontus Plavén-Sigray; Granville James Matheson; Björn Christian Schiffler; William Hedley Thompson
Journal:  Elife       Date:  2017-09-05       Impact factor: 8.140

Review 10.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

Authors:  Martin Krallinger; Alfonso Valencia; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  29 in total

1.  PMC text mining subset in BioC: about three million full-text articles and growing.

Authors:  Donald C Comeau; Chih-Hsuan Wei; Rezarta Islamaj Doğan; Zhiyong Lu
Journal:  Bioinformatics       Date:  2019-09-15       Impact factor: 6.937

2.  PubTator central: automated concept annotation for biomedical full text articles.

Authors:  Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

3.  Text mining for modeling of protein complexes enhanced by machine learning.

Authors:  Varsha D Badal; Petras J Kundrotas; Ilya A Vakser
Journal:  Bioinformatics       Date:  2021-05-01       Impact factor: 6.937

4.  Towards a unified search: Improving PubMed retrieval with full text.

Authors:  Won Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2022-09-21       Impact factor: 8.000

5.  Machine Learning Approach to Facilitate Knowledge Synthesis at the Intersection of Liver Cancer, Epidemiology, and Health Disparities Research.

Authors:  Travis C Hyams; Ling Luo; Brionna Hair; Kyubum Lee; Zhiyong Lu; Daniela Seminara
Journal:  JCO Clin Cancer Inform       Date:  2022-05

6.  Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries.

Authors:  Balu Bhasuran
Journal:  Methods Mol Biol       Date:  2022

7.  BioBERT and Similar Approaches for Relation Extraction.

Authors:  Balu Bhasuran
Journal:  Methods Mol Biol       Date:  2022

Review 8.  Past and future uses of text mining in ecology and evolution.

Authors:  Maxwell J Farrell; Liam Brierley; Anna Willoughby; Andrew Yates; Nicole Mideo
Journal:  Proc Biol Sci       Date:  2022-05-18       Impact factor: 5.530

9.  Working the literature harder: what can text mining and bibliometric analysis reveal?

Authors:  Yu Han; Sara A Wennersten; Maggie P Y Lam
Journal:  Expert Rev Proteomics       Date:  2019-12-16       Impact factor: 3.940

Review 10.  Data-Driven Modeling of Pregnancy-Related Complications.

Authors:  Camilo Espinosa; Martin Becker; Ivana Marić; Ronald J Wong; Gary M Shaw; Brice Gaudilliere; Nima Aghaeepour; David K Stevenson
Journal:  Trends Mol Med       Date:  2021-02-08       Impact factor: 15.272

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.