Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

Literature DB >> 29447159

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

David Westergaard^1,2, Hans-Henrik Stærfeldt¹, Christian Tønsberg³, Lars Juhl Jensen², Søren Brunak¹.

Abstract

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2018 PMID： 29447159 PMCID： PMC5831415 DOI： 10.1371/journal.pcbi.1005962

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

43 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 2. Application of text mining in the biomedical domain.

Authors: Wilco W M Fleuren; Wynand Alkema
Journal: Methods Date: 2015-01-30 Impact factor: 3.608

3. Collaborations: The rise of research networks.

Authors: Jonathan Adams
Journal: Nature Date: 2012-10-18 Impact factor: 49.962

Review 4. Mining electronic health records: towards better research applications and clinical care.

Authors: Peter B Jensen; Lars J Jensen; Søren Brunak
Journal: Nat Rev Genet Date: 2012-05-02 Impact factor: 53.242

5. Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors: Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal: PLoS Biol Date: 2004-09-21 Impact factor: 8.029

6. Trends in the global funding and activity of cancer research.

Authors: Seth Eckhouse; Grant Lewison; Richard Sullivan
Journal: Mol Oncol Date: 2008-03-27 Impact factor: 6.603

7. Disease Ontology: a backbone for disease semantic integration.

Authors: Lynn Marie Schriml; Cesar Arze; Suvarna Nadendla; Yu-Wei Wayne Chang; Mark Mazaitis; Victor Felix; Gang Feng; Warren Alden Kibbe
Journal: Nucleic Acids Res Date: 2011-11-12 Impact factor: 16.971

Review 8. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.

Authors: Graciela H Gonzalez; Tasnia Tahsin; Britton C Goodale; Anna C Greene; Casey S Greene
Journal: Brief Bioinform Date: 2015-09-29 Impact factor: 11.622

9. The readability of scientific texts is decreasing over time.

Authors: Pontus Plavén-Sigray; Granville James Matheson; Björn Christian Schiffler; William Hedley Thompson
Journal: Elife Date: 2017-09-05 Impact factor: 8.140

Review 10. Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

Authors: Martin Krallinger; Alfonso Valencia; Lynette Hirschman
Journal: Genome Biol Date: 2008-09-01 Impact factor: 13.583

29 in total

1. PMC text mining subset in BioC: about three million full-text articles and growing.

Authors: Donald C Comeau; Chih-Hsuan Wei; Rezarta Islamaj Doğan; Zhiyong Lu
Journal: Bioinformatics Date: 2019-09-15 Impact factor: 6.937

2. PubTator central: automated concept annotation for biomedical full text articles.

Authors: Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

3. Text mining for modeling of protein complexes enhanced by machine learning.

Authors: Varsha D Badal; Petras J Kundrotas; Ilya A Vakser
Journal: Bioinformatics Date: 2021-05-01 Impact factor: 6.937

4. Towards a unified search: Improving PubMed retrieval with full text.

Authors: Won Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal: J Biomed Inform Date: 2022-09-21 Impact factor: 8.000

5. Machine Learning Approach to Facilitate Knowledge Synthesis at the Intersection of Liver Cancer, Epidemiology, and Health Disparities Research.

Authors: Travis C Hyams; Ling Luo; Brionna Hair; Kyubum Lee; Zhiyong Lu; Daniela Seminara
Journal: JCO Clin Cancer Inform Date: 2022-05