Literature DB >> 16799127

Enhancing text categorization with semantic-enriched representation and training data augmentation.

Xinghua Lu1, Bin Zheng, Atulya Velivelli, Chengxiang Zhai.   

Abstract

OBJECTIVE: Acquiring and representing biomedical knowledge is an increasingly important component of contemporary bioinformatics. A critical step of the process is to identify and retrieve relevant documents among the vast volume of modern biomedical literature efficiently. In the real world, many information retrieval tasks are difficult because of high data dimensionality and the lack of annotated examples to train a retrieval algorithm. Under such a scenario, the performance of information retrieval algorithms is often unsatisfactory, therefore improvements are needed.
DESIGN: We studied two approaches that enhance the text categorization performance on sparse and high data dimensionality: (1) semantic-preserving dimension reduction by representing text with semantic-enriched features; and (2) augmenting training data with semi-supervised learning. A probabilistic topic model was applied to extract major semantic topics from a corpus of text of interest. The representation of documents was projected from the high-dimensional vocabulary space onto a semantic topic space with reduced dimensionality. A semi-supervised learning algorithm based on graph theory was applied to identify potential positive training cases, which were further used to augment training data. The effects of data transformation and augmentation on text categorization by support vector machine (SVM) were evaluated. RESULTS AND
CONCLUSION: Semantic-enriched data transformation and the pseudo-positive-cases augmented training data enhance the efficiency and performance of text categorization by SVM.

Mesh:

Year:  2006        PMID: 16799127      PMCID: PMC1561790          DOI: 10.1197/jamia.M2051

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  8 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup.

Authors:  Alexander S Yeh; Lynette Hirschman; Alexander A Morgan
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

3.  Finding scientific topics.

Authors:  Thomas L Griffiths; Mark Steyvers
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-10       Impact factor: 11.205

4.  Text categorization models for high-quality article retrieval in internal medicine.

Authors:  Yindalon Aphinyanaphongs; Ioannis Tsamardinos; Alexander Statnikov; Douglas Hardin; Constantin F Aliferis
Journal:  J Am Med Inform Assoc       Date:  2004-11-23       Impact factor: 4.497

5.  Will a biological database be different from a biological journal?

Authors:  Philip Bourne
Journal:  PLoS Comput Biol       Date:  2005-08       Impact factor: 4.475

6.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

7.  The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology.

Authors:  Janan T Eppig; Carol J Bult; James A Kadin; Joel E Richardson; Judith A Blake; A Anagnostopoulos; R M Baldarelli; M Baya; J S Beal; S M Bello; W J Boddy; D W Bradt; D L Burkart; N E Butler; J Campbell; M A Cassell; L E Corbani; S L Cousins; D J Dahmen; H Dene; A D Diehl; H J Drabkin; K S Frazer; P Frost; L H Glass; C W Goldsmith; P L Grant; M Lennon-Pierce; J Lewis; I Lu; L J Maltais; M McAndrews-Hill; L McClellan; D B Miers; L A Miller; L Ni; J E Ormsby; D Qi; T B K Reddy; D J Reed; B Richards-Smith; D R Shaw; R Sinclair; C L Smith; P Szauter; M B Walker; D O Walton; L L Washburn; I T Witham; Y Zhu
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

8.  Identifying biological concepts from a protein-related corpus with a probabilistic topic model.

Authors:  Bin Zheng; David C McLean; Xinghua Lu
Journal:  BMC Bioinformatics       Date:  2006-02-08       Impact factor: 3.169

  8 in total
  8 in total

1.  Mapping annotations with textual evidence using an scLDA model.

Authors:  Bo Jin; Vicky Chen; Lujia Chen; Xinghua Lu
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  A Kernel Theory of Modern Data Augmentation.

Authors:  Tri Dao; Albert Gu; Alexander J Ratner; Virginia Smith; Christopher De Sa; Christopher Ré
Journal:  Proc Mach Learn Res       Date:  2019-06

3.  Learning to Compose Domain-Specific Transformations for Data Augmentation.

Authors:  Alexander J Ratner; Henry R Ehrenberg; Zeshan Hussain; Jared Dunnmon; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2017-12

Review 4.  Artificial Intelligence in Medicine: Chances and Challenges for Wide Clinical Adoption.

Authors:  Julian Varghese
Journal:  Visc Med       Date:  2020-10-12

Review 5.  Developing Embedded Taxonomy and Mining Patients' Interests From Web-Based Physician Reviews: Mixed-Methods Approach.

Authors:  Jia Li; Minghui Liu; Xiaojun Li; Xuan Liu; Jingfang Liu
Journal:  J Med Internet Res       Date:  2018-08-16       Impact factor: 5.428

6.  Augmentation and heterogeneous graph neural network for AAAI2021-COVID-19 fake news detection.

Authors:  Andrea Stevens Karnyoto; Chengjie Sun; Bingquan Liu; Xiaolong Wang
Journal:  Int J Mach Learn Cybern       Date:  2022-01-08       Impact factor: 4.377

7.  Deep Transfer Learning for Question Classification Based on Semantic Information Features of Category Labels.

Authors:  Lei Su; Wenqian Kang; Liping Wu; Di Jiang
Journal:  Comput Intell Neurosci       Date:  2022-09-30

8.  Multi-label literature classification based on the Gene Ontology graph.

Authors:  Bo Jin; Brian Muller; Chengxiang Zhai; Xinghua Lu
Journal:  BMC Bioinformatics       Date:  2008-12-08       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.