Literature DB >> 34308448

Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it.

K Bretonnel Cohen1, Lawrence E Hunter1, Martha Palmer1.   

Abstract

Significant progress has been made in addressing the scientific challenges of biomedical text mining. However, the transition from a demonstration of scientific progress to the production of tools on which a broader community can rely requires that fundamental software engineering requirements be addressed. In this paper we characterize the state of biomedical text mining software with respect to software testing and quality assurance. Biomedical natural language processing software was chosen because it frequently specifically claims to offer production-quality services, rather than just research prototypes. We examined twenty web sites offering a variety of text mining services. On each web site, we performed the most basic software test known to us and classified the results. Seven out of twenty web sites returned either bad results or the worst class of results in response to this simple test. We conclude that biomedical natural language processing tools require greater attention to software quality. We suggest a linguistically motivated approach to granular evaluation of natural language processing applications, and show how it can be used to detect performance errors of several systems and to predict overall performance on specific equivalence classes of inputs. We also assess the ability of linguistically-motivated test suites to provide good software testing, as compared to large corpora of naturally-occurring data. We measure code coverage and find that it is considerably higher when even small structured test suites are utilized than when large corpora are used.

Year:  2013        PMID: 34308448      PMCID: PMC8300901          DOI: 10.1007/978-3-642-45260-4_6

Source DB:  PubMed          Journal:  Trust Eternal Syst Via Evol Softw Data Knowl (2012)


  9 in total

1.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

2.  Efficient extraction of protein-protein interactions from full-text articles.

Authors:  Jörg Hakenberg; Robert Leaman; Nguyen Ha Vo; Siddhartha Jonnalagadda; Ryan Sullivan; Christopher Miller; Luis Tari; Chitta Baral; Graciela Gonzalez
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2010 Jul-Sep       Impact factor: 3.710

3.  Scientific publishing. A scientist's nightmare: software problem leads to five retractions.

Authors:  Greg Miller
Journal:  Science       Date:  2006-12-22       Impact factor: 47.728

4.  A fault model for ontology mapping, alignment, and linking systems.

Authors:  Helen L Johnson; K Bretonnel Cohen; Lawrence Hunter
Journal:  Pac Symp Biocomput       Date:  2007

Review 5.  Frontiers of biomedical text mining: current progress.

Authors:  Pierre Zweigenbaum; Dina Demner-Fushman; Hong Yu; Kevin B Cohen
Journal:  Brief Bioinform       Date:  2007-10-30       Impact factor: 11.622

6.  GENETAG: a tagged corpus for gene/protein named entity recognition.

Authors:  Lorraine Tanabe; Natalie Xie; Lynne H Thom; Wayne Matten; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

7.  Facts from text--is text mining ready to deliver?

Authors:  Dietrich Rebholz-Schuhmann; Harald Kirsch; Francisco Couto
Journal:  PLoS Biol       Date:  2005-02       Impact factor: 8.029

8.  Concept recognition for extracting protein interaction relations from biomedical text.

Authors:  William A Baumgartner; Zhiyong Lu; Helen L Johnson; J Gregory Caporaso; Jesse Paquette; Anna Lindemann; Elizabeth K White; Olga Medvedeva; K Bretonnel Cohen; Lawrence Hunter
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

9.  OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression.

Authors:  Lawrence Hunter; Zhiyong Lu; James Firby; William A Baumgartner; Helen L Johnson; Philip V Ogren; K Bretonnel Cohen
Journal:  BMC Bioinformatics       Date:  2008-01-31       Impact factor: 3.169

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.