Literature DB >> 30239677

Assisting document triage for human kinome curation via machine learning.

Yi-Yu Hsu1, Chih-Hsuan Wei1, Zhiyong Lu1.   

Abstract

In the era of data explosion, the increasing frequency of published articles presents unorthodox challenges to fulfill specific curation requirements for bio-literature databases. Recognizing these demands, we designed a document triage system with automatic methods that can improve efficiency to retrieve the most relevant articles in curation workflows and reduce workloads for biocurators. Since the BioCreative VI (2017), we have implemented texting mining processing in our system in hopes of providing higher effectiveness for curating articles related to human kinase proteins. We tested several machine learning methods together with state-of-the-art concept extraction tools. For features, we extracted rich co-occurrence and linguistic information to model the curation process of human kinome articles by the neXtProt database. As shown in the official evaluation on the human kinome curation task in BioCreative VI, our system can effectively retrieve 5.2 and 6.5 kinase articles with the relevant disease (DIS) and biological process (BP) information, respectively, among the top 100 returned results. Comparing to neXtA5, our system demonstrates significant improvements in prioritizing kinome-related articles as follows: our system achieves 0.458 and 0.109 for the DIS axis whereas the neXtA5's best-reported mean average precision (MAP) and maximum precision observed are 0.41 and 0.04. Our system also outperforms the neXtA5 in retrieving BP axis with 0.195 for MAP and the neXtA5's reported value was 0.11. These results suggest that our system may be able to assist neXtProt biocurators in practice.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30239677      PMCID: PMC6146134          DOI: 10.1093/database/bay091

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  14 in total

1.  tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Authors:  Chih-Hsuan Wei; Bethany R Harris; Hung-Yu Kao; Zhiyong Lu
Journal:  Bioinformatics       Date:  2013-04-05       Impact factor: 6.937

2.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Authors:  Robert Leaman; Zhiyong Lu
Journal:  Bioinformatics       Date:  2016-06-09       Impact factor: 6.937

3.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

4.  neXtProt: organizing protein knowledge in the context of human proteome projects.

Authors:  Pascale Gaudet; Ghislaine Argoud-Puy; Isabelle Cusin; Paula Duek; Olivier Evalet; Alain Gateau; Anne Gleizes; Mario Pereira; Monique Zahn-Zabal; Catherine Zwahlen; Amos Bairoch; Lydie Lane
Journal:  J Proteome Res       Date:  2012-12-03       Impact factor: 4.466

5.  Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

Authors:  Sun Kim; Won Kim; Chih-Hsuan Wei; Zhiyong Lu; W John Wilbur
Journal:  Database (Oxford)       Date:  2012-11-17       Impact factor: 3.451

6.  Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

Authors:  Zhiyong Lu; Lynette Hirschman
Journal:  Database (Oxford)       Date:  2012-11-17       Impact factor: 3.451

7.  The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.

Authors:  Martin Krallinger; Miguel Vazquez; Florian Leitner; David Salgado; Andrew Chatr-Aryamontri; Andrew Winter; Livia Perfetto; Leonardo Briganti; Luana Licata; Marta Iannuccelli; Luisa Castagnoli; Gianni Cesareni; Mike Tyers; Gerold Schneider; Fabio Rinaldi; Robert Leaman; Graciela Gonzalez; Sergio Matos; Sun Kim; W John Wilbur; Luis Rocha; Hagit Shatkay; Ashish V Tendulkar; Shashank Agarwal; Feifan Liu; Xinglong Wang; Rafal Rak; Keith Noto; Charles Elkan; Zhiyong Lu; Rezarta Islamaj Dogan; Jean-Fred Fontaine; Miguel A Andrade-Navarro; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

8.  On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.

Authors:  Sylvain Poux; Cecilia N Arighi; Michele Magrane; Alex Bateman; Chih-Hsuan Wei; Zhiyong Lu; Emmanuel Boutet; Hema Bye-A-Jee; Maria Livia Famiglietti; Bernd Roechert; The UniProt Consortium
Journal:  Bioinformatics       Date:  2017-11-01       Impact factor: 6.937

9.  eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality.

Authors:  Ruoyao Ding; Emmanuel Boutet; Damien Lieberherr; Michel Schneider; Michael Tognolli; Cathy H Wu; K Vijay-Shanker; Cecilia N Arighi
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

10.  Collaborative biocuration--text-mining development task for document prioritization for curation.

Authors:  Thomas C Wiegers; Allan Peter Davis; Carolyn J Mattingly
Journal:  Database (Oxford)       Date:  2012-11-22       Impact factor: 3.451

View more
  2 in total

1.  Integrating image caption information into biomedical document classification in support of biocuration.

Authors:  Xiangying Jiang; Pengyuan Li; James Kadin; Judith A Blake; Martin Ringwald; Hagit Shatkay
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

2.  Using deep learning to identify translational research in genomic medicine beyond bench to bedside.

Authors:  Yi-Yu Hsu; Mindy Clyne; Chih-Hsuan Wei; Muin J Khoury; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.