Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Assisting document triage for human kinome curation via machine learning.

Literature DB >> 30239677

Assisting document triage for human kinome curation via machine learning.

Yi-Yu Hsu¹, Chih-Hsuan Wei¹, Zhiyong Lu¹.

Abstract

In the era of data explosion, the increasing frequency of published articles presents unorthodox challenges to fulfill specific curation requirements for bio-literature databases. Recognizing these demands, we designed a document triage system with automatic methods that can improve efficiency to retrieve the most relevant articles in curation workflows and reduce workloads for biocurators. Since the BioCreative VI (2017), we have implemented texting mining processing in our system in hopes of providing higher effectiveness for curating articles related to human kinase proteins. We tested several machine learning methods together with state-of-the-art concept extraction tools. For features, we extracted rich co-occurrence and linguistic information to model the curation process of human kinome articles by the neXtProt database. As shown in the official evaluation on the human kinome curation task in BioCreative VI, our system can effectively retrieve 5.2 and 6.5 kinase articles with the relevant disease (DIS) and biological process (BP) information, respectively, among the top 100 returned results. Comparing to neXtA5, our system demonstrates significant improvements in prioritizing kinome-related articles as follows: our system achieves 0.458 and 0.109 for the DIS axis whereas the neXtA5's best-reported mean average precision (MAP) and maximum precision observed are 0.41 and 0.04. Our system also outperforms the neXtA5 in retrieving BP axis with 0.195 for MAP and the neXtA5's reported value was 0.11. These results suggest that our system may be able to assist neXtProt biocurators in practice.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteome
Protein Kinases

Year: 2018 PMID： 30239677 PMCID： PMC6146134 DOI： 10.1093/database/bay091

Source DB: PubMed Journal: Database (Oxford) ISSN： 1758-0463 Impact factor: 3.451

14 in total

1. tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Authors: Chih-Hsuan Wei; Bethany R Harris; Hung-Yu Kao; Zhiyong Lu
Journal: Bioinformatics Date: 2013-04-05 Impact factor: 6.937

2. TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Authors: Robert Leaman; Zhiyong Lu
Journal: Bioinformatics Date: 2016-06-09 Impact factor: 6.937

3. Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors: Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal: J Stat Softw Date: 2010 Impact factor: 6.440

4. neXtProt: organizing protein knowledge in the context of human proteome projects.

Authors: Pascale Gaudet; Ghislaine Argoud-Puy; Isabelle Cusin; Paula Duek; Olivier Evalet; Alain Gateau; Anne Gleizes; Mario Pereira; Monique Zahn-Zabal; Catherine Zwahlen; Amos Bairoch; Lydie Lane
Journal: J Proteome Res Date: 2012-12-03 Impact factor: 4.466

5. Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

Authors: Sun Kim; Won Kim; Chih-Hsuan Wei; Zhiyong Lu; W John Wilbur
Journal: Database (Oxford) Date: 2012-11-17 Impact factor: 3.451

6. Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

Authors: Zhiyong Lu; Lynette Hirschman
Journal: Database (Oxford) Date: 2012-11-17 Impact factor: 3.451

7. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.

Authors: Martin Krallinger; Miguel Vazquez; Florian Leitner; David Salgado; Andrew Chatr-Aryamontri; Andrew Winter; Livia Perfetto; Leonardo Briganti; Luana Licata; Marta Iannuccelli; Luisa Castagnoli; Gianni Cesareni; Mike Tyers; Gerold Schneider; Fabio Rinaldi; Robert Leaman; Graciela Gonzalez; Sergio Matos; Sun Kim; W John Wilbur; Luis Rocha; Hagit Shatkay; Ashish V Tendulkar; Shashank Agarwal; Feifan Liu; Xinglong Wang; Rafal Rak; Keith Noto; Charles Elkan; Zhiyong Lu; Rezarta Islamaj Dogan; Jean-Fred Fontaine; Miguel A Andrade-Navarro; Alfonso Valencia
Journal: BMC Bioinformatics Date: 2011-10-03 Impact factor: 3.169

8. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study.

Authors: Sylvain Poux; Cecilia N Arighi; Michele Magrane; Alex Bateman; Chih-Hsuan Wei; Zhiyong Lu; Emmanuel Boutet; Hema Bye-A-Jee; Maria Livia Famiglietti; Bernd Roechert; The UniProt Consortium
Journal: Bioinformatics Date: 2017-11-01 Impact factor: 6.937

9. eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality.

Authors: Ruoyao Ding; Emmanuel Boutet; Damien Lieberherr; Michel Schneider; Michael Tognolli; Cathy H Wu; K Vijay-Shanker; Cecilia N Arighi
Journal: Database (Oxford) Date: 2017-01-01 Impact factor: 3.451