Literature DB >> 29036277

PySeqLab: an open source Python package for sequence labeling and segmentation.

Ahmed Allam1, Michael Krauthammer1,2.   

Abstract

MOTIVATION: Text and genomic data are composed of sequential tokens, such as words and nucleotides that give rise to higher order syntactic constructs. In this work, we aim at providing a comprehensive Python library implementing conditional random fields (CRFs), a class of probabilistic graphical models, for robust prediction of these constructs from sequential data.
RESULTS: Python Sequence Labeling (PySeqLab) is an open source package for performing supervised learning in structured prediction tasks. It implements CRFs models, that is discriminative models from (i) first-order to higher-order linear-chain CRFs, and from (ii) first-order to higher-order semi-Markov CRFs (semi-CRFs). Moreover, it provides multiple learning algorithms for estimating model parameters such as (i) stochastic gradient descent (SGD) and its multiple variations, (ii) structured perceptron with multiple averaging schemes supporting exact and inexact search using 'violation-fixing' framework, (iii) search-based probabilistic online learning algorithm (SAPO) and (iv) an interface for Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the limited-memory BFGS algorithms. Viterbi and Viterbi A* are used for inference and decoding of sequences. Using PySeqLab, we built models (classifiers) and evaluated their performance in three different domains: (i) biomedical Natural language processing (NLP), (ii) predictive DNA sequence analysis and (iii) Human activity recognition (HAR). State-of-the-art performance comparable to machine-learning based systems was achieved in the three domains without feature engineering or the use of knowledge sources.
AVAILABILITY AND IMPLEMENTATION: PySeqLab is available through https://bitbucket.org/A_2/pyseqlab with tutorials and documentation. CONTACT: ahmed.allam@yale.edu or michael.krauthammer@yale.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2017        PMID: 29036277      PMCID: PMC5872256          DOI: 10.1093/bioinformatics/btx451

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  3 in total

1.  nhKcr: a new bioinformatics tool for predicting crotonylation sites on human nonhistone proteins based on deep learning.

Authors:  Yong-Zi Chen; Zhuo-Zhi Wang; Yanan Wang; Guoguang Ying; Zhen Chen; Jiangning Song
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

2.  Neural networks versus Logistic regression for 30 days all-cause readmission prediction.

Authors:  Ahmed Allam; Mate Nagy; George Thoma; Michael Krauthammer
Journal:  Sci Rep       Date:  2019-06-26       Impact factor: 4.379

3.  Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition.

Authors:  Wangjin Lee; Jinwook Choi
Journal:  BMC Med Inform Decis Mak       Date:  2019-07-15       Impact factor: 2.796

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.