Literature DB >> 24403538

Allerdictor: fast allergen prediction using text classification techniques.

Ha X Dang1, Christopher B Lawrence2.   

Abstract

MOTIVATION: Accurately identifying and eliminating allergens from biotechnology-derived products are important for human health. From a biomedical research perspective, it is also important to identify allergens in sequenced genomes. Many allergen prediction tools have been developed during the past years. Although these tools have achieved certain levels of specificity, when applied to large-scale allergen discovery (e.g. at a whole-genome scale), they still yield many false positives and thus low precision (even at low recall) due to the extreme skewness of the data (allergens are rare). Moreover, the most accurate tools are relatively slow because they use protein sequence alignment to build feature vectors for allergen classifiers. Additionally, only web server implementations of the current allergen prediction tools are publicly available and are without the capability of large batch submission. These weaknesses make large-scale allergen discovery ineffective and inefficient in the public domain.
RESULTS: We developed Allerdictor, a fast and accurate sequence-based allergen prediction tool that models protein sequences as text documents and uses support vector machine in text classification for allergen prediction. Test results on multiple highly skewed datasets demonstrated that Allerdictor predicted allergens with high precision over high recall at fast speed. For example, Allerdictor only took ∼6 min on a single core PC to scan a whole Swiss-Prot database of ∼540 000 sequences and identified <1% of them as allergens.
AVAILABILITY AND IMPLEMENTATION: Allerdictor is implemented in Python and available as standalone and web server versions at http://allerdictor.vbi.vt.edu CONTACT: lawrence@vbi.vt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24403538      PMCID: PMC3982160          DOI: 10.1093/bioinformatics/btu004

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  23 in total

1.  Allergenicity prediction by protein sequence.

Authors:  Michael B Stadler; Beda M Stadler
Journal:  FASEB J       Date:  2003-04-22       Impact factor: 5.191

2.  SORTALLER: predicting allergens using substantially optimized algorithm on allergen family featured peptides.

Authors:  Lida Zhang; Yuyi Huang; Zehong Zou; Ying He; Ximo Chen; Ailin Tao
Journal:  Bioinformatics       Date:  2012-06-12       Impact factor: 6.937

3.  WebAllergen: a web server for predicting allergenic proteins.

Authors:  Tariq Riaz; Hen Ley Hor; Arun Krishnan; Francis Tang; Kuo-Bin Li
Journal:  Bioinformatics       Date:  2005-03-03       Impact factor: 6.937

4.  Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties.

Authors:  Juan Cui; Lian Yi Han; Hu Li; Choong Yong Ung; Zhi Qun Tang; Chan Juan Zheng; Zhi Wei Cao; Yu Zong Chen
Journal:  Mol Immunol       Date:  2006-03-23       Impact factor: 4.407

5.  A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays.

Authors:  Zia Khan; Joshua S Bloom; Leonid Kruglyak; Mona Singh
Journal:  Bioinformatics       Date:  2009-04-23       Impact factor: 6.937

6.  Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

Authors:  Gregory S Ladics; Robert F Cressman; Corinne Herouet-Guicheney; Rod A Herman; Laura Privalle; Ping Song; Jason M Ward; Scott McClain
Journal:  Regul Toxicol Pharmacol       Date:  2011-02-12       Impact factor: 3.271

7.  Predicting allergenic proteins using wavelet transform.

Authors:  Kuo-Bin Li; Praveen Issac; Arun Krishnan
Journal:  Bioinformatics       Date:  2004-04-29       Impact factor: 6.937

8.  UniProt Knowledgebase: a hub of integrated protein data.

Authors:  Michele Magrane
Journal:  Database (Oxford)       Date:  2011-03-29       Impact factor: 3.451

9.  AlgPred: prediction of allergenic proteins and mapping of IgE epitopes.

Authors:  Sudipto Saha; G P S Raghava
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2009-11-12       Impact factor: 16.971

View more
  11 in total

1.  Distinguishing allergens from non-allergenic homologues using Physical-Chemical Property (PCP) motifs.

Authors:  Wenzhe Lu; Surendra S Negi; Catherine H Schein; Soheila J Maleki; Barry K Hurlburt; Werner Braun
Journal:  Mol Immunol       Date:  2018-04-06       Impact factor: 4.407

2.  The Alternaria genomes database: a comprehensive resource for a fungal genus comprised of saprophytes, plant pathogens, and allergenic species.

Authors:  Ha X Dang; Barry Pryor; Tobin Peever; Christopher B Lawrence
Journal:  BMC Genomics       Date:  2015-03-25       Impact factor: 3.969

3.  De Novo Transcriptome Analysis and Detection of Antimicrobial Peptides of the American Cockroach Periplaneta americana (Linnaeus).

Authors:  In-Woo Kim; Joon Ha Lee; Sathiyamoorthy Subramaniyam; Eun-Young Yun; Iksoo Kim; Junhyung Park; Jae Sam Hwang
Journal:  PLoS One       Date:  2016-05-11       Impact factor: 3.240

4.  In silico proposition to predict cluster of B- and T-cell epitopes for the usefulness of vaccine design from invasive, virulent and membrane associated proteins of C. jejuni.

Authors:  Tahirah Yasmin; Salma Akter; Mouly Debnath; Akio Ebihara; Tsutomu Nakagawa; A H M Nurun Nabi
Journal:  In Silico Pharmacol       Date:  2016-07-04

Review 5.  Alternaria Toxins: Potential Virulence Factors and Genes Related to Pathogenesis.

Authors:  Mukesh Meena; Sanjay K Gupta; Prashant Swapnil; Andleeb Zehra; Manish K Dubey; Ram S Upadhyay
Journal:  Front Microbiol       Date:  2017-08-08       Impact factor: 5.640

6.  A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins.

Authors:  Liyang Wang; Dantong Niu; Xinjie Zhao; Xiaoya Wang; Mengzhen Hao; Huilian Che
Journal:  Foods       Date:  2021-04-09

7.  Predicting cancerlectins by the optimal g-gap dipeptides.

Authors:  Hao Lin; Wei-Xin Liu; Jiao He; Xin-Hui Liu; Hui Ding; Wei Chen
Journal:  Sci Rep       Date:  2015-12-09       Impact factor: 4.379

8.  Investigation of immunogenic properties of Hemolin from silkworm, Bombyx mori as carrier protein: an immunoinformatic approach.

Authors:  Veeranarayanan Surya Aathmanathan; Nattarsingam Jothi; Vijay Kumar Prajapati; Muthukalingan Krishnan
Journal:  Sci Rep       Date:  2018-05-03       Impact factor: 4.379

9.  Genomes of trombidid mites reveal novel predicted allergens and laterally transferred genes associated with secondary metabolism.

Authors:  Xiaofeng Dong; Kittipong Chaisiri; Dong Xia; Stuart D Armstrong; Yongxiang Fang; Martin J Donnelly; Tatsuhiko Kadowaki; John W McGarry; Alistair C Darby; Benjamin L Makepeace
Journal:  Gigascience       Date:  2018-12-01       Impact factor: 6.524

10.  Molecular evidence of hybridization between pig and human Ascaris indicates an interbred species complex infecting humans.

Authors:  Alice Easton; Shenghan Gao; Scott P Lawton; Sasisekhar Bennuru; Asis Khan; Eric Dahlstrom; Rita G Oliveira; Stella Kepha; Stephen F Porcella; Joanne Webster; Roy Anderson; Michael E Grigg; Richard E Davis; Jianbin Wang; Thomas B Nutman
Journal:  Elife       Date:  2020-11-06       Impact factor: 8.140

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.