| Literature DB >> 20461154 |
Paul Bobby1, Seetharaman Balaji, Variath Sathyanath, Santhosh J Eapen.
Abstract
UNLABELLED: The recognition of gene/protein names in literature is one of the pivotal steps in the processing of biological literatures for information extraction or data mining. We have compiled a lexicon of biomedical words (conserved patterns/ potential motifs) which has the combination of only 20 alphabets of amino acids. The remaining 6 letters of the English alphabets (B, J, O, U, X, Z) are treated as invalid amino acid characters (to our context), We have jumbled the 6 letters for the sake of usage and convenience and termed as 'JUZBOX' and these characters were filtered in the biomedical lexicon. Undoubtedly, the generation of biomedical words from protein sequence using JUZBOX have applications specific for functional annotation. AVAILABILITY: JUZBOX is available freely at http://www.spices.res.in/juzbox.Entities:
Keywords: JUZBOX; biomedical words; lexicon
Year: 2009 PMID: 20461154 PMCID: PMC2859571 DOI: 10.6026/97320630004179
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Flowchart illustrating the methodology (in anticlock wise direction)
Figure 2JUZBOX sequence input
Figure 3JUZBOX result page