| Literature DB >> 24451626 |
Philip Jones1, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, Sebastien Pesseat, Antony F Quinn, Amaia Sangrador-Vegas, Maxim Scheremetjew, Siew-Yit Yong, Rodrigo Lopez, Sarah Hunter.
Abstract
MOTIVATION: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24451626 PMCID: PMC3998142 DOI: 10.1093/bioinformatics/btu031
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Comparison of the processing steps used by two different member database applications, TMHMM and Pfam
Fig. 2.Overall system architecture of InterProScan 5
Fig. 3.Use of JMS to manage allocation of jobs across a compute resource. This figure shows the primary tier of Master JVM-spawned workers. Jobs are added to a RequestQueue by the Master JVM, and any available worker JVMs will poll this queue to request work
Fig. 4.Portion of the graphical output from InterProScan 5. This view of a protein’s match data is the same in both the HTML and SVG formats