| Literature DB >> 21593126 |
Robert D Finn1, Jody Clements, Sean R Eddy.
Abstract
HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.Entities:
Mesh:
Year: 2011 PMID: 21593126 PMCID: PMC3125773 DOI: 10.1093/nar/gkr367
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
A comparison of HMMER and BLAST programs for protein sequence analysis
| HMMER | BLAST | Comments | |
|---|---|---|---|
| Program | Produces similar results in terms of homolog detection. Searching with the sequence from PDB ID 2abl, chain A against PDB yields 244 matches compared with 214 matches for | ||
| Query | Single sequence | ||
| Target Database | Sequence database | ||
| Program | Typically used for detection of domains on a sequence. Profile HMMs are used by the majority of protein family databases. Both are run as by default as part of the | ||
| Query | Single sequence | ||
| Target Database | Profile HMM database, e.g. Pfam | PSSM database, e.g. CDD | |
| Program | Not applicable | There is no equivalent to | |
| Query | Profile HMM | ||
| Target Database | Sequence database | ||
| Program | Both are used to iteratively search sequence databases. Subsequent iterations use the significantly scoring sequences from the previous round as input data. | ||
| Query | Single sequence | ||
| Target database | Sequence database | ||
Figure 1.(a) A screen shot of the phmmer search results page, using the sequence IMDH1_HUMAN (UniProtKB) and default parameters. The results from the ‘Pfam search’ are shown as both a graphic and as a table, which has been revealed in this figure. Below this is a hit distribution graph and a table of the results from the phmmer search. Key features in the table are labeled and discussed in the text. This search resulted in over 5000 sequence matches, which are accessible either by going through the paginated table or can be navigated using the hit distribution histogram. (b) An enlarged version of the hit distribution graph, showing the taxonomic ranges of sequences matched in the search. The tool tip indicates the E-value range represented by the bar, and the number of sequences from the search that fall within the range.(c) As more and more sequences are being deposited, the large comprehensive sequence databases contain increasing numbers of duplicate sequences. There is no need to show an identical sequence match several times, but the annotation assigned to these duplicate sequences can differ. Thus, we indicate when we have not displayed identical sequences with a number in the results table (Figure 1a). When this number is clicked, a ‘pop-up’ displays the additional annotations for the sequence numbers. As in this example, when there are more than 20 sequences, the list is paginated. (d) Example of an alignment between a match and a query. When the show link is clicked in the results table as shown in (a), the table is expanded to show the alignments between the query and the target sequence. The query is color coded according to the match line found below it in the alignment block (identical residues are colored red similar to pink). The target sequence is colored according to the posterior probability, with lighter shades of gray indicating regions where the alignment confidence is lower. Each number in the ‘PP’ line represents the probability (or alignment accuracy) that the residue in the row above is assigned to the corresponding HMM state found in the first row of the alignment block. The posterior probability is encoded as 11 possible characters 0–9*: 0.0 ≤ P < 0.05 is coded as 0, 0.05 ≤ P < 0.15 is coded as1, and so on, 0.85 ≤ P < 0.95 is coded as 9 and 0.95 ≤ P ≤ 1.0 is coded as ‘*’.
Figure 2.After the submission of a batch job, the user is taken to a table indicating the progress of the job similar to the one shown in this figure. The top bar indicates the progress of the batch job. As sequences are successfully searched, the results are immediately viewable. Pending sequence searches are indicated as ‘grayed out’ entries in the table.