| Literature DB >> 19920124 |
Robert D Finn1, Jaina Mistry, John Tate, Penny Coggill, Andreas Heger, Joanne E Pollington, O Luke Gavin, Prasad Gunasekaran, Goran Ceric, Kristoffer Forslund, Liisa Holm, Erik L L Sonnhammer, Sean R Eddy, Alex Bateman.
Abstract
Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).Entities:
Mesh:
Year: 2009 PMID: 19920124 PMCID: PMC2808889 DOI: 10.1093/nar/gkp985
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Clans which have been merged between Pfam release 23.0 and Pfam release 24.0
| Clan | Description |
|---|---|
| CL0008 (DEAD-like superfamily) | All members of this clan have been moved to CL0023 (P-loop containing NTP hydrolase superfamily) |
| CL0017 (G-protein superfamily) | All members of this clan have been moved to CL0023 (P-loop containing NTP hydrolase superfamily) |
| CL0019 (Armadillo repeat superfamily) | All members of this clan have been moved to CL0020 (TPR repeat superfamily) |
| CL0024 (Reverse transcriptase superfamily) | All members of this clan have been moved to CL0027 (RNA dependent RNA polymerase superfamily) |
| CL0102 (Methyltransferase superfamily) | All members of this clan have been moved to CL0063 (FAD/NAD(P)-binding Rossmann fold superfamily) |
| CL0138 (Chemoreceptor superfamily) | All members of this clan have been moved to CL0192 (Family A G protein-coupled receptor-like superfamily) |
| CL0150 (Peptidase MX superfamily) | All members of this clan have been moved to CL0126 (Peptidase MA superfamily) |
| CL0152 (Xylose isomerase-like TIM barrel superfamily) | All members of this clan have been moved to CL0036 (Common phosphate binding-site TIM barrel superfamily) |
| CL0185 (Frizzled/OA1/CAR/Secretin receptor-like superfamily) | All members of this clan have been moved to CL0192 (Family A G protein-coupled receptor-like superfamily) |
| CL0211 (GDE-like sugar enzyme superfamily) | All members of this clan have been moved to CL0059 (Six-hairpin glycosidase superfamily) |
| CL0216 (DNA recombination protein RecA-like superfamily) | All members of this clan have been moved to CL0023 (P-loop containing NTP hydrolase superfamily) |
| CL0253 (DsbD like superfamily) | All members of this clan have been moved to CL0292 (LysE transporter superfamily) |
Residue and sequence coverage of a number of complete proteomes in Pfam 24.0, with the percentage points change between Pfam releases 23.0 and 24.0 given in brackets. Archaeal species are coloured pale red, bacterial orange and eukaryotic species purple
Figure 1.Sequence search results page. Results page for a single sequence search, showing at the top, the graphic of the domains matched by the query sequence along its length, with any active-site or metal-binding residues marked up if present. Underneath comes, firstly, the significant matches to Pfam-A families, then the insignificant matches to Pfam-A families, followed by the significant matches to Pfam-B families. At the bottom is the expanded match results with the #HMM line coloured such that residues identical to those in the query are coloured cyan and those that are similar in dark blue, and a #PP (posterior probability) line giving the posterior-probabilities at each point such that the #SEQ, query, line is colour-coded accordingly.
Figure 2.New Pfam display of a protein domain architecture. Pfam-A families classified as type ‘family’ and ‘domain’ with a lozenge shape, and families with type ‘repeat’ or ‘motif’ are represented by rectangles. The alignment co-ordinates are depitcted with a solid colour, and the envelope co-ordianates in a lighter shade of this colour. Where the profile HMM match for a domain or family is only of partial length, the curved end of the lozenge/rectangle is replaced by a jagged edge. Active-site residues are marked with a lollipop with a diamond-shaped head. An example tooltip showing the domain description, co-ordinates and source is shown for the fourth domain. Note the overlapping envelopes between fourth and fifth domains.
Figure 3.New alignment confidence display. The colour of the residues reflects the alignment uncertainty, and is based on the posterior probability that is calculated by HMMER3. A green residue indicates a high posterior probability which means that the alignment of the amino acid to the match/insert state in the profile HMM is very likely to be correct. Where the posterior probablity is lower, and therefore the alignment certainty decreases, the colour becomes closer to red. This allows users quickly to identify regions of the alignment where some sequences are aligned with less certainty.
Figure 4.New BioLit/TOPSAN views. Left: using the webservices provided by BioLit, we display the abstract, figures and figure legends from the publication associated with a particular PDB entry (only where articles are published in open access journals). In this case, we have retrieved open access articles that reference the PDB entry 1dan. Right: using the webservices provided by TOPSAN, we display images and text from the TOPSAN wiki, and a link so that users can contribute to the TOPSAN wiki. In this example, we show the information contained in TOPSAN describing PDB entry 1kq3.