| Literature DB >> 30357350 |
Sara El-Gebali1, Jaina Mistry1, Alex Bateman1, Sean R Eddy2, Aurélien Luciani1, Simon C Potter1, Matloob Qureshi1, Lorna J Richardson1, Gustavo A Salazar1, Alfredo Smart1, Erik L L Sonnhammer3, Layla Hirsh4,5, Lisanna Paladin4, Damiano Piovesan4, Silvio C E Tosatto4, Robert D Finn1.
Abstract
The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30357350 PMCID: PMC6324024 DOI: 10.1093/nar/gky995
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The modification of the domain structure of PDB:4z7k (UniProt: A4FXZ3) between Pfam releases 31.0 and 32.0. The structure of PDB:4z7k is represented as a ribbon cartoon of the c-α backbone, with PF17262 coloured in pink. Regions not covered by Pfam are coloured grey. The new Pfam entry PF17955 is coloured blue. In release 31.0 (left panel), the domain boundaries for PF17262 (pink, PDB residues 67–217) extend into the N-terminal structural domain. The coverage of the same structure by Pfam 32.0 (right panel). The C-terminal domain, PF17262 (PDB residues 107–218) boundaries have been corrected and renamed from DUF5328 to Cas6b_C. A new Pfam entry PF17955, named Cas6b_N was created (blue, PDB residues 1–105) to represent the N-terminal domain.
SO terms that have been added in Pfam 32.0 for each Pfam type
| Type | SO id | SO name |
|---|---|---|
| Coiled-coil | SO:0001080 | coiled_coil |
| Disordered | SO:0100003 | intrinsically_unstructured_polypeptide_region |
| Domain | SO:0000417 | polypeptide_domain |
| Family | SO:0100021 | polypeptide_conserved_region |
| Motif | SO:0001067 | polypeptide_motif |
| Repeat | SO:0001068 | polypeptide_repeat |