| Literature DB >> 25614388 |
Zachary Chiang, Ake Vastermark, Marco Punta, Penelope C Coggill, Jaina Mistry, Robert D Finn, Milton H Saier.
Abstract
Transport systems comprise roughly 10% of all proteins in a cell, playing critical roles in many processes. Improving and expanding their classification is an important goal that can affect studies ranging from comparative genomics to potential drug target searches. It is not surprising that different classification systems for transport proteins have arisen, be it within a specialized database, focused on this functional class of proteins, or as part of a broader classification system for all proteins. Two such databases are the Transporter Classification Database (TCDB) and the Protein family (Pfam) database. As part of a long-term endeavor to improve consistency between the two classification systems, we have compared transporter annotations in the two databases to understand the rationale for differences and to improve both systems. Differences sometimes reflect the fact that one database has a particular transporter family while the other does not. Differing family definitions and hierarchical organizations were reconciled, resulting in recognition of 69 Pfam 'Domains of Unknown Function', which proved to be transport protein families to be renamed using TCDB annotations. Of over 400 potential new Pfam families identified from TCDB, 10% have already been added to Pfam, and TCDB has created 60 new entries based on Pfam data. This work, for the first time, reveals the benefits of comprehensive database comparisons and explains the differences between Pfam and TCDB.Entities:
Keywords: Pfam; TCDB; data integration; transport protein classification
Mesh:
Substances:
Year: 2015 PMID: 25614388 PMCID: PMC4570203 DOI: 10.1093/bib/bbu053
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Graphical overview of the mapping of the Pfam DMT clan to TCDB’s DMT superfamily. (A) Each segment in the circle represents one of the 20 Pfam families found in the DMT clan (release 27.0). To the left of the circle is the full list of accession numbers for families in this clan. The first segment at 12 o’clock (with black background) corresponds to the first family in the list starting from the top (i.e. PF00892). PF00892 maps to 18 subfamilies in the DMT superfamily in TCDB, hence the number 18 reported in this segment. Moving clockwise from PF00892, we find all other families in the list, from PF01545 (mapping to eight subfamilies in TCDB) to PF08627 (mapping to a single TCDB subfamily). Different shades of gray are meant to help identify the families. The diamond-shaped lollipops indicate Pfam families that have been used to create new subfamilies in the TCDB DMT superfamily: PF04342 was used to create subfamily 2.A.7.34 and PF10639 to create subfamilies 2.A.7.32 and 2.A.7.33. The pin-shaped lollipops represent families for which TCDB is considering creating hyperlinks to the DMT superfamily (Zip (PF02535) and Cation efflux (PF01545), see main text). (B) Each segment in the circle represents one of the 20 Pfam families found in the DMT clan. The order of the families is the same as outlined in (A); however, segments have sizes that correspond to the percentage of DMT domains found in that family (of the total number of domains for the whole DMT clan). For example, family PF00892 comprises about 66 000 domains, or almost 58% of the clan’s domains.
Figure 2Complex relationships between families in TCDB and Pfam. (A) The same domain in Pfam, the 2 TMS repeat unit Mito_carr (PF00153), covers sequences that are parts of functionally diverse subfamilies in TCDB: 2.A.29.1, three copies of Mito_carr; 2.A.29.2, also three copies of Mito_carr, etc. (B) The same domain in Pfam, CBS (PF00571) (soluble), is found in systems belonging to two different TCDB classes (primary active and secondary carriers): 3.A.1.12.1 {on the left: three components, [OpuAC (PF04069)—soluble], [ABC_tran (PF00005)—soluble and CBS (PF00571)—soluble], [BPD_transp_1 (PF00528)—membrane inserted]} and 2.A.49.1.1 {on the right: one component, [Voltage_CLC (PF00654)—multispanning membrane-inserted] and [CBS (PF00571)—soluble]}. (C) The Cytochrome Ba3 oxidase three component system is represented by a single entry in TCDB (3.D.4.2.1), while in Pfam each of its three constituent chains maps to one or more families annotated as evolutionary unrelated. From left to right: CoxIIa (PF08113), COX2-transmemb (PF09125) and COX2 (PF00116, soluble), COX1 (PF00115). (D) A single component system in TCDB (VIC superfamily member, 1.A.1.2.3) maps to multiple domains in Pfam: Shal-type (PF11601, N-terminal), BTB_2 (PF02214), Ion_trans (PF00520, membrane-inserted) and DUF3399 (PF11879, C-terminal); only Ion_trans is a transmembrane domain.