| Literature DB >> 18854028 |
January Weiner1, Andrew D Moore, Erich Bornberg-Bauer.
Abstract
BACKGROUND: Creating new protein domain arrangements is a frequent mechanism of evolutionary innovation. While some domains always form the same combinations, others form many different arrangements. This ability, which is often referred to as versatility or promiscuity of domains, its a random evolutionary model in which a domain's promiscuity is based on its relative frequency of domains.Entities:
Mesh:
Year: 2008 PMID: 18854028 PMCID: PMC2588589 DOI: 10.1186/1471-2148-8-285
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Different measures used for asserting domain combination tendencies
| Measure | Abbr. | Description | Reference |
|---|---|---|---|
| Co-occurrence | Number of domains that are found at least once in the same proteins as the given domain | [ | |
| Number of neighbours | Number of direct neighbours found for a given domain | [ | |
| Number of triplets | For a given domain | [ | |
| Weighted bigram frequency index | π | See original paper for exact definition | [ |
| Domain versatility index | Strength of the relationship between the number of occurrences and the number of neighbours | this study |
Figure 1Comparing different measures of domain promiscuity. Comparison of the different measures of versatility showing that they are correlated with the number of occurrences of a domain. Data were obtained from Pfam (for details refer to methods). Each point represents a different domain. Left, correlation with the number of occurrences of a domain. Right, correlation with the number of immediate neighbours. N – number of occurrences, NCO – co-occurrences, NN – number of direct neighbours, NTRP – number of triplets. Spearman rank correlation coefficients between the different measures are given in the respective panels.
Figure 2The relationship between N and NN for selected examples. A) Correlation between the number of occurrences (N) and number of neighbours (NN) for the methyltransferase domain (PF08241) and the Sushi domain (PF00084) (corrected for repeats, see Methods, DVI calculation). Each data point corresponds to the number of occurrences and the number of neighbours that a domain has in one genome. B) Correlation between the number of occurrences (N) and number of neighbours (NN) for selected domains. Each data point the corresponds to the number of occurrences and the number of neighbours that a domain has in one genome. Domain ID, description and DV I are given in the left upper corner of the respective graph. For a definition of DV I, see section "The domain versatility index".
Figure 3Examplary calculation of the DVI. Exemplary calculation of the DV I. Sets of proteins belonging to two distinct genomes are indicated as strings of domains represented by boxes in the top left. The occurrence of two exemplary domains, A and B, is displayed in the table, along with two measures of domain promiscuity. N denotes the total occurrence, NN the total number of direct neighbours and NCO the total number of co-occurrences for a given domain in its respective genome. Grey shaded fields within the NN and NCO fields indicate the specific domains that yield the respective values. In essence, the DV I represents the strength of the relationship between N and NN, indicated by the graph to the right. Each line represents a domain as indicated by associated boxes. The slope for the two domains, A and B, signifies the DV I. The desired unlinking of the versatility measurement from the total occurrence is clearly illustrated; despite the overall lower occurrence of domain B, it tends to form new combinations more readily indicated by the steeper slope in the relationship between N and NN.
Domain Versatility Indices (DV I) for 30 selected Pfam A domains. DV I, the domain versatility index. Err, the calculated error of the regression coefficient. Description, description as taken from the Pfam database.
| Domains with a low | |||
|---|---|---|---|
| Domain | DVI | ± | Description |
| PF02861.9 | 0.231 | 0.015 | Clp_N; Clp amino terminal domain |
| PF06815.2 | 0.236 | 0.014 | RVT_connect; Reverse transcriptase connection domain |
| PF00353.9 | 0.244 | 0.022 | HemolysinCabind; Hemolysin-type calcium-binding repeat (2 copies) |
| PF00030.8 | 0.269 | 0.021 | Crystall; Beta/Gamma crystallin |
| PF03130.5 | 0.275 | 0.020 | HEAT_PBS; PBS lyase HEAT-like repeat |
| PF00009.15 | 0.276 | 0.006 | GTP_EFTU; Elongation factor Tu GTP binding domain |
| PF00402.7 | 0.282 | 0.020 | Calponin; Calponin family repeat |
| PF00954.11 | 0.288 | 0.028 | S_locus glycop; S-locus glycoprotein family |
| PF00228.9 | 0.296 | 0.018 | Bowman-Birk_leg; Bowman-Birk serine protease inhibitor family |
| PF02012.9 | 0.306 | 0.030 | BNR; BNR/Asp-box repeat |
| Domains with a medium | |||
| Domain | DVI | ± | Description |
| PF02362.12 | 0.626 | 0.014 | B3; B3 DNA binding domain |
| PF01825.10 | 0.627 | 0.035 | GPS; Latrophilin/CL-1-like GPS domain |
| PF07721.4 | 0.627 | 0.025 | TPR_4; Tetratricopeptide repeat |
| PF01302.13 | 0.628 | 0.031 | CAP_GLY; CAP-Gly domain |
| PF00176.12 | 0.630 | 0.009 | SNF2_N; SNF2 family N-terminal domain |
| PF00567.14 | 0.631 | 0.039 | TUDOR; Tudor domain |
| PF01390.8 | 0.631 | 0.044 | SEA; SEA domain |
| PF07686.5 | 0.632 | 0.016 | V-set; Immunoglobulin V-set domain |
| PF00067.11 | 0.633 | 0.012 | p450; Cytochrome P450 |
| PF00165.11 | 0.633 | 0.009 | HTH_AraC; Bacterial regulatory helix-turn-helix proteins, AraC family |
| PF00249.19 | 0.635 | 0.014 | Myb_DNA-binding; Myb-like DNA-binding domain |
| Domains with a high | |||
| Domain | DVI | ± | Description |
| PF00004.18 | 0.826 | 0.006 | AAA; ATPase family associated with various cellular activities (AAA) |
| PF00583.13 | 0.827 | 0.006 | Acetyltransf_1; Acetyltransferase (GNAT) family |
| PF00250.7 | 0.828 | 0.011 | Fork_head; Fork head domain |
| PF01926.11 | 0.828 | 0.011 | MMR_HSR1; GTPase of unknown function |
| PF00001.11 | 0.830 | 0.011 | 7tm_1; 7 transmembrane receptor (rhodopsin family) |
| PF01464.9 | 0.846 | 0.025 | SLT; Transglycosylase SLT domain |
| PF04055.9 | 0.857 | 0.006 | Radical_SAM; Radical SAM superfamily |
| PF00496.11 | 0.858 | 0.015 | SBP_bac_5; Bacterial extracellular solute-binding proteins, family 5 Middle |
| PF02393.6 | 0.872 | 0.040 | US22; US22 like |
| PF08241.1 | 0.911 | 0.007 | Methyltransf_11; Methyltransferase domain |
Figure 4The relationship between the DVI, domain position and domain age. Left: Domain age and the DV I. OLD – domains that are common to all three main branches of life (Bacteria, Archea, Eukaryota); MID – domains that are present in all taxons of one of these branches (e.g. domains that can be found only in Bacteria, but not in Archea or Eukaryota); NEW – domains that are present only in one subgroup of one of these branches (e.g. domains that occur only in vertebrates). Right: DV I and position of the domain within the protein. NTERM – N-terminal domains; NTERM1 – next-to N-terminal domains in proteins with four domains or more; CTERM – C-terminal domains; CTERM1 – next-to N-terminal domains in proteins with four domains or more; MID – all remaining (non-terminal) domains; SINGLE – domains in single-domain proteins. On the y axis, domain versatility index (DV I). Bold line denotes the median; boxes denote the firstand second quartiles; whiskers show the minimum and maximum values not including outliers.