| Literature DB >> 19333395 |
Sankaran Sandhya1, Saane Sudha Rani, Barah Pankaj, Madabosse Kande Govind, Bernard Offmann, Narayanaswamy Srinivasan, Ramanathan Sowdhamini.
Abstract
BACKGROUND: Related protein domains of a superfamily can be specified by proteins of diverse lengths. The structural and functional implications of indels in a domain scaffold have been examined.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19333395 PMCID: PMC2659687 DOI: 10.1371/journal.pone.0004981
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distributions of domain length variations in members of the 353 multi-membered PASS2 domain superfamilies.
The degree of length variation for every member from the mean domain size of its superfamily was calculated by expressing as a ratio the length difference of each member to its mean domain size.
List of ‘length-deviant’ domain superfamilies.
| S.No | Superfamily (SCOP Superfamily code) | PASS2 members | Av domain size | Std. Deviation | No of SCOP families | Super fold |
|
|
| 26 | 197 | 38.5 | 21 | Y |
|
|
| 14 | 71 | 15 | 1(38) | Y |
|
|
| 5 | 100 | 33.4 | 6 | Y |
|
|
| 6 | 166 | 51.1 | 2(32) | - |
|
|
| 10 | 99 | 14.2 | 4(62) | - |
|
|
| 13 | 99 | 13.7 | 2(23) | Y |
|
|
| 39 | 112 | 37.9 | 14 | Y |
|
|
| 30 | 225 | 34.8 | 4(99) | - |
|
|
| 7 | 119 | 28 | 3(19) | - |
|
|
| 5 | 127 | 38.6 | 10 | - |
|
|
| 12 | 78 | 15.9 | 5 | - |
|
|
| 6 | 38 | 8.3 | 1 (10) | - |
|
|
| 5 | 85 | 20.5 | 2 (11) | - |
|
|
| 42 | 122 | 31.4 | 20 | Y |
|
|
| 46 | 360 | 67.4 | 14 | Y |
|
|
| 5 | 341 | 122.5 | 6 | Y |
|
|
| 6 | 240 | 55.2 | 6 | - |
|
|
| 42 | 109 | 34.7 | 22 | - |
|
|
| 63 | 221 | 49 | 24 | Y |
|
|
| 12 | 234 | 59.7 | 5 | - |
|
|
| 7 | 153 | 34 | 5 | - |
|
|
| 21 | 238 | 38.9 | 52 | - |
|
|
| 13 | 251 | 37 | 15 | - |
|
|
| 39 | 354 | 97.4 | 35 | Y |
|
|
| 7 | 400 | 119.9 | 4(14) | - |
|
|
| 13 | 376 | 79.4 | 1 (15) | - |
|
|
| 15 | 255 | 55.7 | 2(69) | Y |
|
|
| 12 | 193 | 54.4 | 2(22) | - |
|
|
| 8 | 176 | 49.7 | 1 (16) | - |
|
|
| 9 | 278 | 111.4 | 16 | - |
|
|
| 11 | 134 | 31.7 | 12 | - |
|
|
| 11 | 101 | 16.3 | 7 | - |
|
|
| 13 | 143 | 45.8 | 1 (12) | - |
|
|
| 6 | 95 | 19.9 | 2 (12) | - |
|
|
| 8 | 95 | 36.7 | 6 | Y |
|
|
| 7 | 155 | 38.4 | 4(15) | - |
|
|
| 22 | 120 | 19.7 | 8 | - |
|
|
| 10 | 194 | 57.3 | 9 | Y |
|
|
| 12 | 259 | 119.7 | 5 | Y |
|
|
| 22 | 142 | 23.2 | 3 (38) | Y |
|
|
| 35 | 125 | 40.4 | 11 | - |
|
|
| 6 | 76 | 22.9 | 2(8) | Y |
|
|
| 6 | 308 | 71.3 | 4(10) | - |
|
|
| 9 | 369 | 222.8 | 22 | Y |
|
|
| 9 | 202 | 92 | 4 (23) | Y |
|
|
| 7 | 136 | 28.8 | 3 (11) | - |
|
|
| 7 | 184 | 37.7 | 7 | - |
|
|
| 32 | 146 | 33.7 | 7 | - |
|
|
| 31 | 227 | 73 | 9 | - |
|
|
| 22 | 101 | 24.3 | 8 | - |
|
|
| 6 | 191 | 86.7 | 12 | - |
|
|
| 20 | 313 | 187 | 3(4) | - |
|
|
| 8 | 243 | 112.6 | 20 | Y |
|
|
| 14 | 194 | 37 | 2 (39) | - |
|
|
| 7 | 205 | 36.0 | 13 | |
|
|
| 32 | 64 | 13.6 | 17 | Y |
|
|
| 6 | 92 | 20.6 | 3 (13) | Y |
|
|
| 5 | 90 | 20.4 | 7 | - |
|
|
| 12 | 88 | 29.7 | 4 (38) | - |
|
|
| 5 | 75 | 40.4 | 8 | - |
|
|
| 48 | 88 | 20.7 | 68 | Y |
|
|
| 49 | 183 | 39.7 | 12 | Y |
|
|
| 5 | 215 | 40.6 | 4(6) | - |
|
|
| 9 | 187 | 60.9 | 1(32) | - |
[Standard deviations are good indicators of the extent of domain length variation but were not the sole criteria employed in classifying superfamilies as ‘length-deviant’ and ‘length-rigid’. The number of members in each superfamily that showed <10% or >30% length variation were also considered and classifications were based on trends in length variations for at least 75% of member proteins]. The SCOP code of each domain superfamily is provided in addition to the superfamily name in brackets. The number of SCOP families (number of structural protein domains is provided in brackets (in last-but-one column) when number of families <5) and superfold status of the domain is also provided (last column).
List of length-rigid domain superfamilies (with at least 4 members).
| S.No | Superfamily | Av_domain size | Av_Seq Id (%) | Std_deviation | No_families |
|
|
| 417 | 21 | 31.4 | 1 (21) |
|
|
| 323 | 14 | 24.6 | 5 |
|
|
| 250 | 25 | 14.8 | 1 (32) |
|
|
| 204 | 23 | 18.4 | 6 |
|
|
| 114 | 26 | 9.6 | 1(9) |
|
|
| 145 | 7 | 7.0 | 1(13) |
|
|
| 135 | 29 | 3.2 | 3(13) |
|
|
| 133 | 24 | 7.4 | 2(20) |
|
|
| 118 | 22 | 5.0 | 2(2) |
|
|
| 94 | 29 | 4.9 | 1(2) |
|
|
| 75 | 33 | 4.6 | 5 |
|
|
| 474 | 23 | 32.6 | 2(15) |
|
|
| 299 | 15 | 18.6 | 8 |
|
|
| 254 | 23 | 20.1 | 1(6) |
|
|
| 239 | 22 | 21.2 | 11 |
|
|
| 253 | 30 | 8.2 | 2(17) |
|
|
| 167 | 30 | 8.0 | 2(33) |
|
|
| 111 | 36 | 5.1 | 1(14) |
|
|
| 151 | 36 | 10.7 | 4(41) |
|
|
| 124 | 22 | 6.6 | 2(11) |
|
|
| 87 | 29 | 7.4 | 4 (73) |
|
|
| 70 | 36 | 2.5 | 1(8) |
|
|
| 70 | 38 | 8.9 | 1(24) |
|
|
| 67 | 33 | 5.2 | 3 (15) |
Domain contexts in top-10 length-deviant protein domain superfamilies.
| S.No | Superfamily | Single chain | Multi-chain | Domain repeats | Oligomers | ||||
| Single domain | Multi domain | Repeats | Single domain | Multi domain | Repeats | ||||
| 1 | Cytochrome C | 17 | 4 | - | 5 | 1 | 6 | Y | Y |
| Domain generally specified in a single/separate chains. In multi-chain proteins, other non-self domains may be specified by individual chains. | |||||||||
| 2 | SAM like domain | 25 | 3 |
| 15 | 3 | - | N | Y |
| Some members are multi-domain proteins and involve multiple chains. Repeats are not observed in this superfamily. | |||||||||
| 3 | 6-phosphogluconate dehydrogenase C-terminal like | 1 | 5 | - | 5 | 1 | 1 | Y | Y |
| Usually a two-domain protein and involves the NADP-binding Rossmann fold. Members of Hydroxyacyl -CoA dehydrogenase protein family contain repeating copies of the structural domain. | |||||||||
| 4 | Viral proteins | - | - | 2 | - | - | 3 | Y | Y |
| Viral jelly roll, characteristic of this superfamily, repeats with varying lengths of interconnecting loops. These loops are involved in different subunit interactions. | |||||||||
| 5 | RmlC-like cupins | 14 | - | - | 17 | - | 2 | Y | Y |
| Includes members in diverse oligomeric arrangements and includes domains such as glycinins that have repeating copies of the entire cupin domain. | |||||||||
| 6 | Actin like ATPase domain | 1 | - | 7 | 1 | - | 10 | Y | Y |
| Tandem repeats of domain in a single chain are common. Members, all involving an ATP binding site, act on diverse substrates such as actin, glycerol kinase and hexokinase type I | |||||||||
| 7 | Phospholipase D | 1 | - | - | - | - | 4 | Y | Y |
| Giant members typically involve tandem repeats of entire domains. Dwarfs are single domain proteins that usually dimerize to function. | |||||||||
| 8 | PRTase-like | 2 | - | 5 | 8 | 1 | 3 | Y | Y |
| Diverse oligomeric states dictate an important role for loops of varying lengths. | |||||||||
| 9 | Lysozyme-like | 6 | - | - | 3 | 1 | 1 | N | N |
| Single domain protein on a single chain except for 1k28 (Tail associated lysozyme gp5), which has multiple domains involving multiple chains. | |||||||||
| 10 | Concanavalin A-like lectins | 15 | 6 | 4 | 12 | 3 | - | Y | Y |
| Most member proteins are involved in carbohydrate metabolism and occur as single domain proteins in a single or multiple chains. Multi-domain proteins interacting with 2–3 domains also exist. | |||||||||
|
|
|
|
|
|
|
| |||
Figure 2Members of the Cytochrome C- like domain superfamily (a–e) show two-fold length variation.
Additional residues contribute to differences in the lengths of loops around the substrate-binding site. Cytochrome-C552 (1c52–: Thermus thermophilus) acquires two β-strands that further protects the bound-heme (not shown) from solvent. All structures preserve the hydrophobic pocket involving at least three helices (shown in golden yellow) surrounding the heme group (not shown).
Role of indels in top-10 length-deviant domain superfamilies.
| S.No | Superfamily | Domain | Description | Role | Number of SCOP families |
| 1 |
| G: 1c52 (130) | Cytochrome C-552 |
|
|
| D:1c75 (70) | Cytochrome C-553 | ||||
| 2 |
| G: 1pgj (299) | 6-phospho gluconate dehydrogenase |
|
|
| D: 1dlj (97) | UDP-glucose dehydrogenase | ||||
| 3 |
| G: 1p30 (631) | Adenovirus hexons |
|
|
| D: 1hx6 (139) | |||||
| 4 |
| G: 1pmi (439) | Phosphomannose isomerase |
|
|
| D: 1dgw (177) | Canavalin | ||||
| 5 |
| G: 1f3l (320) | Arginine methyltransferase |
|
|
| D:1ej0 (179) | RNA methyltransferase (ftsj) | ||||
| 6 |
| G: 1bu6 (250) | Glycerol kinase |
|
|
| D: 1j6z (142) | Actin alpha 1 | ||||
| 7 |
| G: 1ecf (242) | Glutamine phosphoribosyl transferase |
|
|
| D: 1dkr (149) | Phosphoribosyl pyrophosphate synthetase | ||||
| 8 |
| G:1f0i (257) | Phospholipase D |
|
|
| D: 1byr (149) | Endonuclease | ||||
| 9 |
| G: 1qus (321) | Soluble lytic transglycosylase Slt35 |
|
|
| D: 1iiz (119) | Insect lysozyme | ||||
| 10 |
| G: 1dyp (266) | Kappa carrageenase |
|
|
| D: 1slt (133) | S-lectin |
Figure 3Domain members of the viral protein domain superfamily.
A single subunit of the adenovirus type 5 hexon (1rux) and P3 of the bacteriophage (1hx6) involves two viral jelly roll domains (1ruxa1, 1ruxa2 and 1hx6a1, 1hx6a2 respectively). All four members show a conservation of the structural scaffold involving the viral jelly roll (in green). The nature of structural variations acquired by each domain (in brown) varies and loop lengths vary extensively even within a subunit. Additionally, residues in adenovirus (three-fold difference in length) form a subdomain involved in more extensive subunit interactions.
Figure 4Structures of the giant and dwarf domain members of the PLD domain superfamily.
Endonuclease (1byra-) and Phospholipase D (1f0i), the dwarf and giant domains of the PLD-like superfamily adopt different oligomeric states. Phospholipase D, a pseudo-dimer (1f0ia1: (256) and 1f0ia2: (240)), shows a duplication of the core domain of Endonuclease (1byra), which is a functional dimer. The PLD domain of endonuclease represents the minimum structural scaffold for acting on the phospho-diester bond of a substrate. The core conserved strands in either structure are highlighted in green. In endonuclease, residues from two HKD motifs (in red, ball and stick) from both protomers interact with the substrate. Phospholipase D has two copies of the motif and also shows some additional structures that protect the active site from solvent and move it deeper into the protein. Active site residues involve similar residues and lie in similar structural contexts (in ball and stick).
Figure 5Domain members of the SAM domain-like superfamily.
ftsj (1ej0a-), Catechol-O methyl transferase (1vid–), VP39 (3mag–), PRMT3 (1f3la-), show insertions that do not affect the common core structural scaffold (in green). Residues that interact with the Adomet cofactor (ball and stick representation, in red) and others that interact with the different substrates (not shown) are spatially proximate and their locations are conserved across the different members. In Vp39 (3mag–), a large 100-residue insert in the C-terminus appears to shield the core scaffold. In PRMT3 (1f3la-), the truncated SAM domain acquires a large barrel-like extension at the C-terminus. This subdomain-like indel contributes some residues to substrate-binding and may adopt an auto-regulatory role by interacting with Adomet binding residues of the neighboring subunit during dimer formation.
Figure 6Lysozyme-like superfamily.
Structures of lytic murein transglycosylase b (1qusa-, 321 residues) and insect lysozyme (1iiza-, 120 residues) show a well-conserved lysozyme-like fold (in green). Lytic murein transglycosylase acquires two additional N- and C-terminal subdomain like structures that are implicated in membrane interactions (highlighted in faint pink).
Figure 7Actin-like ATPase domain superfamily.
Superposed structures of acetate kinase (242 residues, in gold) and actin alpha1 (142 residues, in blue) show that the giant member acquires longer helices. The additional helical insert observed in acetate kinase forms a closed loop that brings residues that interact with the substrate close to the Mg2+ ion binding site. In other dwarf members of the superfamily, the same residues are involved in both ion-binding and catalysis thus obviating the need for such extra structural elements. The lower panel shows a graphical projection of the alignments. Large differences in length contribute to insertions of different structural elements in either protein (Helix- red, strand – blue, coil – green, indels- magenta).