| Literature DB >> 28246462 |
Hulikal Shivashankara Santosh Kumar1, Vadlapudi Kumar2.
Abstract
The inventory of proteins used in different kingdoms appears surprisingly similar in all sequenced eukaryotic genome. Protein domains represent the basic evolutionary units that form proteins. Domain duplication and shuffling by recombination are probably the most important forces driving protein evolution and hence the complexity of the proteome. While the duplication of whole genes as well as domain encoding exons increases the abundance of domains in the proteome, domain shuffling increases versatility, i.e. the number of distinct contexts in which a domain can occur. In this study we considered five important adapter domain families namely WD40, KELCH, Ankyrin, PDZ and Pleckstrin Homology (PH domain) family for the comparison of Domain versatility, Abundance and domain sharing between them. We used ecological statistics methods such as Jaccard's Similarity Index (JSI), Detrended Correspondence Analysis, k-Means clustering for the domain distribution data. We found high propensity of domain sharing between PH and PDZ. We found higher abundance of only few selected domains in PH, PDZ, ANK and KELCH families. We also found WD40 family with high versatility and less redundant domain occurrence, with less domain sharing. Hence, the assignments of functions to more orphan WD40 proteins that will help in the identification of suitable drug targets.Entities:
Keywords: Rosetta Stones; WD40; detrended correspondence analysis; domain sharing; domain versatility
Year: 2016 PMID: 28246462 PMCID: PMC5295043 DOI: 10.6026/97320630012285
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Representative structures of the Domains studied. a. Ankyrin repeat, b. PDZ domain, c. PH domain, d. KELCH repeat and e. WD40 repeat structure.
Domain composition profile of different families under study
| S. No | Domains | WD | PDZ | PH | KELCH | ANK | S. No | Domains | WD | PDZ | PH | KELCH | ANK |
| 1 | ANK | 1 | 1 | 13 | 0 | 0 | 26 | LISH | 8 | 0 | 0 | 1 | 0 |
| 2 | ArfGap | 0 | 0 | 18 | 0 | 6 | 27 | LRR | 2 | 2 | 1 | 0 | 2 |
| 3 | B41 | 0 | 7 | 5 | 0 | 1 | 28 | MORN | 0 | 0 | 1 | 0 | 1 |
| 4 | BAR_3 | 0 | 0 | 6 | 0 | 2 | 29 | MYSc | 0 | 1 | 1 | 0 | 1 |
| 5 | BRCT | 0 | 0 | 1 | 0 | 1 | 30 | Oxysterol_BP | 0 | 0 | 11 | 0 | 1 |
| 6 | BTB | 2 | 0 | 0 | 32 | 3 | 31 | PDZ | 0 | 0 | 5 | 0 | 0 |
| 7 | C1 | 0 | 0 | 11 | 0 | 2 | 32 | PH | 0 | 7 | 0 | 0 | 5 |
| 8 | C2 | 0 | 3 | 15 | 0 | 0 | 33 | Pkinase | 1 | 0 | 0 | 0 | 4 |
| 9 | CH | 0 | 1 | 3 | 0 | 0 | 34 | Pkinase_Tyr | 1 | 0 | 0 | 0 | 2 |
| 10 | DAGKa | 0 | 0 | 3 | 0 | 2 | 35 | PTB | 0 | 2 | 0 | 0 | 2 |
| 11 | DAGKc | 0 | 0 | 4 | 0 | 2 | 36 | RA | 0 | 5 | 6 | 0 | 0 |
| 12 | dDENN | 1 | 0 | 1 | 0 | 0 | 37 | RasGEF | 0 | 2 | 3 | 0 | 0 |
| 13 | DENN | 1 | 0 | 1 | 0 | 0 | 38 | RasGEF_N | 0 | 2 | 2 | 0 | 0 |
| 14 | DEP | 0 | 6 | 4 | 0 | 0 | 39 | RBD | 0 | 2 | 2 | 0 | 0 |
| 15 | EGF | 0 | 0 | 0 | 3 | 1 | 40 | RCC1 | 1 | 0 | 1 | 0 | 1 |
| 16 | EGF_CA | 0 | 0 | 0 | 1 | 1 | 41 | RhoGAP | 0 | 2 | 13 | 0 | 0 |
| 17 | FA | 0 | 1 | 1 | 0 | 0 | 42 | RhoGEF | 0 | 3 | 35 | 0 | 0 |
| 18 | FERM_C | 0 | 4 | 1 | 0 | 0 | 43 | RING | 3 | 3 | 0 | 0 | 4 |
| 19 | FN3 | 0 | 0 | 0 | 2 | 2 | 44 | S_TK_X | 0 | 1 | 1 | 0 | 0 |
| 20 | FYVE | 3 | 0 | 6 | 0 | 1 | 45 | S_TKc | 0 | 2 | 5 | 0 | 1 |
| 21 | GRAM | 1 | 0 | 1 | 0 | 0 | 46 | SAM | 1 | 3 | 7 | 0 | 8 |
| 22 | HECTc | 1 | 0 | 0 | 0 | 2 | 47 | SH3 | 1 | 17 | 18 | 0 | 6 |
| 23 | HR1 | 0 | 1 | 2 | 0 | 0 | 48 | SOCS box | 3 | 0 | 0 | 0 | 9 |
| 24 | IQ | 1 | 1 | 3 | 0 | 2 | 49 | WD | 0 | 0 | 0 | 0 | 0 |
| 25 | KELCH | 0 | 0 | 0 | 0 | 0 | 50 | WW | 0 | 2 | 4 | 0 | 0 |
| 51 | ZU5 | 0 | 1 | 0 | 0 | 3 | |||||||
| Note: Except WD40 family, all the others has redundant occurrence of few selected domains | |||||||||||||
Figure 3Number of tethered proteins, tethered domains and percentage of shared domains and unique domains.Note that PH is highest in tethering number while KELCH has least tethering number, WD has highest percentage of unique domains.
Jaccard’s similarity index matrix. JSI values are high for PH, PDZ and ANK whereas least for WD40 and KELCH
| WD40 | PDZ | PH | KELCH | ANK | |
| WD40 | -- | -- | -- | -- | -- |
| PDZ | 0.038 | -- | -- | -- | -- |
| PH | 0.068 | 0.194 | -- | -- | -- |
| KELCH | 0.024 | 0 | 0 | -- | -- |
| ANK | 0.076 | 0.079 | 0.113 | 0.028 | -- |
Figure 2JSI based clustering of tethered domains (only significant domains shown).
Figure 5Detrended correspondence analysis (DCA) of shared domain (Black dots) across different domain families (blue dots). Note that WD40 and KELCH lies outside signifying their uniqueness.
Figure 4Contribution of domain for sharing by different families. Note that KELCH and WD40 contribute less towards domain sharing.
Domain sharing between different families
| S. No. | PDZ and ANK | WD40 and ANK | WD40 and PDZ | PH and PDZ | PH and ANK | WD40 and PH | KELCH and ANK | KELCH and WD40 | PH and KELCH | PDZ and KELCH |
| 1 | ANK | BTB | IQ | ANK | ANK | DENN | BTB | BTB | 0 | 0 |
| 2 | B41 | COR | LRR | B41 | ArfGap | FYVE | EGF | LISH | ||
| 3 | IQ | FYVE | RING | C2 | B41 | GRAM | FN3 | |||
| 4 | LRR | HECTc | SAM | CH | BAR_3 | IQ | ||||
| 5 | MYSc | IQ | SH3 | CRIC_ras_sig | BRCT | KISc | ||||
| 6 | PH | LRR | DEP | C1 | LRR | |||||
| 7 | PTB | PKINASE | DUF1170 | DAGKa | RCC1 | |||||
| 8 | RING | Pkinase_Tyr | FA | DAGKc | RCC1_2 | |||||
| 9 | SAM | RCC1 | FERM_C | FN3 | SAM | |||||
| 10 | SH3 | RING | FHA | FYVE | SH3 | |||||
| 11 | S_TKc | SAM | IQ | IQ | dDENN | |||||
| 12 | ZU5 | SH3 | LRR | LRR | ||||||
| 13 | SOCS Box | MYSc | MORN | |||||||
| 14 | TPR | PDZ | MYSc | |||||||
| 15 | ZnF_C2H2 | PH | Oxysterol_BP | |||||||
| 16 | PX | PH | ||||||||
| 17 | RA | RCC1 | ||||||||
| 18 | RBD | SAM | ||||||||
| 19 | RGS | SH3 | ||||||||
| 20 | RasGEF | S_TKc | ||||||||
| 21 | RhoGAP | |||||||||
| 22 | RhoGEF | |||||||||
| 23 | SAM | |||||||||
| 24 | SH3 | |||||||||
| 25 | S_TK_X | |||||||||
| 26 | S_TKc | |||||||||
| 27 | WW | |||||||||
| PH and PDZ have the most number of common domains. | ||||||||||