| Literature DB >> 35056738 |
Broto Chakrabarty1, Nita Parekh2.
Abstract
Ankyrin is one of the most abundant protein repeat families found across all forms of life. It is found in a variety of multi-domain and single domain proteins in humans with diverse number of repeating units. They are observed to occur in several functionally diverse proteins, such as transcriptional initiators, cell cycle regulators, cytoskeletal organizers, ion transporters, signal transducers, developmental regulators, and toxins, and, consequently, defects in ankyrin repeat proteins have been associated with a number of human diseases. In this study, we have classified the human ankyrin proteins into clusters based on the sequence similarity in their ankyrin repeat domains. We analyzed the amino acid compositional bias and consensus ankyrin motif sequence of the clusters to understand the diversity of the human ankyrin proteins. We carried out network-based structural analysis of human ankyrin proteins across different clusters and showed the association of conserved residues with topologically important residues identified by network centrality measures. The analysis of conserved and structurally important residues helps in understanding their role in structural stability and function of these proteins. In this paper, we also discuss the significance of these conserved residues in disease association across the human ankyrin protein clusters.Entities:
Keywords: ankyrin conserved residues; ankyrin repeat domain; ankyrin repeats; human ankyrins; protein contact network
Mesh:
Substances:
Year: 2022 PMID: 35056738 PMCID: PMC8781854 DOI: 10.3390/molecules27020423
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1(a) The 3D view of Ankyrin repeat motif. The helices are colored blue, turns are red, and the coils are grey. (b) The Ankyrin repeat domain in designed Ankyrin repeat protein, 1N0R, comprising 4 repeat copies. The concave surface is highlighted in red, and the convex surface is highlighted in blue color.
Figure 2Analysis of 257 human ankyrin repeat proteins in UniProtKB. (a) Frequency distribution of the number of ankyrin repeat copies reported in a protein. (b) Coverage of ankyrin repeat region compared to the number of repeat copies. (c) Frequency distribution of the length of ankyrin repeat unit. (d) Sequence logo obtained from multiple sequence alignment of all ankyrin repeat copies. (e) Amino acid compositional bias in ankyrin repeat region compared to the base composition in all UniProtKB proteins.
Top 10 clusters obtained using CD-HIT and their domain information.
| Cluster No. | No. of Proteins | Average Copy No. | Std. Dev. Copy No. | InterPro Domains |
|---|---|---|---|---|
| 1 | 34 | 10.9 | 8.1 | Ankyrin repeat-containing domain, ZU5 domain, Death domain, EGF-like domain, Notch domain, Sterile alpha motif domain |
| 2 | 32 | 5.2 | 0.8 | Ankyrn repeat-containing domain, CCDC144C-like coiled-coil domain |
| 3 | 13 | 10.7 | 6.8 | Ankyrin repeat-containing domain, K Homology domain, Sterile alpha motif domain, Protein kinase domain |
| 4 | 9 | 3.7 | 1 | Ankyrin repeat-containing domain, BRCT domain, BCL-6 corepressor, PCGF1 binding domain |
| 5 | 8 | 6.4 | 0.7 | Ankyrin repeat-containing domain, NFkappaB IPT domain, Rel homology domain (RHD), Death domain |
| 6 | 7 | 9.1 | 2.4 | Ankyrin repeat-containing domain, SOCS box domain |
| 7 | 6 | 6.0 | 2.4 | Ankyrin repeat-containing domain |
| 8 | 6 | 7.8 | 4.4 | Ankyrin repeat-containing domain |
| 9 | 6 | 5.0 | 1.5 | Ankyrin repeat-containing domain, Ion transport domain |
| 10 | 6 | 4.0 | 0 | Ankyrin repeat-containing domain, Transient receptor ion channel domain, Ion transport domain |
Figure 3Sequence logo for the ANK motif obtained from multiple sequence alignment (MSA) of top 10 sequence clusters. The x-axis corresponds to the residue number in the ANK motif.
Top 10 sequence clusters and their consensus sequence. The conserved positions are highlighted in blue color.
| Cluster No. | Consensus (50%) |
|---|---|
| 1 | X |
| 2 | XX |
| 3 | X |
| 4 | X |
| 5 | X |
| 6 | XXXX |
| 7 | X |
| 8 | X |
| 9 | X |
| 10 | XXXX |
Figure 4Consensus secondary structure for the top 10 clusters. H: helix, T: Turn, C: Coil.
Figure 5Amino acid connectivity information of the top 10 clusters. (a) Average number of connections (degree) for each amino acid positions within the repeat copy. (b) Average number of intra-repeat edges. (c) Average number of inter-repeat edges.
Figure 6(a) Three-dimensional network view showing the intra-repeat edges in grey and inter-repeat edges in yellow for 3 copies of ankyrin repeat. The nodes and the backbone edges for the three copies are shown in blue, green, and orange color, respectively. The residue positions in the intermediate copy (green) are marked. (b) The 3D structure view of the 3 copies is shown in the same orientation as shown in the network view.
Figure 7Average node centrality profiles for top 10 clusters. (a) Betweenness centrality. (b) Eigenvector centrality.
Figure 8The 3D view of example ankyrin repeat proteins from top 4 clusters are shown. The ankyrin repeat domain is colored in red, and all other domains are shown in grey. The 3D coordinates of the proteins are obtained from AlphaFold DB. (a) ANK1 (UniProt: P16157) from Cluster 1. (b) POTEJ (UniProt: P0CG39) from Cluster 2. (c) TNI3K (UniProt: Q59H18) from Cluster 3. (d) BARD1 (UniProt: Q99728) from Cluster 4.
Variations associated with the most conserved positions of the ankyrin motif.
| Residue Position | Gene Name | Copy No. | Cluster No. | Variation | Condition(s) |
|---|---|---|---|---|---|
| Gly2 | ANKRD29 | 4 | 1 | G112E | G -> E (in dbSNP:rs17855552) |
| RNASEL | 2 | 3 | G59S | G -> S (in dbSNP:rs151296858) | |
| BARD1 | 1 | 4 | G428* | Hereditary cancer-predisposing syndrome | |
| CDKN2B | 2 | 7 | G47E | Lung adenocarcinoma | |
| Thr4 | ANKK1 | 8 | 1 | T595I | T -> I (in dbSNP:rs55787008) |
| Pro5 | ANKK1 | 8 | 1 | P596L | P -> L (in dbSNP:rs7104979) |
| ASB2 | 4 | 6 | P160S | P -> S (in dbSNP:rs2295213) | |
| CDKN2A | 4 | 7 | P114L | Non-small cell lung carcinoma | |
| CDKN2A | 4 | 7 | P114S | Melanoma; lossof CDK4 binding; dbSNP:rs104894104 | |
| CDKN2A | 2 | 7 | P48L | CMM2 $; also found in head and neck tumor | |
| CDKN2A | 2 | 7 | P48T | Hereditary melanoma|Hereditary cancer-predisposing syndrome | |
| CDKN2A | 3 | 7 | P81L | Melanoma; impairs the function; | |
| CDKN2A | 3 | 7 | P81T | CMM2; loss of CDK4 binding; | |
| Leu6 | ANK1 | 8 | 1 | L276R | Spherocytosis type 1 |
| ANK1 | 17 | 1 | L573fs | Spherocytosis type 1 | |
| ANKK1 | 1 | 1 | L366F | L -> F (in dbSNP:rs56339158) | |
| NFKBIA | 2 | 5 | 115..120 | LHLAVI->AHAAVA: Greatly reduced nuclearlocalization. Great reduction in its ability to inhibit DNA binding of RELA. | |
| CDKN2A | 1 | 7 | L16fs | Hereditary cancer-predisposing syndrome | |
| CDKN2A | 1 | 7 | L16fs | Squamous cell lung carcinoma; Hereditary melanoma; Hereditary cancer-predisposing syndrome | |
| CDKN2A | 1 | 7 | L16P | Biliary tract tumor; familial melanoma | |
| CDKN2A | 1 | 7 | L16P | Hereditary melanoma | |
| CDKN2A | 1 | 7 | L16R | Hereditary melanoma; Hereditary cancer-predisposing syndrome | |
| INVS | 15 | 8 | L493S | NPHP2; impairs ability to target DVL1 fordegradation | |
| His7 | ANKK1 | 1 | 1 | H367Q | H -> Q (in dbSNP:rs34298987) |
| CDKN2A | 3 | 7 | H83N | Lung tumor | |
| CDKN2A | 3 | 7 | H83Q | H -> Q (in dbSNP:rs34968276) | |
| CDKN2A | 3 | 7 | H83Y | Pancreas tumor; head andneck tumor | |
| Ala9 | CDKN2A | 1 | 7 | A19T | CMM2; loss of CDK4 binding |
| CDKN2A | 4 | 7 | A118T | CMM2 | |
| CDKN2A | 3 | 7 | A85T | A -> T (in dbSNP:rs878853646) | |
| Gly13 | BARD1 | 3 | 4 | G505fs | Hereditary cancer-predisposing syndrome; Familial cancer of breast |
| CDKN2A | 4 | 7 | G122R | CMM2 | |
| CDKN2A | 4 | 7 | G122S | Biliary tract tumor | |
| CDKN2A | 1 | 7 | G23D | Pancreas tumor; melanoma; loss of CDK4 binding | |
| CDKN2A | 3 | 7 | G89D | CMM2 | |
| CDKN2A | 3 | 7 | G89S | CMM2 | |
| TRPC6 | 1 | 10 | G109S | Focal segmental glomerulosclerosis 2 (FSGS2); increases calcium ion transport | |
| Leu21 | CDKN2A | 3 | 7 | L97R | CMM2; loss of CDK4 binding |
| Leu22 | CDKN2A | 1 | 7 | L32P | Hereditary cancer-predisposing syndrome; Hereditary melanoma |
| Gly25 | ANK2 | 20 | 1 | G685E | breast cancer |
| ANKK1 | 3 | 1 | G451R | G -> R (in dbSNP:rs34983219) | |
| ANKHD1 | 1 | 3 | G228C | G -> C (in dbSNP:rs17850572) | |
| CDKN2A | 3 | 7 | G101W | Hereditary melanoma; Cutaneous malignant melanoma 2; Melanoma-pancreatic cancer syndrome; Hereditary cancer-predisposing syndrome | |
| CDKN2A | 1 | 7 | G35A | CMM2; biliary tract tumor; uveal melanoma; partial loss of CDK4 binding | |
| CDKN2A | 1 | 7 | G35E | CMM2 | |
| CDKN2A | 1 | 7 | G35V | CMM2; loss of CDK4 binding | |
| Ala26 | ANKRD16 | 3 | 6 | A128G | A -> G (in dbSNP:rs2296136) |
| CDKN2A | 3 | 7 | A102E | Seminoma; medulloblastoma tissues from Li-Fraumeni syndrome patients carrying a mutation in TP53; | |
| CDKN2A | 3 | 7 | A102T | A -> T (in dbSNP:rs35741010) | |
| CDKN2A | 1 | 7 | A36fs | Hereditary melanoma; Hereditary cancer-predisposing syndrome |
$ Cutaneous malignant melanoma 2.