| Literature DB >> 35629968 |
Susanna S Sologova1, Sergey P Zavadskiy1, Innokenty M Mokhosoev2, Nurbubu T Moldogazieva1.
Abstract
Short linear motifs (SLiMs) are evolutionarily conserved functional modules of proteins that represent amino acid stretches composed of 3 to 10 residues. The biological activities of two short peptide segments of human alpha-fetoprotein (AFP), a major embryo-specific and cancer-related protein, have been confirmed experimentally. This is a heptapeptide segment LDSYQCT in domain I designated as AFP14-20 and a nonapeptide segment EMTPVNPGV in domain III designated as GIP-9. In our work, we searched the UniprotKB database for human proteins that contain SLiMs with sequence similarity to the both segments of human AFP and undertook gene ontology (GO)-based functional categorization of retrieved proteins. Gene set enrichment analysis included GO terms for biological process, molecular function, metabolic pathway, KEGG pathway, and protein-protein interaction (PPI) categories. We identified the SLiMs of interest in a variety of non-homologous proteins involved in multiple cellular processes underlying embryonic development, cancer progression, and, unexpectedly, the regulation of redox homeostasis. These included transcription factors, cell adhesion proteins, ubiquitin-activating and conjugating enzymes, cell signaling proteins, and oxidoreductase enzymes. They function by regulating cell proliferation and differentiation, cell cycle, DNA replication/repair/recombination, metabolism, immune/inflammatory response, and apoptosis. In addition to the retrieved genes, new interacting genes were identified. Our data support the hypothesis that conserved SLiMs are incorporated into non-homologous proteins to serve as functional blocks for their orchestrated functioning.Entities:
Keywords: alpha-fetoprotein; bioinformatics; cancer; embryonic development; redox regulation; short linear motifs
Year: 2022 PMID: 35629968 PMCID: PMC9144484 DOI: 10.3390/metabo12050464
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Selected human proteins retrieved from UniprotKB knowledgebase containing AFP14–20-like motifs.
| Protein Name | Entry Code | Gene Symbol | Alignment | Aa Positions | Identity | E-Value | GO Molecular Functions | GO Biological Processes | Reference |
|---|---|---|---|---|---|---|---|---|---|
| Tripartite motif-containing protein 3 (RING finger protein HAC1) | TR:Q1KXY7 |
| LDSYQCT | 26–32 | 71.4% | 2.2 × 10−4 | Metal ion binding, ubiquitin-protein ligase/transferase activity | Transcriptional regulation, UPS-mediated protein degradation | [ |
| Zinc finger protein 714 | TR:A0A087WV13 |
| LDSYQCT | 15–21 | 57.1% | 3.0 × 10−2 | Transcription factor | Transcriptional regulation | [ |
| Hematopoietically-expressed homeobox protein HHEX | TR:F8VU08 |
| LDSYQCT | 59–65 | 71.4% | 9.4 × 10−4 | DNA binding, transcription activator activity | Transcriptional regulation, anterior–posterior pattern specification, B- cell differentiation | [ |
| Neurogenic locus notchhomolog protein 2 | TR:A0A494C1F1 |
| LDSYQCT | 87–93 | 57.1% | 7.9 × 10−2 | Calcium ion binding, signaling receptor activity | Tissue morphogenesis, cell fate determination | [ |
| von Willebrand factor A domain-containing protein 2 | SP: Q5GFL6-2 |
| LDSYQCT | 315–321 | 71.4% | 0.36 | Calcium binding activity | Cell–matrix adhesin, insulin-receptor signaling | [ |
| EGF-containing fibulin-like extracellular matrix protein 2 | TR: E9PRQ8 |
| LDSYQCT | 144–150 | 71.4% | 4.0 | Calcium ion binding | Extracellular matrix assembly, developmental processes | [ |
| Slit-Robo RhoGTPase-activating protein 2B | TR:A0A087 × 1G6 |
| LDSYQCT | 22–28 | 57.1% | 0.17 | GTPase activator activity | Neuronal morphogenesis developmental process | [ |
| Calcium and integrin-binding family member 2 | TR:H0YND4 |
| LDSYQ-CT | 13–20 | 75.0% | 3.3 × 10−2 | Calcium ion binding, integrin binding | Calcium ion homeostasis, response to ATP | [ |
| F-box protein Fbx3 | TR: Q9UKC5 |
| LDSYQCT | 137–143 | 57.1% | 4.0 × 10−2 | Ubiquitin-protein transferase activity | Protein ubiquitination and degradation | [ |
| Ubiquitin-like modifier-activating enzyme 6 | TR:Q2MD40_ |
| LDSYQCT | 151–157 | 71.4% | 5.0 × 10−3 | ATP binding, ubiquitin-activating enzyme activity | Response to DNA damage, protein ubiquitination, embryonic development | [ |
| Epidermal growth factor | TR:Q6QBS2 |
| LDSYQCT | 26–32 | 57.1% | 3.8 × 10−2 | Growth factor activity | Cell proliferation and survival | [ |
| Proliferating cell nuclear antigen | TR:Q7Z6A3 |
| LDSYQCT | 57–63 | 42.9% | 0.42 | DNA binding | Cell cycle regulation, DNA replication and repair | [ |
| Ethanolamine-phosphate cytidylyltransferase | TR:I3L1F9 |
| LDSYQCT | 24–30 | 57.1% | 5.7 × 10−3 | Catalytic activity | Biosynthetic process, cell division, cell fusion, and apoptosis | [ |
| Cysteine protease ATG4D | SP: Q86TL0-2 |
| LDSYQCT | 40–46 | 57.1% | 0.18 | Peptidase activity | Apoptosis/autophagy/mitophagy/proteolysis | [ |
| CTP:phosphoethanolamine cytidylyltransferase | TR:I3L1C4 |
| LDSYQCT | 24–30 | 57.1% | 0.21 | Transferase activity | Biosynthetic process | [ |
| B-cell linker protein | TR: Q2MD40 |
| LDSYQCT | 1–7 | 57.1% | 9.7 × 10−4 | SH2-domain binding, signaling adaptor activity | B-cell differentiation, immune and inflammatory response | [ |
| 3-alpha hydroxysteroid dehydrogenase III | TR:Q1KXY7 |
| LDS – YQCT | 1–7 | 62.5% | 4.4 × 10−5 | Oxidoreductase, metabolic activity | Steroid hormone metabolism | [ |
| Prostaglandin G/H synthase 1 | SP: P23219-3 |
| LDSYQCT | 26–32 | 71.4% | 0.35 | Cyclooxygenase/peroxidase activity, heme binding, metal ion binding | Response to oxidative stress, inflammatory process | [ |
| Glutathione S-transferase LANCL1 | TR:H7C2E3 |
| LDSYQCT | 59–65 | 57.1% | 0.22 | Glutathione binding, zinc ion binding | Oxidative stress response | [ |
| HSPB1-associated protein 1 | SP: Q96EW2-2 |
| LDSYQCT | 176–182 | 71.4% | 0.12 | Oxidoreductase, dioxygenase activity | Brain development | [ |
| Prolyl hydroxylase EGLN2 | TR:M0R2X9 |
| LDSYQCT | 45–51 | 57.1% | 2.5 | Dioxygenase activity, oxygen sensor activity | Cell redox homeostasis, response to hypoxia | [ |
Note: colons between the aligned sequences indicate identity of the residues, whereas dots indicate similarity between residues.
Selected human proteins retrieved from UniprotKB knowledgebase containing GIP-9-like motifs.
| Protein Name | Entry Code | Gene Symbol | Alignment | Aa Positions | Identity | E-Value | Go Molecular Functions | Go Biological Processes | Reference |
|---|---|---|---|---|---|---|---|---|---|
| Zinc finger protein 547 | TR: M0QYW2 |
| EMTPVNPGV | 60–68 | 55.6% | 3.4 | DNA binding, metal ion binding, transcription factor activity | Transcriptional regulation | [ |
| C-C motif chemokine 4-like | SP: Q8NHW4-7 |
| EMTPVNPGV | 31–39 | 66.7% | 1.7 × 10−2 | Chemokine activity | Response to INF-γ, IL-1, and TNF-α; cell signaling | [ |
| Axin 2 | TR: A0A024R8M3 |
| EMTPVNPGV | 361–369 | 66.7% | 1.7 | Beta-catenin binding, ubiquitin protein ligase binding | Regulation of Wnt signaling, cell death, bone mineralization | [ |
| L1 cell adhesion molecule | TR: Q7Z2J6 |
| EMTPVNPGV | 54–62 | 44.4%1 | 1.0 | Cell adhesion molecule activity | Nervous system development | [ |
| Paired-like homeodomain transcription factor LEUTX | SP: A8MZ59-1 |
| EMTPVNPGV | 69–77 | 44.4% | 2.3 | DNA binding activity | Transcriptional regulation, embryogenesis | [ |
| Homeobox protein Hox-C5 | SP: Q00444 |
| EMTPVNPGV | 90–98 | 55.6% | 0.48 | DNA-binding activity, transcription factor | Anterior/posterior specification, embryonic development | [ |
| Forkhead box protein O1 | SP: Q12778 |
| EMTPVNPGV | 476–484 | 77.8% | 0.12 | DNA-binding activity, transcription factor | Transcriptional regulation, metabolic response to oxidative stress | [ |
| RUNX1/CBFA2T2 fusion protein type 1 | TR:D1LYX4 |
| EMTPVNPGV | 50–58 | 44.4% | 8.5 | Transcription corepressor activity | Transcriptional regulation | [ |
| Cyclin-dependent kinase inhibitor 1B | TR: H7C2T1 |
| EMTPVNPGV | 91–99 | 55.6% | 2.3 | Cyclin binding, chaperone binding | Cell-cycle regulation, autophagy, response to chemicals | [ |
| DNA replication complex GINS protein PSF2 | SP: Q9Y248 |
| EMTPVNPGV | 32–40 | 44.4% | 7.2 | DNA binding | DNA replication, DNA repair | [ |
| IGF-like family receptor 1 | TR: K7ESC2 |
| EMTPVNPGV | 124–132 | 55.6% | 2.7 | Receptor activity | IGF-mediated signaling, inflammation process | [ |
| Brain-specific angiogenesis inhibitor 1-associated protein 2-like protein 2 | TR: B0QYF0 |
| EMTPVNPGV | 98–106 | 66.7% | 4.5 × 10−2 | Cadherin-binding and cytoskeletal-binding activities | Actin cytoskeleton organization, brain development | [ |
| E3 ubiquitin-protein ligase TRIM35 | TR: H0YBF3 |
| EMTPVNPGV | 33–41 | 55.6% | 4.6 | Zinc ion binding, ubiquitin-protein ligase activity | Protein ubiquitination, innate immune response, apoptotic process | [ |
| Ceruloplasmin | TR: H7C5N5 |
| EMTPVNPGV | 162–170 | 55.6% | 2.8 | Oxidoreductase activity, copper binding | Redox homeostasis | [ |
| Pyridoxine-5’-phosphate oxidase | TR: A0A286YF38 |
| EMTPVNPGV | 46–54 | 44.4% | 1.5 | Oxidoreductase activity | Biosynthetic process | [ |
| Growth hormone receptor | TR: Q9NRZ8 |
| EMTPVNPGV | 11–19 | 44.4% | 1.3 | Cytokine receptor activity | Response to stimulus, cell signaling | [ |
Note: colons between the aligned sequences indicate identity of the residues, whereas dots indicate similarity between residues.
Figure 1Representation of gene-ontology-based biological process enrichment categorization of genes encoding (A) AFP14–20-like motif-containing proteins and (B) GIP-9-like motif-containing proteins. The gene list enrichment analysis tool of PANTHER17.0 was applied. All query genes were retrieved from UniprotKB knowledgebase and then converted to ENSEMBL gene IDs.
Figure 2Gene ontology term-based molecular function categorization of genes encoding (A,B) AFP14–20-like motif-containing proteins and (C,D) GIP-9-like motif-containing proteins with the use of the ShinyGO v075 suite. Categories are ranked by (A,C) number of genes and (C,D) fold enrichment. Lolipop chats with an aspect ratio of 1.5 and −log10 (FDR) heat maps for each category are shown.
Figure 3KEGG pathway enrichment analysis of genes encoding (A,C) AFP14–20-like motif-containing proteins and (B,D) GIP-9-like motif-containing proteins. Rankings by both number of genes (A,B) and fold enrichment value (C,D) are shown. Bar plot representation with an aspect ratio of 1.5 and −log10 (FDR) heat maps for each category are shown.
Figure 4Hierarchical tree representation of the reactome metabolic pathway categories of genes encoding (A) AFP14–20-like motif-containing proteins and (B) GIP-9-like motif-containing proteins. The tree summarizes the correlation among significant pathways in the gene enrichment list. Pathways with many shared genes are clustered together. Larger dots indicate more significant p-values.
Figure 5Protein–protein interaction networks constructed by STRING resource for genes encoding (A) AFP14–20-like motif-containing proteins and (B) GIP-9-like motif-containing proteins. ENSEMBL gene IDs or STRING-db protein IDs were used. Colored nodes—query proteins and first shell of interactions; white nodes—second shell of interactions; filled nodes—proteins of known or predicted 3D structure; empty nodes—proteins of unknown 3D structure. Known interactions: blue—from curated databases and violet (experimentally determined). Predicted interactions: red—gene fusions; green—gene neighborhood; purple—gene co-occurrence. Other interactions: lilac—protein homology; black—gene coexpression; light green—text mining.