| Literature DB >> 31649924 |
David Mary Rajathei1, Subbiah Parthasarathy1, Samuel Selvaraj1.
Abstract
Amino acid repeats play an important role in the structure and function of proteins. Analysis of long repeats in protein sequences enables one to understand their abundance, structure and function in the protein universe. In the present study, amino acid repeats of length >50 (long repeats) were identified in a non-redundant set of UniProt sequences using the RADAR program. The underlying structures and functions of these long repeats were carried out using the Gene3D for structural domains, Pfam for functional domains and enzyme and non-enzyme functional classification for catalytic and binding of the proteins. From a structural perspective, these long repeats seem to predominantly occur in certain architectures such as sandwich, bundle, barrel, and roll and within these architectures abundant in the superfolds. The lengths of the repeats within each fold are not uniform exhibiting different structures for different functions. We also observed that long repeats are in the domain regions of the family and are involved in the function of the proteins. After grouping based on enzyme and non-enzyme classes, we observed the abundant occurrence of long repeats in specific catalytic and binding of the proteins. In this study, we have analyzed the occurrence of long repeats in the protein sequence universe apart from well-characterized short tandem repeats in sequences and their structures and functions of the proteins at the domain level. The present study suggests that long repeats may play an important role in the structure and function of domains of the proteins.Entities:
Keywords: domain; enzyme and non-enzyme classes; long repeats; protein; protein family; structural fold
Year: 2019 PMID: 31649924 PMCID: PMC6795024 DOI: 10.3389/fbioe.2019.00250
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Flow diagram of identification and analysis of Long repeats from non-redundant set of UniProt sequences.
Figure 2The plotting of number of proteins against the distribution of long repeats of length >50 in the range of <100, 101–200, 201–300, 301–400, 401–500, and>500 shows that most of the longrepeat lengths fall in the range of <200 (A) and repeat number distribution of long repeats shows that repeat numbers of 2 and 3 in most of long repeat proteins (B).
Figure 3Number of Long repeats containing proteins assigned with different architectures of α, β, and α/β class using CATH.
Figure 4The occurrences of certain folds of Arc Repressor Mutant subunit A of Orthogonal bundle architecture, Four helix bundle of Up-down bundle, Jelly Rolls and Immunoglobulin of Beta sandwich, PH-Domain like fold of Beta Roll, OB Roll of Beta Barrel, Rossmann fold of 3-layer sandwich, Alpha Beta plaits, and Herpes Virus-1 domain of 2-layer sandwich, TIM barrel of Alpha-Beta Barrel and UB Roll of Alpha-Beta Roll in a substantial numbers of Long repeats proteins.
Number of proteins containing long repeats in the architectures and folds of the proteins.
| Alpha (α) class | Orthogonal bundle | 1,272 | Arc repressor mutant, subunit A | 391 |
| Alpha horseshoe | 1,072 | Leucine-rich repeats variant | 488 | |
| Up-down bundle | 265 | Four helix bundle | 32 | |
| Beta (β) class | Beta sandwich | 909 | i)Jelly Rolls ii)Immunoglobulin | 228 |
| 7 Propeller | 549 | Methylamine dehydrogenase | 549 | |
| Beta roll | 334 | PH-domain like | 175 | |
| Beta barrel | 234 | OB Roll | 93 | |
| Alpha Beta (αβ) | 3-Layer (aba) Sandwich | 1,885 | 3-layer(αβα) sandwich | 1,223 |
| 2-Layer sandwich | 1,332 | Alpha beta plaits | 123 | |
| Alpha-beta barrel | 432 | TIM barrel | 334 | |
| Alpha-beta complex | 390 | Spore coat polysaccharide biosynthesis protein SpsA | 156 | |
| Alpha-beta roll | 180 | UB Roll | 32 |
Figure 5Long repeats that form structural repeats in the folds of Orthogonal bundle, Up-down bundle of alpha class, Immunoglobulin fold, Jelly Roll, OB fold of beta class, Rossmann fold, Alpha Beta Plait, TIM barrel, and UB roll of alpha-beta class.
Figure 6Varying larger repeat lengths are observed in the Rossmann fold of the (A) Cobalt chelatase CbiK (2xvyA) with repeat length 103 and (B) Phosphoribosyl pyrophosphate synthetase (3 mbiA) with repeat length 121. The repeats regions are highlighted with different colors.
Figure 7List of the 36 protein families that are having long repeats in more than 40 member proteins.
List of 41 member proteins of the Peptidase S8 family long repeat region's and their function domain regions assigned using Pfam.
| 1 | Aqualysin-1 ( | 157–215/228–313 (57) | 157–401 (244) | Inhibitor_I9 (54–125) |
| 2 | Bacillopeptidase F (P16397 1434) | i)198–279/280–352/355–436/437–529 (85) ii)568–609/615–701/1,044–1,167 (79) | 218–504 (286) | Peptidase_M6 (667–801) |
| 3 | Calcium-dependent protease (Q59149 663) | 219–313/315–412 (93) | 228–530 (302) | P_proprotein (547–662) |
| 4 | Cell wall-associated protease (P54423 895) | 737–803/818–885 (67) | 458–729 (271) | |
| 5 | Cuticle-degrading protease (P29138 389) | 72–171/172–270/277–355 (84) | 139–383 (244) | Inhibitor_I9 (41–107) |
| 6 | Extracellular serine protease (P29805 1046) | i)158–240/241–381/385–491 (137) ii)509–585/586–685/687–754/771–831 (83) | 71–397 (326) | Autotransporter (769–1,045) |
| 7 | Microbial serine proteinase (P31339 622) | 167–237/389–458 (67) | 89–411 (322) | P_proprotein (491–572) |
| 8 | Minor extracellular protease vpr (P29141 807) | 18–148/149–208/340–473/475–532/658–711 (183) | 184–594 (410) | Inhibitor_I9 (57–143); |
| 9 | Minor extracellular protease Epr (P16396 646) | 39–112/169–240/249–327 (74) | 137–380 (243) | |
| 10 | MycP4 protease (I6YC58 456) | 25–208/222–407 (159) | 86–389 (303) | |
| 11 | MycP1 protease (A0QNL1 450) | 91–172/173–327/332–428 (127) | 83–381 (298) | |
| 12 | Nisin leader peptide-processing serine protease (Q07596 683) | 228–281/379–418/504–557 (52) | 255–546 (291) | |
| 13 | PIII-type proteinase (P15292 1963) | 156–206/208–209/295–380 (82) | 212–698 (486) | |
| 14 | Proprotein convertase subtilisin/kexin type 9 (Q80W65 695) | 467–537/540–611/616–682 (140) | 185–423 (238) | Inhibitor_I9 (80–152) |
| 15 | Pyrolysin (P72186 1399) | i)225–274/276–339/341–400 (62) ii)959–1,006/1,011–1,158/1,169–1,296 (126) | i)174–380 (206) ii)408–654 (246) | |
| 16 | Putative subtilisin-like proteinase 1 (Q8SQJ3 466) | 23–92/94–160/165–195 (67) | 144–422 (278) | Inhibitor_I (919–90) |
| 17 | Putative subtilisin-like proteinase 2(Q8SS86 536) | 106–165/278–336/362–390 (60) | 272–452 (180) | |
| 18 | Probable subtilase-type serine protease DR_A0283 (Q9RYM8 729) | 84–126/131–209/232–310/320–378 (78) | 183–470 (287) | Peptidase_M14NE-CP-C_like(486–558); PPC (624–693) |
| 19 | Subtilase-type proteinase psp3 (Q9UTS0 452) | 217–283/349–407 (56) | 202–429 (227) | Inhibitor_I9 (80–162) |
| 20 | Subtilase-type proteinase RRT12 (P25381 492) | 53–107/269–320 (52) | 156–389 (233) | |
| 21 | Subtilisin-like protease SBT3.13 (Q8GUK4 767) | 320–431/608–719 (107) | 153–588 (453) | Inhibitor_I9 (41–119); |
| 22 | Subtilisin-like protease SBT4.4 (Q9FGU3 742) | 65–226/416–581 (150) | 137–581 (444) | Inhibitor_I9(34–112); PA(338–458) |
| 23 | Subtilisin-like protease SBT4.10 (Q9FIM8 694) | 138–284/387–534 (139) | 138–526 (388) | Inhibitor_I9 (35–113);PA (332–371) |
| 24 | Subtilisin-like protease SBT4.14 (Q9LLL8 750) | 202–333/336–464/467–596 (129) | 141–594 (453) | Inhibitor_I9 (38–115); PA (346–467) |
| 25 | Subtilisin-like protease SBT2.4 (F4HYR6 833) | 245–361/362–547/548–736 (178) | 169–691 (522) | Inhibitor_I9 (70–138);PA (389–533) |
| 26 | Subtilisin-like protease SBT4.15 (Q9LZS6 767) | 284–379/450–550 (92) | 137–590 (453) | Inhibitor_I9 (35–113); PA (342–474) |
| 27 | Subtilisin-like protease SBT3.18 (Q9STQ2 780) | i)179–224/495–575/707–756 (76) ii)318–388/405–476 (65) | 137–613 (476) | Inhibitor_I9 (30–109); |
| 28 | Subtilisin-like protease SBT6.1 (Q0WUG6 1039) | 556–644/812–901 (86) | 208–486 (278) | |
| 29 | Subtilisin-like protease SBT2.2 (Q9SUN6 857) | 163–222/226–283 (54) | 184–674 (490) | Inhibitor_I9 (98–159); |
| 30 | Subtilisin-like protease SBT2.6 (Q9SZV5 817) | 155–186/195–224/315–398 (59) | 151–635 (484) | Inhibitor_I9 (61–124); |
| 31 | Subtilisin-like protease SBT3.6 (Q8L7I2 779) | 216–306/307–392 (71) | 138–593 (455) | Inhibitor_I9 (34–113); |
| 32 | Subtilisin-like protease SBT1.2 (O64495 776) | 150–259/527–633 (101) | 127–587 (460) | Inhibitor_I9 (27–112); |
| 33 | Subtilisin-like protease SBT1.4 (Q9LVJ1 778) | 169–268/435–532 (86) | 133–589 (456) | Inhibitor_I9 (32–110); |
| 34 | Serotype-specific antigen 1 (P31631 933) | 378–433/466–588/594–709 (115) | 54–408 (354) | Autotransporter superfamily (673–916) |
| 35 | Subtilisin-like protease 12 (D4AQA9 417) | 253–300/305–372 (64) | 145–399 (254) | Inhibitor_I9 (35–116) |
| 36 | Subtilisin-like protease CPC735_047380 (C5PFR5 401) | 84–138/140–196 (53) | 143–363 (220) | Inhibitor_I9 (35–114) |
| 37 | Tripeptidyl-peptidase 2 (Q09541 1375) | 370–469/688–782 (87) | 89–559 (470) | TPPII (832–1,017) |
| 38 | Tripeptidyl-peptidase 2 homolog (Q9UT05 1275) | i)265–379/413–522 (96) ii)637–807/820–958 (121) | 90–545 (450) | TPPII (837–1,008) |
| 39 | Thermophilic serine proteinase (Q45670 402) | 121–202/203–282/283–358 (79) | 151–392 (241) | |
| 40 | Tripeptidyl-peptidase 2 (F4JVN6 1381) | i)143–207/343–403/717–760 (62) ii)1,043–1,188/1,236–1,380 (135) | 140–620 (480) | TPPII (897–1,078) |
| 41 | Subtilisin-like protease (Q00139 371) | 20–72/80–135 (52) | 83-255 (172) | P–proprotein (240–370) |
Figure 8The Aqualysin-1(P085594), Bacillopeptidase F(P16397), and Thermophilic serine proteinase (Q45670) protein's repeats regions and their alignments that are in the Peptidase S8 domain region assigned by Pfam.
Figure 9The functional residues of repeats (157–215/228–313) in the structural regions (30–88)/(101–186) (highlighted by red and green inverted triangles with red color dots) of Aqualysin-1 proteins are found out using PDBsum search.