| Literature DB >> 23418055 |
Abstract
Amino acid repeats (AARs) are abundant in protein sequences. They have particular roles in protein function and evolution. Simple repeat patterns generated by DNA slippage tend to introduce length variations and point mutations in repeat regions. Loss of normal and gain of abnormal function owing to their variable length are potential risks leading to diseases. Repeats with complex patterns mostly refer to the functional domain repeats, such as the well-known leucine-rich repeat and WD repeat, which are frequently involved in protein–protein interaction. They are mainly derived from internal gene duplication events and stabilized by ‘gate-keeper’ residues, which play crucial roles in preventing inter-domain aggregation. AARs are widely distributed in different proteomes across a variety of taxonomic ranges, and especially abundant in eukaryotic proteins. However, their specific evolutionary and functional scenarios are still poorly understood. Identifying AARs in protein sequences is the first step for the further investigation of their biological function and evolutionary mechanism. In principle, this is an NP-hard problem, as most of the repeat fragments are shaped by a series of sophisticated evolutionary events and become latent periodical patterns. It is not possible to define a uniform criterion for detecting and verifying various repeat patterns. Instead, different algorithms based on different strategies have been developed to cope with different repeat patterns. In this review, we attempt to describe the amino acid repeat-detection algorithms currently available and compare their strategies based on an in-depth analysis of the biological significance of protein repeats.Entities:
Mesh:
Substances:
Year: 2014 PMID: 23418055 PMCID: PMC4103538 DOI: 10.1093/bib/bbt003
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Some examples of AARs in human proteins
| UniProt ID | Protein | AA | Repeat pattern |
|---|---|---|---|
| SECR_HUMAN | Secretin | 121 | polyL |
| PRIO_HUMAN | Major prion protein | 253 | (PHGGGWGQ)4 |
| ANKR1_HUMAN | Ankyrin repeat domain-containing protein 1 | 319 | Ankyrin repeat |
| CASQ2_HUMAN | Calsequestrin-2 | 399 | D/E-Rich |
| ESX1_HUMAN | Homeobox protein ESX1 | 406 | (PPxxPxPPx)9 |
| WDR1_HUMAN | WD repeat-containing protein 1 | 606 | WD repeat |
| UBC_HUMAN | Polyubiquitin-C | 685 | Ubiquitin |
| FOXP2_HUMAN | Forkhead box protein P2 | 715 | polyQ |
| LRRN1_HUMAN | Leucine-rich repeat neuronal protein 1 | 716 | Leucine Rich Repeat |
| ANDR_HUMAN | Androgen receptor | 919 | polyQ, polyG, polyP |
| SRBP2_HUMAN | Sterol regulatory element-binding protein 2 | 1141 | polyS, (PQ)4, (SGSS)2 |
| BRD4_HUMAN | Bromodomain-containing protein 4 | 1362 | polyP, polyH, polyQ, K-Rich, S-Rich |
| CO1A1_HUMAN | Collagen alpha-1(I) chain | 1464 | (GPP)n |
| CAC1A_HUMAN | Brain calcium channel I | 2505 | polyQ, polyH, polyG |
| HD_HUMAN | Huntington disease protein | 3142 | polyQ, polyP, polyT, polyE, HEAT domain |
| MLL2_HUMAN | Histone-lysine N-methyltransferase MLL2 | 5537 | (S/P-P-P-E/P-E/A)15 |
| TITIN_HUMAN | Titin | 34 350 | Several types of repeating domains: TPR WD RCC1 PEVK Kelch Z Ig repeats |
Repeat detection algorithms
| Method | Repeat type | Ref | Availability |
|---|---|---|---|
| Self-comparison | |||
| REP | Domain | [ | |
| COACH | Domain | [ | |
| TPRpred | Domain | [ | |
| REPRO | Domain | [ | |
| TRUST | Divergent | [ | |
| Internal Repeat Finder | Divergent | [ | |
| HHrep | Divergent | [ | |
| RADAR | Divergent | [ | |
| HHrepID | Divergent | [ | |
| Pattern recognition | |||
| REPETITA | Solenoid | [ | |
| LSTM | Domain | [ | |
| ARD | Alpha-Rod | [ | |
| Complexity measurement | |||
| SIMPLE | Simple | [ | |
| GBA | Simple | [ | xli@cise.ufl.edu |
| Others | |||
| XSTREAM | NPTR | [ | |
| Apriod | PPP | [ | hwan@mindgen.org |
| LocRepeat | PPP | [ | |
| REPfind | NPTR | [ | adebiyi@informatik.uni-tuebingen.de |
| Reptile | Perfect | [ | |
| SUFFIX | Perfect | [ | |
aNPTR = non-perfect tandem repeat; PPP = pseudo-periodic partitions.