| Literature DB >> 34281150 |
Valentina Rudenko1, Eugene Korotkov1,2.
Abstract
We report a Method to Search for Highly Divergent Tandem Repeats (MSHDTR) in protein sequences which considers pairwise correlations between adjacent residues. MSHDTR was compared with some previously developed methods for searching for tandem repeats (TRs) in amino acid sequences, such as T-REKS and XSTREAM, which focus on the identification of TRs with significant sequence similarity, whereas MSHDTR detects repeats that significantly diverged during evolution, accumulating deletions, insertions, and substitutions. The application of MSHDTR to a search of the Swiss-Prot databank revealed over 15 thousand TR-containing amino acid sequences that were difficult to find using the other methods. Among the detected TRs, the most representative were those with consensus lengths of two and seven residues; these TRs were subjected to cluster analysis and the classes of patterns were identified. All TRs detected in this study have been combined into a databank accessible over the WWW.Entities:
Keywords: amino acid sequence; cyclic alignment; mathematical method; pairwise correlation; protein; tandem repeats
Mesh:
Substances:
Year: 2021 PMID: 34281150 PMCID: PMC8269118 DOI: 10.3390/ijms22137096
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Number of TRs with different degrees of evolutionary divergence identified in 5400 artificial sequences by MSHDTR, T-REKS, and XSTREAM.
| Divergence Degree | Number (Percent) of Detected Sequences with TRs | ||
|---|---|---|---|
| MSHDTR | T-REKS | XSTREAM | |
| 25 | 3194 (59.15) | 2997 (55.50) | 2622 (48.56) |
| 50 | 2462 (45.59) | 160 (2.96) | 39 (0.72) |
| 75 | 1730 (32.04) | 7 (0.13) | 4 (0.07) |
| 90 | 1276 (23.63) | 1 (0) | 0 (0) |
| 100 | 949 (17.57) | 0 (0) | 0 (0) |
| 110 | 604 (11.19) | 0 (0) | 0 (0) |
| 120 | 324 (6.00) | 0 (0) | 0 (0) |
| 150 | 42 (0.78) | 0 (0) | 0 (0) |
Figure 1The number of TRs (N) found in Swiss-Prot by different methods depending on (a) the TR length (L) and (b) the degree of TR divergence (S).
Number of TRs detected by MSHDTR, T-REKS, and XSTREAM before and after filtering.
| Method | Without Filter | |||||
|---|---|---|---|---|---|---|
| MSHDTR | 15,407 | 15,323 | 15,223 | 15,042 | 14,684 | 13,986 |
| T-REKS | 41,375 | 37,769 | 17,599 | 92 | 0 | 0 |
| XSTREAM | 19,255 | 17,305 | 10,440 | 5549 | 871 | 1 |
TRs detected in eight artificial sequences by MSHDTR and HHRepID.
| Sequence | Divergence | TR Boundaries | MSHDTR | HHRepID | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Left ( | Right ( | Length, |
|
| Z | Period |
|
| e-Value | ||
|
| 25 | 161 | 864 | 7 | 161 | 865 | 58.4 | 7 | 164 | 866 | 1.6 × 10−172 |
|
| 50 | 14 | 715 | 7 | 161 | 775 | 48.8 | 7 | 13 | 333 | 5.7 × 10−54 |
|
| 75 | 263 | 960 | 7 | 222 | 984 | 22.3 | 56 | 373 | 600 | 2.1 × 10−12 |
|
| 90 | 483 | 1182 | 7 | 445 | 1227 | 20.1 | 7 | 1103 | 1144 | 3.3 × 10−6 |
|
| 100 | 254 | 959 | 7 | 220 | 1244 | 17.7 | 15 | 20 | 48 | 2.6 × 10−5 |
|
| 110 | 553 | 1254 | 7 | 385 | 1254 | 13.5 | - | |||
|
| 120 | 144 | 839 | 7 | 130 | 848 | 11.3 | - | |||
|
| 150 | 214 | 915 | 7 | 1 | 1188 | 5.0 | - | |||
The 10 most common consensus lengths of TRs detected in amino acid sequences from Swiss-Prot.
| Consensus Length, | Number of Detected TRs | % of TRs |
|---|---|---|
| 2 | 2104 | 13.65 |
| 3 | 1349 | 8.75 |
| 4 | 1205 | 7.82 |
| 5 | 1180 | 7.66 |
| 6 | 1005 | 6.52 |
| 7 | 1650 | 10.71 |
| 8 | 759 | 4.92 |
| 9 | 699 | 4.54 |
| 10 | 450 | 2.92 |
| 11 | 528 | 3.43 |
Multiple alignment of 64-character repeats found in the sequence of Mus musculus hexokinase-1 (P17710).
| No. | Sequence of a Period |
|---|---|
| 1 |
|
| 2 |
|
| 3 |
|
| 4 |
|
| 5 |
|
| 6 |
|
| 7 |
|
| 8 |
|
| 9 |
|
| 10 |
|
| 11 |
|
Figure 2Cluster dendrogram of 2104 TRs with the consensus length of two residues. Red vertical lines indicate the division into classes.
Figure 3Cluster dendrogram of 1650 TRs with the consensus length of seven residues. Red vertical lines indicate the division into classes.
Figure 43D structure of the 73–969 residue region of the Mus musculus hexokinase-1 monomer.
Multiple alignment of six-character repeats found in aspartyl/glutamyl-tRNA (Asn/Gln) amidotransferase subunit B (Q02GV7) from Pseudomonas aeruginosa.
| No. | Repeat Sequence | No. | Repeat Sequence |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 53D structure of the 3–403 residue region of the aspartyl/glutamyl-tRNA (Asn/Gln) amidotransferase subunit B monomer.