| Literature DB >> 20529928 |
Marco Pellegrini1, M Elena Renda, Alessio Vecchio.
Abstract
MOTIVATION: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events.Entities:
Mesh:
Year: 2010 PMID: 20529928 PMCID: PMC2881393 DOI: 10.1093/bioinformatics/btq209
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.BMS as a function of copy number for NTR. Motif lengths 60 (a), 100 (b), 100 (c) and 300 (d). The total length of the input sequence is 10 000 bp; the amount of substitutions, insertions and deletions are equal to 10% of the motif length each (thus with total error allowed of 30%). Every point is the average of 30 measurements and the 95% confidence intervals are shown.
Fig. 2.BMS as a function of copy number for Steiner-STR. Motif lengths 60 (a), 100 (b), 200 (c) and 300 (d). The total length of the input sequence is 10 000 bp; the amount of substitutions, insertions and deletions are equal to 10% of the motif length each (thus with total error allowed of 30%). Every point is the average of 30 measurements and the 95% confidence intervals are shown.
Evaluation of recall for the three methods under evaluation
| Algorithm | Filter 90% | Filter 70% |
|---|---|---|
| Frataxin | ||
| TRStalker (TRF filter) | 59 (56.2) | 43 (56.5) |
| TRStalker (ATR filter) | 43 (41.0) | 30 (39.4) |
| TRF | 24 (22.9) | 18 (23.6) |
| ATRHunter | 24 (22.9) | 23 (30.2) |
| Union | 105 (100.0) | 76 (100.0) |
| TRStalker (TRF Filter) | 22 557 (59.1) | 14 137 (60.2) |
| TRStalker (ATR Filter) | 18 124 (47.5) | 11 427 (48.7) |
| TRF | 9977 (26.1) | 8521 (36.0) |
| ATRHunter | 7392 (19.3) | 7034 (29.6) |
| Union | 38218 (100.0) | 23743 (100.0) |
| TRStalker (TRF Filter) | 7168 (61.8) | 4656 (63.5) |
| TRStalker (ATR Filter) | 5621 (48.4) | 3655 (49.9) |
| TRF | 2892 (24.9) | 2518 (34.1) |
| ATRHunter | 2037 (17.6) | 1958 (26.4) |
| Union | 11 616 (100.0) | 7407 (100.0) |
Each entry in the table gives the absolute number of unique TR found, and in the percentage of unique TR w.r.t the union of the three methods. For TRSTalker, we used both a TRF-like and an ATRHunter-like filtering (more restrictive) on the TRs found.
Examples of TRs found by TRStalker and missed by TRF and ATRHunter
| No. | Sequence | Seq. length | TR start | TR end | TR length | Consensus | Repetitions | Score | Norm. score |
|---|---|---|---|---|---|---|---|---|---|
| 1 | HSBT | 684 973 | 411 000 | 413127 | 2127 | 1061 | 2.00 | 2868 | 1.384 |
| 2 | HSBT | 684 973 | 448 001 | 449687 | 1686 | 842 | 2.00 | 2310 | 1.370 |
| 3 | HSBT | 684 973 | 636 116 | 638622 | 2506 | 1253 | 2.00 | 3323 | 1.326 |
| 4 | YCh1 | 230 208 | 186 168 | 188347 | 2179 | 1089 | 2.00 | 3053 | 1.401 |
| 5 | FRDA | 2465 | 2029 | 2407 | 378 | 188 | 2.011 | 501 | 1.325 |
We report the original sequence name and length, the TR starting and ending positions, the TR length and the TR repeating unit length and copy number. The score is computed by assigning +2 to matches and −1 to mismatches and gaps w.r.t the consensus string. The normalized score is the score divided the TR length.
Motif/repeats alignment scores computed by jaligner using the BLOSUM62 score matrix with gap open penalty set to 10.0 and gap extend penalty set to 0.5 for the TRs reported in Table 2
| Seq. | No. | Repeat | Length, | Identity, n(%) | Gaps, n(%) | Score |
|---|---|---|---|---|---|---|
| HSBT | 1 | 1 | 1107 | 805(72.72) | 91 (8.22) | 3657.00 |
| - | 1 | 2 | 1093 | 895(81.88) | 70 (6.40) | 4291.00 |
| HSBT | 2 | 1 | 878 | 638(72.67) | 85 (9.68) | 3045.50 |
| - | 2 | 2 | 866 | 716(82.68) | 52 (6.00) | 3568.00 |
| HSBT | 3 | 1 | 1300 | 1000(76.92) | 94 (7.23) | 5206.00 |
| - | 3 | 2 | 1313 | 1004(76.47) | 120 (9.14) | 5176.50 |
| YCh1 | 4 | 1 | 1130 | 895(79.20) | 83 (7.35) | 4280.50 |
| - | 4 | 2 | 1123 | 901(80.23) | 77 (6.86) | 4345.50 |
| FRDA | 5 | 1 | 193 | 149(77.20) | 10 (5.18) | 723.50 |
| - | 5 | 2 | 191 | 146(76.44) | 5 (2.62) | 765.00 |