| Literature DB >> 17428323 |
Daniel P Depledge1, Ryan P J Lower, Deborah F Smith.
Abstract
BACKGROUND: Amino acid repeat-containing proteins have a broad range of functions and their identification is of relevance to many experimental biologists. In human-infective protozoan parasites (such as the Kinetoplastid and Plasmodium species), they are implicated in immune evasion and have been shown to influence virulence and pathogenicity. RepSeq http://repseq.gugbe.com is a new database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17428323 PMCID: PMC1854910 DOI: 10.1186/1471-2105-8-122
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Examples of amino acid repeats.
| Single Amino Acid Repeat (SAAR) | MJRK |
| Di-peptide Repeat (DPR) | MJRK |
| Sequence Repeat Region (SRR) | MJRK |
Each repeat type is shown within a sequence (bold, highlighted).
Test data set analysis.
| Total | True positives | False positives | Total | True positives | False positives | ||
| 342 | 250 (100%) | 92 | 248 | 248 (99.2%) | 0 | ||
| 1306 | 1250 (100%) | 56 | 1237 | 1237 (99.0%) | 0 | ||
| 674 | 500 (100%) | 174 | 492 | 492 (98.4%) | 0 | ||
| 2633 | 2500 (100%) | 133 | 2466 | 2466 (98.6%) | 0 | ||
| Total | True positives | False positives | Total | True positives | False positives | ||
| 256 | 250 (100%) | 6 | 248 | 248 (99.2%) | 0 | ||
| 1253 | 1248 (99.8%) | 5 | 1237 | 1237 (99.0%) | 0 | ||
| 506 | 499 (99.8 %) | 7 | 492 | 492 (98.4%) | 0 | ||
| 2504 | 2496 (99.8%) | 8 | 2466 | 2466 (98.6%) | 0 | ||
| Total | True positives | False positives | Total | True positives | False positives | ||
| 245 | 245 (98.0%) | 0 | 244 | 244 (97.6%) | 0 | ||
| 1220 | 1220 (97.6%) | 0 | 1219 | 1219 (97.5%) | 0 | ||
| 485 | 485 (97.0%) | 0 | 484 | 484 (96.8%) | 0 | ||
| 2424 | 2424 (97.0%) | 0 | 2420 | 2420 (96.8%) | 0 | ||
Proteomes containing 5000 or 10000 proteins (5% or 25% of which contained repeat regions) were created and analysed using RepSeq.
Figure 1RepSeq database UML design. The database schema consists of three tables in which data redundancy is eliminated by data linking from child tables via foreign keys.
Figure 2RepSeq query interface. The query interface contains a number of options that can be adjusted to limit/expand the search. The user is also able to search for specific genes or annotations.
Figure 3RepSeq output table. The top table shows the initial output of the input queries. Selecting a gene then displays the second image, indicating where each repeat is located (red) and allowing the user to determine its motif.
Protozoan parasite species currently available in RepSeq.
| 7046 | |
| 8183 | |
| 8302 | |
| 8758 | |
| 25401 | |
| 17203 | |
| 13498 | |
| 9766 | |
| 12235 | |
| 15007 | |
| 5479 | |
| 5352 | |
| 8761 |
Amino acid repeat distribution of selected species.
| 6+ | 7+ | 8+ | 9+ | 11+ | 12+ | 13+ | 14+ | 15+ | ||
| 974 | 447 | 207 | 108 | 37 | 19 | 9 | 4 | 4 | ||
| 558 | 291 | 191 | 126 | 53 | 38 | 29 | 26 | 21 | ||
| 6050 | 5361 | 4719 | 4197 | 3374 | 3128 | 2920 | 2737 | 2585 | ||
| 1904 | 1594 | 1346 | 1053 | 658 | 512 | 429 | 364 | 310 | ||
| 3+ | 5+ | 2+ | 4+ | |||||||
| 507 | 25 | 531 | 108 | |||||||
| 419 | 31 | 328 | 133 | |||||||
| 2318 | 553 | 3676 | 1122 | |||||||
| 745 | 119 | 1874 | 1171 | |||||||
Altering the search criteria can dramatically alter the number of proteins identified as bearing significant amino acid repeats. The values in bold indicate the chosen cut-off point for the analyses presented in this study.
Amino acid repeat frequency in protozoan parasitic proteomes.
| 7046 | 34 | 40 | 123 | 190 | 2.70% | |
| 8183 | 60 | 60 | 158 | 259 | 3.17% | |
| 8302 | 80 | 85 | 174 | 315 | 3.79% | |
| 8758 | 86 | 97 | 177 | 346 | 3.95% | |
| 17203 | 105 | 60 | 504 | 643 | 3.73% | |
| 25401 | 594 | 245 | 514 | 1264 | 4.98% | |
| 13498 | 3741 | 1003 | 2060 | 4627 | 34.28% | |
| 9766 | 10 | 7 | 257 | 272 | 2.79% | |
| 12235 | 58 | 104 | 346 | 496 | 4.05% | |
| 15007 | 37 | 45 | 249 | 328 | 2.19% | |
| 5479 | 853 | 256 | 1490 | 1835 | 33.49% | |
| 5352 | 111 | 113 | 1050 | 1157 | 21.62% | |
| 8761 | 103 | 155 | 1024 | 1182 | 13.49% |
* Some proteins contain several individual repeats. These are taken into account here.