| Literature DB >> 22102581 |
Hong Luo1, Ke Lin, Audrey David, Harm Nijveen, Jack A M Leunissen.
Abstract
ProRepeat (http://prorepeat.bioinformatics.nl/) is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats' corresponding codons.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22102581 PMCID: PMC3245022 DOI: 10.1093/nar/gkr1019
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The statistical analysis result of repeat properties for PTRs of all taxa in UniProt with the ‘Isomorphic Search’ option on, and the default evidence at protein level.
Repeat abundance in four kingdoms
| Kingdom | Repeat number | Repeat abundance | Protein abundanceb (%) | Relative abundancec | |
|---|---|---|---|---|---|
| PTR | ATR | ||||
| Eukaryota | 1 163 368 | 1 195 655 | 63.10 | 27.2 | 2.32 |
| Bacteria | 498 071 | 705 575 | 32.20 | 63.9 | 0.50 |
| Archaea | 12 584 | 18 631 | 0.85 | 1.8 | 0.47 |
| Viruses | 75 109 | 68 821 | 3.85 | 6.9 | 0.56 |
aPercentage of repeat numbers in four kingdoms, bPercentage of protein numbers in four kingdoms based on UniProtKB (0.2% unclassified entries are not listed), cPercentage of protein abundance divided by percentage of repeat abundance.
Repeat properties in representative species
| Species | Most abundant SAARs(%) | N/C SAARs | GC1 | GC2 | L1 | L2 |
|---|---|---|---|---|---|---|
| HIV | E(45.3), A(27.0), N(8.6) | SA/INP | 42.0 | 41.9 | 462 | 10.7 |
| HDV | E(99.6), P(0.4) | Na/Na | Na | 41.5 | 113 | 5.2 |
| L(32.0), A(29.5), G(9.4) | LAT/GAV | 50.0 | 58.0 | 765 | 18.7 | |
| A(23.8), L(19.8), S(19.8) | LKA/KSG | 43.5 | 48.5 | 481 | 15.0 | |
| E(22.0), V,(18.0), L(18.0) | ER/KTL | 48.6 | 51.9 | 389 | 10.1 | |
| E(25.9), K(22.2), L(11.1) | ILE/KGR | 31.0 | 31.7 | 412 | 10.8 | |
| S(24.0), Q(18.7), N(11.7) | SQN/KDQ | 38.1 | 44.3 | 759 | 18.5 | |
| S(27.2), G(12.3), P(11.5) | SLE/GES | 36.0 | 50.9 | 812 | 16.0 | |
| S(14.9), T(13.8), Q(13.6) | SLQ/QGS | 35.0 | 51.7 | 1103 | 25.0 | |
| Q(31.9), A(15.2), S(11.3) | QAS/QAS | 41.0 | 61.3 | 1338 | 15.6 | |
| S(21.4), E(17.6), P(13.1) | LAG/ESK | 37.6 | 54.4 | 1286 | 37.9 | |
| E(17.7), P(15.1), S(13.4) | LAG/ESK | 50.0 | 62.5 | 1099 | 20.8 | |
| E(19.2), P(14.6), A(11.6) | LAG/EPA | 41.7 | 60.9 | 1304 | 26.6 | |
| E(16.0), P(16.0), A(14.3) | LAG/ESP | 40.9 | 63.0 | 1390 | 31.2 |
N/C SAARs, most abundant N- and C-terminal SAARs corresponding to 5% and 95% of RCPs length, respectively; the middle point of the repeat fragments is defined as the position of repeats; GC1, genomic GC content; GC2, average GC content of repeat codon; L1, average length of RCPs; L2, average length of repeat fragments.