| Literature DB >> 16827924 |
Mridul K Kalita1, Gowthaman Ramasamy, Sekhar Duraisamy, Virander S Chauhan, Dinesh Gupta.
Abstract
BACKGROUND: Genome wide and cross species comparisons of amino acid repeats is an intriguing problem in biology mainly due to the highly polymorphic nature and diverse functions of amino acid repeats. Innate protein repeats constitute vital functional and structural regions in proteins. Repeats are of great consequence in evolution of proteins, as evident from analysis of repeats in different organisms. In the post genomic era, availability of protein sequences encoded in different genomes provides a unique opportunity to perform large scale comparative studies of amino acid repeats. ProtRepeatsDB http://bioinfo.icgeb.res.in/repeats/ is a relational database of perfect and mismatch repeats, access to which is designed as a resource and collection of tools for detection and cross species comparisons of different types of amino acid repeats. DESCRIPTION: ProtRepeatsDB (v1.2) consists of perfect as well as mismatch amino acid repeats in the protein sequences of 141 organisms, the genomes of which are now available. The web interface of ProtRepeatsDB consists of different tools to perform repeat s; based on protein IDs, organism name, repeat sequences, and keywords as in FASTA headers, size, frequency, gene ontology (GO) annotation IDs and regular expressions (REGEXP) describing repeats. These tools also allow formulation of a variety of simple, complex and logical queries to facilitate mining and large-scale cross-species comparisons of amino acid repeats. In addition to this, the database also contains sequence analysis tools to determine repeats in user input sequences.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16827924 PMCID: PMC1538635 DOI: 10.1186/1471-2105-7-336
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1(a) Simplified schema of ProtRepeatsDB pipeline. (b) Snapshots of ProtRepeatsDB main page display, overlapped with the output obtained for analysis of the homo repeats in androgen receptors, see text for details.
Figure 2ProtRepeatsDB: (a) A snapshot of the REGEXP search tool: regular expression search can be performed on different ProtRepeatsDB sections. (b) Consolidated graphical view for all the proteins with conserved polyQ in androgen receptors (see text for more details).
Relative distribution of homo repeats (tandem or scattered) of size <10 (occurring one or more times) and ≥10 (occurring more than once). The figures within round brackets represent the number of homo repeat proteins of corresponding amino acid and organism. The figures within square brackets represent percentage of homo repeat proteins of corresponding amino acid and organism, to that of all the homo repeat proteins in the organism.
| Sce(94) [13.33] | Pfa(270) [11.56] | Ncr(280) [9.18] | Dre(81) [8.7] | Ara(395) [7.7] | Sce(3) [0.43] | Pfa(6) [0.44] | Ara(3) [0.39] | ||
| Hsa(996) [21.95] | Mus(761) [20.35] | Dre(187) [20.08] | Rat(654) [19.53] | Spo(53) [15.54] | Mus(28) [4.11] | Hsa(25) [2.43] | Rat(9) [1.48] | ||
| Osa(545) [12.5] | Sav(48) [11.11] | Sco(44) [10.06] | Rat(161) [4.81] | Mus(151) [4.03] | Mus(1) [0.15] | -- | -- | ||
| Pfa(1100) [47.11] | Lin(12) [19.6] | Spo(37) [10.8] | Dre(92) [9.88] | Sce(59) [8.37] | Ara(1) [1.43] | Ncr(3) [0.56] | Pfa(2) [0.14] | ||
| Ano(153) [6.28] | Dme(282) [6.06] | Ncr(75) [2.87] | Cel(68) [2.69] | Ara(135) [2.63] | -- | -- | -- | ||
| Lpl(25) [36.23] | Ara(1742) [33.99] | Spo(112) [32.84] | Sce(194) [27.52] | Dre(221) 23.73] | Hsa(20) [1.94] | Mus(9) [1.32] | Dme(14) [0.76] | ||
| Lpl(14) [20.28] | Cel(307) [12.15] | Ncr(316) [12.09] | Dme(516) [11.11] | Ano(235) [9.65] | Dme(6) [0.33] | Ano(2) [0.35] | Rat(1) [0.16] | ||
| Pfa(1675) [71.73] | Sce(88) [12.48] | Dme(465) [9.99] | Ncr(146) [5.58] | Ara(234) [4.56] | Pfa(280) [20.29] | Ncr(8) [1.49] | Dme(2) [0.11] | ||
| Dme(1582) [34.01] | Lma(25) [31.6] | Ano(576) [23.65] | Ncr(525) [20.09] | Sce(129) [18.30] | Dme(131) [7.12] | Ncr(34) [6.34] | Hsa(43) [4.18] | ||
| Mus(25) [0.66] | Dre(6) [0.64] | Rat(21) [0.62] | Ano(11) [0.45] | Hsa(21) [0.44] | -- | -- | -- | ||
| Pfa(30) [1.28] | Ype(1) [1.23] | Cel(7) [0.27] | Ano(3) [0.12] | Osa(5) [0.11] | -- | -- | -- | ||
| Osa(1294) [26.69] | Ano(601) [24.68] | Dme(957) [20.57] | Ncr(454) [17.38] | Mbo(47) [15.87] | Ara(11) [1.43] | Has(10) [0.97] | Dme(8) [0.44] | ||
| Mbo(191) [64.52] | Sav(222) [51.38] | Sco(223) [51.02] | Osa(1520) [34.87] | Dme(1209) [25.98] | Dme(34) [1.85] | Mus(10) [1.47] | Hsa(11) [1.07] | ||
| Osa(79) [1.89] | Ano(29) [1.19] | Rat(35) [1.04] | Ara(53) [1.03] | Cel(23) [0.91] | Rat(1) [0.10] | -- | -- | ||
| Tth(48) [49.48] | Dra(49) [42.61] | Pae(49) [32.23] | Rat(662) [19.77] | Hsa(860) [18.26] | Xfa(1) [9.09] | Ano(1) [0.18] | -- | ||
| Sto(7) [12.72] | Pfa(18) [0.77] | Dre(7) [0.75] | Mus(26) [0.69] | Cel(14) [0.55] | -- | -- | -- | ||
| Hsa(851) [18.07] | Osa(712) [16.33] | Mus(608) [16.21] | Cel(394) [15.59] | Rat(506) [15.11] | Ara(10) [1.30] | Has(13) [1.26] | Mus(4) [0.59] | ||
| Ara(16) [0.33] | Dre(3) [0.32] | Pfa(5) [0.21] | Ano(5) [0.20] | Ncr(3) [0.11] | Ara(1) [0.13] | -- | -- | ||
| Pfa(65) [2.78] | Ara(43) [0.83] | Cel(19) [0.75] | Mus(23) [0.61] | Osa(16) [0.37] | Mus(1) [0.15] | Osa(1) [0.13] | -- | ||
| Ano(3) [0.12] | Osa(3) [0.06] | Ncr(1) [0.03] | -- | -- | -- | -- | -- | ||
Note: Ano: Anophelis gambiae; Ara: Arabidopsis thaliana; Cel: Caenorhabditis elegans; Dre: Danio rerio; Dra: Deinococcus radiodurans; Dme: Drosophila melanogastor; Hsa: Homo sapiens; Lpl: Lactobacillus plantarum; Lma: Leishmania major; Lin: Leptospira interrogans; Mus: Mus musculus; Mbo: Mycobacterium bovis; Ncr: Neurospora crassa; Osa: Oryza sativa; Pfa: Plasmodium falciparum; Pae: Pseudomonas aeruginosa, Rat: Rattus norvegicus; Sce: Saccharomyces cerevisiae; Spo: Schizosaccharomyces pombe; Sav: Streptomyces avermitilis; Sco: Streptomyces coelicolor; Sto: Sulfolobus tokodaii; Tth: Thermus thermophilus; Ype: Yersinia pestis; Xfa: Xylella fastidiosa
Figure 3Comparative distribution of different types of repeat proteins in ProtRepeatsDB. (a) The percentage of repeat proteins in three super kingdoms containing perfect repeats (including hetero and homo repeats), perfect hetero repeats, perfect homo repeats, mismatch repeats and repeats with PROSITE repeat profiles. (b) The percentage of different types of perfect repeat proteins in few eukaryotic genomes in ProtRepeatsDB. (c) The percentage of different types of perfect repeat proteins in few prokaryotic genomes. (d) The percentage of different types of perfect repeat proteins in few archeal genomes. Note: figures b-d: The organisms with 6 highest and 6 lowest repeat protein percentages are shown here.