| Literature DB >> 15951512 |
Luciano Brocchieri1, Samuel Karlin.
Abstract
We analyzed length differences of eukaryotic, bacterial and archaeal proteins in relation to function, conservation and environmental factors. Comparing Eukaryotes and Prokaryotes, we found that the greater length of eukaryotic proteins is pervasive over all functional categories and involves the vast majority of protein families. The magnitude of these differences suggests that the evolution of eukaryotic proteins was influenced by processes of fusion of single-function proteins into extended multi-functional and multi-domain proteins. Comparing Bacteria and Archaea, we determined that the small but significant length difference observed between their proteins results from a combination of three factors: (i) bacterial proteomes include a greater proportion than archaeal proteomes of longer proteins involved in metabolism or cellular processes, (ii) within most functional classes, protein families unique to Bacteria are generally longer than protein families unique to Archaea and (iii) within the same protein family, homologs from Bacteria tend to be longer than the corresponding homologs from Archaea. These differences are interpreted with respect to evolutionary trends and prevailing environmental conditions within the two prokaryotic groups.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15951512 PMCID: PMC1150220 DOI: 10.1093/nar/gki615
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Proteomic collections
| Species | Abbreviation | |
|---|---|---|
| Eukaryota | HUMAN | |
| Drosophila melanogaster | DROME | |
| CAEEL | ||
| Saccharomyces cerevisiae | YEAST | |
| ARATH | ||
| Euryarchaeota | Pyrococcus abyssi | PYRAB |
| Pyrococcus horikoshii | PYRHO | |
| Pyrococcus furiosus | PYRFU | |
| Archaeoglobus fulgidus | ARCFU | |
| Thermoplasma acidophilum | THEAC | |
| Thermoplasma volcanium | THEVO | |
| Methanothermobacter thermoautotrophicus | METTH | |
| Methanococcus jannaschii | METJA | |
| METAC | ||
| METMA | ||
| METKA | ||
| HALN1 | ||
| Crenarchaeota | Aeropyrum pernix | AERPE |
| PYRAE | ||
| Sulfolobus solfataricus | SULSO | |
| Sulfolobus tokodaii | SULTO | |
| γ-Proteobacteria | ECOLI | |
| Salmonella typhimurium | SALTY | |
| Salmonella enterica | SALTI | |
| YERPE | ||
| SHIFL | ||
| WIGBR | ||
| BUCAI | ||
| BUCAP | ||
| Haemophilus influenzae | HAEIN | |
| Pasteurella multocida | PASMU | |
| XANCP | ||
| XANAC | ||
| XYLFA | ||
| Pseudomonas aeruginosa | PSEAE | |
| Vibrio cholerae | VIBCH | |
| SHEON | ||
| β-Proteobacteria | NEIMB | |
| RALSO | ||
| α-Proteobacteria | AGRT5 | |
| Mesorhizobium loti | RHILO | |
| Sinorhizobium meliloti | RHIME | |
| BRUME | ||
| BRUSU | ||
| Caulobacter crescentus | CAUCR | |
| Rickettsia prowazekii | RICPR | |
| Rickettsia conorii | RICCN | |
| ɛ-Proteobacteria | HELPY | |
| Campylobacter jejuni | CAMJE | |
| Actinobacteria | MYCTU | |
| Mycobacterium leprae | MYCLE | |
| STRCO | ||
| Corynebacterium glutamicum | CORGL | |
| COREF | ||
| BIFLO | ||
| Firmicutes | OCEIH | |
| Bacillus subtilis | BACSU | |
| Bacillus halodurans | BACHD | |
| STAAN | ||
| Listeria innocua | LISIN | |
| Listeria monocytogenes | LISMO | |
| Lactococcus lactis | LACLA | |
| STRA5 | ||
| STRMU | ||
| STRPN | ||
| STRPY | ||
| Clostridium acetobutylicum | CLOAB | |
| CLOPE | ||
| THETN | ||
| Ureaplasma urealiticum | UREPA | |
| Mycoplasma genitalium | MYCGE | |
| Mycoplasma pneumoniae | MYCPN | |
| Mycoplasma pulmonis | MYCPU | |
| MYCPE | ||
| FUSNN | ||
| Cyanobacteria | SYNY3 | |
| ANASP | ||
| SYNEL | ||
| Chlamydiae | Chlamydia trachomatis | CHLTR |
| Chlamydia muridarum | CHLMU | |
| CHLPN | ||
| Spyrochaetes | Borrelia burgdorferi | BORBU |
| Treponema pallidum | TREPA | |
| LEPIN | ||
| Others | Deinococcus radiodurans | DEIRA |
| CHLTE | ||
| Thermotoga maritima | THEMA | |
| Aquifex aeolicus | AQUAE |
aProteins from these species are not classified in the COG database and are excluded from the functional group analyses.
bThe COG classification of proteins from this species does not follow the standard coding and has been excluded from the COG analyses.
Median protein lengths in eukaryotic, bacterial and archaeal organisms
| Species | All species | Classified in COG | Classified in Pfam-A | |||
|---|---|---|---|---|---|---|
| Number | Median | Number | Median | Number | Median | |
| Eukarya | 104 394 | 361 | 5177 | 471 | 71 584 | 419 |
| HUMAN | 33 869 | 375 | – | – | 21 686 | 416 |
| DROME | 14 226 | 373 | 3092 | 492 | 13 091 | 475 |
| CAEEL | 21 124 | 344 | – | – | 13 316 | 391 |
| YEAST | 6315 | 379 | 2085 | 438 | 3953 | 448 |
| ARATH | 28 860 | 356 | – | – | 19 538 | 407 |
| Bacteria | 191 541 | 267 | 83 513 | 304 | 131 915 | 306 |
| ECOLI | 4289 | 3289 | 3483 | |||
| SALTY | 4553 | 3408 | 3527 | |||
| SALTI | 4767 | 3258 | 3118 | |||
| YERPE | 4083 | 2991 | 3003 | |||
| SHIFL | 4180 | – | – | 2613 | ||
| WIGBR | 654 | – | – | 571 | ||
| BUCAI | 574 | 558 | 544 | 285 | ||
| BUCAP | 545 | – | – | 536 | 285 | |
| HAEIN | 1709 | 1470 | 750 | |||
| PASMU | 2014 | 1740 | 780 | |||
| XANCP | 4181 | – | – | 2976 | ||
| XANAC | 4312 | – | – | 3056 | ||
| XYLFA | 2832 | 201 | 1549 | 1544 | ||
| PSEAE | 5565 | 4355 | 4309 | |||
| VIBCH | 3828 | 2794 | 2731 | |||
| Chromosome 1 | 2736 | 273 | 2133 | 314 | ||
| Chromosome 2 | 1092 | 225 | 661 | 316 | ||
| SHEON | 4778 | 245 | – | – | 2913 | |
| NEIMB | 2025 | 239 | 1448 | 1310 | ||
| RALSO | 5116 | – | – | 3518 | ||
| Chromosome 1 | 3440 | 271 | ||||
| Chromosome 2 | 1676 | 296 | ||||
| AGRT5 | 5402 | 3984 | 4062 | |||
| Circular chr. | 2785 | 258 | 2098 | 305 | ||
| Linear chr. | 1876 | 302 | 1424 | 329 | ||
| Plasmids | 741 | 273 | 462 | 316 | ||
| RHILO | 7275 | 5184 | 5107 | |||
| Chromosome | 6746 | 270 | 4888 | 304 | ||
| Plasmids | 529 | 243 | 296 | 327 | ||
| RHIME | 6205 | 4614 | 4669 | |||
| Chromosome | 3341 | 276 | 2602 | 302 | ||
| Plasmid A | 1294 | 265 | 890 | 310 | ||
| Plasmid B | 1570 | 303 | 1122 | 330 | ||
| BRUME | 3198 | – | – | 2322 | ||
| Chromosome 1 | 2059 | 252 | ||||
| Chromosome 2 | 1139 | 279 | ||||
| BRUSU | 3264 | – | – | 2351 | ||
| Chromosome 1 | 2116 | 239 | ||||
| Chromosome 2 | 1148 | 278 | ||||
| CAUCR | 3737 | 2551 | 2686 | |||
| RICPR | 834 | 687 | 672 | |||
| RICCN | 1374 | 173 | 861 | 247 | 769 | |
| HELPY | 1566 | 1083 | 1052 | |||
| CAMJE | 1634 | 1309 | 1197 | |||
| MYCTU | 3918 | 2554 | 2213 | |||
| MYCLE | 1605 | 1145 | 1138 | |||
| STRCO | 7897 | – | – | 5330 | ||
| CORGL | 2993 | 1954 | 1985 | |||
| COREF | 2950 | – | – | 1961 | ||
| BIFLO | 1729 | – | – | 1286 | ||
| OCEIH | 3496 | – | – | 2583 | ||
| BACSU | 4100 | 2818 | 2974 | |||
| BACHD | 4066 | 2838 | 2916 | |||
| STAAN | 2625 | 1801 | 1542 | |||
| LISIN | 3043 | 2176 | 2234 | |||
| LISMO | 2846 | 2206 | 2211 | |||
| LACLA | 2266 | 1602 | 1690 | 281 | ||
| STRA5 | 2124 | – | – | 1480 | ||
| STRMU | 1960 | – | – | 1445 | ||
| STRPN | 2043 | 243 | 1465 | 1409 | ||
| STRPY | 1696 | 1178 | 1159 | |||
| CLOAB | 3848 | 2487 | 2634 | |||
| CLOPE | 2723 | – | – | 1997 | ||
| THETN | 2588 | – | – | 1903 | ||
| UREPA | 614 | 409 | 395 | |||
| MYCGE | 484 | 384 | 375 | |||
| MYCPN | 677 | 407 | 507 | |||
| MYCPU | 782 | 489 | 498 | |||
| MYCPE | 1037 | – | – | 664 | ||
| FUSNN | 2067 | – | – | 1432 | ||
| SYNY3 | 3169 | 2141 | 2344 | |||
| ANASP | 6129 | – | – | 3600 | ||
| SYNEL | 2475 | – | – | 1759 | ||
| CHLTR | 894 | 615 | 639 | |||
| CHLMU | 916 | 644 | 641 | |||
| CHLPN | 1052 | 646 | 716 | |||
| BORBU | 1637 | 220 | 635 | 981 | 265 | |
| Chromosome | 850 | 286 | – | – | ||
| Plasmids | 787 | 179 | – | – | ||
| TREPA | 1031 | 708 | 691 | |||
| LEPIN | 4727 | 207 | – | – | 2243 | |
| Chromosome 1 | 4360 | 206 | ||||
| Chromosome 2 | 367 | 223 | ||||
| DEIRA | 3182 | 2249 | 2050 | |||
| Chromosome 1 | 2629 | 257 | 1873 | 294 | ||
| Chromosome 2 | 368 | 304 | 265 | 347 | ||
| Plasmids | 185 | 303 | 111 | 336 | ||
| CHLTE | 2252 | 239 | – | – | 1431 | |
| THEMA | 1846 | 1509 | 1459 | |||
| AQUAE | 1560 | 1321 | 1231 | |||
| Archaea | 37 141 | 247 | 18 219 | 283 | 24 067 | 288 |
| PYRAB | 1765 | 265 | 1450 | 281 | 1407 | 282 |
| PYRHO | 1801 | 257 | 1398 | 283 | 1312 | 285 |
| PYRFU | 2065 | 253 | 1627 | 273 | 1477 | 281 |
| ARCFU | 2420 | 243 | 1887 | 270 | 1720 | 276 |
| THEAC | 1482 | 269 | 1233 | 293 | 1083 | 301 |
| THEVO | 1499 | 259 | 1247 | 287 | 1074 | 304 |
| METTH | 1869 | 242 | 1382 | 273 | 1325 | 277 |
| METJA | 1770 | 241 | 1298 | 266 | 1260 | 272 |
| METAC | 4540 | 256 | – | – | 2677 | 306 |
| METMA | 3371 | 255 | – | – | 2141 | 294 |
| METKA | 1691 | 257 | – | – | 1067 | 272 |
| HALN1 | 2622 | 242 | 1746 | 297 | 1471 | 303 |
| AERPE | 1840 | 239 | 1191 | 293 | 1067 | 301 |
| PYRAE | 2603 | 208 | – | – | 1411 | 267 |
| SULSO | 2977 | 251 | 1983 | 294 | 1917 | 293 |
| SULTO | 2826 | 226 | 1777 | 284 | 1658 | 279 |
aSee Table 1 for abbreviations.
bNumber of proteins in each set.
cMedian length.
COG functional classification
| Information storage and processing (Isp) | |
| J | Translation, ribosomal structure and biogenesis |
| K | Transcription |
| L | DNA replication, recombination and repair |
| Cellular processes (Cp) | |
| D | Cell division and chromosome partitioning |
| O | Posttranslational modification, protein turnover, chaperones |
| M | Cell envelope biogenesis, outer membrane |
| N | Cell motility and secretion |
| P | Inorganic ion transport and metabolism |
| T | Signal transduction mechanisms |
| Metabolism (Me) | |
| C | Energy production and conversion |
| G | Carbohydrate transport and metabolism |
| E | Amino acid transport and metabolism |
| F | Nucleotide transport and metabolism |
| H | Coenzyme metabolism |
| I | Lipid metabolism |
| Q | Secondary metabolites biosynthesis, transport and catabolism |
| Poorly characterized (Pc) | |
| R | General function prediction only |
| S | Function unknown |
Figure 1Relative median length of proteins within major functional classes in Eukaryotes (Euk), Bacteria (Bac) and Archaea (Arc). Lengths are normalized by the global median length within each phylum. Major functional classes follow the definition in COG (see also Table 3): Isp, information storage and processes; Cp, cellular processes; Me, metabolism; Pc, poorly characterized. N.C. signifies proteins not classified in the COG database.
Figure 2Representation in eukaryotic and prokaryotic proteomes of proteins belonging to the major functional classes. Isp, information storage and processes; Cp, cellular processes; Me, metabolism; Pc, poorly characterized. N.C. signifies proteins not classified in the COG database.
Median protein lengths in COG functional classes
| COG class | Tot grp | All COGs | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| EUK | BAC | ARC | |||||||||
| Grp | Seq | Med | Grp | Seq | Med | Grp | Seq | Med | |||
| Isp | 541 | 258 | 1157 | 399 | 434 | 17 621 | 252 | 345 | 4034 | 218 | 0.000 |
| J | 220 | 175 | 623 | 296 | 155 | 6110 | 208 | 155 | 1736 | 205 | 0.231 |
| K | 139 | 31 | 295 | 444 | 114 | 6122 | 240 | 75 | 1027 | 156 | 0.000 |
| L | 188 | 55 | 318 | 723 | 167 | 5389 | 315 | 116 | 1271 | 321 | 0.309 |
| Cp | 689 | 161 | 1355 | 507 | 667 | 19 275 | 325 | 279 | 2720 | 297 | 0.000 |
| D | 35 | 8 | 33 | 439 | 34 | 936 | 346 | 12 | 181 | 282 | 0.000 |
| O | 116 | 47 | 421 | 370 | 110 | 3167 | 62 | 584 | 246 | 0.012 | |
| M | 166 | 24 | 82 | 449 | 166 | 4713 | 49 | 550 | 341 | 0.011 | |
| N | 131 | 13 | 43 | 508 | 121 | 3194 | 38 | 368 | 292 | 0.077 | |
| P | 167 | 50 | 338 | 538 | 164 | 4014 | 314 | 91 | 732 | 294 | 0.003 |
| T | 89 | 21 | 439 | 605 | 87 | 3251 | 323 | 28 | 305 | 253 | 0.001 |
| Me | 1005 | 438 | 2034 | 494 | 970 | 31 258 | 338 | 640 | 6625 | 331 | 0.000 |
| C | 228 | 85 | 377 | 480 | 210 | 5149 | 366 | 160 | 1620 | 346 | 0.000 |
| G | 178 | 61 | 456 | 519 | 175 | 6478 | 371 | 84 | 929 | 372 | 0.506 |
| E | 240 | 117 | 577 | 515 | 227 | 8268 | 356 | 163 | 1632 | 353 | 0.240 |
| F | 89 | 61 | 186 | 376 | 82 | 2396 | 274 | 65 | 586 | 244 | 0.002 |
| H | 147 | 73 | 184 | 393 | 137 | 3434 | 307 | 106 | 911 | 283 | 0.000 |
| I | 80 | 47 | 240 | 518 | 78 | 2658 | 313 | 41 | 515 | 360 | 0.000 |
| Q | 68 | 15 | 225 | 505 | 63 | 2875 | 294 | 22 | 432 | 261 | 0.000 |
| Pc | 1372 | 207 | 1057 | 444 | 1167 | 15 359 | 262 | 645 | 4840 | 246 | 0.000 |
| R | 501 | 146 | 962 | 459 | 423 | 9003 | 291 | 282 | 2806 | 274 | 0.000 |
| S | 897 | 64 | 95 | 318 | 764 | 6355 | 368 | 2035 | 202 | 0.018 | |
| Chr | 2201 | 845 | 4791 | 481 | 2051 | 68 154 | 315 | 1261 | 13 379 | 299 | 0.000 |
| All | 3482 | 1027 | 5177 | 471 | 3162 | 83 513 | 304 | 1894 | 18 219 | 283 | 0.000 |
| N.C. | n.a. | n.a. | 15 492 | 365 | n.a. | 31 739 | 158 | n.a. | 6717 | 157 | 0.342 |
aChr = Isp + Cp + Me; All = Chr + Pc; N.C. = Not classified in COGs. See Table 3 for other class symbols.
bNumber of COG groups within each class.
cNumber of sequences.
dMedian length.
eProbability of the median length difference observed between bacterial and archaeal sequences.
Median lengths of proteins unique to Bacteria or Archaea among Prokaryotesa
| COG class | BAC | ARC | |||||
|---|---|---|---|---|---|---|---|
| Grp | Seq | Med | Grp | Seq | Med | ||
| Isp | 192 | 5771 | 240 | 103 | 1161 | 172 | 0.000 |
| J | 61 | 2178 | 166 | 61 | 653 | 145 | 0.009 |
| K | 62 | 1990 | 256 | 23 | 281 | 136 | 0.000 |
| L | 71 | 1607 | 276 | 20 | 227 | 369 | 0.000 |
| Cp | 406 | 8432 | 312 | 18 | 165 | 219 | 0.000 |
| D | 23 | 464 | 384 | 1 | 2 | 253 | 0.516 |
| O | 53 | 984 | 234 | 5 | 39 | 368 | 0.008 |
| M | 117 | 2492 | 348 | 0 | 0 | n.a. | n.a. |
| N | 92 | 1990 | 269 | 9 | 99 | 197 | 0.000 |
| P | 74 | 1061 | 386 | 1 | 8 | 376 | 0.611 |
| T | 61 | 1447 | 254 | 2 | 17 | 287 | 0.402 |
| Me | 359 | 6757 | 331 | 29 | 145 | 237 | 0.000 |
| C | 64 | 1009 | 383 | 14 | 58 | 312 | 0.122 |
| G | 91 | 2010 | 339 | 0 | 0 | n.a. | n.a. |
| E | 70 | 1062 | 389 | 6 | 25 | 273 | 0.000 |
| F | 20 | 440 | 214 | 3 | 29 | 189 | 0.053 |
| H | 36 | 862 | 286 | 5 | 27 | 181 | 0.000 |
| I | 37 | 927 | 300 | 0 | 0 | n.a. | n.a. |
| Q | 42 | 447 | 374 | 1 | 6 | 263 | 0.120 |
| Pc | 698 | 6744 | 228 | 176 | 1246 | 224 | 0.221 |
| R | 193 | 2520 | 284 | 52 | 471 | 273 | 0.105 |
| S | 524 | 4247 | 196 | 128 | 776 | 191 | 0.215 |
| Chr | 940 | 20 945 | 295 | 150 | 1471 | 186 | 0.000 |
| All | 1588 | 27 570 | 275 | 320 | 2709 | 202 | 0.000 |
aSee Table 3 and footnotes of Table 4 for abbreviations.
Median lengths of orthologs shared by Bacteria and Archaeaa
| COG class | # Grp | BAC | ARC | |||
|---|---|---|---|---|---|---|
| Seq | Med | Seq | Med | |||
| Isp | 242 | 11 850 | 260 | 2873 | 246 | 0.001 |
| J | 94 | 3932 | 245 | 1083 | 255 | 0.105 |
| K | 52 | 4132 | 221 | 746 | 157 | 0.000 |
| L | 96 | 3782 | 1044 | 306 | 0.004 | |
| Cp | 261 | 10 843 | 330 | 2555 | 302 | 0.000 |
| D | 11 | 472 | 179 | 283 | 0.021 | |
| O | 57 | 2183 | 545 | 246 | 0.002 | |
| M | 49 | 2221 | 359 | 550 | 341 | 0.000 |
| N | 29 | 1204 | 269 | 357 | 0.023 | |
| P | 90 | 2953 | 724 | 293 | 0.058 | |
| T | 26 | 1804 | 358 | 288 | 252 | 0.000 |
| Me | 611 | 24 501 | 340 | 6480 | 332 | 0.000 |
| C | 146 | 4140 | 1562 | 346 | 0.011 | |
| G | 84 | 4468 | 929 | 372 | 0.020 | |
| E | 157 | 7206 | 351 | 1607 | 354 | 0.312 |
| F | 62 | 1956 | 310 | 557 | 257 | 0.000 |
| H | 101 | 2572 | 312 | 884 | 285 | 0.000 |
| I | 41 | 1731 | 322 | 515 | 0.019 | |
| Q | 21 | 2428 | 281 | 426 | 261 | 0.000 |
| Pc | 469 | 8614 | 281 | 3595 | 253 | 0.000 |
| R | 230 | 6483 | 294 | 2335 | 274 | 0.000 |
| S | 240 | 2108 | 234 | 1259 | 206 | 0.000 |
| Chr | 1111 | 47 209 | 321 | 11 908 | 311 | 0.000 |
| All | 1574 | 55 943 | 315 | 15 510 | 298 | 0.000 |
aSee Table 3 and footnotes of Table 4 for abbreviations.
Median length relations of orthologs conserved between Bacteria and Archaeaa
| COG class | Bacteria versus Archaea | ||||
|---|---|---|---|---|---|
| # Grp | Bacteria > Archaea | Archaea > Bacteria | |||
| # | % | # | % | ||
| Isp | 242 | 145 | 59.9 | 92 | 38.0 |
| J | 94 | 46 | 48.9 | 47 | 50.0 |
| K | 52 | 35 | 67.3 | 16 | 30.8 |
| L | 96 | 64 | 66.7 | 29 | 30.2 |
| Cp | 261 | 163 | 62.5 | 96 | 36.8 |
| D | 11 | 11 | 100.0 | 0 | 0.0 |
| O | 57 | 59.6 | 23 | 40.4 | |
| M | 49 | 33 | 67.3 | 15 | 30.6 |
| N | 29 | 16 | 55.2 | 12 | 41.4 |
| P | 90 | 50 | 55.6 | 40 | 44.4 |
| T | 26 | 20 | 76.9 | 6 | 23.1 |
| Met | 611 | 409 | 66.9 | 195 | 31.9 |
| C | 146 | 95 | 65.1 | 50 | 34.2 |
| G | 84 | 60 | 71.4 | 24 | 28.6 |
| E | 157 | 107 | 68.2 | 46 | 29.3 |
| F | 62 | 39 | 62.9 | 21 | 33.9 |
| H | 101 | 67 | 66.3 | 34 | 33.7 |
| I | 41 | 65.9 | 14 | 34.1 | |
| Q | 21 | 71.4 | 6 | 28.6 | |
| Pc | 469 | 284 | 60.6 | 178 | 38.0 |
| R | 230 | 145 | 63.0 | 81 | 35.2 |
| S | 240 | 140 | 58.3 | 97 | 40.4 |
| Chr | 1111 | 717 | 64.5 | 380 | 34.2 |
| All | 1574 | 997 | 63.3 | 556 | 35.3 |
| 1–20 amino acids | 750 | 469 | 62.5 | 281 | 37.5 |
| 21–100 amino acids | 567 | 366 | 64.6 | 201 | 35.4 |
| >100 amino acids | 246 | 162 | 65.9 | 74 | 34.1 |
a# is the number of COG groups within each collection; % is the corresponding percent of COG groups compared to the total within each class (# Grp). See text, Table 3 and footnotes of Table 4 for other abbreviations.
Structural/functional domains in eukaryotic, bacterial and archaeal proteomes
| Database | Domains | EUK | BAC | ARC |
|---|---|---|---|---|
| Pfam-A | Total number | 154 979 | 192 680 | 33 372 |
| Mean number/seq. | 2.16 | 1.46 | 1.39 | |
| Median length/amino acids | 185 | 188 | 179 | |
| Pfam-A + Pfam-B | Total number | 249 163 | 275 630 | 48 060 |
| Mean number/seq. | 3.48 | 2.09 | 2.00 | |
| Median length/amino acids | 257 | 217 | 205 |
Figure 3Relation of median length of genomic proteins included in the Pfam-A database of curated alignments and OGT of the corresponding organism. Each point represents the median protein length within each bacterial (red) or archaeal (green) species.
Median length of orthologous groups from mesophilic or thermophilic Prokaryotesa
| Type of ortholog | # of groups | Set I | Set II | |||||
|---|---|---|---|---|---|---|---|---|
| Bacteria | Seq | Med | Archaea | Seq | Med | |||
| Shared | 977 | BT | 2036 | AT | 10 664 | 298 | 0.071 | |
| 860 | BM | 36 891 | AM | 1481 | 309 | 0.064 | ||
| Unique | 465 831 | BT | 794 | 267 | AT | 5810 | 244 | 0.002 |
| 2 250 187 | BM | 43 791 | 294 | AM | 265 | 206 | 0.000 | |
a# of groups is the number of COG groups in each comparison (the two different numbers shown for comparisons of Unique orthologs correspond to the unique groups found in Bacteria and Archaea, respectively. Pairwise comparisons between bacterial thermophiles (BT), archaeal thermophiles (AT), bacterial mesophiles (BM) and archaeal mesophiles (AM). See text and footnote of Table 4 for other abbreviations.
Number of conserved orthologous groups longer in bacterial or archaeal thermophiles and mesophilesa
| Set I | Set II | # Grp | Set I > Set II | Set II > Set I | ||
|---|---|---|---|---|---|---|
| # | % | # | % | |||
| BT | AT | 977 | 484 | 49.5 | 478 | 48.9 |
| BM | AM | 860 | 411 | 47.8 | 435 | 50.6 |
| BM | BT | 1390 | 947 | 68.1 | 422 | 30.4 |
| AM | AT | 961 | 585 | 60.9 | 361 | 37.6 |
aSee text and footnote of Table 7 for abbreviations.