| Literature DB >> 11790256 |
I K Jordan1, F A Kondrashov, I B Rogozin, R L Tatusov, Y I Wolf, E V Koonin.
Abstract
BACKGROUND: Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat).Entities:
Mesh:
Substances:
Year: 2001 PMID: 11790256 PMCID: PMC64838 DOI: 10.1186/gb-2001-2-12-research0053
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Schematic of an orthologous protein phylogeny and the null hypothesis (Ho) of a constant relative rate of protein evolution. Comparison of multiple orthologous proteins is predicted to reveal a constant relative rate of evolution. This should be manifest as an approximately constant ratio of phylogenetic branch lengths (bold lines) bA/(bB + bC). The branch lengths were calculated using the evolutionary distances (dashed lines) between the proteins from three species (dAB, dAC, dBC) as described in Materials and methods.
Figure 2Phylogenies of the three analyzed species groups. Branch lengths are the average of all branch lengths for a given species group. The different branches bA, bB and bC are indicated.
Figure 3Correlation between the branch lengths in different phylogenetic partitions. Linear regression where the bA branch length (y-axis) is plotted against the sum of bB + bC branch lengths (x-axis) for all orthologous protein sets of each species group. (a) Chlamydiaceae species group including C. pneumoniae, C. muridarum and C. trachomatis. (b)Pyrococcus species group including P. furiosus, P. abyssi and P. horikoshii. (c) Human-mouse-rat species group including H. sapiens, M. musculus and R. norvegicus. The equation for the linear regression trend line (y = mx + b), the correlation coefficient (r) and the level of significance for the correlation (P) are shown on each plot. The linear regression trend line is shown in bold black and the upper and lower limits, corresponding to an expectation value of 0.05, are shown in light gray. For each plot, only 0.05 points are expected to fall outside of these limits by chance. All values are shown as diamonds and the values outside the upper and lower limits that represent functionally diversified orthologous protein sets are indicated by larger squares.
Synonymous (Ks) and non-synonymous (Ka) substitution rates for the human-mouse-rat orthologous protein sets
| All† | Selected‡ | ||
| Human versus rodent* | |||
| Ks | 0.496 ± 0.14 | 0.692 ± 0.19 | 4.5 × 10-3 |
| Ka | 0.079 ± 0.08 | 0.349 ± 0.08 | 2.4 × 10-7 |
| Ka/Ks | 0.147 ± 0.13 | 0.537 ± 0.18 | 1.3 × 10-5 |
| Mouse versus rat¶ | |||
| Ks | 0.172 ± 0.06 | 0.193 ± 0.07 | 0.57 |
| Ka | 0.029 ± 0.03 | 0.148 ± 0.04 | 5.7 × 10-3 |
| Ka/Ks | 0.161 ± 0.16 | 0.787 ± 0.12 | 2.0 × 10-3 |
*Average Ks and Ka for the human-mouse and human-rat pairwise comparisons. †Average and standard deviation values for all protein sets. ‡Average and standard deviation values for the protein sets identified as having accelerated rates of amino-acid substitution. §P value associated with a t-test comparing the means of the all versus selected protein sets. ¶ Average Ks and Ka for the mouse-rat pairwise comparisons.
Figure 4Non-synonymous (Ka) versus synonymous (Ks) substitution rates for the human-mouse-rat orthologous protein sets. (a) Average Ks and Ka for the human-mouse and human-rat pairwise comparisons. (b) Ks and Ka for the mouse-rat pairwise comparisons. Thick diagonal line, Ks = Ka; thin horizontal line, average Ka. All values are shown with circles and the values corresponding to the functionally diversified proteins are indicated by larger squares.
Domain architecture and functional predictions for functionally diversified proteins
| Gene name (GI numbers)* | Domain organization† | Phyletic distribution‡ | Predicted function§ |
| CT079/CP0424/TC0351 (4376613, 3328474, 7190393) | Signal peptide, four transmembrane regions (TMs) | Membrane protein, potential receptor | |
| CT288/CP0709/TC0561 (4376321, 3328702, 7190602) | Four TMs, coiled coil | Membrane protein, potential receptor | |
| CT656/CP0029/TC0027 (4377021, 3329106, 7190067) | NA | Unknown | |
| CT006/CP0311/TC0274 (4376725, 3328394, 7190315) | Three TMs | Membrane protein, potential receptor | |
| CT036/CP0642/TC0306 (4376393, 3328427, 7190347) | Two TMs | Membrane protein, potential receptor | |
| CT147/CP0623/TC0424 (4376417, 3328548, 7190467) | Three-four TMs, coiled coil | Membrane protein, potential receptor | |
| CT695/CP0071/TC0067 (4376977, 3329149, 7190102) | Low sequence complexity | Non-globular protein of unknown function | |
| PH0310 (301754, 3256700, 5459075) | Cathepsin-like cysteine protease, signal peptide, three to seven TMs | Orthologs only in pyrococci; distantly related cathepsins in animals | Pyrolysin - hyperthermostable membrane protease |
| PH1993, PAB1163 (1849588, 3258437, 5459195) | Signal peptide, deacetylase superfamily hydrolase domain | Predicted secreted deacetylase | |
| PH1708, PAB2041 (76490, 3258139, 5457898) | Signal peptide, low-sequence-complexity regions | Secreted, non-globular protein | |
| PH0103, PAB0064 (693117, 3256489, 5457538) | Type IV restriction endonuclease, | Sporadic distribution in archaea and bacteria (COG2810) | Predicted restriction endonuclease |
| PH1340, PAB1824 (1411868, 3257763, 5458235) | NA | Unknown | |
| PH0996, PAB0660 (984613, 3257410, 5458405) | Signal peptide, coiled coil, low complexity | Secreted, non-globular protein | |
| PH0617, PAB1428 (506618, 3257023, 5458851) | Signal peptide, seven TMs | Integral membrane protein | |
| PH0692, PAB0621 (917540, 3257100, 5458351) | Signal peptide, low sequence complexity regions | Secreted, non-globular protein | |
| PH0228, PAB0142 (1132938, 3256617, 5457641) | Acetate/butyrate kinase domain | Conserved orthologs in some archaea ( | Predicted butyrate kinase |
| PH1703, PAB0312(1599997, 3258134, 5457902) | Signal peptide, low sequence complexity regions | Secreted small protein | |
| PH0538, PAB0257 (396645, 3256944, 5457867) | Signal peptide, four TMs | Integral membrane protein | |
| Interferon precursor (32680, 309328, 2317784) | Signal peptide, interferon gamma domain | Vertebrate-specific | Interferon gamma |
| Interferon-beta-2 (32674, 52702, 204926) | Signal peptide, interleukin 6 domain | Vertebrate-specific | Interleukin 6 (interferon-beta-2) |
| Glycoprotein 34 (219666, 551081, 3779224) | Tumor necrosis factor domain, one TM | Mammal-specific | OX40 ligand (membrane-associated cytokine) |
| Uteroglobin (23132, 49691, 206040) | Signal peptide, uteroglobin domain | Mammal-specific | Secreted phospholipid-binding protein |
| Eotaxin (1280141, 995911, 1707665) | Signal peptide, interleukin 8 domain | Vertebrate-specific | Eotaxin (small inducible CxC cytokine) |
| Taste receptor T2R1 (9625043, 10048430, 7262627) | Signal peptide, seven TMs | Mammal-specific | Taste receptor |
| Relaxin, H2 (35927, 414781, 57044) | Signal peptide, insulin-like growth factor/relaxin family domain | Mammal-specific | Relaxin |
| IgE receptor (34003, 193246, 313673) | Lectin C-type domain (CTL), one transmembrane segment, coiled coil domain | Orthologs only in mammals, lectin domain in all animals | Immunoglobulin E receptor |
| Lactadherin (1381162, 4586464, 1620007) | Signal peptide, epidermal growth factor-like domains, coagulation factor 5/8 C-terminal (discoidin) domain (2) | Orthologs in mammals only, discoidin domain animal-specific | Lactadherin (secreted integrin- and phospholipid-binding protein, involved in antimicrobial defense) |
| Secretin preproprotein (11345450, 313711, 206888) | Glucagon-like hormone domain | Mammal-specific | Secretin |
| C-reactive protein (30213, 50564, 203592) | Signal peptide, pentraxin/C-reactive protein family domain | Vertebrate-specific | C-reactive protein, phosphorylcholine-binding, involved in host defense against bacterial infection |
| Preproapolipoprotein AI (28772, 50015, 202945) | Signal peptide, apolipoprotein A1/A4/E family domain | Vertebrate-specific | Preproapolipoprotein AI |
| EDAG-1 (7677357, 11244774, 11140172) | Regions of low sequence complexity | Mammal-specific | Unknown |
| Deoxyribonuclease II beta (11427442, 6175550, 6470131) | Signal peptide, deoxyribonuclease | Mammals, nematodes, insects (animal-specific) | Deoxyribonuclease (lysosomal enzyme, implicated in apoptosis) |
| Hydroxysteroid sulfotransferase (306702, 496152, 2104492) | Sulfotransferase domain | Animals, plants, mycobacteria | Hydroxysteroid sulfotransferase |
| Hydroxy acid oxidase 3 (7208440, 8926328, 311833) | FMN-dependent oxidoreductase | All eukaryotes, many bacteria | Hydroxy acid oxidase (peroxisomal enzyme) |
*Global identifiers (Gis) for the selected orthologous protein sets. The first GI of each Pyrococcus orthologous set corresponds to the P. furiosus identifiers used at the sequencers' site. †Domain organization was assessed using the SMART and CD-Search servers, and PSI-BLAST search results. TM, predicted transmembrane α-helix; NA (not applicable) indicates that no distinct domains could be identified. ‡ Species or taxa in which homologs were detected in BLAST or PSI-BLAST searches. §Functional prediction was based on the domain architecture and comparison of the results of BLAST searches with the protein annotation in Entrez. The COGnitor server was used to query the COG database for Chlamydiaceae and Pyrococcus proteins. Only Pyrococcus proteins grouped into any existing COGs.