| Literature DB >> 25000191 |
Benjamin D Greenbaum1, Pradeep Kumar2, Albert Libchaber3.
Abstract
In this work, we study the first passage statistics of amino acid primary sequences, that is the probability of observing an amino acid for the first time at a certain number of residues away from a fixed amino acid. By using this rich mathematical framework, we are able to capture the background distribution for an organism, and infer lengths at which the first passage has a probability that differs from what is expected. While many features of an organism's genome are due to natural selection, others are related to amino acid chemistry and the environment in which an organism lives, constraining the randomness of genomes upon which selection can further act. We therefore use this approach to infer amino acid correlations, and then study how these correlations vary across a wide range of organisms under a wide range of optimal growth temperatures. We find a nearly universal exponential background distribution, consistent with the idea that most amino acids are globally uncorrelated from other amino acids in genomes. When we are able to extract significant correlations, these correlations are reliably dependent on optimal growth temperature, across phylogenetic boundaries. Some of the correlations we extract, such as the enhanced probability of finding, for the first time, a cysteine three residues away from a cysteine or glutamic acid two residues away from an arginine, likely relate to thermal stability. However, other correlations, likely appearing on alpha helical surfaces, have a less clear physiochemical interpretation and may relate to thermal stability or unusual metabolic properties of organisms that live in a high temperature environment.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25000191 PMCID: PMC4084998 DOI: 10.1371/journal.pone.0101665
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Number of first passages from cysteine to cysteine in (a) T. petrophila and (b) human proteomes, and from arginine to glutamic acid in (c) T. petrophila and (d) human proteomes.
Over-represented First Passage Statistics.
| N-Terminal Direction | |||
| Pairs | Number of Genomes |
|
|
|
| 75 | 3 | 1.545 |
|
| 32 | 3 | 0.325 |
|
| 41 | 3 | 0.3094 |
|
| 57 | 2 | 0.2878 |
|
| 26 | 4 | 0.222 |
|
| 47 | 2 | 0.2207 |
|
| 31 | 4 | 0.1884 |
|
| 48 | 2 | 0.1836 |
|
| 35 | 2 | 0.1812 |
|
| 31 | 2 | 0.1626 |
|
| 34 | 2 | 0.1583 |
Under-represented First Passage Statistics.
| N-Terminal Direction | |||
| Pairs | Number of Genomes |
|
|
| GD | 29 |
| −0.1660 |
Figure 2OGT as a function of the logarithm of the real to expected first passage probability, essentially the non-random parts of the amplitudes for peaks such as those in Figure 1.
These are plotted for (a) CC at three residues, (b) RE at three residues, (c) LE at two residues, and (d) LK at two residues, where blue circles indicate OGT ranging from 5°C to about 60°C, and red circles represent hyperthermophilic organisms with OGT above 60°C.