| Literature DB >> 22558081 |
Brett Trost1, Rolando Pajon, Teenus Jayaprakash, Anthony Kusalik.
Abstract
Numerous aspects of the relationship between bacteria and human have been investigated. One aspect that has recently received attention is sequence overlap at the proteomic level. However, there has not yet been a study that comprehensively characterizes the level of sequence overlap between bacteria and human, especially as it relates to bacterial characteristics like pathogenicity, G-C content, and proteome size. In this study, we began by performing a general characterization of the range of bacteria-human similarity at the proteomic level, and identified characteristics of the most- and least-similar bacterial species. We then examined the relationship between proteomic similarity and numerous other variables. While pathogens and nonpathogens had comparable similarity to the human proteome, pathogens causing chronic infections were found to be more similar to the human proteome than those causing acute infections. Although no general correspondence between a bacterium's proteome size and its similarity to the human proteome was noted, no bacteria with small proteomes had high similarity to the human proteome. Finally, we discovered an interesting relationship between similarity and a bacterium's G-C content. While the relationship between bacteria and human has been studied from many angles, their proteomic similarity still needs to be examined in more detail. This paper sheds further light on this relationship, particularly with respect to immunity and pathogenicity.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22558081 PMCID: PMC3338800 DOI: 10.1371/journal.pone.0034007
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distribution of bacterial similarity to the human proteome.
All bacteria in set 1 were used in creating this histogram. Each bar represents the number of bacteria having between x% and of their 5-mers not found in the human proteome, where x is the number below the bar.
Pathogenic bacteria used in this study (set 2).
| Species | Disease | Mode of transmission | Proteins |
|
| |||
|
| Anthrax | Contact with infected animals | 5284 |
|
| Oroya fever | Sandfly bites | 1255 |
|
| Glanders | Contact with infected animals | 4797 |
|
| Campylobacteriosis | Food; contact with animals | 1836 |
|
| Tetanus | Breaks in skin | 2415 |
|
| Diphtheria | Respiratory/physical contact | 2264 |
|
| Q-fever | Inhalation around animal reservoirs | 1815 |
|
| Tularemia | Many; not person-to-person | 1528 |
|
| Legionnaires’ | Inhalation; not person-to-person | 3202 |
|
| Leptospirosis | Animal urine-infected water/food | 3654 |
|
| Listeriosis | Contaminated food | 2844 |
|
| Pneumonia | Person-to-person | 2193 |
|
| Cholera | Contaminated food/water | 3784 |
|
| Plague | Many, including person-to-person | 3821 |
|
| |||
|
| Lyme Disease | Tick bites | 1556 |
|
| Brucellosis | Contaminated food/water | 3077 |
|
| Chlamydia | Person-to-person (sexual) | 884 |
|
| Chancroid | Person-to-person (sexual) | 1694 |
|
| Tuberculosis | Person-to-person (through the air) | 4201 |
|
| Syphilis | Person-to-person (sexual) | 1028 |
The species name, associated disease, mode of transmission, and proteome size are listed for each bacterium. The specific strain used is indicated in parentheses after each species name. Some bacteria cause varying symptoms or diseases; as such, a representative disease was chosen for each. These bacteria constitute set 2 as described in the text, and were divided into sets 2 a and 2 b, which represent bacteria causing acute infections and bacteria causing chronic infections, respectively. Modes of transmission were derived from the Centres for Disease Control website (http://emergency.cdc.gov).
Nonpathogenic bacteria used in this study (set 3).
| Species | Proteins | Species | Proteins |
|
| 4006 |
| 1859 |
|
| 1724 |
| 2002 |
|
| 2946 |
| 3018 |
|
| 5014 |
| 2207 |
|
| 3545 |
| 3831 |
|
| 3695 |
| 5714 |
|
| 2352 |
| 4124 |
The species name and proteome size are listed for each bacterium, with the specific strain shown in parentheses. These bacteria comprise set 3 as described in the text.
The most similar and dissimilar bacteria to the human proteome.
| Bacterium | Percent | G-C content | Proteins |
|
| |||
|
| 3.71 | 0.70 | 2227 |
|
| 3.72 | 0.70 | 2201 |
|
| 3.79 | 0.74 | 4674 |
|
| 3.98 | 0.74 | 4795 |
|
| 4.03 | 0.75 | 4345 |
|
| 4.05 | 0.74 | 6912 |
|
| 4.05 | 0.75 | 4444 |
|
| 4.06 | 0.75 | 4461 |
|
| 4.09 | 0.68 | 1935 |
|
| 4.09 | 0.72 | 3710 |
|
| |||
|
| 8.63 | 0.29 | 583 |
|
| 8.49 | 0.43 | 3982 |
|
| 8.46 | 0.32 | 610 |
|
| 8.39 | 0.44 | 4773 |
|
| 8.11 | 0.44 | 4597 |
|
| 8.09 | 0.44 | 4234 |
|
| 7.97 | 0.46 | 3831 |
|
| 7.93 | 0.42 | 3545 |
|
| 7.86 | 0.38 | 2761 |
The ten bacteria with the highest and lowest percentage of 5-mers found zero times in the human proteome are listed, along with each bacterium’s G-C content and proteome size.
Figure 2Pathogenic versus nonpathogenic bacteria.
Relative similarity to the human proteome of the bacteria in set 1 (all bacteria sequenced to date), set 2 (20 pathogenic bacteria), and set 3 (14 nonpathogenic bacteria). Each point indicates that, on average, y% of the 5-mers in the proteomes in that set were found x times in the human proteome. Because only rare 5-mers are of interest in this study, bacterial 5-mers that were found more than ten times in the human proteome are not represented. The length in one direction of the error bar associated with each point represents the standard deviation of the measurements that were averaged to calculate that point.
Figure 3Similarity to the human proteome of surface-accessible proteins from pathogens and nonpathogens.
The similarity to the human proteome is shown for proteins (A) from Gram-positive bacteria that are predicted to localize to the cell wall, and (B) from Gram-negative bacteria that are predicted to localize to the outer membrane. Bacterial 5-mers that were found more than ten times in the human proteome are not represented. The length in one direction of the error bar associated with each point represents the standard deviation of the measurements that were averaged to calculate that point.
Figure 4Acute pathogens versus chronic pathogens.
Relative similarity to the human proteome of the bacteria in set 2 a (bacteria that cause acute infections), set 2 b (bacteria that cause chronic infections), set 4a (one randomly-generated bacterial proteome corresponding to each proteome in set 2 a ), and set 4b (one randomly-generated bacterial proteome corresponding to each proteome in set 2 b ). Bacterial 5-mers that were found more than ten times in the human proteome are not represented. The length in one direction of the error bar associated with each point represents the standard deviation of the measurements that were averaged to calculate that point.
Figure 5Relationship between proteome size and percentage of 5-mers absent from the human proteome.
The best-fit line (in green) was calculated using least squares and had an value of 0.062.
Figure 6Relationship between G-C content and percentage of 5-mers absent from the human proteome.
As the plot exhibits two distinct regions, two best-fit lines were calculated. The green best-fit line was calculated using points with G-C contents less than 0.52 and had an value less than 0.01, whereas the blue best-fit line was calculated using points with G-C contents greater than or equal to 0.52 and had an value of 0.74.