| Literature DB >> 20942950 |
Brett Trost1, Monique Haakensen, Vanessa Pittet, Barry Ziola, Anthony Kusalik.
Abstract
BACKGROUND: The increasing availability of whole genome sequences allows the gene or protein content of different organisms to be compared, leading to burgeoning interest in the relatively new subfield of pan-genomics. However, while several studies have analyzed protein content relationships in specific groups of bacteria, there has yet to be a study that provides a general characterization of protein content relationships in a broad range of bacteria.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20942950 PMCID: PMC3020658 DOI: 10.1186/1471-2180-10-258
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
Bacteria used in this study
| Genus | ||
|---|---|---|
| 16 | 10 | |
| 8 | 5 | |
| 19 | 10 | |
| 19 | 10 | |
| 15 | 12 | |
| 14 | 11 | |
| 6 | 2 | |
| 15 | 7 | |
| 4 | 2 | |
| 11 | 9 | |
| 7 | 4 | |
| 18 | 4 | |
| 31 | 9 | |
| 8 | 5 | |
| 8 | 3 | |
| 12 | 3 |
For each bacterial genus used in this study, the number of isolates used (N), as well as the number of species (N), is indicated.
Figure 1Relationship between the E-value threshold and numbers of unique proteins in pairs of isolates. For a given comparison, these graphs denote the number of proteins in the first isolate (e.g. Pseudomonas putida GB-1) that are not found in the second isolate (e.g. Pseudomonas putida KT2440). The relationship between pairs of isolates is: (A) same species; (B) same genus but different species; and (C) different genera. As an E-value threshold of 10-13 was ultimately chosen for our analyses, a vertical line corresponding to this E-value is indicated on each graph.
Figure 2Comparison of the protein content characteristics of selected genera. For each of the bacterial genera listed in Table 1, the relationship is given between the median proteome size of a genus and (A) its core proteome size, (B) its unique proteome size, and (C) the average number of singlets per isolate.
Results of comparison between protein content similarity and 16S rRNA gene percent identity
| Genus | 16S range | Shared proteins | Average unique proteins | ||||
|---|---|---|---|---|---|---|---|
| Range | Slope | Range | Slope | ||||
| 90.4-100% | 1741-5204 | 231 | 0.83* | 248-3000 | -176 | 0.69* | |
| 99.9-100% | 2495-3060 | NDa | ND | 154-454 | NDa | ND | |
| 93.8-100% | 2861-6337 | 192 | 0.26* | 337-4554 | -394 | 0.67* | |
| 80.3-100% | 917-3333 | 38 | 0.47* | 141-2987 | -60 | 0.36* | |
| 85.8-100% | 720-2348 | 42 | 0.49* | 235-1595 | -46 | 0.19* | |
| 91.3-100% | 1258-4327 | 99 | 0.13* | 87-2994 | -151 | 0.47* | |
| 98.4-100% | 1470-1794 | -263 | 0.19 | 206-753 | 305 | 0.03 | |
| 93.1-100% | 2368-5339 | 68 | 0.06* | 383-2847 | -129 | 0.37* | |
| 98.9-99.9% | 3482-4690 | 178 | 0.03 | 1296-2095 | 12 | 0.00 | |
| 97.2-100% | 743-1275 | 92 | 0.49* | 48-556 | 51 | 0.07 | |
| 97.4-99.7% | 2781-3481 | 122 | 0.13 | 463-1185 | -113 | 0.11 | |
| 97.4-100% | 1674-2653 | 72 | 0.41* | 49-923 | -18 | 0.02 | |
| 92.6-100% | 929-1954 | 46 | 0.28* | 84-1028 | -35 | 0.15* | |
| 90.9-99.8% | 2345-3879 | 142 | 0.81* | 396-2167 | -21 | 0.03 | |
| 99.8-100% | 2802-3982 | ND | ND | 201-1653 | ND | ND | |
| 97.2-100% | 2675-3825 | 347 | 0.94* | 216-1319 | -27 | 0.94* | |
For each genus, the range of 16S rRNA gene percent identities for all pairs of isolates from that genus is listed. Under the "shared proteins" heading, "range" indicates the range of shared proteins in pairs of isolates from that genus. The "slope" column indicates the slope of the regression line when the number of shared proteins in each pair of isolates is plotted against their 16S rRNA gene percent identities. The "R2" column contains the square of the standard correlation coefficient between these two variables, and indicates the strength of their relationship. The data under the "average unique proteins" heading are analogous to those under the "shared proteins" heading. Isolates sharing ≥ 99.5% identity of the 16S rRNA gene were not used in the calculation of slope or R2. Values marked with "ND" were not determined; despite having different species names, all isolates with sequenced genomes within these genera shared ≥ 99.5% identity of the 16S rRNA gene. An asterisk (*) beside an R2 value indicates that it is statistically significant with P-value < 0.05.
Figure 3Phylogenetic relationships among the organisms used in this study. Three phylogenetic trees were constructed, each of which used a different distance metric. Panel (A) depicts the tree constructed using the 16S rRNA gene similarity between two isolates, while panels (B) and (C) depict trees based on shared proteins and average unique proteins, respectively. Due to space constraints, collapsed trees are shown; the full trees are available as additional files 2, 3, and 4. The length of the base of each triangle represents the number of species within the genus, while the height indicates the amount of intra-genus divergence.
Results of protein content cohesiveness experiments
| Core proteomes | Unique proteomes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 3 | 4941 | 2123 | ** | 0/25 | 168 | 1 | ** | 0/25 | |
| 4 | 2881 | 1840 | ** | 0/25 | 2 | 0 | - | 0/25 | |
| 2 | 4255 | 2864 | ** | 5/25 | 4 | 7 | n.s. | 7/25 | |
| 3 | 2699 | 2603 | ** | 6/25 | 2 | 1 | * | 4/25 | |
| 2 | 3025 | 2760 | ** | 2/24 | 5 | 4 | n.s. | 5/24 | |
| 2 | 5609 | 3798 | ** | 1/25 | 198 | 17 | ** | 0/25 | |
| 3 | 5908 | 3352 | ** | 0/25 | 168 | 0 | ** | 0/25 | |
| 4 | 3623 | 3086 | ** | 1/25 | 18 | 0 | - | 0/25 | |
| 4 | 4972 | 3086 | ** | 0/25 | 45 | 0 | - | 0/25 | |
| 8 | 1514 | 763 | ** | 0/25 | 10 | 0 | - | 0/25 | |
| 3 | 2110 | 1085 | ** | 0/25 | 298 | 0 | ** | 0/25 | |
| 2 | 2355 | 959 | ** | 0/25 | 593 | 5 | ** | 0/25 | |
| 2 | 1372 | 959 | ** | 0/25 | 222 | 5 | ** | 0/25 | |
| 2 | 1402 | 959 | ** | 0/25 | 120 | 5 | ** | 0/25 | |
| 2 | 3822 | 2577 | ** | 1/25 | 36 | 38 | n.s. | 3/25 | |
| 3 | 3724 | 2118 | ** | 0/25 | 26 | 17 | n.s. | 3/25 | |
| 2 | 1795 | 1560 | ** | 0/8 | 229 | 3 | ** | 0/8 | |
| 4 | 1547 | 1426 | ** | 0/14 | 75 | 4 | ** | 0/14 | |
Column headings are: S, species; N, number of sequenced isolates of species S; , core proteome size of the sequenced isolates of S; , average core proteome size of the randomly-generated sets; P, probability that the average core proteome size of the randomly-generated sets is different than the core proteome size of the sequenced isolates of S; , fraction of random sets having a core proteome larger than S. , , Pand are analogous to , , P, and , respectively, and refer to the comparisons involving the number of proteins found in all sequenced isolates of S, but no other isolates from the same genus ("unique proteomes"). In some cases, all of the random sets corresponding to a particular species had zero unique proteins. No P-value could be computed for these because the standard deviation of these values was zero. In these situations, the Pcolumn contains a dash character (-). The averages in both column and column are rounded to the nearest whole number. For certain rows, column shows a value of 0; in some cases, this value is exact, while in other situations, it is due to rounding. If due to rounding, then the standard deviation of the random sets is non-zero, and column Pcontains a P-value. For columns Pand P, "n.s." means "not significant", a single asterisk indicates a P-value of less than 0.05, and a double asterisk indicates a P-value of less than 0.001. See Table 4 for the continuation of this table.
Results of protein content cohesiveness experiments (continued)
| Core proteomes | Unique proteomes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Species | |||||||||
| 3 | 4959 | 2877 | ** | 0/25 | 571 | 1 | ** | 0/25 | |
| 2 | 4206 | 3199 | ** | 0/25 | 142 | 6 | ** | 0/25 | |
| 4 | 3799 | 2592 | ** | 0/25 | 69 | 0 | ** | 0/25 | |
| 3 | 3894 | 2877 | ** | 0/25 | 290 | 1 | ** | 0/25 | |
| 2 | 4700 | 4063 | n.s. | 0/4 | 431 | 176 | n.s. | 0/4 | |
| 2 | 3678 | 4063 | n.s. | 2/4 | 148 | 176 | n.s. | 2/4 | |
| 2 | 1277 | 850 | ** | 0/25 | 219 | 1 | ** | 0/25 | |
| 2 | 1221 | 850 | ** | 0/25 | 93 | 1 | ** | 0/25 | |
| 2 | 3170 | 2989 | ** | 1/17 | 95 | 12 | ** | 0/17 | |
| 3 | 3255 | 2770 | ** | 0/25 | 130 | 6 | ** | 0/25 | |
| 14 | 1917 | 1486 | ** | 0/25 | 157 | 0 | ** | 0/25 | |
| 2 | 2080 | 1798 | ** | 0/25 | 131 | 0 | ** | 0/25 | |
| 3 | 1688 | 1019 | ** | 0/25 | 156 | 0 | - | 0/25 | |
| 6 | 1543 | 922 | ** | 0/25 | 150 | 0 | - | 0/25 | |
| 13 | 1348 | 811 | ** | 0/25 | 49 | 0 | - | 0/25 | |
| 2 | 1971 | 1087 | ** | 0/25 | 336 | 0 | ** | 0/25 | |
| 3 | 1359 | 1019 | ** | 0/25 | 145 | 0 | - | 0/25 | |
| 2 | 3384 | 2764 | ** | 1/25 | 425 | 20 | ** | 0/25 | |
| 2 | 3380 | 2764 | ** | 1/25 | 447 | 20 | ** | 0/25 | |
| 2 | 3882 | 2764 | ** | 0/25 | 321 | 20 | ** | 0/25 | |
| 4 | 3376 | 2818 | ** | 0/25 | 49 | 4 | ** | 0/25 | |
| 3 | 3276 | 2915 | ** | 5/25 | 299 | 0 | ** | 0/25 | |
| 7 | 2986 | 2717 | ** | 4/25 | 21 | 0 | ** | 0/25 | |
| 4 | 3424 | 3003 | ** | 0/25 | 21 | 0 | ** | 0/25 | |
For the meanings of each column, see Table 3.