| Literature DB >> 17760998 |
Maria Anisimova1, Joseph Bielawski, Katherine Dunn, Ziheng Yang.
Abstract
BACKGROUND: In comparative analyses of bacterial pathogens, it has been common practice to discriminate between two types of genes: (i) those shared by pathogens and their non-pathogenic relatives (core genes), and (ii) those found exclusively in pathogens (pathogen-specific accessory genes). Rather than attempting to a priori delineate genes into sets more or less relevant to pathogenicity, we took a broad approach to the analysis of Streptococcus species by investigating the strength of natural selection in all clusters of homologous genes. The genus Streptococcus is comprised of a wide variety of both pathogenic and commensal lineages, and we relate our findings to the pre-existing knowledge of Streptococcus virulence factors.Entities:
Mesh:
Year: 2007 PMID: 17760998 PMCID: PMC2031904 DOI: 10.1186/1471-2148-7-154
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Twelve complete genomes of congeneric Streptococcus used in this study
| 1,852,441 | 1,697 (1509) | [60] | ||
| 1,895,017 | 1,845 (1584) | [7] | ||
| 1,900,521 | 1,865 (1596) | [61] | ||
| 1,894,275 | 1,861 (1573) | [62] | ||
| 1,899,877 | 1,886 (1582) | [63] | ||
| 2,211,485 | 2,094 (1887) | [8] | ||
| 2,160,267 | 2,124 (1831) | [64] | ||
| 2,038,615 | 2,043 (1723) | [65] | ||
| 2,160,837 | 2,094 (1701) | [66] | ||
| 2,030,921 | 1,960 (1690) | [67] | ||
| 1,796,846 | 1,889 (1541) | [68] | ||
| 1,796,226 | 1,915 (1554) | [68] |
Figure 1A genome-scale estimate of phylogenetic relationships among species of Streptococcus. The tree topology is derived from the joint analysis of 504 single-copy gene sequences by using the BIONJ algorithm [55]. Each gene was comprised of exactly one sequence from each of the 12 genomes; the topology shown above is simplified to show only the relationships among the five nominal species of Streptococcus. The topology is unrooted, and branch lengths indicate the mean number of substitutions per codon, as inferred under codon model M0 [11].
Expanded analyses of ten genes detected to be under positive selection in a genome scan
| ω | ||||||||
| Cell envelope proteinase (prtS); | 5 | 28 | 4947 | 0.0000 | 0.0000 | 0.010 | 15.0 | |
| Lactocepin Dipeptidase (pepD) | 8 | 20 | 1416 | 0.0002 | 0.0000 | 0.016 | 8.0 | |
| Ftsk/SpoIII family proteina | 4 | 10 | 756 | 0.0000 | 0.0000 | 0.049 | ∞ | |
| Hypothetical protein | 4 | 10 | 1098 | 0.0000 | 0.0000 | 0.017 | 27.5 | |
| IS1191 transposase (truncated) | 4 | 6 | 372 | 0.0020 | 0.0020 | 0.087 | 25.0 | |
| Hypothetical proteinb | 3 | 13 | 1092 | 0.0491 | 0.0004 | 0.033 | 12.4 | |
| Hypothetical proteinc | 4 | 7 | 402 | 0.0214 | 0.0214 | 0.026 | 31.2 | |
| grab | 3 | 11 | 735 | 0.0035 | 0.0034 | 0.062 | 15.5 | |
| rplSd | all | 11 | 30 | 345 | 0.1254 | 0.0080 | 0.010 | 6.7 |
| Hypothetical proteine | 4 | 13 | 1023 | 0.0679 | 0.0442 | 0.18 | 2.6 | |
| Homologue of mac/sib38 | ||||||||
Notes: N1 indicates the number of gene sequences in the original gene cluster. N2 indicates the number of sequences in the expanded dataset. Lnt is the length of the gene sequence in nucleotides. p1 indicates the proportion of codon sites estimate to be subject to positive selection under model M8.
ω is the parameter in the codon model M8 for the nonsyonymous to synonymous rate ratio (dN/dS).
Additional remarks:
a Best non-GAS blastp hit: gb|AAK79835.1|AE007695_8 (AE007695) FtsK-like DNA segregation ATPase, YDCQ B. subtilis orthologue [Clostridium acetobutylicum];
b Fibronectin-binding protein (probable antigen);
c Best non-GAS blastp hit: gb|AAB97959.1| (U96166) ATP-binding cassette lipoprotein [S. cristatus];
d May play a role in the structure and function of the aminoacyl-tRNA binding site;
e Novel immunoglobulin binding protein; best blastp hit: sp|P11215|ITAM_human cell surface glycoprotein MAC-1 α subunit precursor (CR-3 α chain) (CD11B) (leukocyte adhesion receptor MO1) (integrin α-M) (neutrophil adherence receptor).
Odds of positive selection in gene clusters categorised according to COG-derived functional assignments
| 0.07 | 0.05 | 0.08 | 0.61 | 0.565 | |
| 0.10 | 0.10 | 0.10 | 1.04 | 1 | |
| 0.08 | 0.04 | 0.12 | 0.35 | 0.02158 | |
| 0.05 | 0.07 | 0.02 | 2.92 | 0.2802 | |
| 0.08 | 0.08 | 0.00 | Infinite | 1 | |
| 0.07 | 0.06 | 0.09 | 0.68 | 1 | |
Notes: The odds of positive selection were computed as the relative frequently of genes under positive selection divided by the relative frequency of genes not under positive selection. The odds ratio is simply the ratio of the odds of positive selection in two different categories of genes. The categories were based on functional assignments found in the COG database. The hypothesis that an odds ratio differed from one was tested by using the Fisher's exact test.
Figure 2A Venn diagram showing the distribution of 26 proteins known to have body-site specific patterns of gene expression during invasive disease in Streptococcus pyogenes and to have evolved under positive Darwinian selection. Expression data are from Orihuela et al. [6] for infected blood (IB), cerebrospinal fluid (CSF) and epithelial cell contact (ECC).