| Literature DB >> 20122262 |
Laurent X Nouvel1, Pascal Sirand-Pugnet, Marc S Marenda, Eveline Sagné, Valérie Barbe, Sophie Mangenot, Chantal Schenowitz, Daniel Jacob, Aurélien Barré, Stéphane Claverol, Alain Blanchard, Christine Citti.
Abstract
BACKGROUND: While the genomic era is accumulating a tremendous amount of data, the question of how genomics can describe a bacterial species remains to be fully addressed. The recent sequencing of the genome of the Mycoplasma agalactiae type strain has challenged our general view on mycoplasmas by suggesting that these simple bacteria are able to exchange significant amount of genetic material via horizontal gene transfer. Yet, events that are shaping mycoplasma genomes and that are underlining diversity within this species have to be fully evaluated. For this purpose, we compared two strains that are representative of the genetic spectrum encountered in this species: the type strain PG2 which genome is already available and a field strain, 5632, which was fully sequenced and annotated in this study.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122262 PMCID: PMC2824730 DOI: 10.1186/1471-2164-11-86
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
General properties of M. agalactiae PG2 and 5632 strains
| PG2 | 5632 | |
|---|---|---|
| Date of isolation | 1952 | <1991 |
| Country | Spain | Spain |
| Source | nk | articulation |
| Host | caprine | caprine |
| Genome size (bp) | 877,438 | 1,006,702 |
| G+C (%) | 29.70 | 29.62 |
| Gene density (%) | 88.5 | 88.7 |
| Total number of CDS | 752 | 826 |
| HP (Hypothetical Protein) | 138 | 148 |
| CHP (Conserved HP) | 186 | 150 |
| CDS with predicted function | 404 | 505 |
| Pseudogenes | 45 | 23 |
| rRNAs sets | 2 | 2 |
| tRNAs | 34 | 34 |
| GenBank accession number | ||
| ICE number | (1 vestigial) | 3 (+2 vestigial) |
| Transposase number | 1a | 15 |
| (+2 pseudogenes) | (+2 pseudogenes) | |
| Genomic DNA digested by: | ||
| Yes | No | |
| Yes | Yes | |
| Relative colony sizeb | 100% | 180% |
Data were extracted from the MolliGen database http://cbi.labri.fr/outils/molligen/.
a One CDS, MAG3410, was annotated as transposase and was detected by proteomics analysis in this study but no inverted repeat sequences could be found.
bRelative colony sizes as defined on agar medium with PG2 as reference. Repeatedly colonies of 5632 were found to be approximately 1.8 times larger than those of PG2.
nk, not known
Figure 1Overall comparison of . (A) VISTA comparison [61]. The graph represents the sequence nucleotide identity (in %) using a sliding window of 100 bp and the 5632 genome as a reference. Colored boxes represent gene families or ICE (orange for the drp genes, yellow for the vpma, green for the spmas, and purple for the ICEs); blue triangles insertion sequence (IS) (dark blue for ISMag1, light blue for ISMag2). Filled orange and blue circles represent respectively the p48 lipoprotein gene and CDSs related to restriction-modification systems. Boxes or triangles surrounded with dotted lines indicate pseudogenes or ICE vestiges. (B) Comparison of CDSs using the MolliGen dot plot alignment [58]. Each dot represents a blastp hit (threshold 10-8) between a CDS of 5632 (ordinates) and a CDS of PG2 (abscises). On axes, the length between two large marks corresponds to 100 kbp. (C) Circular representation of 5632 genome using the Artemis suite DNAplot [63]. Outer to inner circles correspond to: circle1, 5632 mobilome with IS in red and ICEs in purple (the position of the unique vestigial ICE of strain PG2 is also indicated); circle 2, CDS predicted as implicated in HGT with mycoplasmas of the "mycoides" group; circle 3, positive strand annotated CDSs; circle 4, negative strand annotated CDSs; circle 5, CDS of interest discussed in the text (color code as in panel 1); circle 6, CDS predicted as lipoproteins; circle 7, percent G+C content (high G+C content in dark grey and low G+C content in light grey); circle 8, GC skew.
Figure 2Comparison of entire and vestigial ICEs found in . Schematics represent ICEs encountered in 5632 (A) and in PG2 type strain (B). Large arrows represent CDSs, with homologous CDSs labelled with the same color. CDS nomenclature indicated below arrows is based on the first ICE study in 5632 [16]. ICEA5632-I, -II, -III, -IV, -V extend from MAGa7100 to 6880, MAGa2980 to 3220, MAGa4850 to 5060, MAGa4050 to 4010, MAGa3690 to 3670, respectively. ICEAPG2 extend from MAG4060 to 3860. Red crosses indicate SNPs or indels in between ICEs from 5632. Insertion sequence elements (ISMag1) are represented by shaded boxes with transposase CDS in light blue. Pseudogenes are represented by hatched colours with dotted lines.
Figure 3Location of insertion sequences and their flanking sequences in . Schematics representing genomic regions that flank insertion sequence (IS) elements in strain 5632. Large arrows represent CDSs. IS elements are represented by blue boxes filled with straight lines for ISMag1 or wavy lines for ISMag2 with the transposases being indicated by open arrows filled with light blue. CDSs predicted as implicated in HGT with mycoplasmas of the "mycoides" group are filled by plain orange for drp genes or by a dotted orange pattern for the others. MAGa7110 and MAGa7120 that represent a pseudogene of transposase also predicted has implicated in HGT with the "mycoides" group are filled with hatched orange. Short lines with an asterisk (*) or an X below indicates the presence of a 14-nucleotides or of a 25-nucleotides direct repeat flanking ISMag1 or ISMag2, respectively. Pseudogenes are indicated by arrows with dotted lines.
Figure 4Comparison of . Schematics representing the comparison of genomic regions containing drp genes in strains PG2 (MAG, upper schematics) and 5632 (MAGa; lower schematics). Large arrows represent (i) CDSs corresponding to drp genes (filled by plain orange with red outlines) (ii) CDSs others than drp and predicted as implicated in HGT with mycoplasmas of the "mycoides" group (filled by dotted orange) or (iii) CDSs conserved between PG2 and 5632 (filled by plain white). Insertion sequence elements are represented as in Figure 3. Drps detected by LC-MS/MS are labelled by an asterisk (corresponding in PG2 and 5632, respectively, to MAG2430 and MAG4220, MAGa2600 and MAGa7470). Pseudogenes are represented by large arrows with dotted lines. Limits of variable regions are indicated by dotted lines connecting the orthologous regions in both strains. Numbers above and below CDS correspond to MAG or MAGa mnemonics.
Restriction/Modification products comparison between strains PG2 and 5632
| MAGaa | MAGb | Product | Similarity (%) | MS/MSc 5632 | MS/MSc PG2 | Comments |
|---|---|---|---|---|---|---|
| MAGa1570 | MAG1530 | Type III R/M system:Methylase | 75.3 | + | + | |
| MAGa1580 | 77.6 | + | ||||
| MAGa1770 | MAG1790 | DNA methylase | 97.8 | - | - | |
| MAGa2070 | MAG2070 | DNA methylase | 98.9 | + | - | |
| MAGa2700 | Adenine-specific DNA methyltransferase | 65.8 | -d | - | Pseudogene in PG2 | |
| - | ||||||
| MAGa2710 | Type II restriction endonuclease ** | 46.1 | - | - | Pseudogene in PG2 | |
| - | ||||||
| No homolog | MAG3310 | CpG DNA methylase | na | na | - | |
| No homolog | MAG4030 | Conserved hypothetical protein | na | na | - | BBH: |
| MAGa4470 | MAG4250 | Pseudogene of CpG DNA methylase (N-terminal) | 83.4 | - | - | |
| MAGa4480 | MAG4260 | Pseudogene of CpG DNA methylase (C-terminal) | 94.7 | - | - | |
| MAGa6280 | MAG5640 | Type I R/M system specificity subunit | 75.0 | - | +d | Locus |
| MAGa6290 | MAG5650 | Modification (Methylase) protein of type I restriction-modification system HsdM | 98.3 | + | - | Locus |
| MAGa6310 | MAG5680 | Type I R/M system specificity subunit | 32.4 | - | +d | Locus |
| MAGa6330 | HsdR, R/M enzyme subunitR | 95.0 | + | - | Pseudogenes in PG2 | |
| - | ||||||
| MAGa6340 | MAG5720 | Type I R/M system specificity subunit | 30.9 | + | - | Locus |
| MAGa6350 | MAG5730 | Modification (Methylase) protein of type I restriction-modification system HsdM | 90.3 | + | + | Locus |
| MAGa7650 | MAG6680 | Modification methylase | 97.6 | + | - | Modification methylase |
| MAGa3200 MAGa5050 MAGa6900 | No homolog | CDSH | na | - | na | BBH: 92.0% with MCAP0297 - |
| MAGa4250 | No homolog | Modification methylase Bsp6I | na | + | na | BBH: 81.7% |
| MAGa4260 | No homolog | Type II restriction enzyme Bsp6I | na | + | na | BBH:55.1% |
| MAGa3950 | No homolog | Cytosine-specific methyltransferase | na | + | na | BBH: |
| MAGa3970 | No homolog | Type II site-specific deoxyribonuclease, | na | - | na | BBH: |
a, CDS of M. agalactiae strain 5632 (MolliGen Mnemonic).
b, CDS of PG2 (MolliGen Mnemonic), pseudogenes are indicated in italic.
c, Proteomic analyses (see materials and methods): (+) indicates that peptides were detected by MS/MS for the corresponding CDS, suggesting expression of the corresponding gene, (-) indicates that no specific peptides were detected for the corresponding CDS.
d, only one peptide detected.
e, MAG5640 and MAG5680 have common peptides.
BBH, Best Blast Hit; Mmm SC, Mycoplasma mycoides subsp. Mycoides SC; Mcap, M. capricolum subsp. capricolum; na, not applicable; R/M, Restriction/Modification
Lipoproteins and MS/MS detection in Tx-114 phase
| MAGaa | MAGb | Gene name | Product | Tx 5632c | Tx PG2c | Comments |
|---|---|---|---|---|---|---|
| MAGa0140 | MAG0120 | Conserved hypothetical protein, predicted lipoprotein, P48 | + | + | ||
| MAGa0380 | MAG0380 | Oligopeptide ABC transporter, substrate-binding protein (OppA), predicted lipoprotein | + | + | ||
| MAGa1090 | MAG1000 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa1140 | MAG1050 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa1490 | MAG1450 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa1550 | MAG1510 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa1620 | None | Conserved hypothetical protein, P48-like | na | No signal peptide and lipobox except if variation in the length of a poly G10 (+/-1) upstream the chosen start | ||
| MAGa1680 | MAG1670 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa1980 | MAG1980 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa1980 | MAG1980 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa2000 | MAG2000 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa2330 | MAG2220 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa2500 | MAG2340 | Conserved hypothetical protein, predicted lipoprotein | + | - | Not predicted as lipoprotein in PG2 due to variation of the length of a poly A (A6 in PG2, A7 in 5632) | |
| MAGa2510 | MAG2350 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa2570 | MAG2400 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa2580 | MAG2410 | P40, predicted lipoprotein | + | + | ||
| MAGa2600 | MAG2430 | Conserved hypothetical protein, predicted lipoprotein, DUF285 family | + | + | ||
| MAGa2670 | MAG2510 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa2690 | Hypothetical protein, Vpma-like, predicted lipoprotein | + | + | For PG2, only MAG2540 was detected and corresponds to the 5'coding end of a pseudogene in PG2 | ||
| MAGa2740 | MAG2610 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa2820 | MAG2690 | Alkylphosphonate ABC transporter, substrate-binding protein, predicted lipoprotein | + | + | ||
| MAGa2970 | MAG2840 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa3160 | None | CDS14 | + | na | ICE | |
| MAGa3250 | MAG2870 | Conserved hypothetical protein, predicted lipoprotein | + | - | None | |
| MAG2950 | Hypothetical protein, predicted lipoprotein | - | + | Variation of the length of a poly C (C9 in PG2, C8 in 5632) downstream of MAGa3330 may be responsible for frameshifting | ||
| MAGa3640 | MAG3240 | Conserved hypothetical protein, predicted lipoprotein | + | + | Not predicted as lipoprotein in PG2 | |
| MAGa3820 | Hypothetical protein, predicted lipoprotein | + | - | Variation of the length of a poly G (G8 in PG2, G9 in 5632) upstream of MAG3460 may be responsible for frameshifting | ||
| MAGa3830 | MAG3470 | P30, predicted lipoprotein | - | + | Mutation in the | |
| MAGa3980 | MAG3590 | Hypothetical protein, predicted lipoprotein | - | + | None | |
| MAGa3990 | MAG3600 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa4680 | MAG4460 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa5010 | None | CDS14 | + | na | ICE | |
| MAGa5110 | MAG4640 | Conserved hypothetical protein, predicted lipoprotein | - | + | None | |
| MAGa5190 | MAG4720 | Conserved hypothetical protein, predicted lipoprotein | - | - | MAGa5190 was detected in the insoluble pellet | |
| MAGa5210 | MAG4740 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa5420 | Conserved hypothetical protein, predicted lipoprotein | + | - | MAG4960+MAG4950 previously annotated as pseudogenes and detected in total proteins but not in detergent TX-114 phase | ||
| MAGa5490 | Noned | Hypothetical protein, predicted lipoprotein | + | + | CDS missed during annotation of PG2 (nt 586236 to 585832) | |
| MAGa5500 | MAG5030 | P80, predicted lipoprotein | + | + | ||
| MAGa5510 | MAG5040 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa5560 | MAG5080 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa5630 | MAG5150 | Hypothetical protein, predicted lipoprotein | + | + | Not predicted as lipoprotein in PG2 due to the start chosen during annotation | |
| MAGa5830 | None | Variable surface lipoprotein C (VpmaC) | + | na | Duplicated (MAG8080) | |
| MAGa5850 | None | Variable surface lipoprotein E (VpmaE) | + | na | Duplicated (MAGa8090) | |
| MAGa5860 | None | Variable surface lipoprotein F1 (VpmaF1) | + | na | Duplicated (MAGa8170) | |
| MAGa5870 | None | Variable surface lipoprotein D2 (VpmaD2) | + | na | Duplicated (MAGa8120) | |
| MAGa6560 | MAG5910 | 5'Nucleotidase, predicted lipoprotein | + | + | ||
| MAGa6940 | None | CDS14 | + | na | ICE | |
| MAGa7130 | MAG6170 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa7160 | MAG6200 | Hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa7470 | Hypothetical protein, predicted lipoprotein, DUF285 family | + | - | Variation of the length of a poly A (A6 in 5632, A7 in PG2) may be responsible for frameshift | ||
| MAGa7490 | MAG6520 | Conserved hypothetical protein, predicted lipoprotein | + | + | ||
| MAGa8040 | None | Variable surface lipoprotein G (VpmaG) | + | na | ||
| MAGa8050 | None | Variable surface lipoprotein F2 (VpmaF2) | + | na | ||
| MAGa8060 | MAG7070 | Variable surface lipoprotein X (VpmaX) | + | + | ||
| MAGa8070 | MAG7060 | Variable surface lipoprotein W (VpmaW) | + | + | ||
| MAGa8100 | None | Variable surface lipoprotein B (VpmaB) | + | na | Duplicated (MAGa8100) | |
| MAGa8110 | None | Variable surface lipoprotein A (VpmaA) | + | na | Duplicated (MAGa8110) | |
| MAGa8150 | None | Variable surface lipoprotein H (VpmaH) | + | na | ||
| MAGa8160 | None | Variable surface lipoprotein I (VpmaI) | + | na | ||
| MAGa8180 | None | Variable surface lipoprotein J (VpmaJ) | + | na | ||
| MAGa8210 | None | Variable surface lipoprotein D1 (VpmaD1) | + | na | Duplicated (MAGa5840) | |
| MAGa8260 | MAG7130 | Hypothetical protein, predicted lipoprotein | + | - | Not predicted as lipoprotein in PG2 due to a point mutation: | |
| None | MAG1570 | Hypothetical protein | - | + | No signal peptide and lipobox except if variation of the length of a poly G9 (+/-1) next to the chosen start | |
| None | MAG7050 | Variable surface lipoprotein V (VpmaV) | na | + | ||
| None | MAG7080 | Variable surface lipoprotein Y (VpmaY) | na | + | ||
| None | MAG7090 | Variable surface lipoprotein U (VpmaU) | na | + | ||
| None | MAG7100 | Variable surface lipoprotein Z (VpmaZ) | na | + |
a CDS of M. agalactiae strain 5632 (Molligen Mnemonic).
b CDS of PG2 (Molligen Mnemonic), pseudogenes are indicated in italic and bold.
c Peptides detected by MS/MS in the Triton-X114 phase (Tx) (see the Methods section): (+) indicates that peptides corresponding to CDS were detected, suggesting expression of the corresponding gene, (-) indicates that no peptides corresponding to CDS were detected.
d CDS detected in proteomic but for which no Mnemonic was defined because it was missed during the annotation of the PG2 genome [12].
na, not applicable.
Figure 5The . Schematics representing the genomic organization of the spma loci in strains PG2 and 5632 (A) and the structural features of the corresponding spma gene products in both strains (B). In panel A, CDS corresponding to spma genes are filled in green. The S letter represents sequence corresponding to a signal peptide. Other CDSs conserved between PG2 and 5632 are filled by light yellow. Tracks of repeated nucleotides (Gn, where n is the number of residues) found before spma coding sequences are also indicated above the line. In panel B, predicted Spma proteins are represented schematically by large arrows beginning generally with a homologous amino-acid leader sequence (black boxes labelled S) followed by regions that have homology between spma gene products or that are repeated within the same product (blue dotted and grey boxes).
Figure 6Analysis of the p48-like sequence of . Schematic represents the p48-like genomic region (A). CDSs are represented by large arrows with MAGa1620 corresponding to p48-like gene filled in blue. Translation of the DNA region flanking the polyG track is given in the three frames (B). The polyG tract suspected to vary in length (G10 +/-1) is underlined by a bold red bar. The putative beginning of a P48-like lipoprotein with an entire signal peptide sequence is shaded in red while the current annotated MAGa1620 open reading frame is in blue. Global amino-acid alignment results obtained with Needle (program available at http://www.ebi.ac.uk/Tools/emboss/align/) between the P48-like of M. agalactiae 5632 and the P68 lipoprotein of M. bovis PG45 for which a similar polyG tract was previously described [48], are of 89.3% (identity) and 92.1% (similarity).