| Literature DB >> 24625193 |
Mali Mærk, Jostein Johansen, Helga Ertesvåg, Finn Drabløs1, Svein Valla.
Abstract
BACKGROUND: Gene duplication and horizontal gene transfer are common processes in bacterial and archaeal genomes, and are generally assumed to result in either diversification or loss of the redundant gene copies. However, a recent analysis of the genome of the soil bacterium Azotobacter vinelandii DJ revealed an abundance of highly similar homologs among carbohydrate metabolism genes. In many cases these multiple genes did not appear to be the result of recent duplications, or to function only as a means of stimulating expression by increasing gene dosage, as the homologs were located in varying functional genetic contexts. Based on these initial findings we here report in-depth bioinformatic analyses focusing specifically on highly similar intra-genome homologs, or synologs, among carbohydrate metabolism genes, as well as an analysis of the general occurrence of very similar synologs in prokaryotes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24625193 PMCID: PMC4022178 DOI: 10.1186/1471-2164-15-192
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Distribution of carbohydrate metabolism and transport protein families in and pseudomonad genomes. The number of intra-genome protein families (identified using OrthoMCL) assigned to the functional categories a) carbohydrate metabolism and b) carbohydrate transport in the genomes of A. vinelandii DJ and 15 fully sequenced strains in the genus Pseudomonas. This illustrates that for carbohydrate metabolism the number of synologs is clearly higher in the A. vinelandii genome compared to the Pseudomonas strains included in this study. The total number of families for each genome is shown as stacked columns, with block patterning indicating the number of proteins in the identified families. Members of the same intra-genome protein family were regarded as synologs.
Summary of statistical data for SEED data sets
| All synolog pairs | ≥90% protein sequence identity between synolog pairs | |||||
|---|---|---|---|---|---|---|
| Median ± MAD | Min | Max | Median ± MAD | Min | Max | |
| Number of carbohydrate metabolism synologs | 49.0 ± 35.0 | 0 | 394 | 0.0 ± 0.0 | 0 | 47 |
| Number of carbohydrate metabolism synolog groups | 20.0 ± 13.0 | 0 | 128 | 0.0 ± 0.0 | 0 | 16 |
| Average protein sequence identity between synolog pairs2 [%] | 36.8 ± 4.2 | 13.4 | 100.0 | 97.3 ± 1.8 | 90.0 | 100.0 |
| Synolog fraction of carbohydrate metabolism genes [%] | 30.0 ± 9.7 | 0.0 | 85.7 | 0.0 ± 0.0 | 0.0 | 34.6 |
1Median, minimum and maximum values for the carbohydrate metabolism gene set extracted from 943 prokaryote genomes in the SEED database [19], with no set cutoff and with a cutoff set at 90% protein sequence identity between synologs. Synologs are here defined as intra-genome sequences assigned to the same FIGfam (see text). The synolog fraction describes the ratio of the total number of synologs relative to the total number of genes in a genome. MAD is median absolute deviation. The median number of carbohydrate metabolism genes in the data set was 160.0 ± 74.0. The minimum and maximum numbers of carbohydrate metabolism genes observed among the included genomes were 4 and 585 genes, respectively.
2Calculated from the genomes containing carbohydrate metabolism synologs at the given cutoff.
Figure 2Distribution of synolog groups, synolog fractions and average synolog sequence identity for carbohydrate metabolism genes. Distribution of a) number of synolog groups, b) synolog fractions and c) average synolog sequence identity at the protein level for carbohydrate metabolism synologs in a data set consisting of 943 [a)-b) ] or 897 [c) ] bacterial and archaeal genomes, illustrating that a high fraction of very similar carbohydrate metabolism synologs is rare among the genomes included in this analysis. Synologs are here defined as intra-genome sequences assigned to the same FIGfam in the SEED database [19]. The synolog fraction describes the ratio of the total number of carbohydrate metabolism synologs relative to the total number of carbohydrate metabolism genes in a genome.
Figure 3Occurrence of synologs with ≥90 % identity among carbohydrate metabolism proteins in bacteria and archaea. Number of carbohydrate metabolism synolog groups with internal protein sequence identity ≥90% identified in 943 investigated prokaryotic genomes. Synolog groups are here defined as two or more intra-genome sequences assigned to the same FIGfam in the SEED database [19].
The genomes with the highest levels of very similar carbohydrate metabolism synologs
| Genome | Number of synolog groups | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | Central carbohydrate metabolism | Organic acids | Di- and oligo-saccharides | Fermentation | One-carbon metabolism | CO 2 fixation | Amino-sugars | Poly-saccharides | Carbohydrates - no sub-category | Sugar alcohols | Mono-saccharides | |
|
| ||||||||||||
|
| 16 | 3 | - | 2 | 5 | - | - | - | - | - | - | 6 |
|
| 15 | 8 | - | 2 | 5 | - | - | - | - | - | - | - |
|
| 12 | 3 | 1 | - | 4 | - | - | - | - | - | 1 | 3 |
|
| 10 | 2 | - | - | - | - | - | - | - | - | 7 | 1 |
|
| 9 | 4 | 4 | - | - | - | - | - | - | - | 1 | - |
|
| 9 | 3 | - | - | 3 | - | - | - | - | - | - | 3 |
|
| 9 | - | - | - | 5 | 1 | - | - | - | - | - | 3 |
|
| 9 | 2 | - | - | - | - | 6 | - | - | - | - | 1 |
|
| 9 | 8 | - | - | 1 | - | - | - | - | - | - | - |
|
| 8 | 4 | 1 | - | - | 3 | - | - | - | - | - | - |
|
| 8 | 3 | 1 | - | 2 | 2 | - | - | - | - | - | - |
|
| 8 | 2 | - | - | 4 | 1 | - | - | - | - | - | 1 |
|
| ||||||||||||
|
| 12 | 10 | - | 2 | - | - | - | - | - | - | - | - |
|
| 8 | 7 | - | 1 | - | - | - | - | - | - | - | - |
|
| ||||||||||||
|
| 16 | 6 | - | 1 | 1 | 1 | - | - | - | - | - | 7 |
|
| 16 | 2 | - | 5 | - | 1 | - | - | - | 1 | 2 | 5 |
|
| 12 | 3 | - | 2 | - | - | - | - | - | - | 2 | 5 |
|
| 11 | 3 | - | 6 | 1 | - | - | 1 | - | - | - | - |
|
| 8 | - | - | - | - | - | - | - | - | - | 8 | - |
|
| ||||||||||||
|
| 9 | 1 | - | 6 | - | - | - | - | - | 1 | 1 | - |
1The table lists the twenty genomes with the largest number of synolog groups among carbohydrate metabolism genes when a threshold of at least 90% amino acid sequence identity was used. The data set was extracted from the SEED database [19] and synologs were defined as intra-genome sequences assigned to the same FIGfam (see text). The total number of such synolog groups in these genomes as well as their distribution in the eleven subcategories defined in the SEED database is shown. The median number of synolog groups for the genomes in this data set was 2.0 ± 1.0.
2Opportunistic pathogen.
3All synologs in this table were evaluated manually with regards to genomic context. The manual evaluation revealed that several of the synologs in V. cholerae MZO-3, E. coli B7A, S. pneumoniae OXC141, S. mitis NCTC 12261 and E. coli E110019 might be mistakenly identified as highly similar synologs due to overlapping contigs or the presence of truncated sequences. These sequences were therefore disregarded in interpretation of the results.
Figure 4Genomic contexts for carbohydrate metabolism synologs in DJ with ≥ 90 % identity. The figure shows that the highly similar A. vinelandii DJ carbohydrate metabolism synologs identified in this study are found in differing genomic contexts. Synologs are here defined as intra-genome sequences assigned to the same FIGfam in the SEED database [19] and a threshold was set to include only synologs displaying at least 90% protein sequence identity. Synolog groups are highlighted as same-coloured arrows. Striped arrows represent other genes annotated as carbohydrate metabolism genes, checkered arrows represent genes annotated as aromatic compounds metabolism genes, meshed arrows represent genes annotated as electron transport genes, white arrows represent genes annotated as belonging to other functional categories, and dotted arrows represent genes annotated as encoding hypothetical proteins or proteins of unknown function. Genes that had not been assigned a gene name in the annotated A. vinelandii DJ genome are marked with the number corresponding to their respective geneIDs, which in the genome annotation are written on the form Avin#####[18].