| Literature DB >> 18765438 |
Takeshi Tsuru1, Ichizo Kobayashi.
Abstract
It has been assumed that an open reading frame (ORF) represents a unit of gene evolution as well as a unit of gene expression and function. In the present work, we report a case in which a unit comprising the 3' region of an ORF linked to a downstream intergenic region that is in turn linked to the 5' region of a downstream ORF has been conserved, and has served as the unit of gene evolution. The genes are tandem paralogous genes from the bacterium Staphylococcus aureus, for which more than ten entire genomes have been sequenced. We compared these multiple genome sequences at a locus for the lpl (lipoprotein-like) cluster (encoding lipoprotein homologs presumably related to their host interaction) in the genomic island termed nuSaalpha. A highly conserved nucleotide sequence found within every lpl ORF is likely to provide a site for homologous recombination. Comparison of phylogenies of the 5'-variable region and the 3'-variable region within the same ORF revealed significant incongruence. In contrast, pairs of the 3'-variable region of an ORF and the 5'-variable region of the next downstream ORF gave more congruent phylogenies, with distinct groups of conserved pairs. The intergenic region seemed to have coevolved with the flanking variable regions. Multiple recombination events at the central conserved region appear to have caused various types of rearrangements among strains, shuffling the two variable regions in one ORF, but maintaining a conserved unit comprising the 3'-variable region, the intergenic region, and the 5'-variable region spanning adjacent ORFs. This result has strong impact on our understanding of gene evolution because most gene lineages underwent tandem duplication and then diversified. This work also illustrates the use of multiple genome sequences for high-resolution evolutionary analysis within the same species.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18765438 PMCID: PMC2568036 DOI: 10.1093/molbev/msn192
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FThe lpl homologs in Staphylococcus aureus genome and their phylogenetic tree. (A) Location of four lpl loci, Locus 0 through Locus III, on the genome of strain N315. Note that the lpl homologs are found in the corresponding loci in all the sequenced S. aureus strains. (B) A nucleotide NJ phylogenetic tree for the lpl ORFs and their homologs in two other Staphylococcus species. The uncondensed version of this tree is presented in supplementary figure S1 (Supplementary Material online).
FStructure of lpl genes (top) with similarity plots for nucleotide sequences (middle) and for amino acid sequences (bottom). The central conserved region is highlighted by gray shading. A predicted signal peptide region is indicated together with a conserved cysteine residue at the C-terminus.
FPhylogenetic comparison of the 5′ regions and 3′ regions of lpl ORFs. A nucleotide NJ phylogeny for 5′-variable regions and one for 3′-variable regions were compared with each other by connecting OTUs in a pair of 3′-variable region of an ORF and 5′-variable region of its downstream ORF (A) and in a pair within an ORF (B). The bootstrap values (%) were obtained from 1,000 resamplings. Groups of the conserved pairs are indicated in different colors of connecting lines in (A). A paraphyletic group, B1, and the other monophyletic groups in each phylogeny corresponding to the groups of the pairs in (A) are indicated by boxes both in (A) and (B).
Statistics of Nucleotide Sequence Alignments for lpl ORFs and Their Intergenic Region
| Pairwise Identity | ||||
| Name | Number of Sequences | Length in Alignment (bp) | Minimum–Maximum (%) | Average (%) |
| 5′-variable region | ||||
| All | 48 | 120 | 41–100 | 65 |
| a | 6 | 96 | 89–100 | 94 |
| b | 3 | 96 | 94–98 | 96 |
| c | 3 | 87 | 100 | 100 |
| d | 2 | 96 | NR | 93 |
| e | 3 | 108 | 99–100 | 99 |
| f | 3 | 108 | 92–96 | 94 |
| g | 1 | 108 | NR | NR |
| h | 2 | 108 | NR | 98 |
| i | 2 | 108 | NR | 93 |
| j | 2 | 108 | NR | 95 |
| k | 11 | 96 | 80–100 | 88 |
| l | 1 | 96 | NR | NR |
| m | 9 | 120 | 90–100 | 96 |
| Central conserved region | ||||
| All | 48 | 132 | 79–100 | 88 |
| 3′-variable region | ||||
| All | 48 | 574 | 52–100 | 66 |
| A | 8 | 561 | 87–100 | 94 |
| B1 | 3 | 543 | 98–100 | 99 |
| B2 | 2 | 543 | NR | 99 |
| C | 2 | 561 | NR | 98 |
| D | 3 | 546 | 98–100 | 99 |
| E | 3 | 561 | 91–92 | 92 |
| F | 3 | 565 | 88–93 | 91 |
| G | 6 | 561 | 91–100 | 94 |
| H | 2 | 557 | NR | 99 |
| I | 1 | 567 | NR | NR |
| J | 3 | 552 | 99–100 | 99 |
| K | 2 | 558 | NR | 89 |
| L | 1 | 552 | NR | NR |
| M | 9 | 567 | 82–100 | 90 |
| Intergenic region | ||||
| A–a | 6 | 18 | 100 | 100 |
| B1–e | 3 | 31 | 100 | 100 |
| B2–h | 2 | 31 | NR | 97 |
| C–g | 1 | 18 | NR | NR |
| C–k | 1 | 18 | NR | NR |
| D–f | 3 | 30 | 97–100 | 98 |
| E–i | 2 | 51 | NR | 98 |
| F–b | 3 | 47 | 89–98 | 93 |
| G–k | 3 | 47 | 92–96 | 94 |
| H–j | 2 | 59 | NR | 100 |
| J–c | 3 | 69 | 97–100 | 98 |
| K–d | 2 | 28 | NR | 93 |
| L–l | 1 | 60 | NR | NR |
| M–m | 9 | 223 | 81–100 | 92 |
NOTE.—NR, not relevant.
FMultiple alignments of the lpl intergenic regions. “A–a” represents, for example, an intergenic region sandwiched by the 3′-variable region of “A” group and the 5′-variable region of “a” group. A putative ribosome-binding site (Novick 1991), which could be found in all except for “K–d”, is indicated by asterisks.
F(A) Schematic maps of lpl ORFs. Naming and coloring for 5′-variable region and 3′-variable region are after the grouping in figure 4. For an indel found between USA300 and COL and one between COL and NCTC8325, an apparently deleted region and the regions involved in recombination are indicated by dotted lines and numbered thick lines, respectively. (B) Alignment of the regions 1, 2, and 3 in (A) suggesting a recombination relationship between them (1 × 2 → 3). The central conserved region is indicated above the alignment. (C) Alignment of the regions 3, 4, and 5 in (A) suggesting a recombination relationship between them (3 × 4 → 5).
FStructure of genes of SA1317 homologs (top) with similarity plots for nucleotide sequences (middle) and for amino acid sequences (bottom). A central conserved region is highlighted by gray shading.
FPhylogenetic comparison of the 5′ regions and 3′ regions of SA1317 homologs. A nucleotide NJ phylogeny for 5′-variable regions and one for 3′-variable regions were compared with each other by connecting OTUs in a pair of 3′-variable region of an ORF and 5′-variable region of its downstream ORF (A) and in a pair within an ORF (B). The bootstrap values (%) were obtained from 1,000 resamplings. Groups of each phylogeny were assigned so as the mutual evolutionary distances remain equal to or shorter than 0.15 within a group.
FMaps of SA1317 homolog clusters in various Staphylococcus aureus strains. The SA1317 homologs are drawn in bold lines. Naming of their 5′-variable region and 3′-variable region is after the tree-based grouping in figure 9. The larger intervening ORF, SAU1320 and its homologous genes, is observed in all the strains except for MRSA252, whereas the shorter intervening ORF, SAA1377 and its homologs, is observed in strains USA300, COL, and NCTC8325. SAB1350 and SAB1349 in RF122 are truncated genes homologous to SAU1320. Insertion of a prophage into the larger intervening ORF observed in USA300, NCTC8325, and MW2 is indicated by a black triangle.
FAn elementary process of diversification through homologous recombination between the central conserved regions. A crossing-over will change combinations of the 5′-variable region and the 3′-variable region of an ORF, but linkage of 3′-variable region of an ORF, its downstream intergenic region, and 5′-variable region of its downstream ORF will be maintained.
FFormation of various types of rearrangements is explained by multiple rounds of crossing-over events at the central conserved region (gray bar). (A) An unequal crossing-over between sister chromosomes ([b]) will cause deletion ([a] to [c]) and duplication ([a] to [d]). An additional unequal crossing-over ([e]) will result in apparent conversion ([a] to [f]). An intra-chromosomal, unequal crossing-over ([g]) can form a circle ([h]) and a deletion ([a] to [i]), and ensuing re-integration of the circle ([j]) will results in apparent translocation ([a] to [k]). The two routes of deletion formation ([a] to [c]; [a] to [i]) can result in apparent substitution ([c] and [i]). (B) Inter-molecular recombination involving horizontal gene transfer also explains formation of various types of rearrangements. Double cross-overs between a donor [l] and a recipient [m] will cause deletion ([m] to [n]; [m] to [o]), resulting in apparent substitution ([n] and [o]). Additional inter-molecular recombination between [n] and [o] can result in an apparent conversion ([m] to [p]). Another round of recombination between [p] and [n] can result in an apparent translocation ([m] to [q]).
Tandem Gene Clusters in Staphylococcus aureus N315 Genome
| Cluster Name | Genes in N315 |
| Lipoprotein-like (Lpl; Locus I) | SAU0092, SAU0092, SAU0093, SAU0094, SAU0095, SAU0096 |
| Hypothetical protein | SAU0282, SAU0286, SAU0287, SAU0288, SAU0289, SAU0290 |
| Superantigen-like (Ssl; νSaα) | SAU0382, SAU0383, SAU0384, SAU0385, SAU0386, SAU0387, SAU0388, SAU0389, SAU0390 |
| Lipoprotein-like (Lpl; νSaα) | SAU0396, SAU0397, SAU0398, SAU0399, SAU0400, SAU0401, SAU0402, SAU0403, SAU0404, SAU0405 |
| Ser–Asp–rich fibrinogen-binding protein | SAU0519, SAU0520, SAU0521 |
| Superantigen-like protein | SAU1009, SAU1010, SAU1011 |
| ECM-binding protein homologue | SAU1267, SAU1268 |
| Hypothetical protein | SAU1317, SAU1318, SAU1319, SAU1321 |
| Serine protease-like (Spl; νSaβ) | SAU1627, SAU1628, SAU1629, SAU1630, SAU1631 |
| Enterotoxin | SAU1642, SAU1643, SAU1644, SAU1645, SAU1646, SAU1647, SAU1648 |
| Hemolysin | SAU2207, SAU2208, SAU2209 |
| Hypothetical protein | SAU2263, SAU2264, SAU2265 |
| Lipoprotein-like (Lpl; Locus III) | SAU2269, SAU2273, SAU2274, SAU2275 |
| Fibronectin-binding protein | SAU2290, SAU2291 |