| Literature DB >> 17974548 |
Blessing Tawari1, Ibne Karim M Ali, Claire Scott, Michael A Quail, Matthew Berriman, Neil Hall, C Graham Clark.
Abstract
Genome sequencing of the protistan parasite Entamoeba histolytica HM-1:IMSS revealed that almost all the tRNA genes are organized into tandem arrays that make up over 10% of the genome. The 25 distinct array units contain up to 5 tRNA genes each and some also encode the 5S RNA. Between adjacent genes in array units are complex short tandem repeats (STRs) resembling microsatellites. To investigate the origins and evolution of this unique gene organization, we have undertaken a genome survey to determine the array unit organization in 4 other species of Entamoeba-Entamoeba dispar, Entamoeba moshkovskii, Entamoeba terrapinae, and Entamoeba invadens-and have explored the STR structure in other isolates of E. histolytica. The genome surveys revealed that E. dispar has the same array unit organization as E. histolytica, including the presence and numerical variation of STRs between adjacent genes. However, the individual repeat sequences are completely different to those in E. histolytica. All other species of Entamoeba studied also have tandem arrays of clustered tRNA genes, but the gene composition of the array units often differs from that in E. histolytica/E. dispar. None of the other species' arrays exhibit the complex STRs between adjacent genes although simple tandem duplications are occasionally seen. The degree of similarity in organization reflects the phylogenetic relationships among the species studied. Within individual isolates of E. histolytica most copies of the array unit are uniform in sequence with only minor variation in the number and organization of the STRs. Between isolates, however, substantial differences in STR number and organization can exist although the individual repeat sequences tend to be conserved. The origin of this unique gene organization in the genus Entamoeba clearly predates the common ancestor of the species investigated to date and their function remains unclear.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17974548 PMCID: PMC2652664 DOI: 10.1093/molbev/msm238
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1Phylogenetic relationships among Entamoeba species. The tree depicted is redrawn from that in Clark, Kaffashian, et al. (2006). Species producing cysts with different nuclear number are indicated. Species referred to in this work are shown in boldface. The scale bar represents the evolutionary distance equivalent to 0.1 changes per site.
Comparative Organization of tRNA Array Units in Entamoeba Species
| tRNA | ||||
|---|---|---|---|---|
| AlaAGC | [AAGC] | [AAGC] | [AAGC], [ADSS] | [AAGC] |
| AlaTGC | [ASD] | [ADSSD] | [ADSD], [ADRML] | [ASS] |
| AlaCGC | [ALL] | [ALL] | [ALLSL5] | [ALI] |
| ArgACG | [R5] | [R5] | [IR] | [VMEDR5E] |
| ArgCCG | # | # | # | # |
| ArgCCT | [RT] | [RT] | [PTRL] | [PPTRL] |
| ArgTCG | [MR] | [MR] | [MRM5], [ADRML] | [MR] |
| ArgTCT | [RTCT] | [RTCT] | [RTCT] | [RTCT] |
| AsnGTT | [NK] | [NK] | [NK1], [NK2] | [NKQCK], [NK] |
| AspGTC | [ASD], [SD] | [ADSSD] | [ADSD], [ADSS], [ADRML] | [VMEDR5E], [FVDTX] |
| CysGCA | [SPPCK], [SQCK] | [SPPPCK], [SQCK] | [CK], [SQCK] | [NKQCK] |
| GlnCTG | [SQCK] | [SQCK] | [SQCK], [SQK] | [NKQCK] |
| GlnTTG | [TQ] | [TQ] | [VQ51], [VQ52] | [TQQ] |
| GluCTC | [VME5] | [VME5] | [VME] | [VMEDR5E] [EIDLLL] |
| GluTTC | [YE] | [YE] | [YE] | [VMEDR5E], [YE] |
| GlyCCC | # | # | # | # |
| GlyGCC | [GGCC] | [GGCC] | [GGCC] | [GGCC] |
| GlyTCC | [GTCC] | [GTCC] | [GTCC] | #, #SGI |
| HisGTG | [HGTG] | [HGTG] | [HGTG] | [HGTG] |
| IleAAT | [WI] | [WI1], [WI2] | [IR], [WI1], [WI2] | [WI], [EIDLLL] |
| IleTAT | # | # | # | [ALI], #SGI |
| LeuAAG | [LT] | [LT], [PL] | [PTRL], [LT], [PL] | [PPTRL] |
| LeuCAG | [LS] | [LS] | [ALLSL5], [LLLSL5] | [LS] |
| LeuCAA | [ALL] | [ALL] | [ALLSL5], [LLLSL5] | [ALI], [EIDLLL] |
| LeuTAA | [ALL], # | [ALL], # | [ALLSL5], [ADRML], [LLLSL5] | [EIDLLL] |
| LeuTAG | # | # | # | # |
| LysCTT | [NK] | [NK] | [NK1], [NK2] | [NKQCK], [NK] |
| LysTTT | [SPPCK], [SQCK] | [SPPPCK], [SQCK] | [SQK], [CK], [SQCK] | [NKQCK] |
| eMetCAT | [MR] | [MR] | [MRM5], [ADRML] | [MR] |
| iMetCAT | [VME5] | [VME5], [MV5] | [VME] | [VMEDR5E] |
| PheGAA | [VF] | [VF] | [VF] | [FVV5], [FVDTX] |
| ProCGG | [SPPCK] | [SPPPCK] | [SPPP] | [PPTRL] |
| ProAGG | [SPPCK] | [SPPPCK] | [SPPP] | [SPP] |
| ProTGG | [PTGG] | [SPPPCK] | [PTRL], [PL], [SPPP] | [SPP], [PPTRL] |
| SerAGA | [SPPCK], [SQCK] | [SPPPCK], [SQCK] | [SQK], [SQCK], [SPPP] | [SPP] |
| SerGCT | [ASD] | [ADSSD] | [ADSD], [ADSS] | [ASS] |
| SerCGA | [LS] | [LS] | [ALLSL5], [LLLSL5] | [LS], #SGI |
| SerTGA | [SD] | [ADSSD] | [ADSS] | [ASS] |
| ThrAGT | [LT] | [LT], [RT] | [PTRL], [LT] | [PPTRL] |
| ThrCGT | [TQ] | [TQ] | # | [TQQ] |
| ThrTGT | [TX] | [TX] | [TX] | [FVDTX] |
| TrpCCA | [WI] | [WI] | [WI1], [WI2] | [WI] |
| TyrGTA | [YE] | [YE] | [YE] | [YE] |
| ValCAC | [VME5] | [VME5] | [VQ51], [VQ52] | [FVV5] |
| ValGAC | [VF] | [VF] | [VF] | [FVV5], [FVDTX] |
| ValTAC | [V5] | [MV5] | [VME] | [VMEDR5E] |
Note.—The tRNA gene content of each array is shown using the single-letter amino acid code. The 5S RNA gene is identified by “5” when present. Where 2 distinct variants of an array exist, the numbers 1 and 2 are appended to the gene complement. The E. invadens array containing only 5S RNA genes is not listed. # indicates dispersed tRNA genes. The dispersed cluster of 3 genes in E. invadens is indicated by #SGI.
Unit organization in Entamoeba histolytica HM-1:IMSS (Clark, Ali, et al. 2006) and E. dispar SAW760 is identical except for [NK], which exists as 2 distinct arrays in E. histolytica but only 1 in E. dispar.
X is a gene encoding the same unidentified small RNA in one array in each species (Banerjee and Lohia 2003; Clark, Ali, et al. 2006).
Fig. 2Mapping of array organization onto phylogenetic tree. The relationships among surveyed species from figure 1 (Entamoeba invadens branch shortened for simplicity) are shown adjacent to a depiction of the corresponding array. Arrows indicate the orientation of the tRNA/5S RNA gene and contain the single-letter amino acid code and anticodon for the encoded tRNA. (A) The array unit organization involving the gene encoding tRNA ValTAC. (B) The array unit organization involving the genes encoding tRNAs SerGCT and SerTGA.
Fig. 3Array organization of genes encoding ProTGG. The relationships among surveyed species plus Entamoeba ecuadoriensis from figure 1 (Entamoeba invadens branch shortened for simplicity) are shown adjacent to a depiction of the corresponding tRNA-Pro-encoding arrays. Arrows indicate the orientation of the tRNA gene and contain the single-letter amino acid code and anticodon for the encoded tRNA.
Base Composition of Array Intergenic Regions
| Organism | Mean | Range | Mean | Range |
|---|---|---|---|---|
| 80.0 | 77.7–83.7 | 70.8 | 62.5–76.3 | |
| 81.5 | 79.3–85.3 | 67.3 | 59.9–69.8 | |
| 69.3 | 63.2–71.4 | 75.8 | 56.6–79.7 | |
| 71.2 | 67.6–74.3 | 53.2 | 50.4–56.8 | |
| 81.8 | 66.4–84.7 | 56.9 | 51.0–74.7 |
Fig. 4Intraspecific differences in STR organization in the Entamoeba histolytica R-R intergenic region. Sequence types identified through sequencing of STR regions from different isolates are shown. Blocks of STRs are indicated, with distinct repeat sequences being assigned different shading. The arrow indicates the position of the tRNA gene and contain the 3-letter amino acid code and anticodon for the encoded tRNA. The complete [RTCT] array unit is depicted as there is only 1 tRNA gene in this array.
Fig. 5Intraspecific differences in STR organization in the Entamoeba histolytica STGA-D intergenic region. Sequence types identified through sequencing of STR regions from different isolates are shown. Blocks of STRs are indicated, with distinct repeat sequences being assigned different shading. The arrows indicate the position of the tRNA genes and contain the 3-letter amino acid code and anticodon for the encoded tRNA.
Fig. 6Intraspecific differences in STR organization in the Entamoeba histolytica N-K2 intergenic region. Sequence types identified through sequencing of STR regions from different isolates are shown. Blocks of STRs are indicated, with distinct repeat sequences being assigned different shading. The arrows indicate the position of the tRNA genes and contain the 3-letter amino acid code and anticodon for the encoded tRNA.