| Literature DB >> 20637079 |
Franc-Christophe Baurens1, Stéphanie Bocs, Mathieu Rouard, Takashi Matsumoto, Robert N G Miller, Marguerite Rodier-Goud, Didier MBéguié-A-MBéguié, Nabila Yahiaoui.
Abstract
BACKGROUND: Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW).Entities:
Mesh:
Year: 2010 PMID: 20637079 PMCID: PMC3017797 DOI: 10.1186/1471-2229-10-149
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Features of M. balbisiana BAC sequences containing RGA08 clusters
| RGA cluster | Flanking sequences | |||
|---|---|---|---|---|
| B1 | B2 | B1 | B2 | |
| Size (bp) | 151959 | 131218 | 101407 | 92476 |
| Exons & ψexons (%) | 36.1 | 33.7 | 18.1 | 19.1 |
| Introns & ψintrons (%) | 13.4 | 13.3 | 33.7 | 30.1 |
| Intergenic (%) | 33.6 | 36.7 | 42.9 | 49.9 |
| TE (%) | 15.7 | 15.3 | 4.4 | 0 |
| SSR (%) | 1.2 | 0.9 | 0.9 | 0.9 |
| Gene & ψgene density (kb/gene) | 6.9 | 6.6 | 6 | 5.4 |
| Total predicted gene & ψgene number (RGA & ψRGA) | 22 (18) | 20 (13) | 17 | 17 |
| ψgene number (ψRGA) | 12 (8) | 14 (7) | 4 | 4 |
| Number of exon/gene | 1 | 1 | 5.2 | 5.7 |
| Number of gene & ψgene on the direct strand | 4 | 3 | 10 | 11 |
| Number of gene & ψgene on the complementary strand | 18 | 17 | 7 | 6 |
Percentages of exons and ψexons, introns and ψintrons, TE and SSR were computed according to the size of the nucleotide sequence. Percentage of intergenic sequences represents the rest of the sequence. In these statistics, the five and four Gag-Polymerase -encoding genes of B1 and B2 respectively, were not taken into account. ψ, pseudogene; TE, transposable element; SSR, simple sequence repeat; RGA, resistance gene analog.
Figure 1Dotplot analysis of the . The dotplot alignment between the B1 and B2 sequences reveals patterns of sequence colinearity and divergence. The X axis corresponds to the MbP026I06-MbP032N20 contig of 253 kb and the Y axis corresponds to 223 kb of BAC MbP036B13. Genes are represented as arrow boxes with the head indicating transcription direction, light grey arrow boxes represent RGAs and black boxes represent transposable elements. Localization of the CT repeat mMaCIR341 microsatellite with the corresponding number of repeats is indicated by arrows. Colinear patterns at the beginning and the end of the contig (highlighted on the top of the figure with a plain line) and complex repetitive pattern (dash line) corresponding to the RGA cluster are indicated.
Figure 2Genomic organization of the . M. balbisiana B1 and B2 haplotypes corresponding to BAC MbP032N20c and MbP036B13 respectively are represented. Predicted genes, pseudogenes and transposable elements are represented as plain arrowheads with the head indicating the direction of transcription. RGA08 genes coding for CC-NB-LRR proteins are indicated as green boxes with a corresponding letter. Genes tested for expression are labelled with a star. White arrows represent MTERF genes. Other predicted genes in pink arrows are numbered according to sequence annotation. Repetitive elements are represented as black arrows and numbered according to the list in Table 2. RE4 is a low complexity region with no clear structure. The 1 kb duplicated intergenic sequence is in orange boxes. Simple colinear pattern between haplotypes based on sequence similarity (blast2seq, E-value < = 1e-20) is highlighted for genes and intergenic sequences using light pink trapezoids in the flanking regions of the RGA clusters. Syntenic regions within the RGA clusters are highlighted using light green trapezoids. Identified unequal recombination events are highlighted with yellow trapezoids. The RE3 (Mooz) sequence inverted between haplotypes is indicated with twisted connections.
Type, localization and characteristics of repeated elements in the RGA08 locus of M. balbisiana
| Name | Type | ID | Location | Length | 5' LTR | 3'LTR | TSD | Divergence time | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (bp) | (bp) | length | TG | CA | length | TG | CA | (MY) | ||||
| RE1 | Copia-like element | MbP032N20c_te030 | 129477 | 2855 | - | - | - | - | - | - | - | - |
| RE1 | MbP036B13_te010 | 100745 | 2882 | - | - | - | - | - | - | - | - | |
| Clio | LARD | MbP032N20c_te010 | 28466 | 4441 | 382 | TG | CA | 382 | TG | CA | ATAC/ATAC | 0.294 |
| Clio | MbP032N20c_te040 | 148954 | 4441 | 382 | TG | CA | 382 | TG | CA | GGAG/GGAG | 1.189 | |
| Clio | MbP036B13_te020 | 120343 | 4441 | 382 | TG | CA | 382 | TG | CA | GGAG/- | ||
| Clio* | MbP036B13_te030 | 124784 | 1606 | - | - | - | - | - | - | - | ||
| Mooz | Copia-like element | MbP032N20c_te050 | 166534 | 3849 | 115 | - | - | 112 | - | - | - | - |
| Mooz | MbP032N20c_te060 | 225945 | 3346 | 231 | - | - | 447 | TG | CA | - | - | |
| Mooz | MbP036B13_te050 | 185390 | 3688 | 447 | TG | CA | 447 | TG | CA | CTTTG/CTTTC | 2.183 | |
| Mooz* | MbP036B13_te040 | 181339 | 2464 | 447 | TG | CA | - | - | - | CTTTG/- | ||
| RE5 | Copia-like element | MbP032N20c_te020 | 124026 | 4301 | 94 | TG | - | 171 | TG | CA | ACCAC/ACCAC | - |
LTR features (length, presence of TG and CA at their ends) and Target Site Duplication (TSD) were determined using LTR_FINDER followed by manual annotation. For complete elements with clearly identified TSD and with LTR length exceeding the empirical limit of 300 bp, divergence time was calculated based on the Kimura-2 parameter distance between LTRs with a substitution rate of 0.9 × 10-8 mutation per site per year. Stars indicate TE fragments. MY, million years.
Figure 3Phylogenetic relationships of . Different colored boxes indicate the RGA protein domains identified manually after a multiple RGA08 CDS alignment. Blue boxes represent the N-terminal coiled-coil region (CC). Green boxes represent the nucleotide-binding region and the ARC domain shared by APAF-1, R and CED-4 proteins (NB-ARC domain). Pink boxes represent the 15 leucine-rich repeats (LRR). Yellow boxes represent simple sequence repeats. Sites predicted to be under positive selection according to PAML analysis are indicated by stars. The sizes of boxes and gaps between domains are drawn according to scale.
Figure 4Unequal recombination breakpoints within the . (A) An intragenic unequal recombination event between RGA08X and RGA08W results in the actual structure of the RGA08R sequence. The RDP software detected the region of sequence identity between RGA08X and RGA08R and indicated a recombination breakpoint on RGA08X. Manual inspection of RGA08 sequence alignments indicated RGA08W and RGA08X as likely parental sequences. The percent nucleotide identity of RGA08X (red curve) and RGA08W (blue curve) is plotted along the sequence of RGA08R. A shift in sequence identity levels (black arrows) is observed indicating a putative recombination event. (B) Unequal recombination mediated by duplicated 1 kb intergenic sequences (orange boxes). Intergenic sequences between RGA08K - RGA08L and RGA08O-RGA08P in B1 (top) have recombined to produce the actual intergenic sequence structure between RGA08K - RGA08P in B2 (bottom). Percent of nucleotide identity between 1 kb intergenic sequences of B1 is plotted along the B2 sequence. High sequence identity is visible at the beginning of the RGA08K - RGA08L intergenic sequence (red curve) and at the end of the RGA08O-RGA08P intergenic sequence (blue curve). Black arrows indicate breakpoint position.