| Literature DB >> 19821988 |
Derek M Bickhart1, Johann P Gogarten, Pascal Lapierre, Louis S Tisa, Philippe Normand, David R Benson.
Abstract
BACKGROUND: Genome analysis of three Frankia sp. strains has revealed a high number of transposable elements in two of the strains. Twelve out of the 20 major families of bacterial Insertion Sequence (IS) elements are represented in the 148 annotated transposases of Frankia strain HFPCcI3 (CcI3) comprising 3% of its total coding sequences (CDS). EAN1pec (EAN) has 183 transposase ORFs from 13 IS families comprising 2.2% of its CDS. Strain ACN14a (ACN) differs significantly from the other strains with only 33 transposase ORFs (0.5% of the total CDS) from 9 IS families.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19821988 PMCID: PMC2770080 DOI: 10.1186/1471-2164-10-468
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
IS family diversity in three Frankia strains
| IS3 | 2 | 11 | - | - | - | - |
| IS4 | 7 | 10 | - | 63 | 1 | 25 |
| IS5 | - | - | - | 2 | - | - |
| IS6 | - | 1 | - | 1 | - | 1 |
| IS30 | - | - | - | 2 | - | 1 |
| IS66 | - | - | - | 16 | - | 1 |
| IS110 | - | 9 | - | 5 | 1 | 7 |
| IS200 | - | 2 | - | 8 | - | - |
| IS605 | - | 9 | - | 6 | - | 13 |
| IS630 | - | - | - | 4 | 1 | 15 |
| ISL3 | 1 | - | - | - | - | 5 |
| Mutator | - | 5 | - | - | - | - |
| Tn3 | - | - | - | 1 | 2 | 3 |
| Unclassified Transposase | 4 | 16 | 5 | 35 | 7 | 32 |
| Cutoff4 | 15 | 6 | 3 | - | - | - |
| Total | 29 | 69 | 8 | 143 | 12 | 103 |
1 Determined from annotationing transposase ORFs for EAN and CcI3. ACN transposases were reannotated after the BLAST search to show IS family diversity of transposase ORFs, as all transposases were originally annotated as "putative." Transposases in EAN and CcI3 were not reannotated.
2 Transposase ORFs that hit other ORFs in EAN and CcI3 but not in ACN.
3 Transposase ORFs that hit other ORFs in ACN and one of the other two strains, but not in all three Frankia strains.
4 The number of transposase ORFs in each strain that did not hit any sequence in the nr database with an E-value smaller than 10-15. In all cases this was due to a sequence size of less than 80 amino acids.
Figure 1Transposase ORF distribution in . Putatively shared ORFs were found using BLAST searches of each transposase ORF against the non-redundant (nr) database. Numbers on the arrow heads closest to the strain buttons indicate the number of transposase ORFs that it had in each homologue category. Double headed arrows represent ORFs that had BLAST hits between two strains of Frankia sp. The three innermost arrows pointing away from the button labelled "all" indicate the number of transposase ORFs from each strain that hit ORFs in both of the other strains. The three outermost arrows represent the number of ORFs in each strain that only had BLAST hits in other species. The numbers inside the buttons for each strain indicate the transposase ORFs that had no BLAST hits above an E-value of 10-15.
Figure 2Percent of shared IS content in . Shared transposase ORFs as a percent of the total number of transposases annotated in Frankia sp (364 ORFs). The majority of ORFs (~67.3%) are shared by strains CcI3 and EAN. Only 29.2% of all transposase ORFs are found in only one strain, with 19.1% unique to EAN alone. This distribution suggests that the majority of transposase ORFs have been maintained by and have proliferated within the Frankia strains despite geographic isolation.
Results of PSI-TBLASTN for top 5 transposase families
| Initial number 1 | 13 | 102 | 98 |
| Hits Identified | 17 | 160 | 170 |
| True Positives | 17 | 154 | 165 |
| Intergenic remnants (by PSI-BLAST) 2 | 2 | 21 | 32 |
| Reannotated 3 | 2 | 32 | 45 |
| False Positives 4 | 0 | 6 | 5 |
| False Negatives | 0 | 0 | 0 |
1 Number of original annotated transposase ORFs from the five major IS families (IS4, IS110, IS66, IS630 and IS605) that were used in the PSI-BLAST.
2 Hits that involved a majority of nucleotides that were in between annotated ORFs.
3 ORFs that were not initially annotated as transposases of the PSSM IS family but were hits of the PSI-BLAST search. This included ORFs that were annotated as putative transposases.
4 Hits that were lower than 40% ID and/or less than 50 bp in length.
Figure 3CcI3 gene deletions and IS clustering. (a) Genes that were deleted in strain CcI3 but were present in both EAN and ACN were plotted using a 250 kb sliding window (dark green line). (b) Transposase ORF positions (including those identified by PSI-BLAST analysis) were plotted using a 250 kb sliding window for each strain (dark blue line). Regions of each genome that corresponded to significant clusters of gene deletion in strain CcI3 are highlighted and lettered (red boxes). Confidence intervals determined from calculation of the probability mass function are listed on the right of the graphs, with points greater than 95% confidence in light gray boxes and points greater than 99% confidence in dark gray boxes.
Figure 4Neighbourhood analysis of highlighted windows in Figure 3. Each ORF is depicted as a box. Boxes on top are transcribed from the forward strand (+). Those on the bottom are transcribed from the reverse strand (-). ORFs common to all three strains are highlighted in green. Transposases are in red, and ORFs that are only common to two strains are in blue. ORFs that were not present in the other genomes in each window are in white boxes. These ORFs were either present in the other genomes in different loci, or were unique to strains ACN and EAN and were not present in strain CcI3. Dashed lines indicate points of reference between the three genomes.
Number of Transposases in Breaks in Synteny
| EAN | 5964668 (66%) | 223 (86%) | 1.28 × 10-3 | 262 (306) * |
| CcI3 | 2418797 (45%) | 134 (67%) | 2.18 × 10-6 | 18 (28) |
1 The number of nucleotides in regions of the strain that did not show MAUVE alignment synteny with strain ACN in a continuous 10 kb+ stretch. Numbers in parentheses indicate the percentage of the genome to which these stretches correspond.
2 The number of transposase ORFs in the breaks in synteny. Numbers in parentheses indicate the percentage of all transposases in the strain that were found, including those identified in the PSI-TBLASTN search.
3 P value derived from a Chi squared test of the number of transposases in the region against an expected number.
4 The number of the top 3 duplicated gene family ORFs in each respective strain that were not transposases in this region. Numbers in parentheses are the total number of duplicated genes of the three top duplicated gene families in that strain. (*) indicates a p value less than 0.005.
Figure 5. MAUVE backbone files were imaged using the Genvision™ Adobe Illustrator© plug-in to show distinct LCR events. The top of each genome's histogram represents syntenic LCB's that are in the same order with respect to the ACN genome. The middle layer represents LCB's that are inverted with respect to ACN. The bottom histogram shows positioning of originally annotated (black) and PSI-BLAST determined (red) transposase ORFs in each genome.