| Literature DB >> 25885210 |
Theoden Vigil-Stenman1, John Larsson2, Johan A A Nylander3, Birgitta Bergman4.
Abstract
BACKGROUND: Insertion sequences (ISs) are approximately 1 kbp long "jumping" genes found in prokaryotes. ISs encode the protein Transposase, which facilitates the excision and reinsertion of ISs in genomes, making these sequences a type of class I ("cut-and-paste") Mobile Genetic Elements. ISs are proposed to be involved in the reductive evolution of symbiotic prokaryotes. Our previous sequencing of the genome of the cyanobacterium 'Nostoc azollae' 0708, living in a tight perpetual symbiotic association with a plant (the water fern Azolla), revealed the presence of an eroding genome, with a high number of insertion sequences (ISs) together with an unprecedented large proportion of pseudogenes. To investigate the role of ISs in the reductive evolution of 'Nostoc azollae' 0708, and potentially in the formation of pseudogenes, a bioinformatic investigation of the IS identities and positions in 47 cyanobacterial genomes was conducted. To widen the scope, the IS contents were analysed qualitatively and quantitatively in 20 other genomes representing both free-living and symbiotic bacteria.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25885210 PMCID: PMC4369082 DOI: 10.1186/s12864-015-1386-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1, cyanobacterial symbiont (NoAz) and the NoAz genome. A. The water-fern Azolla filiculoides growing in the greenhouse. B. Close-up of Azolla leaves. Each leaf contains a specialized cavity (marked with dotted circle) where a colony of ‘Nostoc azollae’ 0708 resides. C. ‘Nostoc azollae’ 0708 filaments. Arrows indicate the differentiated nitrogen-fixing heterocysts. Bar is 10 μm. D. Repeat sequences and pseudogenes in the NoAz genome (5.35 Mbps). Red ticks indicate positions of pseudogenes among the black-ticked non-affected functional genes, while green ticks indicate the positions of IS elements within the genome.
Organisms included in the study
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| Cyanobacteria | |||||||||||
|
| Acam | 8.36 | 0.47 | Marine | unicellular | Free-living | 588 | 553 | 32 | 70 | 4 |
|
| Anav | 7.11 | 0.41 | Terrestrial | filamentous, heterocystous | Free-living | 329 | 313 | 40 | 46 | 6 |
|
| Artm | 6.00 | 0.45 | High-salt lakes | filamentous, non-heterocystous | Free-living | 1208 | 906 | 0 | 201 | 0 |
|
| Artp | 5.00 | 0.44 | High-salt lakes | filamentous, non-heterocystous | Free-living | 206 | 141 | 0 | 41 | 0 |
|
| Crow | 6.24 | 0.37 | Marine | unicellular | Free-living | 1611 | 1494 | 0 | 258 | 0 |
| cyanobacterium UCYN-A | Ucyn | 1.44 | 0.31 | Marine, Host: prymnesiophytes | unicellular | obligate unclassified symbiont | 1 | 1 | 0 | 1 | 0 |
|
| Cya8802 | 4.80 | 0.40 | Marine | unicellular | Free-living | 203 | 176 | 204 | 42 | 43 |
|
| Cya | 6.095 | 0.41 | Marine | unicellular | Free-living | 246 | 205 | 167 | 40 | 27 |
|
| Cylr | 3.88 | 0.40 | Freshwater lakes | filamentous, heterocystous | Free-living | 517 | 303 | 1 | 133 | 0 |
|
| Glov | 4.66 | 0.62 | Terrestrial | unicellular | Free-living | 143 | 82 | 0 | 31 | 0 |
|
| Lyns | 7.04 | 0.41 | Marine, Freshwater | filamentous, non-heteroc | Free-living | 443 | 339 | 0 | 63 | 0 |
|
| Mica | 5.84 | 0.42 | Freshwater lake | unicellular | Free-living | 1664 | 1563 | 0 | 285 | 0 |
| Nodularia spumigena CCY9414 | Nods | 5.32 | 0.41 | Brackish water | filamentous, heterocystous | Free-living | 352 | 294 | 0 | 66 | 0 |
| ‘ | NoAz | 5.49 | 0.38 | Host: Azolla ferns | filamentous, heterocystous | obligate extracellular symbiont | 970 | 907 | 1670 | 177 | 304 |
|
| Nosp | 9.06 | 0.41 | Terrestrial | filamentous, heterocystous | facultative intracellular | 399 | 366 | 371 | 44 | 41 |
|
| Noss | 7.21 | 0.41 | Terrestrial | filamentous, heterocystous | Free-living | 311 | 278 | 0 | 43 | 0 |
|
| Prom | 1.8612 | 0.36 | Marine | unicellular | Free-living | 18 | 12 | 15 | 10 | 8 |
|
| Rapb | 3.19 | 0.40 | Freshwater lake | filamentous,non-heterocystous | Free-living | 109 | 80 | 0 | 34 | 0 |
|
| Syne | 2.72 | 0.55 | Freshwater lake | unicellular | Free-living | 8.00 | 7.00 | 2 | 3 | 1 |
|
| SynJA | 2.992 | 0.59 | Hot spring | unicellular | Free-living | 331 | 330 | 51 | 111 | 17 |
|
| Syn | 2.57 | 0.56 | Marine | unicellular | Free-living | 6 | 5 | 10 | 2 | 4 |
|
| Scys6803 | 3.95 | 0.47 | Freshwater lake | unicellular | Free-living | 161 | 151 | 0 | 41 | 0 |
|
| Thee | 2.59 | 0.54 | Hot spring | unicellular | Free-living | 128 | 99 | 0 | 49 | 0 |
|
| Trie | 7.75 | 0.34 | Marine | filamentous,non-heterocystous | Free-living | 1311 | 1144 | 625 | 169 | 81 |
| Non cyanobacterial symbionts | |||||||||||
|
| Borp | 4.09 | 0.68 | Host: Human | unicellular | obligate intracellular | 388 | 380 | 358 | 95 | 88 |
|
| Cana | 1.88 | 0.35 | Host: Acanthamoeba | unicellular | obligate intracellular | 315 | 273 | 222 | 168 | 118 |
|
| Fra | 7.005 | 0.71 | Host: Plants | filamentous | facultative intracellular | 139 | 130 | 53 | 20 | 8 |
| Onion yellows phytoplasma OY-M | PhyOY | 0.85 | 0.28 | Host: Plants | unicellular | obligate intracellular | 113 | 29 | 0 | 133 | 0 |
|
| Orit | 2.13 | 0.31 | Host: Human | unicellular | obligate intracellular | 1122 | 703 | 997 | 527 | 468 |
|
| Shif | 4.60 | 0.51 | Host: Human | unicellular | facultative intracellular | 466 | 450 | 378 | 101 | 82 |
|
| Myc | 1.182 | 0.24 | Host: Animals | unicellular | obligate intracellular | 77 | 75 | 1 | 65 | 1 |
|
| Yer | 4.82 | 0.48 | Host: Animals | unicellular | facultative intracellular | 306 | 294 | 155 | 63 | 32 |
|
| Wol | 1.386 | 0.35 | Host: insects and nematodes | unicellular | obligate intracellular | 361 | 161 | 67 | 260 | 48 |
aSeveral similar bacteria are included in entries with superscripts. Genome sizes are averages, with the number of species indicated by the superscript.
Table of genomes included in the study. Very similar organisms, e.g. several genomes of Cyanothece, have been included as a single row, and the numbers therein represents averages of the data found. The numbers of organisms included in such groups are indicated with a superscript in the Size column of the table. Total repeat hits: The number of hits received to repeated sequences of any kind. Repeated genes with known functions that are not mobile DNA are not included in this count. Total IS hits: number of hits received to repeated sequences that are Insertion Sequences. Annotated pseudogenes: number of pseudogenes in the genome, according to the integrated microbial genomes and metagenomes database (IMG Data Management and Analysis Systems). The following two columns show number of repeated sequences and pseudogenes per Mbps of genome.
Figure 2Genome size and repeat density in investigated genomes. Genomes are ordered by amount of IS base pairs. IS content is displayed in red, while the rest of the bars indicate genome size and symbiotic state.
Figure 3Repeat classifications and fragmentation of repeat sequences in bacteria. Panel A: Number of repeat sequences and copies found in the 67 genomes of the investigated bacteria. Notably, the majority of the discovered repeats are IS elements, while phages and unidentified repeats were low. Number of repeat sequences refers to the number of different repeat sequences found for each class of repeats. Number of copies found refers to the total number of repeat copies found for each class of repeat. ISfinder IS: Repeats listed in the ISfinder database. Putative IS: Repeats with blast similarities to Insertion Sequences. Phages: Repeats with blast similarities to phage genes. Putative MGE: Repeats with no decisive data on kind of replication mechanism, but which are nevertheless suspected to be self-replicating and/or mobile when their similarities to other kinds of Mobile Genetic Elements are considered. Highly repeated: Repeats with low similarities to known proteins, but which appear in high copy numbers in the investigated organisms. DNA interacting: Repeats which are similar to proteins interacting with DNA (e.g. nucleases, helicases), making mobility and/or self-replication feasible. Panel B: Repeat fragment size distribution. Distribution of fragment lengths in NoAz compared to the average distribution in all other 66 bacteria investigated. X-axis: Fragment size in per cent of full size. Y-axis: Frequency of fragment size, expressed as percentage of total number of copies.
Figure 4Genomic areas with non-random IS density in the cyanobacterial symbiont NoAz. Locations of ISs >70% target length for selected repeats in NoAz. Red ticks indicate IS locations. The area of interest is marked with green borders. In the middle of each circle is listed the probability for the IS elements of interest to come in close proximity by chance alone, and the data used to compute this probability. For NoAz_R_21, a circular tree based on uncorrected p-distance (see Methods) has been superimposed on the genome, showing that elements that are similar in nucleotide sequence are located close to each other. NoAz_R_21 elements were chosen in this example because of the simplicity of display, other IS elements make up similar albeit more tangled trees. Numbers at tick marks show genomic position in Mbps.
Figure 5Box plot of uncorrected p-distance by separation distance in NoAz. Box plot graph comparing uncorrected p-distance between pairs of IS elements at different distances (in bp) from each other. Boxes show inner quartiles.
Figure 6IS elements in proximity to pseudogenes in NoAz. IS elements in proximity to pseudogenes in the NoAz genome. The selected examples show both IS elements and regular genes in close proximity to IS elements being converted into pseudogenes. Green arrows represent IS elements and orange arrows regular non-affected genes. Genes annotated as pseudogenes are illustrated by dashed borders. Frame shifts, where known, are indicated by black vertical bars. HP = Hypothetical Protein. PLP = Phycobilisome Linker Polypeptide. Ruler indicates position on the main chromosome. A: The IS element Noaz_R_5 inserted into Noaz_R_9, generating flanking pseudogenes. Further down are two full-length NoAz IS elements, NoAz_R_22 and NoAz_R_10, with flanking pseudogenes. B: NoAz_R_12 surrounded by pseudogenes with frame shifts. Most genes are hypothetical or other insertion sequences. C: The IS element NoAz_R_27 may have fragmented surrounding genes and pseudogenized a glutaminase gene. D: NoAz_R_23 surrounded by pseudogenes. E: NoAz_R_9, itself probably pseudogenized, may have generated the pseudogenes on the left. The psychobilisome linker polypeptide proteins to the right of NoAz_R_9 are probably critical to NoAz fulfilling its symbiotic role, and therefore intact.
Figure 7Phylogeny and repeat density in genomes of cyanobacteria. Phylogenetic tree depicting the two major cyanobacterial clades, with Clade representing cyanobacteria with primarily larger genomes and Clade 2 unicellular cyanobacteria with smaller genomes (after [12]. Morphology and symbiotic state are indicated by coloured squares. Open squares and “ND” in “% occupied” indicate that this bacterium was not included in the study. Genome size (light grey bars) and portion occupied by repeats (red) are displayed on the right.
Figure 8Fragmentations of repeats in IS rich bacteria. Bars represent the fraction of repeats copies found for a certain size category. Light green graphs represent 16 cyanobacteria, with NoAz given in pink and light yellow bars represent the other 20 investigated bacteria. The total of the bar heights for each graph is 1. X-axis: Fragment size in per cent of full size. Y-axis: proportion of total repeat copies that belong to a certain size category.
Figure 9Pseudogene enrichment with increasing distance from IS elements in the symbiotic cyanobacterium NoAz. X-axis: Distance from gene or IS elements within which all pseudogenes are counted. Y-axis: Pseudogene enrichment, i.e. (average number of pseudogenes within distance to IS elements)/(average number of pseudogenes within distance to regular genes). Filled circles indicate that the difference in pseudogene enrichment is statistically significant (p < 0.05), empty circles indicate that the difference is not significant.
Figure 10Pseudogene enrichment with increasing distance from IS elements in genomes of bacteria with different life styles. Heat map depicting pseudogene enrichment at seven distances from a gene or IS element in symbiotic and free-living cyanobacteria and in obligate or facultative intracellular bacteria. Pseudogene enrichment is computed as (avg. number of pseudogenes within [distance] of IS element)/(avg. number of pseudogenes within [distance] of annotated gene). Increasing red colours indicate a higher quotient. Blurred squared indicate that the difference in pseudogene enrichment between IS element and annotated genes is not statistically significant (p ≥ 0.05). Organisms with less than two significant points were omitted, as were those with no annotated pseudogenes.