| Literature DB >> 31273387 |
Karel Kopejtka1,2, Yan Lin3,4, Markéta Jakubovičová5, Michal Koblížek1,2, Jürgen Tomasch6.
Abstract
In Bacteria, chromosome replication starts at a single origin of replication and proceeds on both replichores. Due to its asymmetric nature, replication influences chromosome structure and gene organization, mutation rate, and expression. To date, little is known about the distribution of highly conserved genes over the bacterial chromosome. Here, we used a set of 101 fully sequenced Rhodobacteraceae representatives to analyze the relationship between conservation of genes within this family and their distance from the origin of replication. Twenty-two of the analyzed species had core genes clustered significantly closer to the origin of replication with representatives of the genus Celeribacter being the most apparent example. Interestingly, there were also eight species with the opposite organization. In particular, Rhodobaca barguzinensis and Loktanella vestfoldensis showed a significant increase of core genes with distance from the origin of replication. The uneven distribution of low-conserved regions is in particular pronounced for genomes in which the halves of one replichore differ in their conserved gene content. Phage integration and horizontal gene transfer partially explain the scattered nature of Rhodobacteraceae genomes. Our findings lay the foundation for a better understanding of bacterial genome evolution and the role of replication therein.Entities:
Keywords: zzm321990 Rhodobacteraceaezzm321990 ; genome architecture; genome evolution; origin of replication
Mesh:
Substances:
Year: 2019 PMID: 31273387 PMCID: PMC6699656 DOI: 10.1093/gbe/evz138
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Comparison of phylogenomic and 16S rRNA trees. Both trees comprise the same set of 109 Rhodobacteraceae strains. Pannonibacter phragmitetus 31801, Labrenzia sp. VG12, Labrenzia sp. CP4, and Labrenzia aggregata RMAR6-6 were used to root the trees as outgroup species. Scale bars represent changes per position. Bootstrap values >50% are shown. Bold vertical bars refer to different clustering patterns of the Rhodovulum spp. and RR (Rhodobacter-Rhodobaca) group inside both trees. *Collapsed Phaeobacter (P.) branches involve species Phaeobacter gallaciensis (strains DSM 26640, P11, P63, P73, P75, P128, and P129), Phaeobacter inhibens (strains 2.10, DOK1-1, DSM 17395, P10, P24, P30, P48, P51, P54, P57, P59, P66, P70, P72, P74, P78, P80, P83, P88, and P92), Phaeobacter piscinae (strains P13, P14, P18, P23, P36, P42, and P71), and Phaeobacter porticola P97; **Collapsed Ketogulonicigenium vulgare branches involve strains Hbe602, SKV, SPU B805, WSH-001, and Y25; ***Collapsed Rhodobacter sphaeroides branches involve strains ATCC 17025, ATCC 17029, MBTLJ-8, MBTLJ-13, MBTLJ-20, and KD131 in both trees with additional strain org2181 in the 16S rRNA tree. Maximum-likelihood (ML) tree (left panel) based on concatenated alignments of amino acid sequences of the 85 highly conserved core-genome proteins (27,668 common amino acid positions). Amino acid sequences were identified using Proteinortho with cut-off criteria of e-value ≤1e-10, sequence identity ≥ 60%, and sequence coverage ≥ 80%. The ML tree was calculated with 100 bootstrap replicates. 16S rRNA phylogenetic tree (right panel). Nucleotide sequences were aligned using ClustalX version 2.1 resulting in alignment with 1,260 common nucleotide positions after applying G-blocks. The phylogenetic tree was inferred using the ML algorithm with the GTR nucleotide substitution model and 1,000 bootstrap replicates. When possible, the strains were listed in the same order as in the phylogenomic tree.
Characteristics of Pan-Genome Data Sets Used in This Study
| Data Set | e | Minimum Sequence Coverage | Minimum Identity | Number of Protein Families | Core Protein Families (including paralogs) | Soft-Core | Core Protein Families (no paralogs) | Core Protein Families (no paralogs and connectivity >0.9) |
|---|---|---|---|---|---|---|---|---|
| pan60 | 10−10 | 80 | 60 | 37,326 | 161 | 499 | 141 | 85 |
| pan30 | 10−10 | 70 | 30 | 25,143 | 464 | 911 | 411 | 352 |
| pan15 | 10−05 | 70 | 15 | 24,317 | 479 | 936 | 422 | 362 |
aSoft-Core is defined as protein families found in 95% of the strains (104 out of 109).
. 2.—Gradient in number of conserved genes with increasing distance from oriC. Analysis is based on the pan15 data set. A linear model was fitted for the average ortholog score within sliding windows of 20 genes in relation to the midpoint distance of the sliding window to oriC for each strain. (A) Quantil–quantil plot comparing the slope values extracted from the linear model of each genome to a theoretical normal distribution. Increasing slope values reflect the increase in ortholog score with increasing distance from oriC. Deviations from the normal distribution are indicated by increasing distance from the sloped blue dashed line. The horizontal blue dashed line highlights the coordinate on the y axis where the slope value is equal to 0. Red dots represent strains with slope values significantly different from 0 (P < 0.05). Names of the strains with the highest negative and positive slope values are shown. These strains represent groups with different genome architecture. (B) Average ortholog score compared with distance from oriC for Rhodobaca barguzinensis (upper panel) and Celeribacter manganoxidans (lower panel). The linear function (red line) fitted to the data showed a significant increase (upper panel) or decline (lower panel) in average ortholog score with increasing relative distance from oriC. Results for all three data sets (pan15, pan30, pan60) are shown in supplementary figures S2–S5, Supplementary Material online.
. 3.—Chromosome plots of four strains representing two different kinds of chromosome architecture in Rhodobacteraceae. (A) Two representatives from the group of strains for which the ortholog score increases with the distance from the origin of replication. (B) Two representatives from the major group of strains for which the ortholog score decreases with the distance from the origin of replication. The outer to inner rings represent: scale of genome size in Mb and position of oriC; position of ORFs encoded on the plus strand; position of ORFs encoded on the minus strand; groups of HT genes as defined in the graphical legend below; position of core genes with orthologs in all 108 Rhodobacteraceae strains; barchart displaying the ortholog score of each representative’s genes; GC-skew; polar plot showing the average ortholog score in each of eight segments. Polar plot in the middle: average ortholog score in each segment calculated as an average for all strains; the darker the shade of blue the higher the number. See supplementary figure S6, Supplementary Material online, for Tukey’s HSD test for the eight segments. Orthologs were identified using Proteinortho with cut-off criteria of e-value ≤1e-05, sequence identity ≥ 15%, and sequence coverage ≥ 70% (pan15 data set).
. 4.—Distribution of HT genes along the chromosome in 101 Rhodobacteraceae. (A) Mean numbers of phage regions identified by Phaster (phages, left panel), Genomic Islands identified by AlienHunter (AH, middle panel), and IslandViewer (IV, right panel) were calculated for each third of the chromosome. (B) Proportion of DNA found in phages or genomic islands, panel order as in (A). The orange horizontal lines represent median values. ANOVA was used to test for significant differences between the three parts of the chromosome. Asterisks indicate significant differences between comparisons identified using Tukey’s HSD test (*P < 0.05, **P < 0.01).