| Literature DB >> 32528528 |
Jianglin Zhou1, Hongguang Ren1, Mingda Hu1, Jing Zhou1, Beiping Li1, Na Kong1,2, Qi Zhang1, Yuan Jin1, Long Liang1, Junjie Yue1.
Abstract
Recombination and positive selection are two key factors that play a vital role in pathogenic microorganisms' population adaptation and diversification. The Burkholderia cepacia complex (Bcc) represents bacterial species with high similarity, which can cause severe infections among cases suffering from the chronic granulomatous disorder and cystic fibrosis (CF). At present, no genome-wide study has been carried out focusing on investigating the core genome of Bcc associated with the two evolutionary forces. The general characteristics of the core genome of Bcc species remain scarce as well. In this study, we explored the core orthologous genes of 116 Bcc strains using comparative genomic analysis and studied the two adaptive evolutionary forces: recombination and positive selection. We estimated 1005 orthogroups consisting entirely of single copy genes. These single copy orthologous genes in some Cluster of Orthologous Groups (COG) categories showed significant differences in the comparison of several evolutionary properties, and the encoding proteins were relatively simple and compact. Our findings showed that 5.8% of the core orthologous genes strongly supported recombination; in the meantime, 1.1% supported positive selection. We found that genes involved in protein synthesis as well as material transport and metabolism are favored by selection pressure. More importantly, homologous recombination contributed more genetic variation to a large number of genes and largely maintained the genetic cohesion in Bcc. This high level of recombination between Bcc species blurs their taxonomic boundaries, which leads Bcc species to be difficult or impossible to distinguish phenotypically and genotypically.Entities:
Keywords: Burkholderia cepacia complex; COG; core genome; positive selection; recombination
Year: 2020 PMID: 32528528 PMCID: PMC7253759 DOI: 10.3389/fgene.2020.00506
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Pan -genome analysis of 116 Bcc whole-genome sequences. (A) Bar plot showing the frequencies of orthologous clusters. (B) Stacked bar plot of percentage of orthogroups in core, soft core, shell, and cloud genomes. (C) Gene accumulation curves of the Bcc pan- (blue) and core-genome (green). The estimation was made by including genomes one by one. (D) Quantitative relationship between new genes and sequenced genomes.
FIGURE 2The species tree of Bcc against the genes presence/absence matrix. The species tree was constructed using STAG algorithm and rooted by STRIDE algorithm in OrthoFinder (Emms and Kelly, 2019). The heatmap on the right showed the presence (dark green) or absence (light green) of all 17,740 orthogroups. Each row in the matrix corresponds to a branch on the tree (i.e., one genome) and each column represented an orthogroup.
Relationships of COGs in single copy orthogroups with the descriptive variables.
| COG | Function | Number of genes | Bonferroni-corrected | |||||||
| analyzed | between genes in a given COG and the others | |||||||||
| More informative sites | Fewer informative sites | Higher | Lower | Higher | Lower | Higher codon bias | Lower codon bias | |||
| E | Amino acid metabolism and transport | 87 | <0.001 | 0.024 | <0.001 | |||||
| F | Nucleotide metabolism and transport | 55 | 0.014 | |||||||
| C | Energy production and conversion | 68 | 0.006 | 0.047 | 0.005 | |||||
| M | Cell wall/membrane/envelope biogenesis | 51 | <0.001 | |||||||
| J | Translation, including ribosomal structure and biogenesis | 97 | <0.001 | <0.001 | <0.001 | |||||
| K | Transcription | 82 | 0.001 | <0.001 | ||||||
| S | Function unknown | 194 | 0.01 | <0.001 | <0.001 | <0.001 | ||||
| – | Not in COGs | 48 | <0.001 | <0.001 | <0.001 | |||||
FIGURE 3The protein composition and function distribution of single copy genes in Bcc. (A) The domain content repertoires and domain organization repertoires distribution of all 1005 orthologous families. The domain content repertoire of an ortholog is defined as the set of domains that occur in the proteins of the orthologous family; domain organization represents the sequential protein domain order; for an ortholog, domain organization repertoire is referred to as a group of domain organizations for each encoded protein within the orthologous family. Note that one ortholog including four different domain organizations was not shown in the figure. (B) The COG functional distribution of all single copy genes, evidently recombinant genes, and genes under positive selection. The number of each typical ortholog is given in parentheses. The functional classes are colored as listed in the bottom. Each category is graphed as a percentage of the total number of othologs in the corresponding gene set.
FIGURE 4Those homologously recombined genes are distributed across every COG evenly. The X-coordinates stand for the diverse functional categories of COG, while the Y-coordinates stand for gene proportion within every functional category. Meanwhile, the blue and orange bars represent proportion of single copy genes of each COG, and that of recombined genes (FDR < 10%), separately. Meanwhile, those COG categories are shown below: D, chromosome partitioning, cell division, cell cycle control; M, cell envelope/membrane/wall biogenesis; N, cell motility; O, modification at post-transcription level, chaperones and protein turnover; T, mechanisms of signal transduction; U, vesicular transport, secretion and intracellular trafficking; V, mechanisms of defense; B, chromatin dynamics and structure; J, translation, such as biogenesis and ribosomal structure; K, transcription; L, repair, recombination and replication; C, energy conversion and production; E, amino acid transport and metabolism; F, nucleotide transport and metabolism; G, carbohydrate metabolism and transport; H, coenzyme metabolism and transport; I, lipid metabolism and transport; P, inorganic ion metabolism and transport; Q, catabolism, transport, and biosynthesis of secondary metabolites; S, unknown function; –, unknown proteins not collected in COG categories.
FIGURE 5Partition of recombination events (“inner” fragments) detected by GENECONV. The purple ellipse represents the bacterial species in Bcc, and the area of ellipse corresponds to its genome numbers. A line between two ellipses means recombination events between the strains of the corresponding species; a loop line means the recombination events between strains of the same species. The number of recombination events is shown by the width and the color of the line. The figure was visualized and colored by Cytoscape version 3.7.1.
Genes under positive selection.
| Cluster ID | Gene | COG | Function | Positively selected sites (PBEB ≥ 0.95) | ω(M2a) | Domain (Pfam) | Subcellular localization | Feature | |
| OG0001459 | H | tRNA 2-selenouridine synthase | 363 | 7.966 | 4.77E-06 | – | Cytoplasm | Unknown | |
| OG0001809 | – | S | Conserved hypothetical protein | 91 | 4.522 | 0.00019 | DUF2889 (PF11136) | Cytoplasm | Unknown |
| OG0001972 | J | Aspartyl/glutamyl-tRNA_Asn/Gln_ amidotransferase subunit B | 119 | 4.479 | 0.00097 | GatB/GatE catalytic domain (PF02934) | Cytoplasm | Unknown | |
| OG0001276 | S | 4-hydroxybenzoyl-CoA thioesterase | 155 | 5.109 | 0.00097 | – | Cytoplasm | Unknown | |
| OG0002150 | – | P | Flavin-containing monooxygenase FMO | 210 | 4.047 | 0.00227 | Flavin-binding monooxygenase-like (PF00743) | Cytoplasm | Unknown |
| OG0001150 | V | Transport permease protein | 31, 35 | 4.289 | 0.00697 | ABC2_membrane (PF01061) | Plasma membrane | Transmembrane alpha helix | |
| OG0001116 | J | Ribosomal protein L5 | 94 | 6.183 | 0.03080 | Ribosomal_L5_C (PF00673) | Cytoplasm | Unknown | |
| OG0001513 | – | K | MerR family regulatory protein | 7, 137 | 3.437 | 0.04362 | – | Cytoplasm | Unknown |
| OG0001392 | E | leucine efflux protein | 108, 125 | 2.681 | 0.04704 | LysE (PF01810) | Plasma membrane | Transmembrane alpha helix | |
| OG0001464 | P | Sulfate ABC transporter, inner membrane subunit CysW | 301, 311 | 3.96 | 0.06831 | – | Plasma membrane | Transmembrane alpha helix | |
| OG0002263 | – | Q | Dopa 45-dioxygenase | 109 | 3.672 | 0.06880 | DOPA_dioxygen (PF08883) | Cytoplasm | Unknown |
FIGURE 6Three-dimensional structural models of material transport proteins yadH (A) and leuE/lysE (B). Orange spheres stand for amino acid residues that are subject to strong positive selection (BEB posterior probability ≥95%).
FIGURE 7Three-dimensional structural models for one transcriptional regulator in the MerR family. The orange spheres represent the positively selected amino acid residues (BEB posterior probability ≥95%).