| Literature DB >> 29667925 |
Leonardo T Rosa1, Vicki Springthorpe2, Matheus E Bianconi3, Gavin H Thomas2, David J Kelly1.
Abstract
Lineage-specific expansion (LSE) of protein families is a widespread phenomenon in many eukaryotic genomes, but is generally more limited in bacterial genomes. Here, we report the presence of 434 genes encoding solute-binding proteins (SBPs) from the tripartite tricarboxylate transporter (TTT) family, within the 8.2 Mb genome of the α-proteobacterium Rhodoplanes sp. Z2-YC6860, a gene family over-representation of unprecedented abundance in prokaryotes. Representing over 6 % of the total number of coding sequences, the SBP genes are distributed across the whole genome but are found rarely in low-GC islands, where the gene density for this family is much lower. This observation, and the much higher sequence identity between the 434 Rhodoplanes TTT SBPs compared with the average identity between homologues from different species, is indicative of a key role for LSE in the expansion. The TTT SBP genes were found in the vicinity of genes encoding membrane components of transport systems from different families, as well as regulatory proteins such as histidine-kinases and transcription factors, indicating a broad range of functions around the sensing, response and transport of organic compounds. A smaller expansion of TTT SBPs is known in some species of the β-proteobacteria Bordetella and we observed similar expansions in other β-proteobacterial lineages, including members of the genus Comamonas and the industrial biotechnology organism Cupriavidus necator, indicating that strong environmental selection can drive SBP duplication and specialisation from multiple evolutionary starting points.Entities:
Keywords: Gene duplication; Solute transporter; gene over-representation; lineage specific expansion; periplasmic-binding protein
Mesh:
Substances:
Year: 2018 PMID: 29667925 PMCID: PMC5994714 DOI: 10.1099/mgen.0.000176
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Distribution of TTT SBPs in the genomes of bacteria. The outer circle represents the number of TTT SBPs present in each genome, using a log2 scale. The tree was inferred using 16S rRNA sequences retrieved from the genome of each organism, and aligned using MAFFT v7 [37]. A maximum-likelihood tree was inferred using RAxML v8.2.11 [38] under the GTRCAT model, with 100 bootstrap pseudoreplicates. Bootstrap support values are indicated on nodes of major lineages when higher than 50 % (inclusive; filled circles) or lower than 50 % (open circle). Major branches are coloured as indicated in the key. Non-coloured branches are minor lineages of Bacteria.
Fig. 2.Circular genome plot of the 8.193 Mb Rhodoplanes sp. Z2-YC6860 genome. The outer two tracks represent CDS on the forward strand (blue) and reverse strand (green), with black arrows indicating the location of TTT SBP genes. Orange blocks indicate genomic islands predicted by IslandViewer software [25]. The next inner circular plot represents percentage GC content calculated over a 10 kb sliding window with a range of 52.4 to 69.1% and mean of 63.5 %. The innermost circular plot represents GC skew calculated as (G−C)/(G+C) over a 10 kb sliding window. The characteristic GC skew reversal at the origin and terminus of replication is indicated by dashed lines.
Occurrence of TTT SBP genes as single genes or in larger arrays of genes, in the genome of Rhodoplanes sp. Z2-YC6860 and two closely related bacteria
| Array size (number of genes) | Frequency | ||
|---|---|---|---|
| 1 | 294 | 93 | 39 |
| 2 | 48 | 3 | 2 |
| 3 | 6 | – | – |
| 4 | 3 | – | – |
| 5 | 1 | – | – |
| 6 | – | – | – |
| 7 | – | – | – |
| 8 | – | – | – |
| 9 | 1 | – | – |
Fig. 3.Analysis of the relationships between the nine TTT family proteins located in the largest tandem array found in Rhodoplanes sp. Z2-YC6860. (a) Similarity matrix for amino acid sequences between proteins belonging to the nine-gene cluster. With the exception of RHPLAN_16830, which shared highest identity with RHPLAN_10480 (53.1 %), the remaining shared similarities were below 50 %. (b) Representation of highest identity for individual proteins in the cluster through blastp searches against 2323 bacterial genomes. The arrows represent the highest shared identity for each protein. Of the nine proteins in the array, only three members shared highest amino acid identity with other members of the same array, while four members shared highest identity with RHPLAN_29570, located elsewhere in the same genome. Two members were more similar to the TTT SBP ANW05692.1_3832 from Bradyrhizobium icense than to any gene inside the genome of Rhodoplanes sp. Z2-YC6860.