| Literature DB >> 27151933 |
Anukriti Sharma1, Jack A Gilbert2,3,4, Rup Lal1.
Abstract
Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effect across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM's genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27151933 PMCID: PMC4858710 DOI: 10.1038/srep25527
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Phylogenomic analysis and comparative functional potential of Cellulosimicrobium strains.
(a) Rooted Maximum likelihood tree based on Jukes-Cantor model for family Promicromonosporacae using 31 16S rRNA gene sequences with Cellulomonas aerilata 5420S-23 as outgroup, (b) Rooted tree based on 31 single copy genes from 7 whole genomes using Cellulosmonas flavigena DSM 20109 as outgroup. All the trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The percentage (> 70%) of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches. (c) Heatmap with column dendrogram showing top 50 metabolic pathways reconstructed between three Cellulosimicrobium genomes i.e. strains MM, LMG16121, and J36. Three strains were clustered based on functional pathways using Manhattan distance metric, top 50 pathways with standard deviation 0.4 and having at least 0.8% of the total abundance were selected. Colour scale is representing the relative abundance of each functional pathway.
Figure 2Scatter plot analysis showing coupling between G+C content, F and dN/dS across Cellulosimicrobium genomes.
(a) Pearson Product-Moment Correlation between %G+ C and F with labeled R2 and P-value for Cellulosimicrobium sp. strain MM, Cellulosimicrobium cellulans LMG16121, and Cellulosimicrobium cellulans J36, based on 791 common orthologues between all three genomes. (b) dN/dS values for orthologous proteins in independent pairs of strains C. cellulans J36, C. cellulans LMG16121, and Cellulosimicrobium sp. strain MM. Black dotted line at dN/dS value of 1 represents the baseline criterion for positive natural selection. (c) Pairwise correlation analysis between Fand dN/dS values for three genome pairs.
Figure 3Illustration of pathogenicity islands and metagenomic islands across Cellulosimicrobium sp. strain MM.
(a) Twelve MGIs depicted across the genome of Cellulosimicrobium sp. strain MM after mapping of metagenomic reads from biofilm at Manikaran hot springs. (b) Whole genome alignments. Rings from inside to outside: 1, Whole genome synteny plot of three Cellulosimicrobium genomes using 5 kb window size. Black, blue and red rings represent whole genome sequences for strain MM, LMG16121, and J36. BLASTN comparisons of strain MM with two reference genomes i.e. strains LMG16121 and J36: 2, Black solid represents the genome sequence of strain MM. 3, Green colored rings represent the location of 5 PAIs deciphered across strain MM. 4, Circular black line graph shows %G+ C content of strain MM with regions highlighted for sudden variability (P-value < 0.05) across the extent of 5 PAIs. 5, Blue ring represents genome sequence of strain LMG16121. 6, Red ring represents genome sequence of strain J36. (c) The schematic representation for the annotation of 5 PAIs deciphered in the genome of Cellulosimicrobium sp. strain MM. The direction of the ORFs shows the gene orientation. A standard nomenclature was followed for each PAI belonging to strain MM as MM_CPAI1, MM_CPAI2, MM_CPAI3, MM_CPAI4, MM_CPAI5 where “MM” stands for the strain and “C” stands for the genus Cellulosimicrobium. Blue and red colored blocks represent non-virulent and virulent ORFs, respectively as predicted by VirulentPred. Grey colored blocks represent hypothetical proteins. On MM_CPAI2, ORFs for Fic and VbhA following ‘selfish operon’ theory are highlighted.
General features of PAIs determined across three Cellulosimicrobium genomes.
| S.No. | PAI designation | Start | End | Size (in Kbp) | Codon Usage Bias | %G+ C Difference | Number of ORFs predicted | Annotated Proteins | Hypothetical proteins | Metagenome recruitment |
|---|---|---|---|---|---|---|---|---|---|---|
| 1. | MM_PAI1 | 278605 | 298609 | 20 | 0.17 | 0.063 | 15 | 9 | 4 | Yes |
| 2. | MM_PAI2 | 1279161 | 1304985 | 25.8 | 0.194 | 0.038 | 29 | 14 | 1 | Yes |
| 3. | MM_PAI3 | 2243392 | 2258060 | 14.7 | 0.21 | 0.057 | 11 | 9 | 4 | Yes |
| 4. | MM_PAI4 | 2720926 | 2723161 | 2.2 | 0.168 | 0.054 | 12 | 7 | 1 | No |
| 5. | MM_PAI5 | 2781105 | 2801106 | 20 | 0.124 | 0.048 | 28 | 7 | 0 | Yes |
| 1. | LMG_PAI1 | 2985001 | 3010000 | 25 | 0.281 | 0.072 | 14 | 13 | 0 | No |
| 2. | LMG_PAI2 | 3460001 | 3480000 | 20 | 0.198 | 0.039 | 18 | 14 | 7 | No |
| 3. | LMG_PAI3 | 3490001 | 3495000 | 5 | 0.203 | 0.035 | 6 | 6 | 4 | No |
| 4. | LMG_PAI4 | 3500001 | 3550000 | 50 | 0.216 | 0.046 | 48 | 42 | 15 | No |
| 5. | LMG_PAI5 | 4220001 | 4230000 | 10 | 0.23 | 0.028 | 8 | 8 | 3 | No |
| 1. | J36_PAI1 | 850001 | 890000 | 40 | 0.285 | 0.068 | 26 | 20 | 6 | Yes |
| 2. | J36_PAI2 | 915001 | 970000 | 55 | 0.233 | 0.037 | 57 | 41 | 18 | No |
| 3. | J36_PAI3 | 1445001 | 1460000 | 15 | 0.221 | 0.05 | 10 | 10 | 0 | No |
Figure 4Maximum clade credibility tree summarizing the results of the Bayesian molecular clock analysis of (a) Fic protein, (b) D-alanyl-D-alanine carboxypeptidase, and (c) transposase. The protein sequences of these genes harbored by PAIs of different bacterial lineages were aligned by CLUSTALW and evolutionary rate estimation was performed using BEAST. The timeline indicates the age (mya, million years ago) of nodes. Values above the branches indicate posterior probability values and blue horizontal node bars show the length of the 95% highest posterior density (HPD) interval of node ages. The Cellulosimicrobium strains are labeled in red. The branch color gradient (red to black) and width is set according to the increasing substitution rate (r at 95% HPD interval) with black and increased thickness representing the higher substitution rate.
Annotation of hypothetical proteins deciphered on PAIs across three Cellulosimicrobium genomes using ACLAME database.
| S.No. | ACLAME Annotation | Origin | Host | MGE class | Coordinates |
|---|---|---|---|---|---|
| CPAI2 | |||||
| 1. | Putative MrcB penicillin binding protein B | Plasmid; pSymA | 607 | 3768, 6348 | |
| 2. | Hypothetical protein | Plasmid; pRHL1 | 814 | 16606, 17598 | |
| CPAI3 | |||||
| 3. | Putative outer membrane protein | Plasmid; pKPN3 | 1959 | 7084, 8875 | |
| 4. | Mobilization protein | Plasmid; pKJ50 | 247 | 8912, 9427 | |
| 5. | Orf15 | Viral peptides | 328 | 11205, 11626 | |
| 6. | Hypothetical protein | Plasmid | 1145 | 12071, 13867 | |
| CPAI4 | |||||
| 7. | Phage terminase | Prophage | 2612 | 9596, 10915 | |
| CPAI3 | |||||
| 8. | Hypothetical protein | Plasmid; pREL1 | 773 | 1993, 2385 | |
| 9. | Putative atp/gtp-binding protein | Plasmid | 656 | 5790, 7312 | |
| 10. | Hypothetical protein | Plasmid | 1812 | 9579, 10760 | |
| 11. | Type I site-specific deoxyribonuclease, HsdR family | Plasmid; pPNAP05 | 1820 | 10905, 11417 | |
| 12. | Hypothetical protein | Plasmid; ColIb-P9 | 515 | 11857, 12980 | |
| 13. | Hypothetical protein | Plasmid; pBD2 | 579 | 17843, 19254 | |
| CPAI4 | |||||
| 14. | Hypothetical protein | Plasmid | 656 | 247, 1968 | |
| 15. | Conjugal transfer protein | Plasmid; pXF51 | 686 | 2269, 3240 | |
| CPAI5 | |||||
| 16. | Hypothetical protein | Plasmid; pREL1 | 773 | 5006, 5892 | |
| 17. | Site-specific recombinase for integration and excision | Viral peptides; phi-105 | 329 | 7307, 7675 | |
| 18. | Hypothetical protein | Plasmid; pREC1 | 1126 | 8234, 8950 | |
| 19. | Putative transcriptional regulator | Plasmid; pCM2 | 1957 | 8984, 10622 | |
| 20. | Integrase | Prophage | 2619 | 11024, 11988 | |
| 21. | Hypothetical protein | plasmid; pMFLV02 | 1958 | 12080, 12985 | |
| 22. | Hypothetical protein | plasmid; pREL1 | 773 | 13074, 14654 | |
| 23. | Hypothetical protein | Plasmid | 786 | 16493, 16714 | |
| 24. | Hypothetical protein | plasmid; pRL11 | 741 | 35361, 35924 | |
| 25. | Hypothetical protein | plasmid; pRL12 | 779 | 35967, 36521 | |
| 26. | Hypothetical protein | plasmid; pSymA | Sinorhizobium | 607 | 41413, 41835 |
| CPAI6 | |||||
| 27. | Transfer gene complex protein-like protein | plasmid; p103 | 576 | 408, 1903 | |
| 28. | Integral membrane protein, putatine | plasmid; TC2 | 1812 | 1916, 2799 | |
| 29. | Putative septum site-determining protein (MinD) | plasmid; pBD2 | 579 | 5987, 7340 | |
| CPAI1 | |||||
| 30. | Hypothetical protein | Plasmid | 715 | 11725, 11943 | |
| 31. | Clp N terminal domain protein | Plasmid; pMFLV02 | 1958 | 19589, 19837 | |
| 32. | Type IV secretion/conjugal transfer ATPase | Plasmid | 910 | 27425, 31462 | |
| 33. | Hypothetical protein | Plasmid | 714 | 35373, 35855 | |
| CPAI2 | |||||
| 34. | Phage tail tape measure protein | Viral peptides; phiN315 | 170 | 6168, 6698 | |
| 35. | Putative alkylmercury lyase | Plasmid | 752 | 6714, 7025 | |
| 36. | Putative DNA primase/helicase | plasmid; pSLA2-L | 661 | 11012, 11650 | |
| 37. | DNA primase catalytic core | plasmid; pNOCA01 | 1808 | 11656, 14345 | |
| 38. | Hypothetical protein | Plasmid | 786 | 14933, 15175 | |
| 39. | TraF/VirB10-like protein | plasmid; pF1947 | 720 | 17652, 18719 | |
| 40. | Transfer gene complex protein-like protein | plasmid; p103 | 576 | 25494, 26919 | |
| 41. | Putative secreted protein | plasmid; pCM2 | 1957 | 27007, 27747 | |
| 42. | Phage lambda-related host specificity protein J | plasmid; pMT1 | 1123 | 36826, 38202 | |
| 43. | LtrC-like protein | plasmid; pC4602-2 | 1968 | 43611, 44552 | |
| 44. | Hypothetical protein | plasmid; pCC7120epsilon | 495 | 44756, 45819 | |
| 45. | Phage-related protein | viral peptides | 1466 | 52588, 53604 | |
Figure 5Recruitment plot showing binning of metagenomic reads from biofilm at Manikaran hot springs on pathogenicity islands (PAIs).
(a) One dot represents each read aligned onto the PAIs of Cellulosimicrobium sp. strain MM namely MM_CPAI1, (b) MM_CPAI2, (c) MM_CPAI3, (d) MM_CPAI5, and (e) J36_CPAI2 from Cellulosimicrobium cellulans J36. x-y axes represent the sequence co-ordinates and sequence identity, respectively. Blue, red and gray blocks represent non-virulent, virulent (as predicted by VirulentPred) and hypothetical proteins, respectively.