| Literature DB >> 30947688 |
Joseph L Sevigny1,2, Derek Rothenheber3, Krystalle Sharlyn Diaz3,4, Ying Zhang5, Kristin Agustsson5, R Daniel Bergeron5, W Kelley Thomas3,4.
Abstract
BACKGROUND: Although high-throughput marker gene studies provide valuable insight into the diversity and relative abundance of taxa in microbial communities, they do not provide direct measures of their functional capacity. Recently, scientists have shown a general desire to predict functional profiles of microbial communities based on phylogenetic identification inferred from marker genes, and recent tools have been developed to link the two. However, to date, no large-scale examination has quantified the correlation between the marker gene based taxonomic identity and protein coding gene conservation. Here we utilize 4872 representative prokaryotic genomes from NCBI to investigate the relationship between marker gene identity and shared protein coding gene content.Entities:
Keywords: 16S rRNA; Amplicon; Comparative genomics; Functional capacity; Marker gene; Metabarcoding; Metagenomics
Mesh:
Substances:
Year: 2019 PMID: 30947688 PMCID: PMC6449922 DOI: 10.1186/s12864-019-5641-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Workflow of data analysis. The workflow for analysis starts at the upper left box “NCBI Representative Genome Database” and follows two majors tracks. The first leads to a comparison between bacterial genome orthology (determined by Orthofinder) and marker gene sequence cluster groups (determined by UCLUST). The second path utilizes the protein-coding gene sets to determine which functions are shared or unshared across the bacterial genomes. Arrows correspond to the movement of data through the pipeline
Fig. 2Taxonomic classifications of NCBI’s RefSeq representative prokaryotic genomes. A KronaTool map representing the relative taxonomic breakdown of the genomes used in this study. The inner circle represents genomes at the domain, the middle circle corresponds to phylum, and the outer circle represents data at the class level
Fig. 3Relationships between intra-organism 16S rRNA copy number and the percent identity across copies. A scatter bubble plot represents the relationship between 16S rRNA copy number and the percent identity between those copies. The circle size corresponds to the number of bacterial genomes with the same percent identity and copy number
Fig. 416S rRNA clustering statistics. a The relationship between the number of 16S rRNA clustering groups and the number of bacterial genomes represented in each cluster at various percent identity thresholds. b Taxonomic resolution (genus level) based on clustered marker genes for each of the three amplicon datasets. c and d The percentage of genomes whose 16S rRNA genes clustered into one, two, or greater than two different clustering groups for the 16S rRNA and V4 16S rRNA datasets respectively
Fig. 5Phylogenetic marker(s) and single-copy ortholog(s) relationship to shared gene content. Shown are box and whisker plots depicting the percentage of shared genes between genomes clustered at various percent identity intervals: (a) 16S rRNA, (b) V4 16S rRNA, (c) Five-concatenated MLSA orthologs. Boxplots show the first and third quartile (bottom and top lines of the box), the median (middle line of the box), and the smallest and largest data-points excluding outliers (bottom and top whiskers). Data-points outside the whiskers correspond to outliers
Fig. 6Relationship between 99% similar V4 16S rRNA and shared gene content across select microbial lineages. Violin plots representing the distribution of phylogenetically identical organisms (99% V4 16S rRNA) across select microbial lineages and the percentage of shared gene content. The dotted black line corresponds to the mean shared gene content of the entire dataset and the width of the violin represents the relative concentration of data (i.e. wider regions contain more data points)
Significant shared and unshared gene ontology terms between phylogenetically identical organisms (99% V4 16S rRNA)
| Ontology | GO.ID | Term | Count-shared | Count-unshared | |
|---|---|---|---|---|---|
| Molecular Function | |||||
| | GO:0004803 | transposase activity | 4591 | 8641 | < 1e-30 |
| GO:0003964 | RNA-directed DNA polymerase … | 165 | 288 | < 1e-30 | |
| GO:0097351 | toxin-antitoxin pair type II bind … | 72 | 274 | < 1e-30 | |
| GO:0090729 | toxin activity | 357 | 915 | < 1e-30 | |
| GO:0009036 | type II site-specific deoxyribon … | 24 | 180 | < 1e-30 | |
| | GO:0019843 | rRNA binding | 42,179 | 808 | < 1e-30 |
| GO:0046872 | metal ion binding | 124,123 | 28,675 | < 1e-30 | |
| GO:0003735 | structural constituent of ribos … | 63,194 | 2123 | < 1e-30 | |
| GO:0003723 | RNA binding | 32,770 | 4032 | < 1e-30 | |
| GO:0000287 | magnesium ion binding | 50,000 | 8454 | < 1e-30 | |
| Biological Function | |||||
| | GO:0032196 | transposition | 1435 | 1887 | < 1e-30 |
| GO:0045927 | positive regulation of growth | 69 | 327 | < 1e-30 | |
| GO:0045926 | negative regulation of growth | 86 | 338 | < 1e-30 | |
| GO:0051607 | defense response to virus | 235 | 756 | < 1e-30 | |
| GO:0043571 | maintenance of CRISPR repeat … | 162 | 560 | < 1e-30 | |
| | GO:0006412 | translation | 70,775 | 2978 | < 1e-30 |
| GO:0071555 | cell wall organization | 18,821 | 1788 | < 1e-30 | |
| GO:0006457 | protein folding | 11,000 | 826 | < 1e-30 | |
| GO:0009252 | peptidoglycan biosynthetic proc. … | 16,336 | 977 | < 1e-30 | |
| GO:0008360 | regulation of cell shape | 17,552 | 1049 | < 1e-30 | |
| Cellular Component | |||||
| | GO:0012506 | vesicle membrane | 143 | 220 | < 1e-30 |
| GO:0009341 | beta-galactosidase complex | 487 | 567 | < 1e-30 | |
| GO:0031469 | polyhedral organelle | 37 | 149 | < 1e-30 | |
| GO:0008305 | integrin complex | 38 | 83 | < 1e-30 | |
| GO:0030077 | plasma membrane light-harv … | 68 | 147 | < 1e-30 | |
| | GO:0015934 | large ribosomal subunit | 8067 | 252 | < 1e-30 |
| GO:0005623 | cell | 15,927 | 4518 | < 1e-30 | |
| GO:0005886 | plasma membrane | 157,863 | 45,460 | < 1e-30 | |
| GO:0015935 | small ribosomal subunit | 7833 | 98 | < 1e-30 | |
| GO:0005737 | cytoplasm | 248,487 | 26,478 | < 1e-30 | |