| Literature DB >> 26538306 |
Gregory Ditzler1, J Calvin Morrison2, Yemin Lan3, Gail L Rosen4.
Abstract
BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome.Entities:
Mesh:
Year: 2015 PMID: 26538306 PMCID: PMC4634798 DOI: 10.1186/s12859-015-0793-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Pseudo code for search selecting features using a greedy algorithm that attempts to maximize
List of the top ranking features for omnivores and vegetarians in the 16S data collected from the American Gut Project detected using JMI within Fizzy
| (Feature rank) | Operation taxonomic unit classification | (OTU ID) |
|---|---|---|
| (F1) | Firmicutes, Clostridia, Clostridiales, Lachnospiraceae | (GGID4329132) |
| (F2) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae | (GGID185584) |
| (F3) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID177150) |
| (F4) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID197367) |
| (F5) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID199716) |
| (F6) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID188887) |
| (F7) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID312140) |
| (F8) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID4401110) |
| (F9) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID198449) |
| (F10) | Firmicutes, Bacilli, Bacillales, Paenibacillaceae, Paenibacillus | (GGID4470837) |
| (F11) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae, Faecalibacterium prausnitzii | (GGID359314) |
| (F12) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID2859978) |
| (F13) | Firmicutes, Clostridia, Clostridiales | (GGID197832) |
| (F14) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID205904) |
| (F15) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae, Faecalibacterium prausnitzii | (GGID520413) |
The number followed by “F” indicates the order Fizzy selected the OTU and the “GGID” contains the Greengenes OTU ID from the taxonomic classification
List of the top ranking features for omnivores and vegetarians in the 16S data collected from the American Gut Project detected using NPFS-JMI
| (Feature rank) | Operation taxonomic unit classification | (OTU ID) |
|---|---|---|
| (F1) | Firmicutes, Clostridia, Clostridiales, Lachnospiraceae, Shuttleworthia | (GGID4424924) |
| (F2) | Cyanobacteria, Oscillatoriophycideae, Chroococcales, Xenococcaceae, Chroococcidiopsis | (GGID649518) |
| (F3) | Proteobacteria, Betaproteobacteria, Gallionellales, Gallionellaceae, Gallionella | (GGID3239358) |
| (F4) | Firmicutes, Clostridia, Clostridiales | (GGID176062) |
| (F5) | Firmicutes, Bacilli, Gemellales, Gemellaceae | (GGID967433) |
| (F6) | Firmicutes, Erysipelotrichi, Erysipelotrichales, Erysipelotrichaceae, Erysipelothrix | (GGID4478325) |
| (F7) | Firmicutes, Clostridia, Clostridiales, Lachnospiraceae | (GGID183576) |
| (F8) | Firmicutes, Clostridia, Clostridiales, Clostridiaceae, Clostridium | (GGID174688) |
| (F9) | Firmicutes, Clostridia, Clostridiales, Clostridiaceae | (GGID1137375) |
| (F10) | Firmicutes, Clostridia, Clostridiales, Lachnospiraceae, Blautia | (GGID305997) |
| (F11) | Firmicutes, Clostridia, Clostridiales, Lachnospiraceae | (GGID288682) |
| (F12) | Proteobacteria, Gammaproteobacteria, Pasteurellales, Pasteurellaceae, Haemophilus | (GGID995893) |
| (F13) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID4450198) |
| (F14) | Firmicutes, Clostridia, Clostridiales | (GGID267502) |
| (F15) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID531722) |
The number followed by “F” indicates the order NPFS selected the OTU and the “GGID” contains the Greengenes OTU ID from the taxonomic classification
List of the top ranking features for omnivores and vegetarians in the 16S data collected from the American Gut Project detected using Random Forests
| (Feature rank) | Operation taxonomic unit classification | (OTU ID) |
|---|---|---|
| (F1) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides ovatus | (GGID180606) |
| (F2) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides fragilis | (GGID4386507) |
| (F3) | Firmicutes, Clostridia, Clostridiales, Lachnospiraceae, Roseburia | (GGID4335815) |
| (F4) | Actinobacteria, Actinobacteria, Actinomycetales, Corynebacteriaceae, Corynebacterium simulans | (GGID912997) |
| (F5) | Bacteroidetes, Bacteroidia, Bacteroidales, Rikenellaceae | (GGID175375) |
| (F6) | Firmicutes, Clostridia, Clostridiales, Lachnospiraceae | (GGID194112) |
| (F7) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae | (GGID189924) |
| (F8) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID1105984) |
| (F9) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID197367) |
| (F10) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae | (GGID174818) |
| (F11) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae | (GGID4324040) |
| (F12) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae | (GGID197204) |
| (F13) | Bacteroidetes, Bacteroidia, Bacteroidales, Bacteroidaceae, Bacteroides | (GGID1944498) |
| (F14) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae | (GGID196307) |
| (F15) | Firmicutes, Clostridia, Clostridiales, Ruminococcaceae, Ruminococcus flavefaciens | (GGID1122673) |
The number followed by “F” indicates the order the Random Forest selected the OTU and the “GGID” contains the Greengenes OTU ID from the taxonomic classification
List of the largest differences in abundance between omnivores and vegetarians in the 16S data collected from the American Gut Project using LefSe. Note that LefSe does not return the Greengenes IDs
| Operation taxonomic unit classification |
|---|
| Bacteria, Actinobacteria, Actinobacteria, Actinomycetales, Actinomycetaceae, Actinobaculum |
| Bacteria, Actinobacteria, Actinobacteria, Actinomycetales, Micrococcaceae, Kocuria, rhizophila |
| Bacteria, Proteobacteria, Gammaproteobacteria, Xanthomonadales, Xanthomonadaceae, Dyella |
| Archaea, Euryarchaeota, Methanomicrobia, Methanosarcinales |
| Bacteria, Proteobacteria, Alphaproteobacteria, Rhizobiales, Bradyrhizobiaceae, Bradyrhizobium |
| Bacteria, Actinobacteria, Actinobacteria, Actinomycetales, Mycobacteriaceae, Mycobacterium, celatum |
| Bacteria, Actinobacteria, Actinobacteria, Bifidobacteriales, Bifidobacteriaceae, Alloscardovia |
| Bacteria, Actinobacteria, Actinobacteria, Actinomycetales, Mycobacteriaceae |
| Bacteria, Actinobacteria, Actinobacteria, Actinomycetales, Actinomycetaceae, Actinomyces, europaeus |
| Bacteria, Actinobacteria, Actinobacteria, Actinomycetales, Micromonosporaceae |
| Bacteria, Proteobacteria, Betaproteobacteria, Burkholderiales, Comamonadaceae, Paucibacter |
| Bacteria, Firmicutes, Bacilli, Bacillales, Bacillaceae, Bacillus, coagulans |
| Bacteria, Firmicutes, Bacilli, Bacillales, Bacillaceae, Bacillus, humi |
| Archaea, Euryarchaeota, Methanomicrobia, Methanosarcinales, Methanosarcinaceae, Methanosarcina, mazei |
| Archaea, Euryarchaeota, Methanomicrobia |
| Archaea, Euryarchaeota, Methanomicrobia, Methanosarcinales, Methanosarcinaceae |
| Bacteria, Bacteroidetes, Flavobacteriia, Flavobacteriales, Flavobacteriaceae, Capnocytophaga |
| Bacteria, Proteobacteria, Alphaproteobacteria, Rhodospirillales, Acetobacteraceae, Acetobacter |
| Bacteria, Actinobacteria, Actinobacteria, Actinomycetales, Nocardioidaceae, Nocardioides |
Fig. 2Joint Mutual Information (JMI) was configured to select 500 features from the 25k + OTUs in the American Gut Project’s fecal samples. The diet of the sample is the dependent variables. The selected Greengenes (GG) OTUs are sorted by the absolute difference between the omnivores and vegetarians. The numerical values on the x-axis that correspond to an OTU can be found the the text
List of the top five ranked Pfams as selected by the Fizzy’s Mutual Information Maximization (MIM) applied to MetaHit
| Rank | IBD features |
|---|---|
|
| ABC transporter (PF00005) |
|
| Phage integrase family (PF00589) |
|
| Glycosyl transferase family 2 (PF00535) |
|
| Acetyltransferase (GNAT) family (PF00583) |
|
| Helix-turn-helix (PF01381) |
| Rank | Obese features |
|
| ABC transporter (PF00005) |
|
| MatE (PF01554) |
|
| TonB dependent receptor (PF00593) |
|
| Histidine kinase-, DNA gyrase B-, and HSP90-like |
| ATPase (PF02518) | |
|
| Response regulator receiver domain (PF00072) |
Fig. 3Number of feature being selected by JMI, mRMR, MIM Lasso, NPFS, and Random Forests as a function of the evaluation time