| Literature DB >> 26975620 |
Lei Chen1,2, Yu-Hang Zhang3, Tao Huang3, Yu-Dong Cai1.
Abstract
The gut microbiome is shaped and modified by the polymorphisms of microorganisms in the intestinal tract. Its composition shows strong individual specificity and may play a crucial role in the human digestive system and metabolism. Several factors can affect the composition of the gut microbiome, such as eating habits, living environment, and antibiotic usage. Thus, various races are characterized by different gut microbiome characteristics. In this present study, we studied the gut microbiomes of three different races, including individuals of Asian, European and American races. The gut microbiome and the expression levels of gut microbiome genes were analyzed in these individuals. Advanced feature selection methods (minimum redundancy maximum relevance and incremental feature selection) and four machine-learning algorithms (random forest, nearest neighbor algorithm, sequential minimal optimization, Dagging) were employed to capture key differentially expressed genes. As a result, sequential minimal optimization was found to yield the best performance using the 454 genes, which could effectively distinguish the gut microbiomes of different races. Our analyses of extracted genes support the widely accepted hypotheses that eating habits, living environments and metabolic levels in different races can influence the characteristics of the gut microbiome.Entities:
Mesh:
Year: 2016 PMID: 26975620 PMCID: PMC4791684 DOI: 10.1038/srep23075
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Four IFS curves show the results of four prediction engines obtained using the IFS method.
The X-axis represents the number of features used for classification, whereas the Y-axis represents the MCC. It can be observed that the prediction engine SMO can produce the best performance.
The best performance of four prediction engines.
| Prediction engine | Number of features | Accuracy for American race | Accuracy for Asian race | Accuracy for European race | Overall prediction accuracy | MCC |
|---|---|---|---|---|---|---|
| Random forest | 890 | 0.986 | 0.989 | 0.999 | 0.994 | 0.990 |
| Nearest neighbor algorithm | 80 | 0.964 | 0.981 | 0.991 | 0.985 | 0.972 |
| SMO | 454 | 0.986 | 0.992 | 1.000 | 0.996 | 0.993 |
| Dagging | 956 | 0.964 | 0.978 | 1.000 | 0.990 | 0.981 |
Figure 2The IFS curve of the SMO between X-axis 4 and X-axis 100.
The X-axis represents the number of features used for classification, whereas the Y-axis represents the MCC. It can be observed that the first 25 features in the mRMR feature list yielded a MCC greater than 0.960.
Genes that are the most important for distinguishing gut microbiome from different races.
| Gene ID | Gene symbol | Gene name | Species Annotation (Phylum Level) | Species Annotation (Genus Level) | Gene Function | MI value | Rank in the mRMR feature list |
|---|---|---|---|---|---|---|---|
| 5098122 | ppsA | O2.UC22-1_GL0086706 | Firmicutes | unknown | Carbon metabolism, methane metabolism,pyruvate metabolism | 0.328 | 1 |
| 2388211 | — | N032A_GL0031359 | Bacteroidetes | Bacteroides | Intracellular trafficking, secretion, and vesicular transport | 0.256 | 3 |
| 5465406 | TVAG_129840 | 763982056-stool2_revised_scaffold11317_1_gene87646 | Bacteroidetes | Bacteroides | Intracellular trafficking, secretion, and vesicular transport | 0.253 | 7 |
| 3974863 | N-CoR1 | NLM004_GL0000416 | Bacteroidetes | unknown | Transcription associated processes, immune associated processes | 0.252 | 11 |
| 6917923 | ffh | N013A_GL0070806 | Bacteroidetes | Bacteroides | Bacterial secretion, protein export processes | 0.238 | 20 |
| 8616426 | jcdB | 508703490-stool1_revised_scaffold18525_1_gene60248 | Bacteroidetes | Bacteroides | Function unknown | 0.238 | 15 |
| 5528883 | – | MH0053_GL0075770 | Bacteroidetes | Bacteroides | Intracellular trafficking, secretion, and vesicular transport | 0.236 | 22 |
| 7263275 | ahpC | T2D-120A_GL0064788 | Bacteroidetes | Bacteroides | Transcriptiion associated processes | 0.236 | 16 |
| 7647738 | — | V1.UC49-0_GL0182914 | Bacteroidetes | Bacteroides | Function unknown | 0.234 | 5 |
| 6373942 | hemE | V1.UC35-4_GL0001594 | Firmicutes | unknown | Amino acid transport and metabolism, uroporphyrinogen metabolism | 0.229 | 2 |
| 5165148 | Gura_R0049 | T2D-132A_GL0073376 | Bacteroidetes | Bacteroides | Translation, ribosomal structure and biogenesis, aminoacyl-tRNA biosynthesis | 0.224 | 9 |
| 6809955 | SSPA0672 | 764143897-stool2_revised_C578533_1_gene62401 | Bacteroidetes | Bacteroides | Transferring nitrogenous groups | 0.216 | 19 |
| 8881455 | Snas_0276 | 508703490-stool1_revised_scaffold1234_2_gene24275 | Bacteroidetes | Bacteroides | Thioesterase associated metabolism processes | 0.214 | 13 |
| 8062488 | SORBIDRAFT_01g014353 | V1.CD12-3_GL0107538 | Bacteroidetes | Alistipes | Function unknown | 0.195 | 24 |
| 5689732 | AZC_1524 | MH0135_GL0082377 | Firmicutes | unknown | General function prediction only | 0.185 | 6 |
| 6663602 | BURPSS13_K0148 | V1.FI03_GL0058372 | Firmicutes | unknown | Amino acid transport and metabolism | 0.184 | 14 |
| 3417347 | PB400914.00.0 | MH0346_GL0140692 | unknown | unknown | Secondary metabolites biosynthesis, transport and catabolism (General function prediction only) | 0.173 | 18 |
| 8949875 | EAM_2103 | DLF001_GL0011563 | Firmicutes | Streptococcus | Function unknown | 0.169 | 10 |
| 2539602 | erg6 | MH0444_GL0048102 | Bacteroidetes | Bacteroides | Transcription, steroid biosynthesis, ergocalciferol biosynthesis | 0.169 | 4 |
| 4483132 | Mmc1_3137 | MH0311_GL0031693 | Firmicutes | Roseburia | Porphyrin and chlorophyll metabolism | 0.162 | 21 |
| 3901267 | CYB_1400 | MH0150_GL0048296 | Firmicutes | unknown | Function unkown | 0.158 | 12 |
| 6425695 | RP11-20H2 | V1.CD54-0_GL0116394 | Bacteroidetes | Alistipes | Function unknown | 0.158 | 8 |
| 4493132 | gcvT | V1.UC4-5_GL0212543 | Bacteroidetes | Bacteroides | Signal transduction mechanisms, carbon metabolism, Glyoxylate and dicarboxylate metabolism | 0.156 | 17 |
| 9244908 | Ndas_1062 | DLF004_GL0039505 | unknown | unknown | Function unknown | 0.146 | 23 |
| 1065492 | SPs0355 | MH0077_GL0010885 | Bacteroidetes | Bacteroides | Amino sugar and nucleotide sugar metabolism, fructose and mannose metabolism, phosphotransferase system (PTS) associated biological processes | 0.146 | 25 |
Figure 3The relationships among 25 important genes and three different races.
Three circles represent the sets of genes that are specific to American, Asian and European races, respectively. Genes in the overlap of circles indicate that they are specific to multiple races. A total of ten genes are specific to only one race, whereas eight genes are specific to two races, and seven genes are shared by all three races analyzed.
Figure 4A heat map showing the expression levels of seven genes shared by three races on different races.