| Literature DB >> 29610551 |
Toshio Kobayashi1, Akira Andoh2.
Abstract
The human intestinal microbiota has a close relationship with health control and causes of diseases, and a vast number of scientific papers on this topic have been published recently. Some progress has been made in identifying the causes or species of related microbiota, and successful results of data mining are reviewed here. Humans who are targets of a disease have their own individual characteristics, including various types of noise because of their individual life style and history. The quantitatively dominant bacterial species are not always deeply connected with a target disease. Instead of conventional simple comparisons of the statistical record, here the Gini-coefficient (i.e., evaluation of the uniformity of a group) was applied to minimize the effects of various types of noise in the data. A series of results were reviewed comparatively for normal daily life, disease and technical aspects of data mining. Some representative cases (i.e., heavy smokers, Crohn's disease, coronary artery disease and prediction accuracy of diagnosis) are discussed in detail. In conclusion, data mining is useful for general diagnostic applications with reasonable cost and reproducibility.Entities:
Keywords: data mining; decision tree; intestinal microbiota; operational taxonomic unit; terminal restriction fragment length polymorphism
Year: 2018 PMID: 29610551 PMCID: PMC5874238 DOI: 10.3164/jcbn.17-84
Source DB: PubMed Journal: J Clin Biochem Nutr ISSN: 0912-0009 Impact factor: 3.114
Singh RK et al. “Effects of protein on gut microbiota”
| Microbial diversity | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Animal protein | ↑ | ↑↓ | ↑↓ | ↑ | ↑ | ↑ | ↓ | ↑↓ | |
| Whey protein extract | ↑ | ↑ | ↑ | ↓ | ↓ | ||||
| Pea protein extract | ↑ | ↑ | ↑ |
Arrow thickness corresponds to relative number of studies supporting the relationship; Reprinted with permission from Ref (1).
List of the successful results by data-mining analysis
| Target | Number of subjects | T-RFLP or NGSA | Results of DM-analyses | Most related OUT | References | |||
|---|---|---|---|---|---|---|---|---|
| Target | Ref | Total | ||||||
| Daily life | ||||||||
| Ages | M, 92 | — | M, 92 | HA323 | ||||
| Over a hundred years old | F | B505 | ||||||
| Residential areas | 40 + 30 + 35 + 16 | (4-NP) | 121 | Hh32 | ||||
| BMI/obese, Lean | M, 92 | — | M, 92 | A95 | ||||
| 10 - 10 | — | 20 | NGSA | |||||
| Disease | ||||||||
| Smoking habit | M, 16 | M,76 | M, 92 | HA291 | ||||
| Nicotine-gum | 10 | 10 | 20 | B517 | ||||
| Smoking cessation period | M, 35 | M, 57 | M, 92 | M216 | ||||
| Drinking habits | M, 45 | M, 47 | M, 92 | A47 | ||||
| Crohn’s disease | 66 + 51 + 43 (3-NP) | 121 | 281 | Hh93 | ||||
| Coronary artery disease | 39 | 30 | 69 | B853 | ||||
| Sarcopenia, loss of grip/muscle mass | M | NGSA | ||||||
| F | NGSA | |||||||
| Sarcopenia (by Amino acid compositions | M | 21 kinds of amino acids | α-aminobutyric acid | |||||
| F | 21 kinds of amino acids | α-aminobutyric acid | ||||||
| Diabetes | M, 8 + 9 (2-NP) | M, 19 | M, 36 | B749 | ||||
| F, 7 + 6 (2-NP) | F, 12 | F, 25 | B366 | |||||
| Technical aspects of DM-analyses | ||||||||
| Restriction enzymes | Operating examples were Smoking, Drinking habits and Ages. | — | ||||||
| Prediction accuracies | Operating examples were Smoking and Drinking habits. | — | ||||||
| M, 92 | — | M, 92 | ||||||
| Personal identification | M, 92 | — | M, 92 | — | ||||
Restriction enzymes: BslI:516f-BslI, HaeIII:516f-HaeIII, MspI:27f-MspI, AluI:27f-AluI, QHhaI:35f-HhaI, QMspI:35f-MspI and QAluI:35f-AluI; DM, data-mining; Ref, reference; NGSA, 16S rRNA sequencing amplicon analysis; M, male; F, female; NP, nominal partition. $1: same subjects, before and after the applying Nicotine-gum for a month, Akira Andoh, Shiga Univ. of Medical Science, personal communication. $2: instead of intestinal microbiota in feces, amino acid composition in the fasting blood collected early in the morning. $3: successful researches and analyses have been in progresses now. &: DM results were good, no error of classification in Decision-tree with 5 steps or less; #: few errors with 5 steps of Decision-tree. The grey cells are described in further detail.
Fig. 1Smoking habit: Dt obtained by DM with 92 healthy men. Each square is called ‘node’. The left end node is called ‘Root-node’, which is the starting point of tree construction. Dt was growing toward right side. The marks, e.g., HA291, was the dividing OTU, of which numerical dividing points were shown. Each node showed its component of subjects. A: non-smokers, B: smokers; Reprinted with permission from Ref (15).
Fig. 2Dt results of ‘Crohn’s disease’. HhaI 93-bp OTU and MspI 208-bp OTU are abbreviated as Hh93 and M208. The cut-off values are also calculated by Gini-coefficient. The details of the decision tree and the pathway indicate the species and quantities of OTUs. Reprinted with permission from Ref (19).
Fig. 3Dt results of ‘Coronary artery disease’ with BslI. Coronary artery disease, CAD; control, Ctrl. Dark areas in circle charts represent CAD. The cut-off values of each dividing steps are calculated with optimization of Gini-coefficient. Reprinted with permission from Ref (24).
List of comparing prediction accuracies for specific examples
Evaluation of constructed Dt with 7 kinds of R.Enz and their combinations for smoking and drinking habits. The horizontal line of ‘N of false identifications in 92 records’ show the accuracies of identification. R.Enz, restriction enzyme; Dt, decision tree; N, number; B, BslI; HA, HaeIII; M, MspI; A, AluI; QHh, 35f-HhaI; QM, 35f-MspI; QA, 35f-AluI; Dilute·Conc., location of false identified nodes in Dt; D2/C1·$, features of false nodes in Dt. Reprinted with permission from Ref (17).