Literature DB >> 24936373

Comparison of the accuracy and mechanism of data mining identification of the intestinal microbiota with 7 restriction enzymes.

Abstract

The intestinal microbiota compositions of 92 Japanese men were identified following consumption of identical meals for 3 days, and collected feces were analyzed through terminal restriction fragment length polymorphism. The obtained operational taxonomic units (OTUs) and subjects' smoking and drinking habits, which had 2 nominal partitions, yes or no, were analyzed by Data mining software. Identification of subjects for each habit was successfully performed and reported previously, but the identification accuracy was closely dependent on the species of the applied restriction enzymes for PCR. For the sake of better selection of enzymes and understanding the mechanisms of Data mining analysis, 516f-BslI and 516f-HaeIII, 27f-MspI and 27f-AluI and 35f-HhaI, 35f-MspI and 35f-AluI, altogether 7 enzymes, were examined comparatively. Data mining analysis provides a Decision tree for identification of subjects and their dividing pathways that is produced using a limited number of OTUs, which affects the accuracy of the results. The present report discusses not only a global comparison of accuracies for characteristics, but also the detailed mechanisms that result in better or worse results and the practical roles and functions of OTUs. The OTU at the 1st step of the constructed Decision tree was the most important for any identification, and for all cases, the combination of subsequent OTUs, which formed later in the Decision tree, was also unignorable. Detailed dividing pathways were traced and compared for the 7 enzymes and the future supporting ideas were provided for better Data mining analysis of the human intestinal microbiota.

Entities: Chemical Disease Species

Keywords: accuracy and mechanism of identification; data mining analysis; decision tree; human intestinal microbiota; operational taxonomic unit; restriction enzyme; smoking and drinking habit

Year: 2013 PMID： 24936373 PMCID： PMC4034332 DOI： 10.12938/bmfh.32.139

Source DB: PubMed Journal: Biosci Microbiota Food Health ISSN： 2186-3342

INTRODUCTION

The human intestinal microbiota (HIM) is closely related to our health, and practical research on the relationship with the human immune systems and diseases is now being widely performed. In order to analyze and compare the HIM of each subject, various factors, e.g., diets and drugs, that can directly affect the composition of the HIM, must be controlled. In particular, it is essential to unify the dietary factors because daily feeding habits vary among individuals. However, a fixed diet itself may also affect the composition of the HIM, so we cannot apply a fixed diet to long-term feeding experiments. Here we have also tried to apply Data mining analysis (DM) to identify or discriminate the relation between the characteristics of subjects and obtained HIM data from feces. Our previous papers [1, 2] examined how this worked and how we were able to trace the related bacteria with smokers. The obtained results were fruitful, but due them being the first applications of DM analysis to the HIM, deliberate accumulation of a variety of case studies was required to further expand and deepen our ability to stably apply DM. In particular, the selection of primer-restriction enzyme systems (R.Enz.) is the most notable point for practical utilization of DM, because the operational taxonomic unit (OTU) obtained by terminal restriction fragment length polymorphism (T-RFLP) usually contains many different bacteria, and its composition directly affects the results and dictates the accuracy of DM identifications. Regarding the comparison of DM processing with other existing analysis methods, the major difference is that DM utilizes mostly a limited range of OTU data to avoid noises, which are not related to an assigned characteristic. This paper focuses not only on examination and comparison of practical DM analysis with regard to OTUs but also focuses on increasing the accuracy of DM identifications. Because DM identifications did not always derive the best or better results, as shown in later tables, much operational experience and many trials were necessary. Here smoking and drinking habits were recognized as examples of a subject’s characteristics. DM analysis is also able to discriminate any OTU data, the characteristic of which have many nominal partitions, but here we applied simple 2 nominal partitions only, because deliberate progress of DM analysis is required for better and stable understanding and utilization.

MATERIALS AND METHODS

To avoid the influences of dietary factors, as already reported [2, 3], we designed identical meals (1,879 kcal/day), which were fed to 92 healthy male volunteers living in Japan for 3 days. The ages and body mass indexes (BMI) of the subjects were 21–59 years (average: 36.8) and 17.3–30.2 kg/m2 (average: 22.6), respectively. Fecal samples were analyzed by T-RFLP using 7 R.Enz. [2, 3, 4]. The reason for applying T-RFLP was as follows. First, the numerical data obtained from T-RFLP are reproducible, and second, the processing is comparatively easy and reasonable for handling large numbers of subjects. Third, T-RFLP provides appropriate numbers of data for subsequent DM analysis, which requires a balance between number of subjects (vertical axis) and the number of OTUs species fields (horizontal axis). In other words, DM analysis requires square or vertically long data, not horizontally long data, so T-RFLP provides an acceptable balance for subjects of 100 or so compared with the other molecular techniques, e.g., metagenomics, which have 1,000 or more species of vertical field. The studies were performed in accordance with the protocol approved by the RIKEN Research Ethics Committee, and the OTUs data were accumulated by the Benno Laboratory, RIKEN, Japan. Bacterial DNA was isolated from 40–100 mg of feces using the modified method described by Matsuki et al. [5]. Amplification of the fecal 16S rRNA, restriction enzyme digestion, size fractionation of the T-RFs and T-RFLP analysis were carried out as previously described [6, 7, 8]. The details of amplification and T-RFLP analysis with the 7 R.Enz., i.e., 516f-BslI, 516f-HaeIII, 27f-MspI, 27f-AluI, 35f-HhaI, 35f-MspI and 35f-AluI, were described in our previous papers [2, 3]. The amounts for each OTU represent the fluorescence intensity and then concentrations of bacteria. The obtained OTUs data were abbreviated as B--- (---base pair number) for 516f-BslI, HA--- for 516f-HaeIII, M--- for 27fMspI, A— for 27f-AluI, QHh— for 35f-HhaI, QM—for 35f-MspI and QA— for 35f-AluI, respectively. We had 2 groups of OTUs: one was 516f- + 27f-, altogether 4 R.Enz., and the other was 35f-, altogether 3 R.Enz. The component numbers of these 7 OTU groups were 27·B, 33·HA, 20·M, 40·A, 31·QA, 34·QM and 48·QA, so if we combined all the enzyme components of each group, the former had a maximum of 120 OTUs, and the latter had a maximum of 113 OTUs. On account of the balance of the number of subjects, i.e., 92, with the number of OTU species field and the problem of field alignment sequence described later, we did not mix the data of the 2 OUT groups. Various sets of R.Enz. combinations were applied, and the data were arranged with the answers of the 92 subjects. The resulting 2-dimensional Excel data were analyzed with DM software (IBM-SPSS, Clementine14). DM analysis, especially the Classification and Regression Tree (C&RT) DM modeling system, which is the most typical method of subject identification, provides a Decision tree (Dt). The Dt identifies explicitly the various groups of subjects according to the assigned characteristic, i.e., smoking [A (No, 76 subjects), B (Yes, 16)] or drinking [A (No, 47), B (Yes, 45)] habit in this paper. The C&RT divides subjects into two subsets by comparing the Gini coefficient according to the OTUs data, such that the subjects within each subset are more homogeneous than in the previous subset. The C&RT system is quite flexible, and allows unequal misclassification costs to be considered in the other modeling systems of DM. A major specialty of DM and the constructed Dt is that a single selected OUT is used for each step of Dt construction. The default setting of the C&RT system grows a Dt until 5 steps, which is modifiable, e.g., 7 steps, but thinking about the capability of OTUs to discriminate an assigned characteristic, we used 5 steps for the Dt in the present DM identification.

RESULTS

Detailed example of a Dt, its construction and accuracy

DM provided a Dt, as shown in Fig. 1 for smoking habit, that discriminates explicitly the various subject groups (i.e., nodes) as boxes. The node at the left end is called the root node in reference to a growing tree, which is the starting point of tree construction. Toward the right side, the Dt grows to divide the subjects appropriately according to the assigned characteristic, i.e., smoking (A: No; B: Yes), with OTUs of 3 R.Enz., i.e., 27·B+33·HA+20·M, a total of 80 OTUs in this case. In Fig. 1, five dotted vertical lines show the growing steps from left to right as Dt 1st step to Dt 5th step, which illustrates the progress of Dt construction. The details of the Dt and the pathway to reach the terminal node indicated clearly the species and quantities of the related OTUs, which played a role in dividing the various groups of subjects. The Dt also provided practical values of dividing points, that is, the 92 men were divided at Dt 1st step by HA291 into 2 subsets at the left end of Fig. 1 and so on. The critical value for division was 3.13, and at the lower Dt 2nd step, HA291 was also utilized. A major specialty of the C&RT system is that it uses a single selected OTU for each step of the Dt. Seven large arrows indicate terminal nodes containing all 16 smokers (B), and a large dotted arrow indicates a terminal node that contained 56 nonsmokers, i.e., 74% of the ‘A’ group, designated Node-19.

Fig. 1.

Decision tree (Dt) obtained by DM, smoking habit with 80-OTUs: 27∙BslI+33∙HaeIII+20∙MspI, case 3-A in Table 3. The large 7 arrows indicate all 16 smokers, and the large dotted arrow indicates gathered 56 nonsmokers. Each box is called a node. The left end, the root node, is the starting point of tree growth toward the right. The name of the OUT, e.g., HA291, that played a role in division is indicated: the numerical dividing point is shown only at Dt 1st step with thin dotted vertical arrow.

Table 3.

Smoking habit, Identifying detailed OTUs for reaching Dt∙5th step, 2 Nominal Partitions, representing restriction enzymes and their combinations

Comparisons between 7 R.Enz.

Table 1 shows a comparison of the results for identification of smoking habit using a single R.Enz. versus combinations of the 7 R.Enz. The upper half of Table 1 shows the 516f- + 27f- group of 4 R.Enz., and the lower half shows the 35f- group of 3 R.Enz. The second row of each group shows the OTU of the Dt 1st step, because this OUT was recognized as having a main role in dividing subjects. The third row for the group indicates the accuracy of DM, i.e., the number of falsely identified subjects in the 92 men until the Dt 5th step, which was the main evaluation term for Dt identification, and the number of falsely identified subjects ranged from 0 to 7. A value of 0 represented the best accuracy, and values higher than 0 expressed less accuracy.

Table 1.

Smoking habit, 2 Nominal Partitions, accuracy of DM identifications with various restriction enzymes for T-RFLP and their combinations

In Table 2, the drinking habit of the 92 subjects were identified with their OTUs as shown for smoking in Table 1. Comparing Table 1 and Table 2, the latter reveals a little less accuracy than the former, especially with the three 35f- R.Enz. cases.

Table 2.

Drinking habit, 2 Nominal Partitions, accuracy of DM identifications with various restriction enzymes for T-RFLP and their combinations

Detailed aspects of Dt construction

Comparing the various aspects of the obtained Dts, we examined the detailed components of each Dt. Table 3 shows a comparison for smoking habit for cases that were selected from Table 1 based on better accuracy. To assist in understanding the table notation and the Dt pathway, a sample case (case 3-A), which can be traced through Fig. 1, is marked with an asterisk (*) in Table 3. Table 4 shows a similar comparison for drinking habit. As in the case of Fig. 1, to assist in understanding the Dt structure together with the data in the cited table, a sample case (case 2-C’), which is marked (#) in Table 4, is shown in Fig. 2.

Table 4.

Drinking habit, Identifying detailed OTUs for reaching Dt∙5th step, 2 Nominal Partitions, representing restriction enzymes and their combinations

Fig. 2.

Decision tree (Dt) obtained by DM, drinking habit with 60-OTUs: 20∙27f-MspI+40∙27f-AluI, case 2-C’ in Table 4. The solid arrow indicates the major drinkers group (node), and the 2 dotted arrows indicate the major nondrinkers groups. Other notations are the same with Fig. 1.

Location of false terminal nodes

To understand the mechanisms of OTU identification, it was beneficial to not only trace the OTUs that resulted in better accuracy but also those that resulted in worse accuracy. So the location of falsely identified nodes within the Dt were traced, and an example is shown in Fig. 3, which is for drinking and includes 3 misidentified subjects. As easily understood in Table 2 and Table 4, identification was better when the OUT A47 was located at the Dt 1st step. To classify the location of falsely identified nodes, the Dt 1st step was nominated as the main dividing position. The upper half of the Dt was the diluted region (D, Dilute) of A47, the boundary concentration of which was ≤6.65, and the lower half was the concentrated region (C, Conc.) of A47, the boundary concentration of which was 6.65<. The border line is shown as a dotted horizontal line in Fig. 3. The terminal nodes for the false identifications are shown by 3 large dotted arrows. In this case, 2 misleads are situated in Dilute (D2), and 1 is situated in Conc. (C1). In Table 5, which is for both for smoking and drinking habit, typical cases of false locations are shown with emphasis on the misled locations marked with ‘Dilute or Conc.’. A sample case, i.e.,Fig. 3, is indicated with a dollar sigh ($) and D2/C1 in the lower part of Table 5.

Fig. 3.

Decision tree (Dt) obtained by DM, drinking habit with 73-OTUs: 33∙HaeIII+40∙27f-AluI, marked as $ in Table 5. The 3 dotted arrows indicate the location of falsely identified nodes, ‘D2/C1’. Other notations are the same with Fig. 1.

Table 5.

Location of false terminal nodes, comparing with OTU of Dt∙1st step and various R.Enz

DISCUSSION

Construction of various Dts and their accuracy

In Fig. 1, only 8 OTUs out of 80 were active, with 2 OTUs, i.e., HA291 and B749, being applied twice, which indicates that the remaining 72 OTUs were neglected in construction of the Dt. In other words, 8 OTUs were closely related to the subjects’ smoking characteristics, and the other 72 were recognized as OTUs unrelated to smoking like a kind of noise. These facts were the main differences compared with the former classification methods for the HIM, such as clustering, correlation coefficient and principal component analysis (PCA), which consider all OTU data without any selection, and their results are inevitably obscure. Looking at the lower right side of Fig. 1, terminal nodes were assembled until the Dt 3rd step, and were not present at the 4th and 5th steps, which meant that simpler discrimination of OTUs was carried out in this case. Furthermore, all the terminal nodes in Fig. 1 show the details of identification of the subjects, and there were no false identification; that is, the content of all terminal nodes had 0 at either A or B site. While in some other cases of Dts shown in Table 1 and Table 2, there were some false identifications in terminal nodes up to the Dt 5th step that contained a mixture of ‘A’ and ‘B’ (not 0). These misidentifications were used to evaluate the accuracy of Dt, which was tightly related to the applied OTU groups, i.e., species and combinations of R.Enz. So, this study focused on understanding the mechanisms and roles of OTUs in order to construct a better Dt structure with R.Enz.

Accuracies of the 7 R.Enz.

Looking at the cases with better accuracy with regard to smoking habit in Table 1, the Dt 1st step always had the same OTU, i.e., HA291 for the 516f- + 27f- group, and QM124 for the 35f- group, though the latter had less accuracy than the former. This fact meant that the most important role, i.e., Dt 1st step, was confirmed for better identification. Furthermore, applying a larger number of OTUs, e.g., 4 R.Enz. for the 516f- + 27f- group, did not always result in better accuracy. On the other hand, application of a single R.Enz. also showed less accuracy than a combination of 2 or 3 enzymes. There were 7 cases of better accuracy, as accuracy=1, these were shown in the 516f- + 27f- group in Table 1. Their constituent subjects were traced and are indicated with 1¢, 1§, 1* and 1£ in the table, with the same mark representing the same subject, which meant that some subjects were easily misclassified and that their OTU data possessed some boundary values and were easily misclassified. Comparing Table 1 and Table 2, similar features with regard to accuracy were observed. Also, contrasting the 516f- + 27f- group with the 35f- group, the latter had less accuracy than the former. But this was examined only with the two cases, i.e., smoking and drinking habit, so the trend should be confirmed with other cases. Simple imaginations were thought that each OTU element would be dispersed or more uniform with the 35f- group than the 516f- + 27f- group, so clear Dt construction at DM analysis would became rather difficult by the 35f- group.

Details of better accuracy

As already described, the CR&T system of DM always divides subjects into two subsets, and principally each Dt step has 2 OTUs, where ‘n’ is the step number, i.e., 1 to 5. So the Dt 5th step fundamentally has 16 OTUs. The accuracies obtained with combinations of OTUs are shown in Table 1 to Table 5. In almost all of the better cases, except case 1-A in Table 3, the OTU in the Dt 1st step was HA291, and this meant that, as already reported in our previous paper [2], HA291 had a very close relation with smoking in all 4 of these R.Enz., i.e., 120 OTUs. The 4 best cases in Table 3 had no false identifications, i.e., cases 2-A, 3-A, 3-C and 3-D, but the components of their Dt pathways differed slightly between them. Focusing on the Dt 3rd to 5th steps, the components of 2-A and 3-A were completely the same. In other words, although ‘M’ had been applied, the ‘M: 27f-MspI’ of R.Enz. had no effects on 3-A. For 3 cases, 3-A, 3-C and 3-D, we realized that the alignment sequences of the 3 R.Enz. were different and that this had affected greatly the pathway construction. The front alignments in the data sequence were more effective than the back alignments, which is understandable given the fundamental algorithms of the CR&T. At the Dt 3rd step, B749 to M558, B124 to HA175, M133 and B919 to HA83 and M224, (underlined in Table 3) played the same roles in their Dts. A similar trend was also observed with other cases in Table 3. Looking at case 1-B, although we already knew that HA291 was very notable for smoking, a single R.Enz. even ‘HA’ itself, provided less accuracy than a combination of R.Enz. The reason for this was thought to be follows: even though HA291 was very effective, HA868, in case 1-B at the Dt 2nd step was comparatively less capable of division than B469 and A87 in the same row. Also case 1-B had no Dt 5th step. This indicated interesting mechanisms, that revealed that a single R.Enz. had insufficient OTU data compared with 2 or 3 combined enzymes or that limited species of bacteria had gathered only in HA291 and that other OTUs of ‘HA’ had fewer relations and were insufficient for identification of smoking. In Table 4 for drinking habit, all the best cases, which had no false identifications, i.e., case 2-A’, 2-C’, 3-B’, 3-C’ and 4-B’, had A47 as their OTU at the Dt 1st step. Concerning their Dt 2nd step, M216 was preferable, but B332 in the case of 2-A’ was replaceable with a combination of OTUs at subsequent steps. In these 5 best cases, none of the cases followed the same pathway before the Dt 3rd step, which seemed to suggest based on comparison that this characteristic, drinking habit, had milder and wider effects on OTUs and more species related to OTUs than smoking. A vigorous OTU like HA291 in Table 3 was not observed in Table 4. In addition, comparing general features between Table 3 and Table 4, the pathways for drinking contained many OTUs at the Dt 5th step and seemed rather packed compared with the pathways for smoking. Generally, if the number of OTUs utilized in Table 3 and Table 4 is considered, the pathways for drinking contained more OTUs, i.e., more related OTUs species, than the pathways for smoking.

Detailed aspects of worse accuracy

In Table 5, the location of falsely identified nodes in the Dt, whether in the ‘Dilute or Concentrated region’ shown in Fig. 3, revealed the features of misleading mechanisms. Most of false location (around 3/4) were situated in Dilute and separated by the Dt 1st step, and this indicated that identification was difficult for them on account of the signatures of the OTUs, for smoking or drinking, being too ambiguous or obscured to distinguish. On the other hand, around 1/4 of the false locations were found in the Concentrated region (Conc.) in Table 5, and this meant that other mechanisms misled pathways compared with those in Dilute. This is, the subsequent OTU species after the Dt 2nd step were not able to distinguish properly due to slight differences in the fitted OTUs in Conc. While, the 2 rare cases of ‘D2/C1’ in Table 5, observed only at drinking table, indicated as the mixed mechanisms described above. Furthermore, Table 5 gave us some additional estimations, that the related OTUs, i.e., group of bacteria, to an assigned characteristic, did not originally exist so much, therefore how the limited species of bacteria were belonged to a certain OTUs were the key and mechanism of these DM identifications, which were also recognized in Table 3 and Table 4. On the other hand, if a plentiful number of related bacteria species did exist, not limited to a small amount, then the locations of false terminal nodes like in Table 5 would be more spread out and blended. Finally, the major results of applying DM analysis to the HIM were identification of subjects with only several closely related OTUs, i.e., groups of bacteria, and the finding of explicit and numerically clear interdependences between the assigned characteristic and OTUs. These were the most remarkable differences compared with the former classification methods, like clustering, correlation coefficients and PCA. The constructed Dt ignored most of the less related OTUs, treating them like noise, which was not possible with any of the former methods. However as already described, identification of OTUs for assigned characteristics was not easily performed with a Dt. Precise and thoughtful preparations are necessary, and in particular, the most important step in preparation is the selection of R.Enz. species. Here we applied only 2 characteristics, smoking and drinking, so recommendations for some sets of R.Enz. species could not be made at this time, but based on Table 1 to Table 5, some advice can be proffered. That is, 2 to 3 sets of R.Enz. species were eligible than single or 4 R.Enz. for 2 nominal partitions of characteristic and for subjects of 100 or so. To obtain clear identifications or the best accuracy, the OTU at the Dt 1st step played the most important role, and the reason why an OTU, e.g., HA291 for smoking, revealed such an activity depended on each characteristic and specialty of the R.Enz., which should be clarified by accumulation of similar analyses and experience in the future. However, it was certain that a combination of several OTUs at subsequent steps of the Dt was required to obtain the best accuracy, actual examples of which are shown in Table 3 and Table 4. As for the OTU species that were situated at the Dt 4th to 5th steps, which were surely less related to the assigned characteristics than those situated at the Dt 1st to 3rd steps, there was some slight doubt concerning the need for further detailed tracing. Thinking about the accuracy of identification with various OTU data, the fundamental algorithms of DM software and unknown symbiotic characters of various uncultured bacteria, which have already been reported [2], this was not something easy to resolve. Some intuitive and practical streamlining focused on clinical applications would be preferable to strict adherence to accuracy with regard to subjects who are borderline. The main reasons were as follows. First, the OTUs themselves were not absolutely firm and stable, and would be affected by peripheral circumstances, e.g., latest meals and drugs, personal factors and living localities of the subjects. Second, the assigned characteristics, smoking or drinking, were not always strictly defined, and so it was possible that they were defined based on personal concepts and that they contained wide intermediate stages. Third, thinking about the future application of Dts for predictive analysis of diseases, such as alimentary disorders, it is expected that identifications and predictions derived from a Dt would support diagnoses and be sufficient to suggest other possibilities. Once a certain Dt is constructed with a well-considered group of subjects, a subsequent new group of similar subjects for which there is less clinical information can be run on the same Dt, and their possibilities of suffering similar disorders can be identified, which will be very effective for new forms of preventive medicine. The HIM is known to be different between individuals, very sensitive and closely related to various physiological characteristics, suggesting that the HIM and OTUs can become a new source or reservoir of health information that can be used to evaluate patients and make predictions using the Dt structure constructed with DM analysis.

7 in total

1. Application of new primer-enzyme combinations to terminal restriction fragment length polymorphism profiling of bacterial populations in human feces.

Authors: Koji Nagashima; Takayoshi Hisada; Maremi Sato; Jun Mochizuki
Journal: Appl Environ Microbiol Date: 2003-02 Impact factor: 4.792

2. Analysis of the human intestinal microbiota from 92 volunteers after ingestion of identical meals.

Authors: J S Jin; M Touyama; R Kibe; Y Tanaka; Y Benno; T Kobayashi; M Shimakawa; T Maruo; T Toda; I Matsuda; H Tagami; M Matsumoto; G Seo; O Chonan; Y Benno
Journal: Benef Microbes Date: 2013-06-01 Impact factor: 4.205

3. Restriction fragment-length polymorphism analysis of 16S rDNA from oral asaccharolytic Eubacterium species amplified by polymerase chain reaction.

Authors: T Sato; M Sato; J Matsuyama; S Kalfas; G Sundqvist; E Hoshino
Journal: Oral Microbiol Immunol Date: 1998-02

4. Dynamics of fecal microbiota in hospitalized elderly fed probiotic LKM512 yogurt.

Authors: Mitsuharu Matsumoto; Mitsuo Sakamoto; Yoshimi Benno
Journal: Microbiol Immunol Date: 2009-08 Impact factor: 1.955

5. Quantitative PCR with 16S rRNA-gene-targeted species-specific primers for analysis of human intestinal bifidobacteria.

Authors: Takahiro Matsuki; Koichi Watanabe; Junji Fujimoto; Yukiko Kado; Toshihiko Takada; Kazumasa Matsumoto; Ryuichiro Tanaka
Journal: Appl Environ Microbiol Date: 2004-01 Impact factor: 4.792

6. Identification of Heavy Smokers through Their Intestinal Microbiota by Data Mining Analysis.

Authors: Toshio Kobayashi; Kenji Fujiwara
Journal: Biosci Microbiota Food Health Date: 2013-04-27

7. Identification of Human Intestinal Microbiota of 92 Men by Data Mining for 5 Characteristics, i.e., Age, BMI, Smoking Habit, Cessation Period of Previous Smokers and Drinking Habit.

Authors: Toshio Kobayashi; Jong-Sik Jin; Ryoko Kibe; Mutsumi Touyama; Yoshiki Tanaka; Yoshiko Benno; Kenji Fujiwara; Masaki Shimakawa; Toshiya Maruo; Toshiya Toda; Isao Matsuda; Hiroyuki Tagami; Mitsuharu Matsumoto; Genichirou Seo; Naoki Sato; Osamu Chounan; Yoshimi Benno
Journal: Biosci Microbiota Food Health Date: 2013-05-15

7 in total

3 in total

1. Characterization of gut microbiota profiles in coronary artery disease patients using data mining analysis of terminal restriction fragment length polymorphism: gut microbiota could be a diagnostic marker of coronary artery disease.

Authors: Takuo Emoto; Tomoya Yamashita; Toshio Kobayashi; Naoto Sasaki; Yushi Hirota; Tomohiro Hayashi; Anna So; Kazuyuki Kasahara; Keiko Yodoi; Takuya Matsumoto; Taiji Mizoguchi; Wataru Ogawa; Ken-Ichi Hirata
Journal: Heart Vessels Date: 2016-04-28 Impact factor: 2.037

2. Technical Aspects of Nominal Partitions on Accuracy of Data Mining Classification of Intestinal Microbiota - Comparison between 7 Restriction Enzymes.

Authors: Toshio Kobayashi; Kenji Fujiwara
Journal: Biosci Microbiota Food Health Date: 2014-05-16

Review 3. Numerical analyses of intestinal microbiota by data mining.

Authors: Toshio Kobayashi; Akira Andoh
Journal: J Clin Biochem Nutr Date: 2018-01-11 Impact factor: 3.114

3 in total