Literature DB >> 25003020

Applying Data Mining to Classify Age by Intestinal Microbiota in 92 Healthy Men Using a Combination of Several Restriction Enzymes for T-RFLP Experiments.

Toshio Kobayashi¹, Takako Osaki², Shinya Oikawa³.

Abstract

The composition of the intestinal microbiota was measured following consumption of identical meals for 3 days in 92 Japanese men, and terminal restriction fragment length polymorphism (T-RFLP) was used to analyze their feces. The obtained operational taxonomic units (OTUs) and the subjects' ages were classified by using Data mining (DM) software that compared these data with continuous data and for 5 partitions for age divided at 5 years intervals between the ages of 30 and 50. The DM provided Decision trees in which the selected OTUs were closely related to the ages of the subjects. DM was also used to compare the OTUs from the T-RFLP data with seven restriction enzymes (two enzymes of 516f-BslI and 516f-HaeIII, two enzymes of 27f-MspI and 27f-AluI, three enzymes of 35f-HhaI, 35f-MspI and 35f-AluI) and their various combinations. The OTUs delivered from the five enzyme-digested partitions were analyzed to classify their age clusters. For use in future DM processing, we discussed the enzymes that were effective for accurate classification. We selected two OTUs (HA624 and HA995) that were useful for classifying the subject's ages. Depending on the 16S rRNA sequences of the OTUs, Ruminicoccus obeum clones 1-4 were present in 18 of 36 bacterial candidates in the older age group-related OTU (HA624). On the other hand, Ruminicoccus obeum clones 1-33 were present in 65 of 269 candidates in the younger age group-related OUT (HA995).

Entities: Chemical Disease Species

Keywords: Key wordshuman intestinal microbiota; Ruminicoccus obeum; classification of age; data mining analysis; decision tree; operational taxonomic unit; terminal restriction fragment length polymorphism

Year: 2014 PMID： 25003020 PMCID： PMC4081184 DOI： 10.12938/bmfh.33.65

Source DB: PubMed Journal: Biosci Microbiota Food Health ISSN： 2186-3342

INTRODUCTION

The human intestinal microbiota (HIM) is closely related to our health, and its relationship with the human immune system and diseases is now being widely researched to obtain useful information. One of the barriers to further progress was thought to be the ability to quantitatively analyze HIM data, such as terminal restriction fragment length polymorphism (T-RFLP) data from feces, which are presented as operational taxonomic units (OTUs). Bicluster analysis, principal component analysis, and analysis with correlation coefficients have been applied to OTU data, but the results have not been clear, and it has been difficult to convince the community of the merits of this approach. Kobayashi successfully introduced the application of Data mining (DM) to classification of the relationships between characteristics and OTUs, and the results were reported in a series of papers [1,2,3]. Another barrier to understanding the detailed relationship between the components of the HIM is the effect of the daily dietary habits of subjects; these usually vary between individuals. It is difficult to unify the dietary factors with many subjects, and the process requires much effort and cooperation. Another common barrier to clarifying the HIM is the need to gather T-RFLP data for many different restriction enzymes in order to compare them and select a suitable enzyme system for DM. Fortunately, our previous studies [1,2,3] and by Jin et al. [4] have provided T-RFLP data for the same subjects. This overcomes the barriers, and the data obtained with seven restriction enzymes were particularly valuable for further DM analysis. This paper focuses on the ages of the subjects in order to precisely clarify the hidden relationship of HIM with age.

MATERIALS AND METHODS

As discussed previously [1,2,3,4], it is important to avoid the influence of dietary factors. Thus, we designed identical meals (1,879 kcal/day), which were fed for 3 days to 92 healthy male volunteers living in Japan. The ages of the subjects ranged from 21 to 59 years (average: 36.8), and their body mass indexes (BMI) ranged from 17.3 to 30.2 kg/m2 (average: 22.6). The cumulative frequencies for age in the 92 men are shown in Fig. 1. T–RFLP with seven restriction enzymes was used to analyze the fecal samples [2, 4]. The reasons for using T–RFLP analysis were its reproducibility, relatively low cost, and the convenience of using it with DM processing due to similar numbers of subjects and OTUs. Written informed consent was obtained from each participant prior to enrolment, and the study was performed in accordance with the protocol approved by the RIKEN Research Ethics Committee (Wakou 2009-3rd 21-13). The OTUs were accumulated by the Benno Laboratory, RIKEN, Japan.

Fig. 1.

Cumulative frequency of the ages of the 92 Subjects

Each dot represents a subject.

Cumulative frequency of the ages of the 92 Subjects Each dot represents a subject. The procedure for extraction of DNA from feces has been described previously [2, 4], and the OTU data were analyzed in a similar manner [1,2,3,4]. Fluorescence intensity was used to measure the amount of each OTU. The OTU data was abbreviated as B--- (where--- represents the base-pair number) for 516f-BslI, HA--- for 516f-HaeIII, M--- for 27f-MspI, A--- for 27f-AluI, QHh--- for 35f-HhaI, QM--- for 35f-MspI, and QA--- for 35f-AluI. We had two groups of OTUs: one was 516f- + 27f-, treated with four restriction enzymes, and the other was 35f-, treated with three restriction enzymes. The component numbers of these seven restriction enzymes were 27·B, 33·HA, 20·M, 40·A, 31·QHh, 34·QM, and 48·QA. When all enzyme components were combined for each group, the 516f- + 27f- group had a maximum of 120 OTUs and 35f- group had 113 OTUs. In order to balance the number of OTUs (120 or 113) with the number of subjects (92) and to avoid the problem of field alignment sequences (which were reported in our previous papers [2, 3]), we did not mix the data of the two OTU groups, i.e., 516f- + 27f- and 35f-. The data from T-RFLP with the various restriction enzymes were combined with the written answers to questions of the 92 subjects. The resulting two-dimensional Excel data sets were analyzed by DM software (Clementine14, IBM-SPSS). Pearson correlation coefficients were obtained using other software (Statistics17.0, IBM-SPSS). Each OTU was matched with an accession number using a University of Idaho database [5]. The sequence of 16S rRNA was determined by the accession number derived from the National Center for Biotechnology Information (NCBI). The nearest bacterial species (≥97% identity) was identified by BLAST searches from an available genomic databank. Phylogenic tree was obtained using Mega#5 [6, 13] in comparison with these sequences.

RESULTS

As a target characteristic for DM, we considered age in two different ways. The first way was to simply apply the ages of the subjects (continuance), and the second was to divide the ages into two nominal partitions (2-NP), in which a boundary age was used to divide subjects into two age regions. The reason why we did not use three or more nominal partitions was that we wanted better accuracy, and our experiences repeatedly was that larger numbers of partitions reduced accuracy. We applied a classification and regression tree (C&RT), which is the most common DM processing algorithm. It provides a Decision tree (Dt) that can be evaluated and can distinguish the effective OTUs for constructing a Dt. The C&RT optimizes the Gini coefficient to divide the subjects into two subsets according to the OTUs. The result is that subjects within each subsequent subset are more homogeneous than in the previous subset. The C&RT system is flexible and allows for the consideration of unequal misclassification costs, as compared to other DM processing algorithms. Major advantages are that C&RT is reproducible and delivers only a single selected OTU for each step of constructing the Dt.

DM with unpartitioned age

DM can be applied to a continuous target or one that has been divided into a few nominal partitions. The resulting Dts are similar but have notable differences. The former is more delicate and has a more finely divided structure than the latter. The first three steps of the Dt are shown in Fig. 2, which explicitly classifies the clusters of subjects ages, i.e., the nodes. The root node is the starting point of the tree construction, and the Dt grows from left to right, dividing the subjects appropriately according to age. The details of the Dt and the pathway to reach the Terminal nodes clearly indicate the names and the quantities of the OTUs that are applied to divide the branches. The default setting of the software results in five steps of the Dt; this can be changed, but we focused here only on the first three steps. This was because further steps had less effect on the age and were easily affected by the upstream OTUs. If we counted the numbers of nodes at the 3rd step of the Dt (Dt 3rd step, see Fig. 2), there were 8 nodes. But at the Dt 5th step, there were 20 nodes for the narrow age range of 1 to 3 years; these were more complex than was required for our present purpose.

Fig. 2.

Dt obtained by DM with unpartitioned age

Node-0: starting point of Dt construction. Node-1 to Node-14: subject groups divided by DM processing and classification. n: number of subjects. Average: average age at each node. σ: standard deviation of age at each node. Node-0 was divided into Node-1 and Node-2 by HA624 at 9.32 with the optimization of the Gini coefficient, and similar steps were repeated for constructing the Dt.

Dt obtained by DM with unpartitioned age Node-0: starting point of Dt construction. Node-1 to Node-14: subject groups divided by DM processing and classification. n: number of subjects. Average: average age at each node. σ: standard deviation of age at each node. Node-0 was divided into Node-1 and Node-2 by HA624 at 9.32 with the optimization of the Gini coefficient, and similar steps were repeated for constructing the Dt.

DM with 2 nominal partitions (2-NP)

We considered various “boundary age” within the range of the subjects’ ages (21 to 59), and we chose five different options (at 5 year intervals) for dividing the 92 subjects into partitions, as shown in Table 1. The smallest group were not having less than 10 subjects, which was approximately 10% of the data. The Dt structure for 2-NP, partitioned at 40/41, is shown in Fig. 3, which shows the tree structure until Dt 5th step. One could easily understand the fundamental differences between this Dt and the one in Fig. 2, which used unpartitioned age for DM processing. Terminal nodes appeared earlier with nominal partitioning, as shown in Fig. 3, and the Dt structure became much more simple than that shown in Fig. 2.

Table 1.

Five different partitions for the 92 subjects by age

Fig. 3.

Dt obtained by DM with 2-NP partitioned at 40/41

The 92 subjects were divided at Node-1 and Node-2 by HA323, with a Gini coefficient cutoff value of 2.86. The following divisions were made in an analogous way. The numbers of subjects at each node are shown. The arrow at Node-19 indicates that a subject was falsely classified; this was subject #21, who was 55 years of age. His OTUs corresponded to those of someone younger than 40. As for Terminal nodes, there are 11; there are four nodes for the younger group, which are shaded for the age range of 21-40, and there are 7 nodes for older group, which are not shaded for age range. Four of these nodes contain only one subject.

Dt obtained by DM with 2-NP partitioned at 40/41 The 92 subjects were divided at Node-1 and Node-2 by HA323, with a Gini coefficient cutoff value of 2.86. The following divisions were made in an analogous way. The numbers of subjects at each node are shown. The arrow at Node-19 indicates that a subject was falsely classified; this was subject #21, who was 55 years of age. His OTUs corresponded to those of someone younger than 40. As for Terminal nodes, there are 11; there are four nodes for the younger group, which are shaded for the age range of 21-40, and there are 7 nodes for older group, which are not shaded for age range. Four of these nodes contain only one subject.

Pearson correlation coefficients

To determine the correlation with age, we analyzed a total of six ways of partitioning the data: unpartitioned age and five partitions with different boundary ages, as shown in Table 1. To compare DM with another common statistical method for OTUs, we considered the results with Pearson correlation coefficients. For the 120 OTUs obtained with the 516f- + 27f- restriction enzymes, i.e., 27·B, 33·HA, 20·M and 40·A, the obtained results are shown in Table 2. As is well known, Pearson coefficients can be positive or negative, and the top 10 of both ends are shown here. In this table, some sets of OTUs are written in bold letters; these were the ones that appeared with the same partitions until the Dt 3rd step. Namely, two OTUs, HA995 and HA624 in the top row for “continuance” in Table 2, are also found in Fig. 2. Another five OTUs in Fig. 2, which were used to construct the Dt, i.e., A47, M216, M485, B124, and HA868, are not in the same row in Table 2. In other words, these five OTUs had correlation coefficients that were lower than those of the top 10 of both sides, and these are shown in the right-end column in Table 2. The number of the OTUs until the Dt 3rd step, i.e., 5, is shown in the top row. We also had a similar table of Pearson coefficients for the 113 OTUs obtained with the 35f- restriction enzymes system, but it was not included here to save space.

Table 2.

Pearson correlation coefficients for the 120 OTUs obtained with the 516f- + 27f- restriction enzymes

Comparison between the various DM results and comparison of those with Pearson coefficients

The various DM results are summarized in Table 3, which also shows Pearson coefficients for comparison. Regarding the restriction enzymes in Table 3, all were the same for 516f- + 27f- and 35f- and were compared between “continuance” and “2-NP”. DM with “continuance” did not estimate the age but simply delivered the average age at nodes, so “N. of falsely classified subjects” was not given. The Dt 1st step to the Dt 3rd step indicate how to compare the abbreviated Dt structures, such as those in Figs. 2 and 3. “False nodes-x” were indicated to show how false classification took place and to highlight the shapes of the detailed Dt structure. The bold letters in the Dt 1st step to the Dt 3rd step reveal that these OTUs had Pearson coefficients that were among top 10 that were positive or negative; these are also shown in Table 2.

Table 3.

Comparison of DM results and details of major related OTUs for continuance and five partitions by age

Similarly, Table 4 presents a comparison of the different combinations of restriction enzymes that could be applied for effective and accurate DM processing in the future. Age is one of the fundamental characteristics of life, and so it is important to know which restriction enzymes are the most effective for classifying the ages of subjects. The notation used in Table 4 is the same as that used in Table 3, and 2 rows, “Partition at DM” and “Age area”, are shown to make comparison with Table 3 easier.

Table 4.

Further comparison of DM results and details of major related OTUs for different combinations of restriction enzymes with the same 2-NP and categories of age

Tracing the personal features of subjects

DM provides a general classification of subjects, but it is also interesting to know how personal differences could be applied to determine age. We considered the subjects who were falsely classified by the DM processing. The results are shown in Table 5. In the former figures and tables, the results with 35f- restriction enzymes were skipped because the number of false classifications was larger than in the 516f- + 27f- group. But here we saw that those results could be applied for determining the differences between individuals, as shown in the lower half of Table 5. The column “subject’s #” indicates the numbers assigned individually to each of the 92 subjects.

Table 5.

Detailed personal tracing of false subjects by DM (Table 3)

DISCUSSION

In Fig. 2, we determined 7 out of the 120 OTUs that were closely related to age in the subjects between 21 and 59 years of age. These OTUs were HA624, HA995, A47, M216, M485, B124, and HA868. We carried out the same DM processing that we reported in a previous paper [2], and the results were the same. However, in Fig. 2, we presented the detailed structure of the Dt with cutoff values that divided the OTUs and the classified number of subjects at each node. At Node-9, 48 subjects, i.e., 52.2% of the 92 subjects, were notably different. The average age at Node-9 was 31.2, which meant that Node-9 consisted mostly of younger subjects. The pathway to reach Node-9, i.e., HA624, HA995, and M485, could also be used to classify younger subjects. The next node of major separation was Node-7 in Fig. 2, which had 19 subjects, 20.7% of the 92 subjects, and had an average age of 42.5. Node-7 was thought to represent middle age. Although not major nodes, Node-5 and Node-8 were notable. The former, Node-5, was located at the Dt 2nd step, and it represented older subjects. Node-8 had the youngest average age of all the nodes. The other younger subjects, e.g., those 21–26 year of age, that were not separated at Node-8 were seated at Node-9. There were 9 subjects who were 26 years old, which was the mode of the 92 subjects, but 2 of them were separated and belonged to Node-8. Most nodes between Node-7 and Node-14, except for Node-8 and Node-9, represented older groups. Thus, we saw that aging provided a diversity of OTUs. The pathways to reach these nodes and their cutoff values were very different, and the reasons for this will be partly clarified later. For better and precise understanding of the OTUs related to age, we designed five partitions with 2-NP for the 92 subjects. This was because, as shown in Fig. 2, it was difficult to distinguish between ages and to clarify the progress of aging, using only a simple Dt structure like Fig. 2. Comparing Figs. 2 and 3, although the target characteristic was the same (age), one was unable to find the same OTU. HA323, which is present in Fig. 3 and most closely related to age partitioned at 40/41, is not present in Fig. 2. For a more universal understanding of the influence of the age, we compared DM processing with unpartitioned age (i.e., 21–59 years of age, see Fig. 2) with DM processing with partitioned age (e.g., divided at 40/41, see Fig. 3). No relation between these similar instances of DM was observed. As described above, the Dt with 2-NP provided a much simpler structure that had more terminal nodes at the second step and later, as shown in Fig. 3. Until the Dt 5th step, there were a total of 11 Terminal nodes. Four of these nodes were for the younger subjects, aged 21–40, namely, Node-7, Node-14, Node-17 and Node-19, as indicated by shading. On the other hand, there were 7 Terminal nodes for the older subjects, aged 41–59, one subject was seated in 4 nodes (if including Node-19, then 5 nodes). When these nodes were compared, the differences in the divided contents were very clear. Comparison of these two partitions along the DM pathway revealed convincingly that aging provided the diversity of the OTUs.

Comparison with Pearson correlation coefficients

Not only Pearson coefficients, but also bicluster and principal component analysis have been used for HIM and OTU data [4, 7,8,9] to classify various characteristics of subjects, but no general conclusions have been made regarding any related peculiarities. These were individual and local case studies, and were not universal and applicable to other similar data. Here, we applied Pearson correlation coefficients as a representative of the existing types of analysis. Rather wide coverage was achieved with 20 OTUs selected out of 120 for the 516f- + 27f- restriction enzymes, but the bolded OTUs in Table 2, which appeared for both Pearson coefficients and DM, did not cover too much. This indicated that the methods were fundamentally different. DM was usually applied with many OTUs species that had smaller Pearson coefficients. This was due to the following two major differences between them. The first is that Pearson coefficients cover the data for all the OTUs of the 92 subjects including various noises, while DM focuses on a limited number of subjects. This can be easily seen in the pathways in Figs. 2 and 3. For example, in Fig. 2, the 92 subjects were first divided by HA624, which had the highest positive Pearson coefficient, but the pathways divided by HA995 and A47 were focused on limited number of subjects, that is, HA995 was focused on 77 subjects and A47 was focused on 15 subjects. The DM processing was performed in a similar manner, with limited or partial optimization of the records, i.e., subjects, and applying fewer influences of noises. The second major differences between Pearson coefficients and DM were the detailed features of the OTU data. Figure 4 shows an example of the major OTUs related to age. Four OTUs, i.e., B955, HA323, HA624 and M295, were selected from Tables 2 and 3. With these scatter diagrams, we were unable to determine a linear relationship between age and OTUs. This is necessary, however, to confirm the validity of Pearson coefficients. In contrast, DM does not require the relationship to be linear. Finally, the OTU data contained such technical problems; consequently, it was necessary to introduce a new method of analysis.

Fig. 4.

Scatter diagram of major OTUs for age

Data features for 4 major OTUs related to age, which are shown at the Dt 1st step in Table 3.

Scatter diagram of major OTUs for age Data features for 4 major OTUs related to age, which are shown at the Dt 1st step in Table 3.

Tracing the personal feature of subjects

As shown in Table 5, we traced the few subjects who were repeatedly misclassified. Subject #21, aged 55, appeared 3 times in this table. In the lower half of Table 5, there were 5 subjects in the 36f- group who appeared twice: #23, #53, #62, #67 and #77. These 6 subjects were easily misclassified by the OTUs. The column for “False nodes” indicates that the forms of misclassifications that occurred at the cited nodes. For instance, “1-(44)” in the first row of Table 5 indicates that one younger subject (i.e., subject #45, aged 27) was misclassified and that 44 of the older subjects were correctly categorized. Similarly, “(50)-1” in the third row of Table 5 indicates that 50 subjects in the younger group contained one false subject, who was an older subject (i.e., #21, aged 55); this subject is shown in Fig. 3 at Node-19 with an arrow. According to the 2-NP for this DM processing, misclassification could occur in only the following two situations. Namely, an older subject was misclassified as belonging to a younger group, or a younger subject was misclassified as belonging to an older group. Of the 6 subjects who were falsely categorized by DM (as shown in Table 5), all who were misclassified more than 2 times were the former type, i.e., their OTUs seemed to be younger than their actual age; they were said to have a young intestine. Although some younger subjects were misclassified as older, this was an anomaly and was related to only certain OTUs. We noted that this meant increasing age leads to a wider diversity of OTUs, and so accurate classification of older subjects is more difficult than it is for younger ones.

Comparison between the DM results

Various DM results are comprehensively shown in Tables 3 and 4, with Table 3 showing DM results with the same restriction enzymes, i.e., 27B, 33HA, 20 M and 40A and 31QHh, 34QM and 48QA, and all OTUs of the 516f- + 27f- and 35f- groups. Table 4 shows the same “Age area” but different combinations of restriction enzymes. In Table 3, we can see how the OTUs change with age by examining the changes between the 5 partitions of age between 30 and 50. At the first step of the 516f- + 27f- group, A199, B955, HA323, M316 and M295 were clearly remarkable OTUs for these 5 partitions. Similarly, in the 35f- group, QHh574, QHh377 and QHh555 played the same role but were less definite than the former case. According to the functions of restriction enzymes, we needed to know if an OTU resulted in overlapping classification of bacteria, and this seemed apparent with QHh555. The changes in OTUs with age can be seen clearly in Table 3. Thus, if we had another similar set of OTU data for which the ages of the subjects were not known, we would be able to estimate their ages using the Dt structures in Table 3 in a so-called predictive analysis. A similar predictive analysis will be applied in the future for various diseases as a kind of easy and preliminary diagnosis. The main purpose of showing Table 4 was to demonstrate and compare the effects of suitable combinations of restriction enzymes for further DM processing. Finally, as we already reported [2, 3], the 35f- group was less effective than 516f- + 27f-. Also, 33HA, i.e., 516f-HaeIII, is the most effective restriction enzyme to apply in future DM to determine the ages of subjects; this is a little different from other characteristics, i.e., smoking and drinking habits, which were reported in our previous papers [2, 3]. We confirmed that the best choice of restriction enzymes depends on the property to be determined, e.g., age, smoking, or drinking habits.

Determination of closely related bacterial species by BLAST search

We compared our results for HA624 with those of Mitsuoka [6] in the left-hand column (516f- + 27f- groups) in Table 3. According to Fig. 2, HA624 clearly divided age at the Dt 1st step, and one can easily understand that the concentrated region of HA624 indicated the older group of subjects, i.e., Node-2 and its downstream. We traced this gene to determine its accession number. With Microbial Community Analysis III of the University of Idaho [5], simple tracing of HA624 produced 724 bacterial 16S rDNA gene sequences. Comparing HA624 with 3 other restriction enzymes (BslI, 27f-MspI, and 27f-AluI), a closely related OTU was found by DM processing. These OTUs might have been included in the same bacteria. The obtained lists from the University of Idaho were scanned and crosschecked for the same accession number, and 36 candidates were determined. However, all candidates of the OTUs were identified as uncultured bacteria, and 16S rRNA sequences of the candidates were searched by BLAST to find closely related bacterial species. All these bacteria were gram-positive facultative anaerobes. After alignment analysis, a phylogram of an OTU (HA624) was obtained, which is shown in the Appendix Fig. A1 and abbreviated in the Appendix Table 1 to save space. We detected R. obeum, Ruminococcus sp. DJF VR66 and R. obeum clone 1-4 as closely related to 23 of 36 candidates by phylogenic analysis. Furthermore, R. obeum, Ruminococcus sp. DJF VR66, R. obeum clone 1-4, Lachnospiraceae, butyrate-producing bacterium, Ruminococcus sp. WAL 17306, Blautia sp., Clostridium symbiosum, Eubacterium contortum, Pseudobutyrivibrio ruminis, Butyrivibrio sp., Ruminococcus sp. and E. rectal were obtained with more than 97% identity. The primary structure of 16S rRNA is easier to determine than hybridization between DNA strands, and the strength of sequence analysis is that it can identify the level at which DNA pairing studies need to be performed, which certainly applies to similarities of 97% and higher [10].

Table 1

Appendix Bacterial candidates of OTUs (A,HA624; B,HA955) by BLAST search of 16S rRNA gene sequences

For the younger group, the concentrated OTU was observed at Node-4 in Fig. 2, which was divided by HA995. With Microbial Community Analysis III of the University of Idaho [5], simple tracing of HA995 produced 4,551 bacterial 16S rDNA gene sequences. Comparing HA995 with 2 other restriction enzymes, i.e., BslI and 27f-AluI, a closely related OTU was found by DM processing. After similar processing was performed with 3 OTUs as in the case of HA624, 269 candidates with the same accession number were obtained. In the phylogram of an OTU (HA995) shown in Appendix Fig. A2 and in the Appendix Table 1, R. obeum clone 1-33 was detected as closely related to 65 of 265 candidates. Furthermore, B. wexlerae, Ruminococcus sp. K-1, Blautia luti, Ruminococcus sp. DJF VR70k1, Ruminococcus sp. DJF VR52, R. obeum, Clostridiales bacterium, Lachnospiraceae bacterium G41, Lachnospiraceae bacterium RM44, Firmicutes sp. oral clone CK051, Johnsonella sp. oral taxon, Lachnospiraceae bacterium G11, Clostridium leptum and Dorea longicatena were found with more than 97% identity.

We compared our results with Mitsuoka’s report of increasing rates with aging, such as in the case of C. perfringens, Lactobacillus, Enterobacteriaceae, and Streptococcus [6]. As shown in the Appendix Table 1, butyric acid-producing bacteria were found in the intestines of older subjects, and this was also observed in our study. This might be one of the mechanism of the aging process that leads to the OTU. Benno et al. [11] examined fecal microbiota of elderly persons in rural and urban areas of Japan, and reported that Ruminicoccus sp. was isolated from elderly persons in both areas. We could find the anaerobes easily by data-base analysis. Ruminicoccus sp. is one of the strict anaerobes requiring special methods for culture. We found R. obeum in two OTUs (HA624 and HA955), however, R. obeum clone 1-33 is classified to a different cluster than R. obeum clone 1-4 in the phylogenetic analysis of bacterial 16S rRNA gene sequences [12]. R. obeum clone 1-33 might be changed to clone 1-4 during aging. Further studies are needed to clarify this phenomenon.

8 in total

1. Phylogenetic analysis of anaerobic psychrophilic enrichment cultures obtained from a greenland glacier ice core.

Authors: Peter P Sheridan; Vanya I Miteva; Jean E Brenchley
Journal: Appl Environ Microbiol Date: 2003-04 Impact factor: 4.792

2. Multicenter analysis of fecal microbiota profiles in Japanese patients with Crohn's disease.

Authors: Akira Andoh; Hiroyuki Kuzuoka; Tomoyuki Tsujikawa; Shiro Nakamura; Fumihito Hirai; Yasuo Suzuki; Toshiyuki Matsui; Yoshihide Fujiyama; Takayuki Matsumoto
Journal: J Gastroenterol Date: 2012-05-11 Impact factor: 7.527

3. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.

Authors: Koichiro Tamura; Daniel Peterson; Nicholas Peterson; Glen Stecher; Masatoshi Nei; Sudhir Kumar
Journal: Mol Biol Evol Date: 2011-05-04 Impact factor: 16.240

4. Comparison of the fecal microbiota profiles between ulcerative colitis and Crohn's disease using terminal restriction fragment length polymorphism analysis.

Authors: Akira Andoh; Hirotsugu Imaeda; Tomoki Aomatsu; Osamu Inatomi; Shigeki Bamba; Masaya Sasaki; Yasuharu Saito; Tomoyuki Tsujikawa; Yoshihide Fujiyama
Journal: J Gastroenterol Date: 2011-01-21 Impact factor: 7.527

5. Analysis of the human intestinal microbiota from 92 volunteers after ingestion of identical meals.

Authors: J S Jin; M Touyama; R Kibe; Y Tanaka; Y Benno; T Kobayashi; M Shimakawa; T Maruo; T Toda; I Matsuda; H Tagami; M Matsumoto; G Seo; O Chonan; Y Benno
Journal: Benef Microbes Date: 2013-06-01 Impact factor: 4.205

6. Comparison of fecal microflora of elderly persons in rural and urban areas of Japan.

Authors: Y Benno; K Endo; T Mizutani; Y Namba; T Komori; T Mitsuoka
Journal: Appl Environ Microbiol Date: 1989-05 Impact factor: 4.792

7. Identification of Heavy Smokers through Their Intestinal Microbiota by Data Mining Analysis.

Authors: Toshio Kobayashi; Kenji Fujiwara
Journal: Biosci Microbiota Food Health Date: 2013-04-27

8. Identification of Human Intestinal Microbiota of 92 Men by Data Mining for 5 Characteristics, i.e., Age, BMI, Smoking Habit, Cessation Period of Previous Smokers and Drinking Habit.

Authors: Toshio Kobayashi; Jong-Sik Jin; Ryoko Kibe; Mutsumi Touyama; Yoshiki Tanaka; Yoshiko Benno; Kenji Fujiwara; Masaki Shimakawa; Toshiya Maruo; Toshiya Toda; Isao Matsuda; Hiroyuki Tagami; Mitsuharu Matsumoto; Genichirou Seo; Naoki Sato; Osamu Chounan; Yoshimi Benno
Journal: Biosci Microbiota Food Health Date: 2013-05-15

8 in total

2 in total

1. Characterization of gut microbiota profiles in coronary artery disease patients using data mining analysis of terminal restriction fragment length polymorphism: gut microbiota could be a diagnostic marker of coronary artery disease.

Authors: Takuo Emoto; Tomoya Yamashita; Toshio Kobayashi; Naoto Sasaki; Yushi Hirota; Tomohiro Hayashi; Anna So; Kazuyuki Kasahara; Keiko Yodoi; Takuya Matsumoto; Taiji Mizoguchi; Wataru Ogawa; Ken-Ichi Hirata
Journal: Heart Vessels Date: 2016-04-28 Impact factor: 2.037

Review 2. Numerical analyses of intestinal microbiota by data mining.

Authors: Toshio Kobayashi; Akira Andoh
Journal: J Clin Biochem Nutr Date: 2018-01-11 Impact factor: 3.114

2 in total