| Literature DB >> 33046051 |
Jasminka Hasic Telalovic1, Azra Music2.
Abstract
BACKGROUND: A decade ago, the advancements in the microbiome data sequencing techniques initiated the development of research of the microbiome and its relationship with the host organism. The development of sophisticated bioinformatics and data science tools for the analysis of large amounts of data followed. Since then, the analyzed gut microbiome data, where microbiome is defined as a network of microorganisms inhabiting the human intestinal system, has been associated with several conditions such as irritable bowel syndrome - IBS, colorectal cancer, diabetes, obesity, and metabolic syndrome, and lately in the study of Parkinson's and Alzheimer's diseases as well. This paper aims to provide an understanding of differences between microbial data of individuals who have been diagnosed with multiple sclerosis and those who were not by exploiting data science techniques on publicly available data.Entities:
Keywords: Data science; Machine learning; Microbiome; Multiple sclerosis
Mesh:
Substances:
Year: 2020 PMID: 33046051 PMCID: PMC7549194 DOI: 10.1186/s12911-020-01263-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Telalovic and Kilic [17] Dataset [18] description
| Age | 42.2±9.61 | 49.7±8.50 |
| Male | 6 (14%) | 19 (32%) |
| Female | 37 (86%) | 41 (68%) |
| Body mass index | 26.4±6.3 | 27.2±4.7 |
| Caucasian | 43 | 58 |
| Disease Duration | NA | 12.8±8.3 |
| Untreated | NA | 28 |
Fig. 1The taxonomy tree includes genus archaebacteria identified as statistically significant at p<0.05 along with their phylogenetic tree, used for the development of the classification model. The bolded archaebacteria are the ones that have been identified with high feature importance score by the classifier
Telalovic and Kilic [17] Identified bacteria count per taxonomical level in [18] dataset
| Phylum | 13 | 13 |
| Class | 29 | 26 |
| Order | 47 | 39 |
| Family | 88 | 79 |
| Genus | 192 | 174 |
| Species | 257 | 234 |
Telalovic and Kilic [17] Bacteria with high feature importance score in the classification model using [18] dataset
| Phylum | Euryarchaeota (0.0158), Bacteroidetes (0.0456), Verrucomicrobia (0.0059), Tenericutes (0.0492) |
| Class | Verrucomicrobiae (0.0059), Bacteroidia (0.0458), Methanobacteria (0.016), Mollicutes (0.0491) |
| Order | Verrucomicrobiales (0.0059), Bacteroidales (0.0458), Methanobacteriales (0.016), RF39 (0.0482), bacteria from class Clostridia (0.0181) |
| Family | Verrucomicrobiaceae (0.0059), bacteria from order RF39 (0.0482), Barnesiellaceae (0.0133), |
| Methanobacteriaceae (0.016), bacteria from class Clostridia (0.0181), Paraprevotellaceae (0.034) | |
| Genus | Akkermansia (0.0059), bacteria from family Ruminococcaceae (0.0437), Butyricimonas (0.0359), |
| bacteria from family Barnesiellaceae (0.0133), Methanobrevibacter (0.0159) | |
| Species | Akkermansia muciniphila (0.0059), bacteria from family Ruminococcaceae (0.0437), bacteria from genus Butyricimonas (0.0359), |
| bacteria from family Barnesiellaceae (0.0133), bacteria from genus cc_115 (0.0496) |
Classification accuracy obtained with dataset used for the development of the classification model [18]. The basis for classification was abundance of bacteria on different levels of taxonomy. The discrepancy with [17] is due to the introduction of the cross-validation
| Phylum | 61.90 |
| Class | 64.32 |
| Order | 69.32 |
| Family | 75.16 |
| Genus | 76.82 |
| Species | 53.44 |
Bacteria with high feature importance score in the classification model developed using the dataset from [18]
| 2nd hierarchical level | Signaling Molecules and Interaction, Amino Acid Metabolism, Excretory System, Lipid Metabolism, |
| Genetic Information Processing, Nervous System, Energy Metabolism | |
| 3rd hierarchical level | Carotenoid biosynthesis, Influenza A, Glycosyltransferases, Basal transcription factors, Biosynthesis of unsaturated fatty acids, |
| Caprolactam degradation, Signal transduction mechanisms, Flavonoid biosynthesis, Caffeine metabolism, | |
| Chloroalkane and chloroalkene degradation, Non-homologous end-joining, Hepatitis C, | |
| Chagas disease (American trypanosomiasis), Butirosin and neomycin biosynthesis, Chlorocyclohexane and chlorobenzene | |
| degradation, Phenylalanine, tyrosine and tryptophan biosynthesis, Ubiquinone and other terpenoid-quinone biosynthesis, | |
| Vibrio cholerae infection, Nitrotoluene degradation, Steroid hormone biosynthesis, Aminoacyl-tRNA biosynthesis, Steroid | |
| biosynthesis, Bacterial toxins, Novobiocin biosynthesis, Phenylalanine metabolism, Pantothenate and CoA biosynthesis, | |
| Meiosis – yeast, Cell cycle |
Classification accuracy obtained with dataset used for the development of the classification model [18]. The basis for classification was different hierarchical levels of KEGG pathways
| 2nd hierarchical level | 62.03 |
| 3rd hierarchical level | 70.95 |
Classification accuracy obtained with validation datasets [19, 26]. The basis for classification was abundance of bacteria on different levels of taxonomy
| Phylum | 52.78 |
| Class | 58.33 |
| Order | 63.89 |
| Family | 69.44 |
| Genus | 75% |
| Species | 51.56 |
Classification accuracy obtained with validation datasets [19]. The basis for classification was abundance of bacteria on different levels of taxonomy. That dataset contains samples of individuals with MS disease
| Phylum | 55.56 |
| Class | 61.11 |
| Order | 66.67 |
| Family | 72.22 |
| Genus | 77.78 |
Classification accuracy obtained with validation datasets [26]. The basis for classification was abundance of bacteria on different levels of taxonomy. The dataset contains samples from individuals that self reported as healthy
| Phylum | 50.00 |
| Class | 55.56 |
| Order | 61.11 |
| Family | 66.67 |
| Genus | 72.22 |
Classification accuracy obtained with validation datasets [19, 26]. The basis for classification was different hierarchical levels of KEGG pathways
| 2nd hierarchical level | 58.83 |
| 3rd hierarchical level | 67.24 |
Fig. 2The confusion matrix visualises the number of samples that were correctly (green background) and incorrectly (pink background) classified. The higher the taxonomy resolution, the more accurate classification results were obtained
Comparison of results of this study with other MS studies; ↑ indicates that MS samples have statistically significant increase in abundance of a bacteria and ↓ indicates that MS samples have statistically significant decrease in abundance of a bacteria (* indicates that results are not statistically significant); green color indicates agreement of our results and other MS studies; orange color indicated disagreement of our results and previous MS studies; when our results are in black color, we did not have an MS study to compare those results with
| Phylum | Euryarchaeota | ||||||||
| Phylum | Bacteroidetes | ||||||||
| Phylum | Verrucomicrobia | ||||||||
| Phylum | Tenericutes | ||||||||
| Phylum | Firmicutes | ||||||||
| Phylum | Actinobacteria | ||||||||
| Phylum | Proteobacteria | ||||||||
| Phylum | Fusobacteria | ||||||||
| Family | Methanobacteriaceae | ||||||||
| Family | Verrucomicrobiaceae | ||||||||
| Family | uncultured (Costridium) | ||||||||
| Family | Barnesiellaceae | ||||||||
| Family | Paraprevotellaceae | ||||||||
| Family | Uncultured (RF39) | ||||||||
| Family | Lachnospiraceae | ||||||||
| Family | Bacteroidaceae | ||||||||
| Genus | Methanobrevibacter | ||||||||
| Genus | Desulfovibrio | ||||||||
| Genus | Anaerofustis | ||||||||
| Genus | Akkermansia | ||||||||
| Genus | Butyricimonas | ||||||||
| Genus | Uncultured (Ruminococcaceae) | ||||||||
| Genus | Uncultured (RF39) | ||||||||
| Genus | Ruminococcus | ||||||||
| Genus | Bifidobacterium | ||||||||
| Genus | Faecalibacterium | ||||||||
| Genus | Prevotella | ||||||||
| Genus | Streptococcus | ||||||||
| Genus | Acinetobacter | ||||||||
| Genus | Parabacteroides | ||||||||
| Genus | Bilophila | ||||||||
| Genus | Christensenellaceae | ||||||||
| Genus | Bacteroides | ||||||||
| Genus | Anaerostipes | ||||||||
| Genus | Pseudomonas | ||||||||
| Genus | Mycoplana | ||||||||
| Genus | Haemophilus | ||||||||
| Genus | Dorea | ||||||||
| Species | Methanobrevibacter smithii | ||||||||
| Species | Akkermansia muciniphila | ||||||||
| Species | Butyricimonas virosa |
Comparison of results with other MS studies; ↑ indicates that MS samples have statistically significant increase in predictive metabolic function and ↓ indicates that MS samples have statistically significant decrease in predictive metabolic function; green color indicates agreement of our results and other MS studies; we did not have an MS study to compare with our results in black color
| [ | |||
|---|---|---|---|
| 2nd | Energy metabolic functions | ||
| 2nd | Excretory system functions | ||
| 2nd | Signal transduction mechanisms | ||
| 2nd | Replication and repair functions | ||
| 2nd | Amino acid metabolism | ||
| 2nd | lipid metabolism | ||
| 2nd | Inorganic ion transport and metabolism | ||
| 2nd | Unknown functions | ||
| 3rd | Chromosome functions | ||
| 3rd | Peptidases functions | ||
| 3rd | Homologous recombination functions | ||
| 3rd | DNA replication | ||
| 3rd | Peroxisome and cyan amino acid metabolism | ||
| 3rd | Vitamin B6 metabolism | ||
| 3rd | |||
| 3rd | Inorganic ion transport and metabolism | ||
| 3rd | Mismatch repair functions | ||
| 3rd | Galactose metabolism | ||
| 3rd | Steroid hormone biosynthesis | ||
| 3rd | Tuberculosis functions | ||
| 3rd | Bacterial secretion system | ||
| 3rd | Influenza A | ||
| 3rd | Valine, leucine and isoleucine biosynthesis | ||
| 3rd | Hepatitis C | ||
| 3rd | Cell motility and secretion |