| Literature DB >> 34305880 |
Yu-Hang Zhang1,2, Wei Guo3, Tao Zeng4, ShiQi Zhang5, Lei Chen6, Margarita Gamarra7, Romany F Mansour8, José Escorcia-Gutierrez9, Tao Huang4,10, Yu-Dong Cai1.
Abstract
Type 2 diabetes (T2D) is a systematic chronic metabolic condition with abnormal sugar metabolism dysfunction, and its complications are the most harmful to human beings and may be life-threatening after long-term durations. Considering the high incidence and severity at late stage, researchers have been focusing on the identification of specific biomarkers and potential drug targets for T2D at the genomic, epigenomic, and transcriptomic levels. Microbes participate in the pathogenesis of multiple metabolic diseases including diabetes. However, the related studies are still non-systematic and lack the functional exploration on identified microbes. To fill this gap between gut microbiome and diabetes study, we first introduced eggNOG database and KEGG ORTHOLOGY (KO) database for orthologous (protein/gene) annotation of microbiota. Two datasets with these annotations were employed, which were analyzed by multiple machine-learning models for identifying significant microbiota biomarkers of T2D. The powerful feature selection method, Max-Relevance and Min-Redundancy (mRMR), was first applied to the datasets, resulting in a feature list for each dataset. Then, the list was fed into the incremental feature selection (IFS), incorporating support vector machine (SVM) as the classification algorithm, to extract essential annotations and build efficient classifiers. This study not only revealed potential pathological factors for diabetes at the microbiome level but also provided us new candidates for drug development against diabetes.Entities:
Keywords: feature selection; gut microbiome; machine learning; microbiota biomarkers; support vector machine; type 2 diabetes
Year: 2021 PMID: 34305880 PMCID: PMC8299781 DOI: 10.3389/fmicb.2021.711244
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Entire procedures to investigate the T2D datasets with eggNOG or KO annotations. The dataset is first analyzed by the Max-Relevance and Min-Redundancy method, resulting in a feature list. This list is fed into the incremental feature selection method, incorporating support vector machine as the classification algorithm, to extract essential annotations and construct efficient classifiers.
FIGURE 2IFS curves with support vector machine (SVM) classifiers on different numbers of eggNOG features. (A) IFS curve with an interval of 10, the highest MCC is 0.844 when top 2090 features are adopted. (B) IFS curve with an interval of one, the highest MCC is still 0.844, however, it can be obtained by only using top 2082 features.
FIGURE 3IFS curves with support vector machine (SVM) classifiers on different numbers of KO features. (A) IFS curve with an interval of 10, the highest MCC is 0.687 when top 200 features are adopted. (B) IFS curve with an interval of one, the highest MCC is still 0.687, which is obtained by the same top 200 features.
MCC performance of classifiers with different features.
| eggNOG | 2082 | 0.844 |
| KO | 200 | 0.687 |
FIGURE 4Detailed performance of the optimal support vector machine (SVM) with eggNOG or KO annotations. Except SP, other four measurements are all quite high.
Top annotations from eggNOG or KO databases.
| NOG275679 | eggNOG | S-layer protein | |
| COG4678 | Muramidase (phage lambda lysozyme) | ||
| NOG70379 | ATP-binding protein | ||
| NOG10530 | Hypothetical protein | ||
| COG0810 | TonB-like protein | ||
| K00244 | KO | Fumarate reductase flavoprotein subunit | |
| K14744 | rzpD, prophage endopeptidase | ||
| K03367 | dltA, | ||
| K03201 | virB6, lvhB6,type IV secretion system protein VirB6 | ||
| K01006 | ppdK, pyruvate, orthophosphate dikinase |