| Literature DB >> 29250541 |
Shao-Wu Zhang1, Xiang-Yang Jin1, Teng Zhang1.
Abstract
Next generation sequencing technologies used in metagenomics yield numerous sequencing fragments which come from thousands of different species. Accurately identifying genes from metagenomics fragments is one of the most fundamental issues in metagenomics. In this article, by fusing multifeatures (i.e., monocodon usage, monoamino acid usage, ORF length coverage, and Z-curve features) and using deep stacking networks learning model, we present a novel method (called Meta-MFDL) to predict the metagenomic genes. The results with 10 CV and independent tests show that Meta-MFDL is a powerful tool for identifying genes from metagenomic fragments.Entities:
Mesh:
Year: 2017 PMID: 29250541 PMCID: PMC5698827 DOI: 10.1155/2017/4740354
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
The performance of Orphelia, FragGeneScan, Meta-MFSVM, and Meta-MFDL on the Set700 and Set120 training datasets in 10 CV test.
| Predictors | Set700 | Set120 | ||||
|---|---|---|---|---|---|---|
| TPR (%) | PPV (%) |
| TPR (%) | PPV (%) |
| |
| Orphelia | 88.61 ± 1.89 | 89.05 ± 1.51 | 0.888 ± 0.016 | 83.19 ± 0.98 | 82.44 ± 1.20 | 0.847 ± 0.985 |
| Meta-MFSVM | 89.59 ± 2.39 | 90.23 ± 1.34 | 0.908 ± 0.018 | 84.70 ± 0.66 | 85.07 ± 1.26 | 0.849 ± 0.543 |
| FragGeneScan | 90.38 ± 2.23 | 91.89 ± 1.98 | 0.918 ± 0.013 | 86.56 ± 0.71 | 86.78 ± 0.54 | 0.868 ± 0.013 |
| Meta-MGPDL | 91.47 ± 1.37 | 93.26 ± 1.97 | 0.923 ± 0.008 | 89.28 ± 0.63 | 90.58 ± 0.61 | 0.899 ± 0.006 |
The overall accuracy (%) of Orphelia, FragGeneScan, MGC, MetaGUN, Meta-MFSVM, and Meta-MFDL on the TesData700 independent testing dataset.
| Species | Orphelia | Meta-MFSVM | MGC | FragGeneScan | MetaGUN | Meta-MFDL |
|---|---|---|---|---|---|---|
|
| 89.49 | 92.25 | 92.47 | 93.14 | 94.30 | 94.70 |
|
| 84.73 | 89.87 | 89.90 | 91.67 | 94.40 | 94.12 |
|
| 86.13 | 91.68 | 91.69 | 95.91 | 95.03 | 94.65 |
|
| 94.16 | 93.43 | 93.75 | 95.60 | 96.24 | 96.05 |
|
| 88.19 | 90.71 | 90.76 | 91.53 | 94.13 | 94.15 |
|
| 91.56 | 92.55 | 91.78 | 92.49 | 93.86 | 93.89 |
|
| 86.03 | 90.15 | 90.85 | 91.32 | 91.31 | 91.62 |
|
| 83.59 | 88.48 | 88.43 | 88.07 | 90.68 | 90.66 |
|
| 91.50 | 91.98 | 92.06 | 93.45 | 93.58 | 93.66 |
|
| 92.88 | 93.30 | 93.48 | 94.98 | 93.81 | 94.49 |
|
| 85.05 | 92.42 | 92.47 | 94.42 | 94.03 | 93.89 |
|
| 89.16 | 91.24 | 91.36 | 90.82 | 93.73 | 93.88 |
|
| 78.66 | 84.70 | 84.78 | 75.67 | 83.69 | 87.34 |
|
| ||||||
|
| 87.78 | 90.98 | 91.06 | 91.47 | 92.98 | 93.31 |
The overall accuracy (%) of Orphelia, FragGeneScan, MGC, MetaGUN, Meta-MFSVM, and Meta-MFDL on the TesData120 independent testing dataset.
| Species | Orphelia | Meta-MFSVM | MGC | FragGeneScan | MetaGUN | Meta-MFDL |
|---|---|---|---|---|---|---|
|
| 84.05 | 85.22 | 85.67 | 85.92 | 86.49 | 87.38 |
|
| 82.43 | 82.79 | 82.96 | 82.94 | 83.12 | 83.03 |
|
| 82.11 | 84.25 | 84.69 | 84.80 | 85.23 | 85.59 |
|
| 85.91 | 87.50 | 88.21 | 88.96 | 89.96 | 90.06 |
|
| 84.28 | 84.73 | 84.93 | 85.10 | 85.37 | 85.60 |
|
| 88.13 | 88.36 | 88.46 | 88.51 | 88.60 | 87.85 |
|
| 80.95 | 83.19 | 84.64 | 85.96 | 87.38 | 91.04 |
|
| 79.17 | 79.67 | 80.02 | 80.16 | 80.52 | 79.22 |
|
| 85.98 | 86.31 | 86.56 | 86.70 | 86.89 | 86.91 |
|
| 88.09 | 89.37 | 90.66 | 91.42 | 92.51 | 92.81 |
|
| 83.71 | 84.28 | 84.57 | 84.84 | 85.15 | 84.67 |
|
| 88.26 | 88.76 | 88.98 | 89.11 | 89.71 | 88.73 |
|
| 74.75 | 76.50 | 77.72 | 78.51 | 79.23 | 79.55 |
|
| ||||||
|
| 83.68 | 84.69 | 85.24 | 85.61 | 86.17 | 86.34 |
The effects of randomly sampling strategy to the Meta-MFDL predictor in 10 CV test.
| Sampling times | TPR (%) | PPV (%) |
|
|---|---|---|---|
| (1) | 91.27 ± 1.00 | 91.79 ± 1.10 | 0.915 ± 0.009 |
| (2) | 91.38 ± 0.92 | 91.38 ± 0.92 | 0.916 ± 0.007 |
| (3) | 91.64 ± 0.66 | 91.74 ± 0.88 | 0.916 ± 0.007 |
| (4) | 92.09 ± 0.95 | 91.73 ± 0.92 | 0.919 ± 0.008 |
| (5) | 92.16 ± 0.77 | 91.93 ± 0.51 | 0.920 ± 0.006 |
| (6) | 92.22 ± 0.63 | 91.98 ± 0.53 | 0.922 ± 0.004 |
| (7) | 92.14 ± 0.82 | 92.57 ± 0.91 | 0.924 ± 0.006 |
| (8) | 92.40 ± 1.26 | 92.64 ± 0.87 | 0.925 ± 0.007 |
| (9) | 91.92 ± 0.67 | 93.28 ± 1.50 | 0.926 ± 0.087 |
| (10) | 91.47 ± 1.37 | 93.26 ± 1.97 | 0.923 ± 0.008 |
|
| |||
| Average | 91.87 ± 0.90 | 92.23 ± 1.01 | 0.921 ± 0.015 |
Results of Meta-MFDL and four individual feature deep learning classifiers on Set700 dataset in 10 CV test.
| Features | TPR (%) | PPV (%) |
|
|---|---|---|---|
| ORFC-DL | 87.64 ± 1.24 | 92.28 ± 1.35 | 0.899 ± 0.003 |
| MC-DL | 87.97 ± 0.75 | 92.37 ± 1.05 | 0.899 ± 0.003 |
| MA-DL | 88.53 ± 0.49 | 92.68 ± 0.86 | 0.905 ± 0.004 |
| ZC-DL | 88.95 ± 0.36 | 93.85 ± 0.75 | 0.913 ± 0.002 |
| Meta-MFDL | 91.47 ± 1.37 | 93.26 ± 1.97 | 0.923 ± 0.008 |