| Literature DB >> 35060594 |
Tsukasa Fukunaga1,2, Wataru Iwasaki3,4,5,6,7,8.
Abstract
MOTIVATION: Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias.Entities:
Year: 2022 PMID: 35060594 PMCID: PMC8963296 DOI: 10.1093/bioinformatics/btac034
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A–C) Overall discrimination performances of each metric using the AUC scores. The x-axis represents the th value, which defines positive dataset. The y-axis represents the AUROC score. (A), (B) and (C) panels represent results for the archaea, micrococcales and fungi datasets, respectively. (D–F) Prediction performances for highly ranked OG pairs of each metric (th = 0.7). The x-axis represents the M value. The y-axis represents the PPV. (D), (E) and (F) panels represent results for the archaea, micrococcales and fungi datasets, respectively. The yellow, blue, green and red colors represent SMI, EMI, SDI and EDI, respectively
Fig. 2.(A–C) Overall discrimination performances of integrated metrics using the AUC scores. The x-axis represents the th value, which defines positive dataset. The y-axis represents the AUROC score. (A), (B) and (C) panels represent results for the archaea, micrococcales and fungi datasets, respectively. (D–F) Prediction performances for highly ranked OG pairs of integrated metrics (th = 0.7). The x-axis represents the M value. The y-axis represents the PPV. (D), (E) and (F) panels represent results for the archaea, micrococcales and fungi datasets, respectively. The gray and black colors represent the highest single metric and integrated metric, respectively
The lists of the top five OG pairs detected by the combination of all three metrics
| Taxonomy | Rank | OG1 | OG2 | Prediction score | STRING score |
|---|---|---|---|---|---|
| 1 | COG0803 ( | COG1108 ( | 10.0 | 0.992 | |
| 2 | COG1203 ( | COG1688 ( | 14.7 | 0.996 | |
| Archaea | 3 | COG1108 ( | COG1121 ( | 17.3 | 0.994 |
| 4 | COG2998 ( | COG4662 ( | 21.3 | 0.999 | |
| 5 | COG1336 ( | COG1604 ( | 24.0 | 0.999 | |
| 1 | COG3181 ( | COG3333 ( | 1.7 | 0.989 | |
| 2 | COG1135 ( | COG2011 ( | 7.0 | 0.995 | |
| Micrococcales | 3 | COG1464 ( | COG2011 COG2011 ( | 10.7 | 0.996 |
| 4 | COG1135 ( | COG1464 ( | 12.3 | 0.987 | |
| 5 | COG1484 ( | COG4584 | 12.7 | 0.986 | |
| 1 | COG0043 ( | COG0163 ( | 34.3 | 0.998 | |
| 2 | KOG4501 | NOG13474 | 143.3 | 0.0 | |
| Fungi | 3 | COG5441 | COG5564 | 620.3 | 0.988 |
| 4 | COG0843 ( | COG1290 ( | 682.3 | 0.969 | |
| 5 | COG2051 ( | KOG3504 | 774.7 | 0.0 |