| Literature DB >> 33367584 |
Roman Feldbauer1, Lukas Gosch1, Lukas Lüftinger1,2, Patrick Hyden1, Arthur Flexer3, Thomas Rattei1.
Abstract
MOTIVATION: Protein orthologous group databases are powerful tools for evolutionary analysis, functional annotation, or metabolic pathway modeling across lineages. Sequences are typically assigned to orthologous groups with alignment-based methods, such as profile hidden Markov models, which has become a computational bottleneck.Entities:
Year: 2020 PMID: 33367584 PMCID: PMC8016488 DOI: 10.1093/bioinformatics/btaa1051
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Main characteristics of orthologous groups databases
| COG 2014 | eggNOG 5 (bacteria) | |
|---|---|---|
| Number of proteins | 1 674 176 | 13 836 642 |
| Number of OGs | 4631 | 206 782 |
| OG population range | 1–10 632 | 1–97 670 |
| Sequence length range | 21–29 202 | 23–24 921 |
Fig. 1.DeepNOG network architecture
Fig. 3.Amino acid representations in the encoding layer of DeepNOG. Random initialization before training (left), and tuned representations after DeepNOG training on -100 (right). Amino acids are colored based on biochemical properties. Ambiguous codes that were not present in the dataset are excluded from the plots
Fig. 2.Assignment accuracy for COG (minimum member threshold 500 and 100)
Inference time (seconds/1000 sequences) for COG and eggNOG 5 (bacteria level)
| COG-500 | COG-100 |
|
| |
|---|---|---|---|---|
| DIAMOND | 161.7 | 214.5 | 781.6 | 810.0 |
| pHMMs | 96.3 | 207.0 | 218.9 | 253.7 |
| DeepFam | 49.0 | 50.2 | n/a | n/a |
| DeepFam light | 32.7 | 35.0 | 34.9 | 38.7 |
| DeepNOG (CPU) |
|
|
|
|
| pHMMs (parallel) | 4.8 | 5.1 | 9.5 | 14.4 |
| DeepNOG (GPU) | 0.6 | 0.6 | 0.6 | 0.6 |
Note:Fastest method bold (single core). Averages over three replicates. Parallel pHMMs used 29x16 CPU cores.
Fig. 4.Assignment accuracy for -100 (tax 1) and -100 (tax 2) and three clustering regimes. Note: * DeepFam not trainable on 24 G GPU for eggNOG datasets
Fig. 5.Assignment confidences (bin size 0.01)
Fig. 6.T6SS assignment overlap of different methods