| Literature DB >> 34850155 |
Jhih-Hua Jhong1, Lantian Yao1,2, Yuxuan Pang1,2, Zhongyan Li1,3, Chia-Ru Chung4, Rulan Wang1,2, Shangfu Li1, Wenshuo Li1,2, Mengqi Luo1, Renfei Ma1, Yuqi Huang3, Xiaoning Zhu3, Jiahong Zhang3, Hexiang Feng3, Qifan Cheng3, Chunxuan Wang1, Kun Xi1, Li-Ching Wu5, Tzu-Hao Chang6, Jorng-Tzong Horng4, Lizhe Zhu1,3, Ying-Chih Chiang3, Zhuo Wang1, Tzong-Yi Lee1,3.
Abstract
The last 18 months, or more, have seen a profound shift in our global experience, with many of us navigating a once-in-100-year pandemic. To date, COVID-19 remains a life-threatening pandemic with little to no targeted therapeutic recourse. The discovery of novel antiviral agents, such as vaccines and drugs, can provide therapeutic solutions to save human beings from severe infections; however, there is no specifically effective antiviral treatment confirmed for now. Thus, great attention has been paid to the use of natural or artificial antimicrobial peptides (AMPs) as these compounds are widely regarded as promising solutions for the treatment of harmful microorganisms. Given the biological significance of AMPs, it was obvious that there was a significant need for a single platform for identifying and engaging with AMP data. This led to the creation of the dbAMP platform that provides comprehensive information about AMPs and facilitates their investigation and analysis. To date, the dbAMP has accumulated 26 447 AMPs and 2262 antimicrobial proteins from 3044 organisms using both database integration and manual curation of >4579 articles. In addition, dbAMP facilitates the evaluation of AMP structures using I-TASSER for automated protein structure prediction and structure-based functional annotation, providing predictive structure information for clinical drug development. Next-generation sequencing (NGS) and third-generation sequencing have been applied to generate large-scale sequencing reads from various environments, enabling greatly improved analysis of genome structure. In this update, we launch an efficient online tool that can effectively identify AMPs from genome/metagenome and proteome data of all species in a short period. In conclusion, these improvements promote the dbAMP as one of the most abundant and comprehensively annotated resources for AMPs. The updated dbAMP is now freely accessible at http://awi.cuhk.edu.cn/dbAMP.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34850155 PMCID: PMC8690246 DOI: 10.1093/nar/gkab1080
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparison between this update and existing AMP databases
| Features | dbAMP | LAMPv2 | DBAASPv3 | DRAMPv2 | dbAMP 2.0 |
|---|---|---|---|---|---|
|
| January 2019 | March 2020 | November 2020 | May 2021 | June 2021 |
|
| 12 389 | 23 253 | >15 700 | 22 151 | 26 447 AMPs and 2,262 antimicrobial proteins |
|
| 2048 | – | – | – | 3044 |
|
| 1169 | – | >3600 (molecular dynamic trajectory >3200) | 283 | 3444 |
|
| – | – | – | 263 | 458 |
|
| 26 | 38 | – | 11 | 53 |
|
| 1737 | – | 6560 | – | 5531 |
|
| 6338 | – | – | – | 9454 |
|
| Text extraction system | – | – | – | Enhanced NLP system |
|
| Yes | Yes | Yes | Yes | Yes |
|
| Yes | – | – | – | Yes |
|
| |||||
|
| Yes | – | – | – | Yes |
|
| Yes | – | – | – | Yes |
|
| AMP prediction models based on multiple species | – | Prediction of general antimicrobial activity/against activity | – |
|
|
| AMP sequence alignment based on Bowtie2 | – | – | – |
|
The terms that could not be identified or missing are recorded as ‘–’.
Figure 1.Highlighted improvements in dbAMP 2.0. dbAMP is the most comprehensive resource for AMPs with this update bringing the total values for the AMP sequences and curated articles to >28 000 and >4500, respectively.
Comparison of the data statistics from this update and the previous version in terms of functional activities
| Function classes | Against activity | dbAMP | dbAMP 2.0a (%) |
|---|---|---|---|
| Antibacterial | Antibacterial | 3006 | 4837 (18.29) |
| Anti-Gram-positive | 2726 | 11 652 (44.06) | |
| Anti-Gram-negative | 2323 | 12 405 (46.91) | |
| Antimicrobial | 4816 | 8654 (32.72) | |
| Antibiofilm | 40 | 40 (0.14) | |
| Mollicute | – | 36 (0.14) | |
| Antiyeast | 4 | 5 (0.02) | |
| Antilisterial | – | 2 (0.01) | |
| Antifungal | Antifungal | 1623 | 5454 (20.62) |
| Antiviral | Antiviral | 300 | 1745 (6.60) |
| Anti-SARS/CoV | – | 186 (0.70) | |
| Antiparasitic | Antiparasitic | 123 | 186 (0.70) |
| New function peptides | Mammalian cells | 308 | 402 (1.52) |
| Anuran defense | – | 7256 (27.44) | |
| Insecticidal | 35 | 1791 (6.77) | |
| Antiprotozoal | 6 | 195 (0.74) | |
| Chemotactic | 59 | 61 (0.23) | |
| Antimalarial | 26 | 46 (0.17) | |
| Antinematode | – | 46 (0.17) | |
| Antiplasmodial | – | 35 (0.13) | |
| Cell penetrating | – | 29 (0.11) | |
| Enzyme inhibitor | 26 | 26 (0.10) | |
| Wound healing | 19 | 21 (0.07) | |
| Antibiotic | – | 19 (0.07) | |
| Immunomodulant | – | 17 (0.06) | |
| Spermicidal | 13 | 13 (0.05) | |
| Edema inducer | – | 11 (0.04) | |
| Disease-associated peptides | Anticancer | 227 | 2290 (7.98) |
| Anti-HIV | 109 | 2286 (8.64) | |
| Antitumor | 9 | 1018 (3.85) | |
| Anti-HCV | – | 67 (0.25) | |
| Antiangiogenesis | – | 13 (0.05) | |
| Anti-HSV | – | 10 (0.04) | |
| Antiallodynic | – | 1 (0.01) | |
| New mechanism-associated peptides | Antihypertensive | – | 1 (0.01) |
| Anti-MRSA | – | 874 (3.30) | |
| Antidiabetic | – | 113 (0.43) | |
| Antioxidant | 22 | 31 (0.12) | |
| Surface immobilized | 19 | 27 (0.10) | |
| Mast cell degranulating | – | 18 (0.07) | |
| Uterotonic | – | 6 (0.02) | |
| Anti-inflammatory | – | 6 (0.02) | |
| Antineurotensive | – | 4 (0.02) | |
| Plasma anticlotting | – | 3 (0.01) | |
| Proteolytic | – | 2 (0.01) | |
| Antinociceptive | – | 2 (0.01) | |
| Hypotensive | – | 1 (0.01) | |
| Sodium channel blocker | 2 | 2 (0.01) | |
| Toxic | Cytotoxin | – | 1 (0.01) |
| Hemolytica | – | 115 (0.43) | |
| Cytolytic | – | 98 (0.37) | |
| Ichthyotoxic | – | 14 (0.05) | |
The numbers in parentheses are displayed as theproportion of entries in the dbAMP.
Figure 2.The predicted structure viewer was integrated into the platform during this update. A case study describing the production of AMP, elafin (dbAMP_00487), which is the major antiviral protein in cervicovaginal lavage fluid, using human γδ T cells.
Figure 3.Main pipeline workflow for AMPfinder. AMPfinder is an efficient online tool, which can accurately identify AMPs within genome/transcriptome and proteome data in a short period of time.
Figure 4.The schematic framework underlying AMP target prediction. First, the user-input sequences are transformed into one of four different groups of peptide descriptors including amino acid composition (AAC), dipeptide composition (DPC), pseudo-amino acid composition (PAAC) and physiochemical properties (PHYC). These descriptors are then used as the feature vector for processing during the two-stage classification process that relies on GBDT and imbalanced learning. This evaluation will then produce a confidence value (ranging from 0 to 1 as the potency level for targeting different microbes) for each of the predicted AMPs.
Prediction performance for the test dataset for each of the tasks in two-stage AMP prediction
| Stage | Prediction task | SEN (%) | SPEC (%) | MeanACC (%) |
|---|---|---|---|---|
| First | AMP | 93.34 | 91.37 | 92.36 |
| Second | Anti-Gram-positive | 91.14 | 88.42 | 89.78 |
| Anti-Gram-negative | 89.58 | 88.79 | 89.19 | |
| Antivirus | 88.87 | 81.05 | 84.96 | |
| Antifungal | 93.86 | 57.92 | 75.89 | |
| Anticancer | 84.33 | 81.24 | 82.79 | |
| Mammalian inhibition | 78.49 | 79.08 | 78.79 |
TP, TN, FP and FN denote true positive, true negative, false positive and false negative, respectively. The sensitivity (SEN), specificity (SPEC) and mean accuracy (MeanACC) are defined as follows: SEN = TP/(TP + FN); SPEC = TN/(TN + FP); and MeanACC = 0.5(SEN + SPEC). The value 0.5 is a default cutoff of confidence values used to determine the positive or negative predictions.
Figure 5.Summary of the properties of the peptides shown to target coronavirus and other viruses. (A) Length distribution of the anticoronavirus peptides (n = 187) and regular AVPs (n = 1664). To distinguish between AMPs and antimicrobial proteins, the entries with sequence length >100 amino acids (n = 1 for anticoronavirus peptides and n = 57 for other regular AVPs, respectively) are not shown in the histogram. (B) Average amino acid composition. Amino acids are categorized according to their physicochemical properties. (C) Dimension reduction of peptide sequences as extracted by tape, which reveals the differences between these peptides and where each point represents a peptide sequence with anticoronavirus (green) or regular antivirus (purple) activity.