| Literature DB >> 35643441 |
Jayadev Joshi1, Daniel Blankenberg2,3.
Abstract
BACKGROUND: Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries, access to computational resources, and flexible pipelines are big hurdles to adopting these advanced methods.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35643441 PMCID: PMC9148462 DOI: 10.1186/s12859-022-04727-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Extending peptide library analysis with the PDAUG toolset inside Galaxy. A Tools are created with Python libraries. B Implementing Galaxy tool wrappers and tests for each tool. C PDAUG toolset with 24 individual tools. D Implementing reusable workflows using PDAUG
Description of PDAUG tools. PDAUG toolset comprises 24 different tools across 9 functional categories
| Functionality | Tool name | Major libraries used |
|---|---|---|
| Data visualization and plotting | PDAUG basic plots | matplotlib*, pandas*, seaborn*, quantiprot |
| PDAUG fishers plot | ||
| PDAUG peptide data plotting | ||
| PDAUG peptide Ngrams | ||
| PDAUG sequence network | ||
| PDAUG peptide length distribution | ||
| PDAUG uversky plot | ||
| Descriptor calculation | PDAUG AA property based peptide descriptor | modlAMP, pandas, pydpi |
| PDAUG peptide core descriptors | ||
| PDAUG peptide global descriptors | ||
| PDAUG sequence property based descriptors | ||
| PDAUG word vector descriptor | ||
| Peptide library generation | PDAUG AA property based peptide generation | modlAMP, pandas |
| PDAUG sequence based peptide generation | ||
| ML | PDAUG ML models | sklearn*, matplotlib, seaborn, pandas, gensim, nltk |
| PDAUG word vector model | ||
| Circular dichroism (CD) data analysis | PDAUG peptide CD spectral analysis | modlAMP, pandas |
| Peptide 3D structure | PDAUG peptide structure builder | fragbuilder, pandas |
| Core functionality | PDAUG peptide sequence analysis | modlAMP, pandas |
| PDAUG peptide core functions | ||
| Peptide data access | PDAUG peptide data access | modlAMP, biopython, pandas |
| Data handling and IO | PDAUG TSVtoFASTA | pandas |
| PDAUG merge dataframes | ||
| PDAUG AddClassLabel |
Libraries utilized for functionally important tasks are listed for each tool
*Python libraries used in data science
Fig. 2Sequence length distributions for the anticancer peptide and non-anticancer peptides. Mean lengths of anticancer and non-anticancer peptides are 40.06 and 32.25 AA, respectively, with less variability in length shown among the anticancer peptides
Fig. 3Sequence similarity network of the ACPs and non-ACPs. In comparison to the non-ACPs peptides, ACPs show two compact clusters that indicate a relatively high sequence similarity. In case of non-ACPs, relatively scattered networks have been observed
Fig. 4ACPs and non-ACPs datasets were compared and represented with a summary plot. A AA frequency distribution plot shows a significant difference in the frequency distribution of G, I, K, and L AA between ACPs and non-ACPs. B Global charge distribution shows a higher positive charge among the ACPs, while overall higher negative charge occurs among non-ACPs sequences. C There are no significant differences observed in the length distribution of ACPs and non-ACPs, except few outliers. D ACPs and non-ACPs show differences in global hydrophobicity. E A relatively smaller hydrophobic moment has been observed in the non-ACPs in comparison to the ACPs. F 3D scatter plot of global hydrophobicity, global hydrophobic movement and global charge showed separation between ACPs and non-ACPs
Fig. 5Feature space visualization of ACPs and non-ACPs. ACPs and non-ACPs in the feature space represented by their mean hydropathy and AA volume. The sequences with larger hydrophobic AA are more frequent in ACPs in comparison to non-ACPs
Fig. 6Assessment of the ML algorithms trained on four descriptor sets. Different performance measures for accuracy, precision, recall, F1 score and mean AUC were calculated for six different algorithms with and without z-scaling normalization. Results suggest that the models trained on the word vector descriptors perform superior to the models trained on other descriptors