Literature DB >> 34046590

A large-scale comparative study on peptide encodings for biomedical classification.

Sebastian Spänig1, Siba Mohsen1, Georges Hattab1, Anne-Christin Hauschild1, Dominik Heider1.   

Abstract

Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness, we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of 397 700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, for example, as part of automated machine learning pipelines. The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standards.
© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

Entities:  

Year:  2021        PMID: 34046590      PMCID: PMC8140742          DOI: 10.1093/nargab/lqab039

Source DB:  PubMed          Journal:  NAR Genom Bioinform        ISSN: 2631-9268


  5 in total

1.  MultiPep: a hierarchical deep learning approach for multi-label classification of peptide bioactivities.

Authors:  Alexander G B Grønning; Tim Kacprowski; Camilla Schéele
Journal:  Biol Methods Protoc       Date:  2021-11-23

Review 2.  Chaos game representation and its applications in bioinformatics.

Authors:  Hannah Franziska Löchel; Dominik Heider
Journal:  Comput Struct Biotechnol J       Date:  2021-11-10       Impact factor: 7.271

3.  Multivalent binding kinetics resolved by fluorescence proximity sensing.

Authors:  Clemens Schulte; Alice Soldà; Sebastian Spänig; Nathan Adams; Ivana Bekić; Werner Streicher; Dominik Heider; Ralf Strasser; Hans Michael Maric
Journal:  Commun Biol       Date:  2022-10-07

Review 4.  Vision for Improving Pregnancy Health: Innovation and the Future of Pregnancy Research.

Authors:  James M Roberts; Dominik Heider; Lina Bergman; Kent L Thornburg
Journal:  Reprod Sci       Date:  2022-05-09       Impact factor: 2.924

5.  Co-AMPpred for in silico-aided predictions of antimicrobial peptides by integrating composition-based features.

Authors:  Onkar Singh; Wen-Lian Hsu; Emily Chia-Yu Su
Journal:  BMC Bioinformatics       Date:  2021-07-30       Impact factor: 3.169

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.