Literature DB >> 34078264

FEGS: a novel feature extraction model for protein sequences and its applications.

Zengchao Mu1, Ting Yu2, Xiaoping Liu3, Hongyu Zheng4, Leyi Wei5, Juntao Liu6.   

Abstract

BACKGROUND: Feature extraction of protein sequences is widely used in various research areas related to protein analysis, such as protein similarity analysis and prediction of protein functions or interactions.
RESULTS: In this study, we introduce FEGS (Feature Extraction based on Graphical and Statistical features), a novel feature extraction model of protein sequences, by developing a new technique for graphical representation of protein sequences based on the physicochemical properties of amino acids and effectively employing the statistical features of protein sequences. By fusing the graphical and statistical features, FEGS transforms a protein sequence into a 578-dimensional numerical vector. When FEGS is applied to phylogenetic analysis on five protein sequence data sets, its performance is notably better than all of the other compared methods.
CONCLUSION: The FEGS method is carefully designed, which is practically powerful for extracting features of protein sequences. The current version of FEGS is developed to be user-friendly and is expected to play a crucial role in the related studies of protein sequence analyses.

Entities:  

Keywords:  Feature extraction; Graphical representation; Physicochemical properties of amino acids; Protein similarity analysis; Statistical features

Mesh:

Substances:

Year:  2021        PMID: 34078264     DOI: 10.1186/s12859-021-04223-3

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  35 in total

1.  On 3-D graphical representation of DNA primary sequences and their numerical characterization.

Authors:  M Randić; M Vracko; A Nandy; S C Basak
Journal:  J Chem Inf Comput Sci       Date:  2000 Sep-Oct

2.  New 2D graphical representation of DNA sequences.

Authors:  Bo Liao; Tian-Ming Wang
Journal:  J Comput Chem       Date:  2004-08       Impact factor: 3.376

3.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

4.  A new graphical representation of similarity/dissimilarity studies of protein sequences.

Authors:  P He
Journal:  SAR QSAR Environ Res       Date:  2010-07       Impact factor: 3.000

5.  A simple way to look at DNA.

Authors:  M A Gates
Journal:  J Theor Biol       Date:  1986-04-07       Impact factor: 2.691

6.  In situ cytotoxic T cells in a methylcholanthrene-induced tumor.

Authors:  F DeLustro; J S Haskill
Journal:  J Immunol       Date:  1978-09       Impact factor: 5.422

7.  Random walk and gap plots of DNA sequences.

Authors:  P M Leong; S Morgenthaler
Journal:  Comput Appl Biosci       Date:  1995-10

8.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences.

Authors:  E Hamori; J Ruskin
Journal:  J Biol Chem       Date:  1983-01-25       Impact factor: 5.157

Review 9.  Phylogenetic Profiling for Probing the Modular Architecture of the Human Genome.

Authors:  Gautam Dey; Tobias Meyer
Journal:  Cell Syst       Date:  2015-08-26       Impact factor: 10.304

Review 10.  Alignment-free sequence comparison: benefits, applications, and tools.

Authors:  Andrzej Zielezinski; Susana Vinga; Jonas Almeida; Wojciech M Karlowski
Journal:  Genome Biol       Date:  2017-10-03       Impact factor: 13.583

View more
  1 in total

1.  FFP: joint Fast Fourier transform and fractal dimension in amino acid property-aware phylogenetic analysis.

Authors:  Wei Li; Lina Yang; Yu Qiu; Yujian Yuan; Xichun Li; Zuqiang Meng
Journal:  BMC Bioinformatics       Date:  2022-08-19       Impact factor: 3.307

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.