Rahul Nikam1, M Michael Gromiha1,2. 1. Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Tamil Nadu, Chennai 600036, India. 2. Advanced Computational Drug Discovery Unit (ACDD), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Yokohama 226-8501, Kanagawa, Japan.
Abstract
MOTIVATION: Machine learning techniques require various descriptors from protein and nucleic acid sequences to understand/predict their structure and function as well as distinguishing between disease and neutral mutations. Hence, availability of a feature extraction tool is necessary to bridge the gap. RESULTS: We developed a comprehensive web-based tool, Seq2Feature, which computes 252 protein and 41 DNA sequence-based descriptors. These features include physicochemical, energetic and conformational properties of proteins, mutation matrices and contact potentials as well as nucleotide composition, physicochemical and conformational properties of DNA. We propose that Seq2Feature could serve as an effective tool for extracting protein and DNA sequence-based features as applicable inputs to machine learning algorithms. AVAILABILITY AND IMPLEMENTATION: https://www.iitm.ac.in/bioinfo/SBFE/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Machine learning techniques require various descriptors from protein and nucleic acid sequences to understand/predict their structure and function as well as distinguishing between disease and neutral mutations. Hence, availability of a feature extraction tool is necessary to bridge the gap. RESULTS: We developed a comprehensive web-based tool, Seq2Feature, which computes 252 protein and 41 DNA sequence-based descriptors. These features include physicochemical, energetic and conformational properties of proteins, mutation matrices and contact potentials as well as nucleotide composition, physicochemical and conformational properties of DNA. We propose that Seq2Feature could serve as an effective tool for extracting protein and DNA sequence-based features as applicable inputs to machine learning algorithms. AVAILABILITY AND IMPLEMENTATION: https://www.iitm.ac.in/bioinfo/SBFE/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Robson P Bonidia; Douglas S Domingues; Danilo S Sanches; André C P L F de Carvalho Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622