Literature DB >> 28363453

A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli.

Shiquan Sun1, Xiongpan Zhang2, Qinke Peng3.   

Abstract

BACKGROUND: Identifying transcription factors binding sites (TFBSs) plays an important role in understanding gene regulatory processes. The underlying mechanism of the specific binding for transcription factors (TFs) is still poorly understood. Previous machine learning-based approaches to identifying TFBSs commonly map a known TFBS to a one-dimensional vector using its physicochemical properties. However, when the dimension-sample rate is large (i.e., number of dimensions/number of samples), concatenating different physicochemical properties to a one-dimensional vector not only is likely to lose some structural information, but also poses significant challenges to recognition methods. MATERIALS AND
METHOD: In this paper, we introduce a purely geometric representation method, tensor (also called multidimensional array), to represent TFs using their physicochemical properties. Accompanying the multidimensional array representation, we also develop a tensor-based recognition method, tensor partial least squares classifier (abbreviated as TPLSC). Intuitively, multidimensional arrays enable borrowing more information than one-dimensional arrays. The performance of each method is evaluated by average F-measure on 51 Escherichia coli TFs from RegulonDB database.
RESULTS: In our first experiment, the results show that multiple nucleotide properties can obtain more power than dinucleotide properties. In the second experiment, the results demonstrate that our method can gain increased prediction power, roughly 33% improvements more than the best result from existing methods.
CONCLUSION: The representation method for TFs is an important step in TFBSs recognition. We illustrate the benefits of this representation on real data application via a series of experiments. This method can gain further insights into the mechanism of TF binding and be of great use for metabolic engineering applications.
Copyright © 2016 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Classification; Computational biology; Machine learning; Partial least squares; Tensor; Transcription factor binding sites

Mesh:

Substances:

Year:  2016        PMID: 28363453     DOI: 10.1016/j.artmed.2016.11.004

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  1 in total

1.  Higher-order partial least squares for predicting gene expression levels from chromatin states.

Authors:  Shiquan Sun; Xifang Sun; Yan Zheng
Journal:  BMC Bioinformatics       Date:  2018-04-11       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.