Literature DB >> 26142184

Nsite, NsiteH and NsiteM computer tools for studying transcription regulatory elements.

Ilham A Shahmuradov1, Victor V Solovyev2.   

Abstract

UNLABELLED: Gene transcription is mostly conducted through interactions of various transcription factors and their binding sites on DNA (regulatory elements, REs). Today, we are still far from understanding the real regulatory content of promoter regions. Computer methods for identification of REs remain a widely used tool for studying and understanding transcriptional regulation mechanisms. The Nsite, NsiteH and NsiteM programs perform searches for statistically significant (non-random) motifs of known human, animal and plant one-box and composite REs in a single genomic sequence, in a pair of aligned homologous sequences and in a set of functionally related sequences, respectively.
AVAILABILITY AND IMPLEMENTATION: Pre-compiled executables built under commonly used operating systems are available for download by visiting http://www.molquest.kaust.edu.sa and http://www.softberry.com. CONTACT: solovictor@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26142184      PMCID: PMC4612222          DOI: 10.1093/bioinformatics/btv404

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Transcription regulatory elements (REs) bound by transcription factors (TFs) are main players in gene expression (Grünberg and Hahn, 2013). Although a large set of experimentally identified REs/TFs has been collected in several databases (Ramirez and Basu, 2009; Solovyev ), the real RE content of promoters of most genes remains unidentified. Established computational RE identification algorithms are predominantly based on one of two approaches: (i) the search for motifs of known REs or (ii) the comparative analysis of homologous sequences aimed to discover new REs (Ladunga, 2010; Solovyev ). The first type of methods uses regulatory site and/or IUPAC consensus sequences or position-weight matrices. One of challenges in RE detection is to estimate the statistical significance of located motifs to distinguish them from random matches. In addition, in some cases, TFs bind a composite RE (a pair of DNA motifs with a spacer sequence of variable length between them) rather than a single short DNA region. Here, we present Nsite, NsiteH and NsiteM, a set of programs to predict both single and composite REs in query sequences and estimate their statistical significance.

2 Results

Previously, we proposed a probabilistic model that computes the probability of observing given sequence motifs or consensuses in random nucleotide sequences of the same length and nucleotide frequencies as a query sequence. The model also estimates the expected number of such motifs in random sequences. In particular, the model assumes that because REs are small that numbers rather than frequencies of nucleotides should be used to describe RE consensus sequences (Shakhmuradov ; Solovyev ; see also Supplementary Material S4). These statistical estimations provide the opportunity to find non-random similarities (unlikely to have occurred by chance) between a set of functional motifs and regions of an analyzed sequence. By applying this approach, we developed the Nsite, NsiteH and NsiteM computer programs that use various functional motif datasets. Using data from the largest three transcription RE databases: TRANSFAC (Wingender ), oTFD (Ghosh, 2000) and RegSite DB (http://linux1.softberry.com/berry. phtml?topic=regsite), we composed two animal (ooTFD and TRANSFAC) RE datasets and one plant (RegSite) RE dataset containing 8030, 3486 and 2871 single or composite REs, respectively. Other frequently cited and available sources of plant transcription REs, PLACE (http://www.dna.affrc.go.jp/PLACE/info.html) and PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) include much less known REs or their consensuses (469 and 435 records, respectively). A user has a choice of selecting one of three datasets or providing own RE set as well as adjusting the search parameters. The format of RE datasets is presented in Supplementary Figure S1 (Supplementary Material S1). Nsite performs searches for statistically non-random motifs of known REs in a single DNA sequence. A predicted motif is considered as statistically significant whether (i) the expected (by chance) number of such motifs is less than a given threshold and (ii) the total number of identified motifs is ≥95% confidence interval upper limit. The search and statistical estimations are performed separately on both strands of a query sequence. NsiteH discovers RE motifs with a given conservation level in a pair of aligned orthologous (homologous) sequences. Sequences should be aligned beforehand, e.g. using the program SCAN2 (http://softberry.com/scan.html). To run NsiteH, three input files are required (two query sequences and their alignment). In comparison to Nsite, this program identifies functional motifs that demonstrate a certain level of similarity between RE motifs in two query sequences. NsiteM searches for statistically significant RE motifs observed in many homologous sequences. This condition serves as an additional criterion for selecting putative REs. By comparison with Nsite, this program applies one additional search parameter—a minimal portion of query sequences containing the same RE motif. As input data, it requires two or more sequences in FASTA format. Descriptions of output results of these programs are presented in Supplementary Figures S2–S4 (Supplementary Material S1) and their algorithms are outlined in Supplementary Material S4. Testing Nsite, NsiteH and NsiteM on plant and animal sequences indicates that these programs can reliably identify known REs of promoters. For example, applying NsiteH for analysis of promoter regions of the orthologous Cab-E and Lhcb1*5 genes encoding the chlorophyll a/b-binding protein in Nicotiana plumbaginifolia and Nicotiana sylvestris, we identified a set of evolutionarily conservative REs (Fig. 1). The predicted GT-1 binding sites (RSP00741 and RSP00742) and G-box (CG-1 binding site; RSP01160) are involved in the photoregulation of plant genes and are known to be functional in the Lhcb1*5 gene of N.plumbaginifolia (Schindler and Cashmore, 1990).
Fig. 1.

A set of relevant REs predicted by the NsiteH program in proximal promoter regions of orthologous genes Cab-E and Lhcb1*5 encoding chlorophyll a/b-binding protein in N.plumbaginifolia and N.sylvestris. RSP02030: ERSE-I, RSP01890: W-box. For the full list of predicted REs see the Supplementary Material S2

A set of relevant REs predicted by the NsiteH program in proximal promoter regions of orthologous genes Cab-E and Lhcb1*5 encoding chlorophyll a/b-binding protein in N.plumbaginifolia and N.sylvestris. RSP02030: ERSE-I, RSP01890: W-box. For the full list of predicted REs see the Supplementary Material S2

3 Conclusion

The Nsite, NsiteH and NsiteM computer tool for identification of REs in promoter sequences is widely used by researchers, accessible through the Softberry and KAUST Bioinformatics WEB servers (www.softberry.com and www.molquest.kaust.edu.sa), and is cited in ∼200 research articles (according to Google Scholar). Nsite is applied for identification of RE patterns in a single query sequence. Nevertheless, reliable detection of short functional motifs increases when we account for sequence conservation in homologs promoters from different organisms. NsiteH is designed for analysis of orthologous genes’ promoters. NsiteM detects REs involved in the coordinated expression regulation of a group of genes. Our programs provide possibility to search for statistically significant sequence motifs and composite elements. The other analogous consensus-based search tools such as SIGNAL SCAN: http://www.dna.affrc.go.jp/sigscan/signal.html; PlantCARE Search Tool: http://bioinformatics.psb.ugent.be/webtools/ plantcare/html/; PatSearch: http://www.bio.net/bionet/mm/bionews/ 1996-October/ 003416.html search for a single motifs only and do not provide any statistical estimations. There are several studies that experimentally confirmed functionality of RE motifs that were predicted by Nsite program (Delatorre ; Linher-Melville and Singh, 2014; Wu ; Zheng ; Zografidis ). Conflict of Interest: none declared.
  13 in total

1.  Object-oriented transcription factors database (ooTFD).

Authors:  D Ghosh
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The TRANSFAC system on gene expression regulation.

Authors:  E Wingender; X Chen; E Fricke; R Geffers; R Hehl; I Liebich; M Krull; V Matys; H Michael; R Ohnhäuser; M Prüss; F Schacherer; S Thiele; S Urbach
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

Review 3.  An overview of the computational analyses and discovery of transcription factor binding sites.

Authors:  Istvan Ladunga
Journal:  Methods Mol Biol       Date:  2010

4.  Identification of promoter regions and regulatory sites.

Authors:  Victor V Solovyev; Ilham A Shahmuradov; Asaf A Salamov
Journal:  Methods Mol Biol       Date:  2010

5.  Functional identification and regulation of the PtDrl02 gene promoter from triploid white poplar.

Authors:  Huiquan Zheng; Shanzhi Lin; Qian Zhang; Yang Lei; Lu Hou; Zhiyi Zhang
Journal:  Plant Cell Rep       Date:  2010-02-24       Impact factor: 4.570

6.  [Enhancer-like structures in moderately repetitive sequences of eukaryotic genomes].

Authors:  I A Shakhmuradov; N A Kolchanov; V V Solov'ev; V A Ratner
Journal:  Genetika       Date:  1986-03

7.  Molecular characterization and identification of the E2/P4 response element in the porcine HOXA10 gene.

Authors:  Di Wu; Dechao Song; Xinyun Li; Mei Yu; Changchun Li; Shuhong Zhao
Journal:  Mol Cell Biochem       Date:  2012-11-18       Impact factor: 3.396

8.  The regulation of the SARK promoter activity by hormones and environmental signals.

Authors:  Carla A Delatorre; Yuval Cohen; Li Liu; Zvi Peleg; Eduardo Blumwald
Journal:  Plant Sci       Date:  2012-05-17       Impact factor: 4.729

9.  Transcriptional regulation and functional involvement of the Arabidopsis pescadillo ortholog AtPES in root development.

Authors:  Aris Zografidis; Giorgos Kapolas; Varvara Podia; Despoina Beri; Kalliope Papadopoulou; Dimitra Milioni; Kosmas Haralampidis
Journal:  Plant Sci       Date:  2014-08-29       Impact factor: 4.729

10.  Comparative analyses of plant transcription factor databases.

Authors:  Silvia R Ramirez; Chhandak Basu
Journal:  Curr Genomics       Date:  2009-03       Impact factor: 2.236

View more
  17 in total

1.  Antagonistic Transcription Factor Complexes Modulate the Floral Transition in Rice.

Authors:  Vittoria Brambilla; Damiano Martignago; Daniela Goretti; Martina Cerise; Marc Somssich; Matteo de Rosa; Francesca Galbiati; Roshi Shrestha; Federico Lazzaro; Rüdiger Simon; Fabio Fornara
Journal:  Plant Cell       Date:  2017-10-17       Impact factor: 11.277

2.  Prediction of Rice Transcription Start Sites Using TransPrise: A Novel Machine Learning Approach.

Authors:  Stepan Pachganov; Khalimat Murtazalieva; Alexei Zarubin; Tatiana Taran; Duane Chartier; Tatiana V Tatarinova
Journal:  Methods Mol Biol       Date:  2021

3.  Genomic characterization of ZIP genes in pigeonpea (CcZIP) and their expression analysis among the genotypes with contrasting host response to pod borer.

Authors:  Atul Nag; Kapil Gupta; Neeraj Dubey; Sujit K Mishra; Jogeswar Panigrahi
Journal:  Physiol Mol Biol Plants       Date:  2021-12-22

4.  RetroCHMP3 blocks budding of enveloped viruses without blocking cytokinesis.

Authors:  Lara Rheinemann; Diane Miller Downhour; Kate Bredbenner; Gaelle Mercenne; Kristen A Davenport; Phuong Tieu Schmitt; Christina R Necessary; John McCullough; Anthony P Schmitt; Sanford M Simon; Wesley I Sundquist; Nels C Elde
Journal:  Cell       Date:  2021-09-30       Impact factor: 66.850

5.  Motif and conserved module analysis in DNA (promoters, enhancers) and RNA (lncRNA, mRNA) using AlModules.

Authors:  Muharrem Aydinli; Chunguang Liang; Thomas Dandekar
Journal:  Sci Rep       Date:  2022-10-20       Impact factor: 4.996

6.  Exploring the diversity of promoter and 5'UTR sequences in ancestral, historic and modern wheat.

Authors:  Michael C U Hammond-Kosack; Kim E Hammond-Kosack; Robert King; Kostya Kanyuka
Journal:  Plant Biotechnol J       Date:  2021-09-16       Impact factor: 9.803

7.  Interferons and viruses induce a novel truncated ACE2 isoform and not the full-length SARS-CoV-2 receptor.

Authors:  Olusegun O Onabajo; A Rouf Banday; Megan L Stanifer; Wusheng Yan; Adeola Obajemu; Deanna M Santer; Oscar Florez-Vargas; Helen Piontkivska; Joselin M Vargas; Timothy J Ring; Carmon Kee; Patricio Doldan; D Lorne Tyrrell; Juan L Mendoza; Steeve Boulant; Ludmila Prokunina-Olsson
Journal:  Nat Genet       Date:  2020-10-19       Impact factor: 41.307

Review 8.  Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

Authors:  Ying Huang; Shi-Yi Chen; Feilong Deng
Journal:  Comput Struct Biotechnol J       Date:  2016-07-27       Impact factor: 7.271

9.  Divergent and convergent modes of interaction between wheat and Puccinia graminis f. sp. tritici isolates revealed by the comparative gene co-expression network and genome analyses.

Authors:  William B Rutter; Andres Salcedo; Alina Akhunova; Fei He; Shichen Wang; Hanquan Liang; Robert L Bowden; Eduard Akhunov
Journal:  BMC Genomics       Date:  2017-04-12       Impact factor: 3.969

10.  The Intergenic Interplay between Aldose 1-Epimerase-Like Protein and Pectin Methylesterase in Abiotic and Biotic Stress Control.

Authors:  Ekaterina V Sheshukova; Tatiana V Komarova; Denis V Pozdyshev; Natalia M Ershova; Anastasia V Shindyapina; Vadim N Tashlitsky; Eugene V Sheval; Yuri L Dorokhov
Journal:  Front Plant Sci       Date:  2017-09-25       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.