Literature DB >> 17916572

Greglist: a database listing potential G-quadruplex regulated genes.

Ren Zhang1, Yan Lin, Chun-Ting Zhang.   

Abstract

The double helix is a conformation that genomic DNA usually assumes; under certain conditions, however, guanine-rich DNA sequences can form a four-stranded structure, G-quadruplex, which is found to play a role in regulating gene expression. Indeed, it has been demonstrated that the G-quadruplex formed in the c-MYC promoter suppresses its transcriptional activity. Recent studies suggest that G-quadruplex motifs (GQMs) are enriched in human gene promoters. To facilitate the research of G-quadruplex, we have constructed Greglist, a database listing potentially G-quadruplex regulated genes. Greglist harbors genes that contain promoter GQMs from genomes of various species, including humans, mice, rats and chickens. Many important genes are found to contain previously unreported promoter GQMs, such as ATM, BAD, AKT1, LEPR, UCP1, APOE, DKK1, WT1, WEE1, WNT1 and CLOCK. Furthermore, we find that not only protein coding genes, 126 human microRNAs also contain promoter GQMs. Greglist therefore provides candidates for further studying G-quadruplex functions and is freely available at http://tubic.tju.edu.cn/greglist.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17916572      PMCID: PMC2238908          DOI: 10.1093/nar/gkm787

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The double helix structure is a conformation that genomic DNA usually assumes; however, DNA can form other non-classical structures as well (1). For instance, under certain conditions, guanine-rich DNA sequences can form a special structure called G-quadruplex. The discovery of G-quadruplex can be traced back to G-quartets, planar arrays of four guanines held together by hydrogen bonds, which were found by Davies and coworkers (2) about 5 decades ago. Later Sen and Gilbert (3) discovered G-quadruplex, a four-stranded structure that is stabilized by G-quartets. As an example, readers may visit www.rcsb.org to view the 3-dimensional (3D) structure of a G-quadruplex (PDB code: 1XAV), which is formed in the promoter regions of the c-MYC gene (4). Sequences with high potential to form G-quadruplex have been found in many different genomic regions, suggesting diverse roles of G-quadruplexes (5–11). For instance, telomeric repeats in virtually all eukaryotes have the ability to form G-quadruplexes (10,11), offering a protection for the telomere 3′ overhang (12,13), which is essential for cell survival. Recent interests on G-quadruplexes have been focused on its role in transcriptional regulation. By using electron microscopy, Maizels and coworkers (14) observed that the G-quadruplex structure is formed cotranscriptionally in vivo. Indeed, Hurley and coworkers have demonstrated that the region upstream of the c-MYC promoter forms a G-quadruplex, removal of which results in an increase, whereas its stabilization results in a decrease in basal transcriptional activity of this promoter, suggesting promoter G-quadruplexes as transcriptional repressor elements (15). Sequences containing G-quadruplex motifs (GQMs) in promoter regions have only been reported for about 10 genes, including c-MYC (20,15–17), VEGF (18), BCL-2 (19), c-KIT (21) and some others (22,23). Recent bioinformatics studies, however, showed that GQMs are prevalent in the human genome (24,25). Furthermore, GQMs were found to be highly enriched in human gene promoters with more than 40% promoters containing at least 1 GQM (26). To facilitate the study of the role of promoter G-quadruplexes, we constructed Greglist, a database listing potential G-quadruplex REgulated Genes, i.e. genes that contain promoter GQMs. The database provides detailed information about the number, the position and the sequence of promoter GQMs from genes of various species. Many important genes are found to contain previously unreported promoter GQMs, such as ATM, BAD, AKT1, LEPR, UCP1, APOE, DKK1, WT1, WEE1, WNT1 and CLOCK. Furthermore, we found that not only protein coding genes, 126 human microRNAs also contain promoter GQMs. Greglist contains candidates for further studying G-quadruplex functions and is another device added to the existing online G-quadruplex toolbox.

DATABASE CONSTRUCTION AND DESCRIPTION

Greglist of the current version contains genes that have promoter GQMs in the genomes of human, mouse, rat and chicken. Table 1 provides a descriptive statistics of the content of the database. We generally defined sequences 1 kb upstream of transcription start site (TSS) as promoter regions. These sequences were downloaded from Ensembl using the software BioMart. The dataset used was Ensembl 45 and human, mouse, rat and chicken genome sequences were based on the versions of NCBI36, NCBIM36, RGSC3.4 and WASHUC2, respectively. The software Quadparser (26) was used to find the promoter GQM, which is G3+N1−7G3+N1−7G3+N1−7G3+, where N denotes any nucleotide. In addition, the G-quadruplex structure can be formed on either of the two DNA strands; therefore the motif of C3+N1−7C3+N1−7C3+N1−7C3+ was also used, which suggests the capability of the G-quadruplex formation on the complementary strand.
Table 1.

Descriptive statistics of genes in Greglist

Species (Latin name)SpeciesGenome versionNumber of genes having promoter GQMsTotal gene numberPercentage of genes having promoter GQMsAverage GQMs a gene hasGQM density in promoter regions (GQMs/Kb)Average GQM length (mean ± SD)
Homo sapiensHumanNCBI3610 27731 52432.60%1.930.6329.19 ± 13.57
Mus musculusMouseNCBIM36896228 39031.57%1.610.5128.18 ± 12.57
Rattus norvegicusRatRGSC3.4701327 30225.69%1.430.3726.39 ± 8.82
Gallus gallusChickenWASHUC2594917 43834.12%1.750.6028.70 ± 14.44
Descriptive statistics of genes in Greglist So far, only about 10 genes have been reported to contain promoter GQMs. In Greglist, however, a lot more genes that contain promoter GQMs are listed. For instance, these genes include ATM, BAD, AKT1, LEPR, UCP1, APOE, DKK1, WT1, WEE1, WNT1, CLOCK, ATF1 and BMP2, which have critical functions in various cellular processes, such as apoptosis and transcriptional regulation. Table 2 lists a sample of 30 genes that contain promoter GQMs with the position of GQMs and gene functions.
Table 2.

A list of 30 human genes that have not been previously reported to contain promoter G-quadruplex motifs

No.AbbreviationGene nameEnsembl IDFunction or associated diseaseReferenceNumber of GQMDistance to TSS
1WNT1Wingless-type MMTV integration site family, member 1ENSG00000125084The Wnt signaling pathway, CNS development(30)1193
2WNT5AWingless-type MMTV integration site family, member 5AENSG00000114251The Wnt signaling pathway, vertebrate development(31)2567, 936
3LEPRLEPTIN receptorENSG00000116678Energy metabolism(32)3310, 372, 495
4UCP1Uncoupling protein 1ENSG00000109424Energy metabolism(33)289, 224
5APOEApolipoprotein EENSG00000130203Alzheimer's disease(34)446, 65, 407, 739
6ATMAtaxia telangiectasia mutatedENSG00000149311Ataxia telangiectasia(35)159
7PAX8Paired box gene 8ENSG00000125618Permanent congenital hypothyroidism(36)1133
8SOX1SRY (sex determining region Y)-box 1ENSG00000203883Lens development(37)380, 726, 826
9SOX10SRY (sex determining region Y)-box 10ENSG00000100146Waardenburg–Hirschsprung disease(38)2130, 313
10HDAC1Histone deacetylase 1ENSG00000116478Histone modification(39)134
11TGFβ1Transforming growth factor, beta 1ENSG00000105329TGFβ signaling(40)1151
12SMAD2MAD homolog 2ENSG00000175387TGFβ signaling(41)2235, 450
13DKK1Dickkopf homolog 1ENSG00000107984TGFβ signaling(42)1136
14CLOCKClock homologENSG00000134852Circadian rhythms(43)3147, 341, 692
15WEE1WEE1 homologENSG00000166483Cell cycle control(44)1542
16BADBCL2-antagonist of cell deathENSG00000002330Apoptosis(45)3116, 628, 756
17AKT1V-akt murine thymoma viral oncogene homolog 1ENSG00000142208Apoptosis(46)161
18GATA4GATA-binding protein 4ENSG00000136574Heart development(47)1314
19MYOD1Myogenic differentiation 1ENSG00000129152Muscle development(48)2128, 216
20WT1Wilms tumor 1ENSG00000184937Kidney development(49)2168, 900
21GDF1Growth differentiation factor 1ENSG00000135414Left–right patterning(50)478, 166, 327, 766
22BMP2Bone morphogenetic protein 2ENSG00000125845Bone development(51)1163
23MEF2DMADS box transcription enhancer factor 2DENSG00000116604Heart development(52)418, 85, 169, 232
24STAT6Signal transducer and activator of transcription 6ENSG00000166888Immunity(53)1505
25SOCS1Suppressor of cytokine signaling 1ENSG00000185338Immunity(54)5112, 211, 534, 578, 758
26MMP2Matrix metallopeptidase 2ENSG00000167346Function of extracellular matrix(55)1576
27MAPK2Mitogen-activated protein kinase 2ENSG00000162889MAP kinase pathway(56)2100, 137
28ATF1Activating transcription factor 1ENSG00000123268Transcriptional regulation(57)136
29TAF2TAF2 RNA polymerase IIENSG00000064313Transcriptional regulation(58)1296
30RING1Ring finger protein 1ENSG00000204227Transcriptional regulation(59)4501, 559, 677, 938
A list of 30 human genes that have not been previously reported to contain promoter G-quadruplex motifs In addition, we found that not only protein coding genes, many microRNAs, such as hsa-mir-639 and hsa-mir-381, also contain promoter GQMs. Totally 126 human microRNAs were found to have promoter GQMs. To get a full list of these microRNAs, refer to the Supplementary Table 1. MicroRNAs have emerged as important regulators of gene expression. The finding that promoter regions of microRNA genes contain GQMs necessitates further studies to address the role of G-quadruplexes in microRNA regulation. Of note, the presence of a GQM only suggests the potential of a sequence to form G-quadruplex. In addition, the G-quadruplex structure is a dynamic structure that is formed upon denaturation of the DNA duplex. Therefore caution must be taken to interpret the data in Greglist. In other words, gene records in Greglist provide a starting point for further analysis of the potential G-quadruplex structure in these genes. Furthermore, Huppert et al. (26) reported that more than 40% of human genes contain promoter GQMs, however, in Greglist, ∼32% human genes do. This is likely because in Ref. (26), only less than 20 000 known genes were used, whereas in the current study, more than 30 000 human genes, including those classified as novel and those encode RNAs were included. Therefore, Greglist is made to be inclusive, not exclusive. Gene names, Ensembl IDs, RefSeq IDs, numbers of GQMs, distance of the GQM to TSS, functional description of gene ontology, sequences containing the GQM and coding sequences of the gene, were extracted from Ensembl database and Quadparser output files. All the data were then organized by using an open-source management system, MySQL, which allows rapid data retrieval. All gene records have been linked directly to corresponding entries in Ensembl. Users can browse each entry or download all records. Because of the large volume of data, a good searching function is important for this database. In Greglist, users can perform searches by inputting gene accession numbers or names at the homepage, and then click ‘Go’. To perform more detailed searches, users can click ‘Search’, and then in the new page, more detailed searching options are provided. For instance, users can search by gene ontology terms to get a list of genes that have desired functions. To further facilitate searching the gene of interest, we installed Blast program locally. So users can input the coding sequence of their gene of interest and perform Blast searches to find homologous ones. Many online resources for G-quadruplexes are available. These include G4P calculator (14), QGRS Mapper (27), Quadfinder (28), which are online programs or web servers for predicting G-quadruplexes. GRSDB (29) is a database of quadruplex forming G-rich sequences in alternatively processed mammalian pre-mRNA sequences. Greglist is another device added to the existing online G-quadruplex toolbox. We plan to include more species in future versions of Greglist. In addition, with the availability of more experimental data, we plan to integrate experimental evidence in corresponding entries. Furthermore, although the GQM used in Quadparser is quite commonly used, there are other motifs that have potential to form G-quadruplexes, and we also plan to include these motifs in future versions of the database. We welcome users’ comments, corrections and new information, which will be used for updating. Greglist is freely available at the website: http://tubic.tju.edu.cn/greglist, and should be cited with the present publication as reference.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  59 in total

Review 1.  Multistranded DNA structures.

Authors:  D E Gilbert; J Feigon
Journal:  Curr Opin Struct Biol       Date:  1999-06       Impact factor: 6.809

2.  Smad2 and Smad3 positively and negatively regulate TGF beta-dependent transcription through the forkhead DNA-binding protein FAST2.

Authors:  E Labbé; C Silvestri; P A Hoodless; J L Wrana; L Attisano
Journal:  Mol Cell       Date:  1998-07       Impact factor: 17.970

3.  DNA tetraplex formation in the control region of c-myc.

Authors:  T Simonsson; P Pecinka; M Kubista
Journal:  Nucleic Acids Res       Date:  1998-03-01       Impact factor: 16.971

4.  Akt phosphorylation of BAD couples survival signals to the cell-intrinsic death machinery.

Authors:  S R Datta; H Dudek; X Tao; S Masters; H Fu; Y Gotoh; M E Greenberg
Journal:  Cell       Date:  1997-10-17       Impact factor: 41.582

5.  Retinoblastoma protein recruits histone deacetylase to repress transcription.

Authors:  A Brehm; E A Miska; D J McCance; J L Reid; A J Bannister; T Kouzarides
Journal:  Nature       Date:  1998-02-05       Impact factor: 49.962

6.  Induction of cell migration by matrix metalloprotease-2 cleavage of laminin-5.

Authors:  G Giannelli; J Falk-Marzillier; O Schiraldi; W G Stetler-Stevenson; V Quaranta
Journal:  Science       Date:  1997-07-11       Impact factor: 47.728

7.  Genetic control of the circulating concentration of transforming growth factor type beta1.

Authors:  D J Grainger; K Heathcote; M Chiano; H Snieder; P R Kemp; J C Metcalfe; N D Carter; T D Spector
Journal:  Hum Mol Genet       Date:  1999-01       Impact factor: 6.150

8.  A Wnt5a pathway underlies outgrowth of multiple structures in the vertebrate embryo.

Authors:  T P Yamaguchi; A Bradley; A P McMahon; S Jones
Journal:  Development       Date:  1999-03       Impact factor: 6.868

9.  Sox1 directly regulates the gamma-crystallin genes and is essential for lens development in mice.

Authors:  S Nishiguchi; H Wood; H Kondoh; R Lovell-Badge; V Episkopou
Journal:  Genes Dev       Date:  1998-03-15       Impact factor: 11.361

10.  PAX8 mutations associated with congenital hypothyroidism caused by thyroid dysgenesis.

Authors:  P E Macchia; P Lapi; H Krude; M T Pirro; C Missero; L Chiovato; A Souabni; M Baserga; V Tassi; A Pinchera; G Fenzi; A Grüters; M Busslinger; R Di Lauro
Journal:  Nat Genet       Date:  1998-05       Impact factor: 38.330

View more
  23 in total

1.  Searching for non-B DNA-forming motifs using nBMST (non-B DNA motif search tool).

Authors:  R Z Cer; K H Bruce; D E Donohue; N A Temiz; U S Mudunuri; M Yi; N Volfovsky; A Bacolla; B T Luke; J R Collins; R M Stephens
Journal:  Curr Protoc Hum Genet       Date:  2012-04

2.  Downregulation of the WT1 gene expression via TMPyP4 stabilization of promoter G-quadruplexes in leukemia cells.

Authors:  Saeedeh Ghazaey Zidanloo; Abasalt Hosseinzadeh Colagar; Hossein Ayatollahi; Jahan-Bakhsh Raoof
Journal:  Tumour Biol       Date:  2016-01-27

Review 3.  Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions.

Authors:  Yong Qin; Laurence H Hurley
Journal:  Biochimie       Date:  2008-02-29       Impact factor: 4.079

4.  G-quadruplex forming region within WT1 promoter is selectively targeted by daunorubicin and mitoxantrone: A possible mechanism for anti-leukemic effect of drugs.

Authors:  Saeedeh Ghazaey Zidanloo; Abasalt Hosseinzadeh Colagar; Hossein Ayatollahi; Zahra Bagheryan
Journal:  J Biosci       Date:  2019-03       Impact factor: 1.826

5.  G-Quadruplexes Involving Both Strands of Genomic DNA Are Highly Abundant and Colocalize with Functional Sites in the Human Genome.

Authors:  Andrzej S Kudlicki
Journal:  PLoS One       Date:  2016-01-04       Impact factor: 3.240

6.  A toolbox for predicting g-quadruplex formation and stability.

Authors:  Han Min Wong; Oliver Stegle; Simon Rodgers; Julian Leon Huppert
Journal:  J Nucleic Acids       Date:  2010-06-08

7.  Promoter G-quadruplex sequences are targets for base oxidation and strand cleavage during hypoxia-induced transcription.

Authors:  David W Clark; Tzu Phang; Michael G Edwards; Mark W Geraci; Mark N Gillespie
Journal:  Free Radic Biol Med       Date:  2012-05-01       Impact factor: 7.376

8.  Molecular models for intrastrand DNA G-quadruplexes.

Authors:  Federico Fogolari; Haritha Haridas; Alessandra Corazza; Paolo Viglino; Davide Corà; Michele Caselle; Gennaro Esposito; Luigi E Xodo
Journal:  BMC Struct Biol       Date:  2009-10-07

9.  Molecular crowding creates an essential environment for the formation of stable G-quadruplexes in long double-stranded DNA.

Authors:  Ke-wei Zheng; Zhao Chen; Yu-hua Hao; Zheng Tan
Journal:  Nucleic Acids Res       Date:  2009-10-25       Impact factor: 16.971

10.  A novel chair-type G-quadruplex formed by a Bombyx mori telomeric sequence.

Authors:  Samir Amrane; Rita Wan Lin Ang; Zhong Ming Tan; Chun Li; Joefina Kim Cheow Lim; Jocelyn Mei Wen Lim; Kah Wai Lim; Anh Tuân Phan
Journal:  Nucleic Acids Res       Date:  2008-12-22       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.