Literature DB >> 14963262

Statistical analysis of over-represented words in human promoter sequences.

Leonardo Mariño-Ramírez1, John L Spouge, Gavin C Kanga, David Landsman.   

Abstract

The identification and characterization of regulatory sequence elements in the proximal promoter region of a gene can be facilitated by knowing the precise location of the transcriptional start site (TSS). Using known TSSs from over 5700 different human full-length cDNAs, this study extracted a set of 4737 distinct putative promoter regions (PPRs) from the human genome. Each PPR consisted of nucleotides from -2000 to +1000 bp, relative to the corresponding TSS. Since many regulatory regions contain short, highly conserved strings of less than 10 nucleotides, we counted eight-letter words within the PPRs, using z-scores and other related statistics to evaluate their over- and under-representation. Several over-represented eight-letter words have known biological functions described in the eukaryotic transcription factor database TRANSFAC; however, many did not. Besides calculating a P-value with the standard normal approximation associated with z-scores, we used two extra statistical controls to evaluate the significance of over-represented words. These controls have important implications for evaluating over- and under-represented words with z-scores.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14963262      PMCID: PMC373387          DOI: 10.1093/nar/gkh246

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  30 in total

1.  Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia.

Authors:  Piero Carninci; Kazunori Waki; Toshiyuki Shiraki; Hideaki Konno; Kazuhiro Shibata; Masayoshi Itoh; Katsunori Aizawa; Takahiro Arakawa; Yoshiyuki Ishii; Daisuke Sasaki; Hidemasa Bono; Shinji Kondo; Yuichi Sugahara; Rintaro Saito; Naoki Osato; Shiro Fukuda; Kenjiro Sato; Akira Watahiki; Tomoko Hirozane-Kishikawa; Mari Nakamura; Yuko Shibata; Ayako Yasunishi; Noriko Kikuchi; Atsushi Yoshiki; Moriaki Kusakabe; Stefano Gustincich; Kirk Beisel; William Pavan; Vassilis Aidinis; Akira Nakagawara; William A Held; Hiroo Iwata; Tomohiro Kono; Hiromitsu Nakauchi; Paul Lyons; Christine Wells; David A Hume; Michela Fagiolini; Takao K Hensch; Michelle Brinkmeier; Sally Camper; Junji Hirota; Peter Mombaerts; Masami Muramatsu; Yasushi Okazaki; Jun Kawai; Yoshihide Hayashizaki
Journal:  Genome Res       Date:  2003-06       Impact factor: 9.043

2.  An efficient statistic to detect over- and under-represented words in DNA sequences.

Authors:  S Schbath
Journal:  J Comput Biol       Date:  1997       Impact factor: 1.479

3.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences.

Authors:  Robert L Strausberg; Elise A Feingold; Lynette H Grouse; Jeffery G Derge; Richard D Klausner; Francis S Collins; Lukas Wagner; Carolyn M Shenmen; Gregory D Schuler; Stephen F Altschul; Barry Zeeberg; Kenneth H Buetow; Carl F Schaefer; Narayan K Bhat; Ralph F Hopkins; Heather Jordan; Troy Moore; Steve I Max; Jun Wang; Florence Hsieh; Luda Diatchenko; Kate Marusina; Andrew A Farmer; Gerald M Rubin; Ling Hong; Mark Stapleton; M Bento Soares; Maria F Bonaldo; Tom L Casavant; Todd E Scheetz; Michael J Brownstein; Ted B Usdin; Shiraki Toshiyuki; Piero Carninci; Christa Prange; Sam S Raha; Naomi A Loquellano; Garrick J Peters; Rick D Abramson; Sara J Mullahy; Stephanie A Bosak; Paul J McEwan; Kevin J McKernan; Joel A Malek; Preethi H Gunaratne; Stephen Richards; Kim C Worley; Sarah Hale; Angela M Garcia; Laura J Gay; Stephen W Hulyk; Debbie K Villalon; Donna M Muzny; Erica J Sodergren; Xiuhua Lu; Richard A Gibbs; Jessica Fahey; Erin Helton; Mark Ketteman; Anuradha Madan; Stephanie Rodrigues; Amy Sanchez; Michelle Whiting; Anup Madan; Alice C Young; Yuriy Shevchenko; Gerard G Bouffard; Robert W Blakesley; Jeffrey W Touchman; Eric D Green; Mark C Dickson; Alex C Rodriguez; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Yaron S N Butterfield; Martin I Krzywinski; Ursula Skalska; Duane E Smailus; Angelique Schnerch; Jacqueline E Schein; Steven J M Jones; Marco A Marra
Journal:  Proc Natl Acad Sci U S A       Date:  2002-12-11       Impact factor: 11.205

4.  CpG islands in vertebrate genomes.

Authors:  M Gardiner-Garden; M Frommer
Journal:  J Mol Biol       Date:  1987-07-20       Impact factor: 5.469

5.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22.

Authors:  Daiya Takai; Peter A Jones
Journal:  Proc Natl Acad Sci U S A       Date:  2002-03-12       Impact factor: 11.205

6.  Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units.

Authors:  Vladimir B Bajic; Seng Hong Seah
Journal:  Genome Res       Date:  2003-07-17       Impact factor: 9.043

7.  Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

Authors:  Paul Cliften; Priya Sudarsanam; Ashwin Desikan; Lucinda Fulton; Bob Fulton; John Majors; Robert Waterston; Barak A Cohen; Mark Johnston
Journal:  Science       Date:  2003-05-29       Impact factor: 47.728

8.  Discovery of novel transcription factor binding sites by statistical overrepresentation.

Authors:  Saurabh Sinha; Martin Tompa
Journal:  Nucleic Acids Res       Date:  2002-12-15       Impact factor: 16.971

9.  Identification and functional analysis of human transcriptional promoters.

Authors:  Nathan D Trinklein; Shelley J Force Aldred; Alok J Saldanha; Richard M Myers
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

10.  TRANSFAC: transcriptional regulation, from patterns to profiles.

Authors:  V Matys; E Fricke; R Geffers; E Gössling; M Haubrock; R Hehl; K Hornischer; D Karas; A E Kel; O V Kel-Margoulis; D-U Kloos; S Land; B Lewicki-Potapov; H Michael; R Münch; I Reuter; S Rotert; H Saxel; M Scheer; S Thiele; E Wingender
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

View more
  53 in total

1.  Identification and characterization of putative methylation targets in the MAOA locus using bioinformatic approaches.

Authors:  Elena Shumay; Joanna S Fowler
Journal:  Epigenetics       Date:  2010-05-05       Impact factor: 4.528

2.  TAT-mediated transduction of NF-Ya peptide induces the ex vivo proliferation and engraftment potential of human hematopoietic progenitor cells.

Authors:  Alevtina D Domashenko; Gwenn Danet-Desnoyers; Alissa Aron; Martin P Carroll; Stephen G Emerson
Journal:  Blood       Date:  2010-07-08       Impact factor: 22.113

3.  Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping.

Authors:  Tae-Young Roh; Suresh Cuddapah; Keji Zhao
Journal:  Genes Dev       Date:  2005-02-10       Impact factor: 11.361

4.  Alignments anchored on genomic landmarks can aid in the identification of regulatory elements.

Authors:  Kannan Tharakaraman; Leonardo Mariño-Ramírez; Sergey Sheetlin; David Landsman; John L Spouge
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

5.  The histone-like NF-Y is a bifunctional transcription factor.

Authors:  Michele Ceribelli; Diletta Dolfini; Daniele Merico; Raffaella Gatta; Alessandra M Viganò; Giulio Pavesi; Roberto Mantovani
Journal:  Mol Cell Biol       Date:  2008-01-22       Impact factor: 4.272

6.  Super paramagnetic clustering of DNA sequences.

Authors:  Sugiarto Radjiman; Lianyi Han; Jian-Sheng Wang; Yu Zong Chen
Journal:  J Biol Phys       Date:  2006-01       Impact factor: 1.365

7.  NF-YC complexity is generated by dual promoters and alternative splicing.

Authors:  Michele Ceribelli; Paolo Benatti; Carol Imbriano; Roberto Mantovani
Journal:  J Biol Chem       Date:  2009-08-18       Impact factor: 5.157

8.  Type 2 NF1 deletions are highly unusual by virtue of the absence of nonallelic homologous recombination hotspots and an apparent preference for female mitotic recombination.

Authors:  Katharina Steinmann; David N Cooper; Lan Kluwe; Nadia A Chuzhanova; Cornelia Senger; Eduard Serra; Conxi Lazaro; Montserrat Gilaberte; Katharina Wimmer; Viktor-Felix Mautner; Hildegard Kehrer-Sawatzki
Journal:  Am J Hum Genet       Date:  2007-10-31       Impact factor: 11.025

9.  Complete ascertainment of intragenic copy number mutations (CNMs) in the CFTR gene and its implications for CNM formation at other autosomal loci.

Authors:  Sylvia Quemener; Jian-Min Chen; Nadia Chuzhanova; Caroline Bénech; Teresa Casals; Milan Macek; Thierry Bienvenu; Trudi McDevitt; Philip M Farrell; Ourida Loumi; Taieb Messaoud; Harry Cuppens; Garry R Cutting; Peter D Stenson; Karine Giteau; Marie-Pierre Audrézet; David N Cooper; Claude Férec
Journal:  Hum Mutat       Date:  2010-04       Impact factor: 4.878

10.  Conservation and implications of eukaryote transcriptional regulatory regions across multiple species.

Authors:  Lin Wan; Dayong Li; Donglei Zhang; Xue Liu; Wenjiang J Fu; Lihuang Zhu; Minghua Deng; Fengzhu Sun; Minping Qian
Journal:  BMC Genomics       Date:  2008-12-20       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.