BACKGROUND: Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors. RESULTS: As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions. CONCLUSIONS: Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.
BACKGROUND: Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors. RESULTS: As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions. CONCLUSIONS: Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.
Authors: B Ren; F Robert; J J Wyrick; O Aparicio; E G Jennings; I Simon; J Zeitlinger; J Schreiber; N Hannett; E Kanin; T L Volkert; C J Wilson; S P Bell; R A Young Journal: Science Date: 2000-12-22 Impact factor: 47.728
Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler Journal: Genome Res Date: 2002-06 Impact factor: 9.043
Authors: Jane Grimwood; Laurie A Gordon; Anne Olsen; Astrid Terry; Jeremy Schmutz; Jane Lamerdin; Uffe Hellsten; David Goodstein; Olivier Couronne; Mary Tran-Gyamfi; Andrea Aerts; Michael Altherr; Linda Ashworth; Eva Bajorek; Stacey Black; Elbert Branscomb; Sean Caenepeel; Anthony Carrano; Chenier Caoile; Yee Man Chan; Mari Christensen; Catherine A Cleland; Alex Copeland; Eileen Dalin; Paramvir Dehal; Mirian Denys; John C Detter; Julio Escobar; Dave Flowers; Dea Fotopulos; Carmen Garcia; Anca M Georgescu; Tijana Glavina; Maria Gomez; Eidelyn Gonzales; Matthew Groza; Nancy Hammon; Trevor Hawkins; Lauren Haydu; Isaac Ho; Wayne Huang; Sanjay Israni; Jamie Jett; Kristen Kadner; Heather Kimball; Arthur Kobayashi; Vladimer Larionov; Sun-Hee Leem; Frederick Lopez; Yunian Lou; Steve Lowry; Stephanie Malfatti; Diego Martinez; Paula McCready; Catherine Medina; Jenna Morgan; Kathryn Nelson; Matt Nolan; Ivan Ovcharenko; Sam Pitluck; Martin Pollard; Anthony P Popkie; Paul Predki; Glenda Quan; Lucia Ramirez; Sam Rash; James Retterer; Alex Rodriguez; Stephanine Rogers; Asaf Salamov; Angelica Salazar; Xinwei She; Doug Smith; Tom Slezak; Victor Solovyev; Nina Thayer; Hope Tice; Ming Tsai; Anna Ustaszewska; Nu Vo; Mark Wagner; Jeremy Wheeler; Kevin Wu; Gary Xie; Joan Yang; Inna Dubchak; Terrence S Furey; Pieter DeJong; Mark Dickson; David Gordon; Evan E Eichler; Len A Pennacchio; Paul Richardson; Lisa Stubbs; Daniel S Rokhsar; Richard M Myers; Edward M Rubin; Susan M Lucas Journal: Nature Date: 2004-04-01 Impact factor: 49.962
Authors: Laura A Lettice; Simon J H Heaney; Lorna A Purdie; Li Li; Philippe de Beer; Ben A Oostra; Debbie Goode; Greg Elgar; Robert E Hill; Esther de Graaff Journal: Hum Mol Genet Date: 2003-07-15 Impact factor: 6.150
Authors: Andrew I Su; Tim Wiltshire; Serge Batalov; Hilmar Lapp; Keith A Ching; David Block; Jie Zhang; Richard Soden; Mimi Hayakawa; Gabriel Kreiman; Michael P Cooke; John R Walker; John B Hogenesch Journal: Proc Natl Acad Sci U S A Date: 2004-04-09 Impact factor: 11.205
Authors: Arttu Jolma; Yimeng Yin; Kazuhiro R Nitta; Kashyap Dave; Alexander Popov; Minna Taipale; Martin Enge; Teemu Kivioja; Ekaterina Morgunova; Jussi Taipale Journal: Nature Date: 2015-11-09 Impact factor: 49.962
Authors: Daniel J Raiten; Fayrouz A Sakr Ashour; A Catharine Ross; Simin N Meydani; Harry D Dawson; Charles B Stephensen; Bernard J Brabin; Parminder S Suchdev; Ben van Ommen Journal: J Nutr Date: 2015-04-01 Impact factor: 4.798
Authors: Caroline S Fox; Jennifer L Hall; Donna K Arnett; Euan A Ashley; Christian Delles; Mary B Engler; Mason W Freeman; Julie A Johnson; David E Lanfear; Stephen B Liggett; Aldons J Lusis; Joseph Loscalzo; Calum A MacRae; Kiran Musunuru; L Kristin Newby; Christopher J O'Donnell; Stephen S Rich; Andre Terzic Journal: Circulation Date: 2015-04-16 Impact factor: 29.690
Authors: Ekta Khurana; Yao Fu; Vincenza Colonna; Xinmeng Jasmine Mu; Hyun Min Kang; Tuuli Lappalainen; Andrea Sboner; Lucas Lochovsky; Jieming Chen; Arif Harmanci; Jishnu Das; Alexej Abyzov; Suganthi Balasubramanian; Kathryn Beal; Dimple Chakravarty; Daniel Challis; Yuan Chen; Declan Clarke; Laura Clarke; Fiona Cunningham; Uday S Evani; Paul Flicek; Robert Fragoza; Erik Garrison; Richard Gibbs; Zeynep H Gümüş; Javier Herrero; Naoki Kitabayashi; Yong Kong; Kasper Lage; Vaja Liluashvili; Steven M Lipkin; Daniel G MacArthur; Gabor Marth; Donna Muzny; Tune H Pers; Graham R S Ritchie; Jeffrey A Rosenfeld; Cristina Sisu; Xiaomu Wei; Michael Wilson; Yali Xue; Fuli Yu; Emmanouil T Dermitzakis; Haiyuan Yu; Mark A Rubin; Chris Tyler-Smith; Mark Gerstein Journal: Science Date: 2013-10-04 Impact factor: 47.728
Authors: Warren A Whyte; David A Orlando; Denes Hnisz; Brian J Abraham; Charles Y Lin; Michael H Kagey; Peter B Rahl; Tong Ihn Lee; Richard A Young Journal: Cell Date: 2013-04-11 Impact factor: 41.582