| Literature DB >> 17166863 |
Daryl J Thomas1, Kate R Rosenbloom, Hiram Clawson, Angie S Hinrichs, Heather Trumbower, Brian J Raney, Donna Karolchik, Galt P Barber, Rachel A Harte, Jennifer Hillman-Jackson, Robert M Kuhn, Brooke L Rhead, Kayla E Smith, Archana Thakkapallayil, Ann S Zweig, David Haussler, W James Kent.
Abstract
The goal of the Encyclopedia Of DNA Elements (ENCODE) Project is to identify all functional elements in the human genome. The pilot phase is for comparison of existing methods and for the development of new methods to rigorously analyze a defined 1% of the human genome sequence. Experimental datasets are focused on the origin of replication, DNase I hypersensitivity, chromatin immunoprecipitation, promoter function, gene structure, pseudogenes, non-protein-coding RNAs, transcribed RNAs, multiple sequence alignment and evolutionarily constrained elements. The ENCODE project at UCSC website (http://genome.ucsc.edu/ENCODE) is the primary portal for the sequence-based data produced as part of the ENCODE project. In the pilot phase of the project, over 30 labs provided experimental results for a total of 56 browser tracks supported by 385 database tables. The site provides researchers with a number of tools that allow them to visualize and analyze the data as well as download data for local analyses. This paper describes the portal to the data, highlights the data that has been made available, and presents the tools that have been developed within the ENCODE project. Access to the data and types of interactive analysis that are possible are illustrated through supplemental examples.Entities:
Mesh:
Year: 2006 PMID: 17166863 PMCID: PMC1781110 DOI: 10.1093/nar/gkl1017
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Composite track control and display. (A) Controls for options that apply to all data in this track (top) with checkboxes to include or exclude individual sub-tracks as desired (bottom). (B) Example of a composite track display showing the IRF1 gene, repeats, Yale transcript maps and Yale transcriptionally active regions (6,7). The latter two are composite tracks, each containing multiple datasets. The Placenta RNA checkbox is deselected above, so that the data are not displayed in the image below.
Figure 2Conservation display. (A) Conservation track at the base level shows details of a multiple sequence alignment, conservation scores and amino acid translations in coding regions. (‘.’: base is identical to human; ‘N’: missing sequence, ‘=’: sequence that does not align to reference is present in this species; orange numbers/lines: additional bases that are present in other species). (B) Conservation track zoomed out shows pairwise identity summary and conservation scores, highlighting non-coding elements in addition to exons.
Figure 3Track correlation in the Table Browser. Correlation of the Boston University. •OH Radical Cleavage Intensity Database (ORChID) (15–17) is shown with the CpG Island (left) and with the GC Percent (right) tracks. Statistical summaries (upper panels), scatter and residual plots (middle panels) and histograms (lower panels) are shown.