| Literature DB >> 23193274 |
Kate R Rosenbloom1, Cricket A Sloan, Venkat S Malladi, Timothy R Dreszer, Katrina Learned, Vanessa M Kirkup, Matthew C Wong, Morgan Maddren, Ruihua Fang, Steven G Heitner, Brian T Lee, Galt P Barber, Rachel A Harte, Mark Diekhans, Jeffrey C Long, Steven P Wilder, Ann S Zweig, Donna Karolchik, Robert M Kuhn, David Haussler, W James Kent.
Abstract
The Encyclopedia of DNA Elements (ENCODE), http://encodeproject.org, has completed its fifth year of scientific collaboration to create a comprehensive catalog of functional elements in the human genome, and its third year of investigations in the mouse genome. Since the last report in this journal, the ENCODE human data repertoire has grown by 898 new experiments (totaling 2886), accompanied by a major integrative analysis. In the mouse genome, results from 404 new experiments became available this year, increasing the total to 583, collected during the course of the project. The University of California, Santa Cruz, makes this data available on the public Genome Browser http://genome.ucsc.edu for visual browsing and data mining. Download of raw and processed data files are all supported. The ENCODE portal provides specialized tools and information about the ENCODE data sets.Entities:
Mesh:
Year: 2012 PMID: 23193274 PMCID: PMC3531152 DOI: 10.1093/nar/gks1172
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.ENCODE data displayed in the UCSC Genome Browser together with annotations from the ENCODE Analysis Hub in the region of the nucleoporin gene NUP133 demonstrate the power this diversity of data provides for visual interpretation. The GENCODE Basic gene set shows this gene having four protein-coding splice variants and three smaller non-coding transcripts nearby. The proteogenomics track shows support for many of the coding exons, with protein localized in the nucleus, but not in plasma membrane or mitochondria. The long polyA RNA signal shows strong peaks over the exons and low intron signal in the cytosol, with greater signal in the nucleus. This is expected because nuclear mRNAs are not all completely spliced. The Combined Genome Segmentation integrates signal from many histones and classifies regions into those with characteristics of promoters (red), enhancers (yellow), insulators (blue), transcribed regions (green) and repressed (gray). Below are signal tracks from four of the eight histone modifications used as input to the segmentation. The promoter and transcribed regions agree with the RNA evidence, and like the RNA evidence show no evidence of transcription of the non-coding gene to the right of NUP133. Underneath the GM12878 histone signals is a track that overlays one of the histone signals, H3K27Ac, in seven different cell lines (with GM12878 shown in red). A peak in H3K27Ac appears at the enhancer, but as is often the case with enhancers, this appears to be relatively cell specific in contrast to the larger peak near the promoter, where the black coloration indicates the peak is shared by many cell types. The DNAse hypersensitivity and transcription factor tracks also provide evidence for both promoter and enhancer. Finally the mappability track indicates regions where short reads are not uniquely mappable, indicating the data are incomplete and therefore harder to interpret. Although most of this region is mappable, there are many small regions throughout and one larger region on the right where mapping is problematic. Overall, the ENCODE data in this region show strong evidence that this is a nuclear-localized protein-coding gene with a promoter that is used in a wide variety of cell types, and is likely to be regulated by tissue-specific enhancers as well.
The full complement of ENCODE data sets summarized by cell type [types annotated as cancer are marked with asterisk (*)]
| Cell type | Tissue | Description | Data sets | |||
|---|---|---|---|---|---|---|
| TF/His | RNA | Other | Total | |||
| Tier 1 initial | ||||||
| GM12878 | Blood | Lymphoblastoid | 137 | 27 | 49 | 213 |
| K562* | Blood | Leukemia | 247 | 45 | 80 | 372 |
| Tier 1 added in 2011 | ||||||
| H1-hESC | Embryonic stem | Embryonic stem | 96 | 14 | 23 | 133 |
| Tier 2 initial | ||||||
| HeLa-S3* | Uterine cervix | Cervical carcinoma | 93 | 14 | 30 | 137 |
| HepG2* | Liver | Liver carcinoma | 118 | 19 | 26 | 163 |
| HUVEC | Umbilical endothelium | Umbilical vein endothelial | 37 | 13 | 16 | 66 |
| Tier 2 added in 2011 | ||||||
| A549* | Lung | Lung carcinoma | 89 | 22 | 12 | 123 |
| CD14+ | Blood | Monocyte | 17 | 4 | 4 | 25 |
| IMR90 | Lung | Lung fibroblast | 11 | 16 | 10 | 37 |
| MCF-7* | Breast | Breast carcinoma | 50 | 15 | 32 | 97 |
| SK-N-SH* | Brain | Neuroblastoma | 36 | 16 | 7 | 59 |
| Tier 2 added in 2012 | ||||||
| CD20+ | Blood | B cell | 11 | 5 | 4 | 20 |
| H1-neuron | Neuron | H1ES-derived neuron | 5 | 3 | 1 | 9 |
| LHCN-M2 | Muscle | Myoblast | 7 | 2 | 4 | 13 |
| Human: totals | ||||||
| Tier1 + Tier2 ( | 954 | 215 | 298 | 1467 | ||
| Tier 3 (274) | 591 | 94 | 734 | 1419 | ||
| All (288) | 1545 | 309 | 1032 | 2886 | ||
| Mouse | ||||||
| All (81) | 381 | 102 | 100 | 583 | ||
Studies in the human genome focused on common cell types in designated ‘tiers’, with Tier1 most intensively studied, followed by Tier 2. A total of 10 292 files have been released referenced to the human (hg19/GRCh37) genome. For mouse (mm9/NCBI37), the comparable number is 8952 files. Data are available for download from the UCSC download server; for access see http://encodeproject.org/ENCODE/downloads.html and http://encodeproject.org/ENCODE/downloadsMouse.html. File formats are described on the ENCODE Portal File Formats page.
The full complement of ENCODE data sets summarized by assay
| Data type/assay | Experiments |
|---|---|
| Chromatin interactions | |
| 5C | 14 |
| ChIA-PET | 8 |
| DNA methylation | |
| Methyl array | 125 |
| Methyl RRBS | 95 |
| Methyl-seq | 20 |
| Histone modifications | |
| ChIP-seq | 315 |
| ChIP-seq (MOUSE) | 179 |
| Open chromatin | |
| DNase-DGF | 56 |
| DNase-DGF (MOUSE) | 22 |
| DNase-seq | 221 |
| Dnase-seq (MOUSE) | 55 |
| FAIRE-seq | 39 |
| RNA profiling | |
| CAGE | 78 |
| Exon array | 158 |
| RNA-chip | 26 |
| RNA-PET | 31 |
| RNA-seq | 245 |
| RNA-seq (MOUSE) | 102 |
| Transcription factor binding sites | |
| ChIP-seq | 1229 |
| ChIP-seq (MOUSE) | 202 |
| Other | |
| DNA cleavage | 1 |
| DNA-PET | 6 |
| GENCODE genes | 7 |
| Genotype | 64 |
| Negative regulatory elements | 2 |
| Nucleosome positioning | 2 |
| Proteogenomics | 14 |
| Replication timing | 24 |
| Replication timing (MOUSE) | 18 |
| RNA binding proteins | 47 |
| Short read mapability | 13 |
| Short read mapability (MOUSE) | 5 |
Descriptive overviews along with methods and references are included in the description page that accompanies all data sets.
All links mentioned in this publication are collected in this table
| ENCODE: |
| Genome Browser: |
| Five-level scoring metric filtering options: |
| UCSC download server (human): |
| UCSC download server (mouse): |
| ENCODE Analysis Hub: |
| ENCODE uniform processing pipeline: |
| GEO ENCODE summary page: |
| Integrative Analysis of ENCODE data: |
| ENCODE portal: Quality Metrics: |
| ENCODE portal: Software tools: |
| ENCODE portal: Analysis tools: |
| ENCODE portal: platform characterization: |
| ENCODE portal: publications: |
| ENCODE file formats: |
| Tutorial and training materials by OpenHelix: |
| Introductory tutorial: |
| OpenHelix QRC: |
| ENCODE announcement list: |
Figure 2.The ENCODE Analysis Hub at the EBI hosts over 2800 ENCODE data sets, organized in six tracks controlled via the track menu shown here.
Figure 3.All three screens of the Experiment Matrix for mouse are shown overlaid. The Data Summary screen lists experiments by data type, and provides launching to the two matrix screens that organize the data by assay and cell type. Clicking the appropriate table row or matrix cell launches a Track or File search tool (based on the Track/File selector control) that allows further refinement of the selection for browsing or download.